Background
In the previous post. I mentioned how the "Pagination" concept changed my mind about implementing the data migration via command. In this post, I want to introduce the second concept, which is also very useful for me to implement the data migration commands with great quality, called Dry Run
.
A Dry Run
refers to the process of simulating the execution of a program without actually executing it on the intended target or environment. Next, we will use some examples to illustrate how to implement the Dry Run
with Django command.
Before
Okay, we use the same example as the previous post. We get all the subscriptions from the subscriptions model and want to update part of the subscription info to the task, so after selecting all objects from the model, append a new task model through the for loop, we finally use the bulk update to update all data at the same time.
from django.core.management.base import BaseCommand
from model import TaskSubscriptionModel, CarrierTaskModel
from managers import TaskManager
class Command(BaseCommand): # pragma: no cover
subscriptions = TaskSubscriptionModel.objects.all()
tasks_to_update = []
for subscription in subscriptions:
task_id = subscription.task.id
task = CarrierTaskModel(
id=task_id,
subscriber=subscription.subscriber,
lookup_id=subscription.lookup_id,
tags=subscription.tags,
expire_time=subscription.expire_time,
)
tasks_to_update.append(task)
TaskManager().bulk_update(
tasks_to_update,
fields=["subscriber", "lookup_id", "tags", "expire_time"],
)
What additional issue is present in this example, apart from what we discussed in the previous post? We don't know how much volume of data to be updated until the command is executed, but sometimes it's too late. When performing potentially risky database operations, is there a way to ensure everything progresses smoothly initially? Can we obtain a preview of how much data will be updated and, if it appears satisfactory, then process the update.
After
That's the value of the Dry Run
. Now, we can see how it works.
from contextlib import contextmanager
from django.db.transaction import atomic
from django.core.management.base import BaseCommand
from model import TaskSubscriptionModel, CarrierTaskModel
from managers import TaskManager
class DoRollback(Exception):
pass
@contextmanager
def rollback_atomic():
try:
with atomic():
yield
raise DoRollback()
except DoRollback:
pass
class Command(BaseCommand):
def add_arguments(self, parser: argparse.ArgumentParser) -> None:
parser.add_argument(
"--dry-run",
dest="dry_run",
action="store_true",
default=False,
help="Actually edit the database or not",
)
def handle(self, *args, **options):
dry_run = options["dry_run"]
prefix = "In the dry run mode" if dry_run else ""
atomic_context = rollback_atomic() if dry_run else atomic()
with atomic_context:
subscriptions = TaskSubscriptionModel.objects.all()
tasks_to_update = []
for subscription in subscriptions:
task_id = subscription.task.id
task = CarrierTaskModel(
id=task_id,
subscriber=subscription.subscriber,
lookup_id=subscription.lookup_id,
tags=subscription.tags,
expire_time=subscription.expire_time,
)
tasks_to_update.append(task)
print("number of the tasks are updated:", len(tasks_to_update))
TaskManager().bulk_update(
tasks_to_update,
fields=["subscriber", "lookup_id", "tags", "expire_time"],
)
What's the key element of the Dry Run process? The answer is "Rollback". We can leverage this database feature in conjunction with the context manager to achieve it. For the details, please refer to the linked article in the reference section; here, I won't delve into it extensively.
This approach allows us to employ the "--dry-run" option to control whether the command carries out the database operation or not.
Summary
In this post, we will discuss what Dry Run
is and how to leverage it to enhance the quality of data migration commands. By combining Pagination
(as discussed in the previous post) with Dry Run
, you can create commands that are even more efficient. Enjoy the process!