An introduction to data migrations in Django

Migrations are one of the great features that come out of the box with Django. It automates the process of changing database after modifications in the models. With few simple commands, model changes will reflect in the database. For example, In a model named Image, we add a new field called file_size. The database schema will change after running some simple commands like makemigrations and migrate. We have changed the database schema without touching the database.

Migrations are of two types:

  1. Schema migrations: It is the most commonly used and often referred as migrations. These help to change database schema like creating database tables, adding new fields to tables and changing fields etc. These are generated automatically by Django.

  2. Data migrations: As the name suggests, migrations that alter data are called Data migrations. Django can not automatically generate data migrations, as it depends on your use case. Some examples where data migrations can be useful are:

    • We add a new field named full_name to our User model. We use the schema migration which will add the field to the User database table. User model already has fields like first_name and last_name. Our production database has more than a thousand users. How will we populate the full_name for these users? Data migration will help here. We can write data migration to combine the first_name and last_name then the result can be saved to full_name.

    • Our project uses a third party Django package which provides an option to use a custom model. Initially, we do not require the custom model. But later we want to use the custom model to add new fields. As we are using the package in production, there is data stored in the old model. We have to migrate data from the old model to new model. Data migration can be created to accomplish this.

Learning by example

Hope you understand what data migrations are and when to use them. So, let's write a data migration. Suppose we have a model in our project like

from django.db import models
from django.utils.translation import ugettext_lazy as _


class Image(models.Model):
    title = models.CharField(max_length=255, verbose_name=_('title'))
    file = models.ImageField(
        verbose_name=_('file'), upload_to='uploads/',
        width_field='width', height_field='height'
    )
    width = models.IntegerField(verbose_name=_('width'), editable=False)
    height = models.IntegerField(verbose_name=_('height'), editable=False)
    created_at = models.DateTimeField(
        verbose_name=_('created at'), auto_now_add=True, db_index=True
    )

    def __str__(self):
        return self.title

The Image model has fields like title, file, width, height and created_at. The file is an ImageField which has a method called size. The size method returns the total size of the image in bytes. Instead, to call this method when we need to get the image size, we can add a new field called file_size.

Schema migration


A new line to add in above model is:

file_size = models.PositiveIntegerField(null=True, editable=False)

Now Django can automatically create a schema migration to add the field to the Image table in database.

python manage.py makemigrations

A new migration is generated with name like 0002_image_file_size.py.

# Generated by Django 2.0.5 on 2018-05-23 08:55

from django.db import migrations, models


class Migration(migrations.Migration):

    dependencies = [
        ('images', '0001_initial'),
    ]

    operations = [
        migrations.AddField(
            model_name='image',
            name='file_size',
            field=models.PositiveIntegerField(editable=False, null=True),
        ),
    ]

The above migration uses migration operation called AddField to add a new field to the Image model. We do not have to write it. Now we have what changes we want in our database table of Image. Run the migrate command to make the changes in the database.

python manage.py migrate

Data migration


We have a new field called file_size in the Image model. Say there are more than thousand objects of the Image model. But the file_size field value for all these image objects is Null. We have to update all the current image objects with the file size. We will write a data migration to accomplish the task.

Create an empty migration file

python manage.py makemigrations --empty images

Here images is the name of the Django app which contains the Image model.

Data Migration that will update the file_size field to image size will be

from django.db import migrations


def set_file_size(apps, schema_editor):
    # We can't import the Image model directly as it may be a newer
    # version than this migration expects. We use the historical version.
    Image = apps.get_model('images', 'Image')
    for image in Image.objects.filter(file_size__isnull=True):
        try:
            image.file_size = image.file.size
            image.save(update_fields=['file_size'])
        except OSError:
            # The file doesn't exist. Internally it uses os.path.getsize
            # to get size of image. Raise OSError if the file does not exist
            # https://docs.python.org/3/library/os.path.html#os.path.getsize
            pass


def do_nothing(apps, schema_editor):
    # The reverse function is not required here as we do not
    # want to set the file size to Null again.
    pass


class Migration(migrations.Migration):

    dependencies = [
        ('images', '0002_image_file_size'),
    ]

    operations = [
        migrations.RunPython(set_file_size, do_nothing),
    ]

Run the Data migration like the schema migration.

python manage.py migrate
  • Schema migration has an operation called AddField which adds a field to the database table. Similarly, we use RunPython operation in data migration to run custom Python code. We pass two functions called set_file_size and do_nothing. The first function is a forward function which gets called when we run migrate command. The second function is necessary if we want to rollback a migration.

  • The set_file_size function gets all the Image objects which have the Null value of file_size. Then set its value to file size and save to the database.

  • Here we have empty reverse function called do_nothing because we do not want to set the file_size to Null again. The do_nothing function will help to roll back the migration.

    We can roll back the data migration by

    ./manage.py migrate images 0002_image_file_size
    

    0002_image_file_size is the schema migration we created before the data migration. we can use the showmigrations command to see applied and unapplied migrations.

    python manage.py showmigrations images
    
    images
     [X] 0001_initial
     [X] 0002_image_file_size
     [ ] 0003_auto_20180523_0907
    

    We can again apply the migration using migrate command. We can also change the data migration name 0003_auto_20180523_0907 to meaningful like 0003_data_migration_image_set_file_size.

  • Data migrations should not be used to load test data. As migrations will run into the production database as well. We do not want test data in the production database. Fixtures can be used to load test data.

Hope the example is useful. In some cases, we have to write the reverse function as well. Simply it reverses the work performed by the forward function. Another example which migrates data from one model to other model is in the blog post. Switch to Custom image model in a Production Wagtail project.

Thanks for reading.

Update:

We can also use migrations.RunPython.noop as mentioned in official docs instead of do_nothing function above. Thanks to Blair Gemmer for suggesting it in comments.

By @Parbhat Puri in
Tags : #django, #migrations, #postgres, #opensource,

Comments !