Custom Migrations

The custom migrations system tracks data migrations that run outside Django's standard migration framework, ensuring they execute exactly once across all environments.

Usage

Management Command

Create a command that inherits from IdempotentCommand:

from apps.data_migrations.management.commands.base import IdempotentCommand

class Command(IdempotentCommand):
    help = "Migrate user data to new format"
    migration_name = "migrate_user_data_v1_2024_11_21"

    def perform_migration(self, dry_run=False):
        users = User.objects.filter(needs_migration=True)

        if dry_run:
            self.stdout.write(f"Would update {users.count()} users")
            return

        updated = users.update(migrated=True)
        return updated

Optional fields:

atomic: Set to False to disable atomic migration.
disable_audit: Set to True to disable model auditing for this migration.

Run with:

python manage.py my_migration              # Execute
python manage.py my_migration --dry-run    # Preview
python manage.py my_migration --force      # Re-run

Django Migration

Use RunDataMigration to run your management command within a Django migration:

from django.db import migrations
from apps.data_migrations.utils.migrations import RunDataMigration

class Migration(migrations.Migration):
    dependencies = [("myapp", "0001_initial")]
    operations = [
        RunDataMigration("my_migration"),
    ]

This automatically handles idempotency.

Managing Migrations

Use the custom_migrations command to view and manage applied migrations:

python manage.py custom_migrations list                 # List all
python manage.py custom_migrations list --name user     # Filter by name
python manage.py custom_migrations mark <name>          # Mark as applied
python manage.py custom_migrations unmark <name>        # Remove record

Naming Convention

Use descriptive names with dates: {description}_{version}_{YYYY_MM_DD}

Examples: - migrate_user_data_v1_2024_11_21 - backfill_team_settings_2024_12_01

Two / Three-Phase Deployment Workflow

When adding new fields that require data backfilling, use a two or three-phase deployment to safely migrate data in production:

Phase 1: Add Field and Initial Migration

Goal: Add the new field and backfill existing data.

Create the data model changes:

# models.py
class User(models.Model):
    name = models.CharField(max_length=100)
    normalized_name = models.CharField(max_length=100, blank=True)  # New field

Create a Django schema migration:
```
python manage.py makemigrations
```

Create the data migration command:

# management/commands/backfill_normalized_names.py
from apps.data_migrations.management.commands.base import IdempotentCommand
from apps.users.models import User

class Command(IdempotentCommand):
    help = "Backfill normalized names for existing users"
    migration_name = "backfill_normalized_names_2024_12_15"

    def perform_migration(self, dry_run=False):
        users = User.objects.filter(normalized_name="")

        if dry_run:
            self.stdout.write(f"Would update {users.count()} users")
            return

        updated = 0
        for user in users:
            user.normalized_name = user.name.lower()
            user.save()
            updated += 1

        return updated

Deploy and run:
Deploy the PR with model and data migration
Run manually in production: python manage.py backfill_normalized_names
Verify the data was migrated correctly

Phase 2: Add Django Migration Top-Up

Goal: Automatically migrate any new records created after Phase 1.

Keep the field as optional (no model changes needed):

# models.py - unchanged from Phase 1
class User(models.Model):
    name = models.CharField(max_length=100)
    normalized_name = models.CharField(max_length=100, blank=True)  # Still optional

Create a Django migration with the data migration:

# migrations/0XXX_backfill_normalized_names_topup.py
from django.db import migrations
from apps.data_migrations.utils.migrations import RunDataMigration

class Migration(migrations.Migration):
    dependencies = [("users", "0XXX_previous_migration")]

    operations = [
        RunDataMigration("backfill_normalized_names", command_options={"force": True}),
    ]

Deploy:
The migration runs automatically during deployment
The data migration command processes any records created between Phase 1 and Phase 2
No constraint changes, so no risk of deploy failures

Phase 3: Make Field Required (Optional)

Goal: Optionally enforce the field constraint after all data is migrated.

Note: This phase is only needed if you want to make the field required. If the field can remain optional, you can stop after Phase 2.

Update the model to make the field required:

# models.py
class User(models.Model):
    name = models.CharField(max_length=100)
    normalized_name = models.CharField(max_length=100)  # Remove blank=True

Create a schema migration:
```
python manage.py makemigrations
```

This generates:

# migrations/0XXX_alter_user_normalized_name.py
from django.db import migrations

class Migration(migrations.Migration):
    dependencies = [("users", "0XXX_previous_migration")]

    operations = [
        migrations.AlterField(
            model_name="user",
            name="normalized_name",
            field=models.CharField(max_length=100),  # No longer blank=True
        ),
    ]

Deploy:
The constraint is applied to the field
All data should already be migrated from Phase 2

Why Three Phases?

Non-Blocking Deploys: Long-running data migrations can block deployments. Running manually in Phase 1 keeps deploys fast and allows you to monitor progress separately.

Deploy Safety: Phase 2 keeps the field optional during the automatic top-up migration. This prevents deployment failures from constraint violations if any unmigrated records exist.

Constraint Isolation: Phase 3 (optional) separates the constraint change from the data migration. If you need to make the field required, you can do so safely after confirming all data is migrated. If the field can remain optional, Phase 3 isn't necessary.

Performance: Run the initial backfill manually with monitoring. Large datasets can be processed in batches or during low-traffic periods.

Flexibility: Test the migration in production with the field as optional. If issues arise in Phase 2, you can fix data before optionally enforcing the constraint in Phase 3.

Zero Downtime: Application code continues working with the optional field through Phases 1 and 2. Phase 3 (if needed) only proceeds after verifying all data is migrated.

Alternative: Single-Phase for Simple Cases

For small datasets or non-critical fields, you can combine all three phases:

# migrations/0XXX_add_normalized_name.py
from django.db import migrations
from apps.data_migrations.utils.migrations import RunDataMigration

class Migration(migrations.Migration):
    dependencies = [("users", "0XXX_previous_migration")]

    operations = [
        migrations.AddField(
            model_name="user",
            name="normalized_name",
            field=models.CharField(max_length=100, blank=True),
        ),
        RunDataMigration("backfill_normalized_names"),
        migrations.AlterField(
            model_name="user",
            name="normalized_name",
            field=models.CharField(max_length=100),  # Now required
        ),
    ]

Use single-phase only when: - Dataset is small (< 10,000 records) - Migration is fast (< 30 seconds) - Field is non-critical - You have tested thoroughly in staging