Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IFRS: prototype migration framework and object_size migration (GSI-1271) #85

Merged
merged 19 commits into from
Jan 22, 2025

Conversation

TheByronHimes
Copy link
Member

@TheByronHimes TheByronHimes commented Jan 20, 2025

Adds a very rough version of the migration framework which aims to minimize the amount of related code required to be maintained across services without being too restricting/prescriptive.

Step 1: The logic for every migration should subclass MigrationDefinition and, usually, Reversible as well:

class V2Migration(MigrationDefinition, Reversible):
    """Adds `object_size` and migrates the database to version 2"""

    version = 2

    async def apply(self):
        async with self.auto_finalize("file_metadata"):
            await self.migrate_docs_in_collection(
                coll_name=METADATA_COLLECTION,
                change_function=my_change_function,
                validation_model=FileMetadata,
                id_field="file_id",
            )

    async def unapply(self):
        ...

Step 2: Define the current database version and the migration map.

In this prototype, version 0 = "DB versioning is not set up" and version 1 = "DB versioning has been created", and our migrations begin with version 2. Does not have to be that way, that's just what I landed on for now.

from my_migrations import V2Migration, V3Migration, V4Migration  # etc.

MIGRATION_MAP = {2: V2Migration, 3: V3Migration, 4: V4Migration}
DB_VERSION = 4

Step 3: Instantiate the MigrationManager via async with, then call .migrate_or_wait():

MIGRATION_MAP = {2: V2Migration, 3: V3Migration, 4: V4Migration}
DB_VERSION = 4

async def run_db_migrations(*, config: Config, target_version: int = DB_VERSION):
    """Check if the database is up to date and attempt to migrate the data if not."""
    async with MigrationManager(
        config=config,
        target_version=target_version,
        migration_map=MIGRATION_MAP,
    ) as mm:
        await mm.migrate_or_wait()

Step 4: Perform the DB migrations/version check sometime before firing up the actual service code:

async def consume_events(run_forever: bool = True):
    """Run an event consumer listening to the specified topics."""
    config = Config()
    configure_logging(config=config)
    await run_db_migrations(config=config)

    async with prepare_outbox_subscriber(config=config) as outbox_subscriber:
        await outbox_subscriber.run(forever=run_forever)

Notes

The MigrationDefinition class defines some useful logic for common operations, such as "staging" migrated data (where we rename the collections to swap the new data with the old), dropping new collections, generating temp table prefixing, applying an update function to each doc in a collection with validation, and a context manager to automate staging/dropping/error case cleanup.

The MigrationManager handles higher-level responsibilities and is ignorant of migration implementation details. It cares about whether or not the database is set up for migrations, determining which migrations it needs to run to get from version X to version Y, and running said migrations. It handles the database "lock" document to ensure only one service instance ever performs migrations, as well as maintaining a record past migrations.

Solving the catch-22 of making sure only one instance initializes the lock & versioning collections is straightforward: the code inserts a lock document with a fixed ID. If the result is a DuplicateKeyError error, stop and enter the waiting cycle (another instance was faster), otherwise continue with the init & migration logic. The service instance that performs the initialization will already have the lock acquired after finishing setup and can roll right into executing any migrations.

Currently one test case added that covers the following:

  • Migration success
  • Migration unapply success
  • Migration unapply failure due to model mismatch
  • Migration idempotence (should be able to run migration check a bunch with no effect if already up to date)

Outstanding items planned or in progress for official framework:

  • batch processing (naive one-by-one doc updates right now)
  • index copying (need to understand this better)
  • (maybe) automatically pass config to instances of MigrationDefinition - only an inconvenience for testing though
  • Testing to cover all the edge cases along with unit testing to better catch problems with helper functions
  • Better metrics (e.g. # documents affected)
  • Logging/error handling is currently unpolished, needs improvement
  • Limits or something for retries? Not sure
  • Reversible subclass thing could probably be done differently
  • Better documentation
  • Whatever ideas and critiques come up in the meantime

@TheByronHimes TheByronHimes marked this pull request as ready for review January 20, 2025 13:25
@TheByronHimes TheByronHimes requested a review from Cito January 20, 2025 13:27
Copy link
Member

@Cito Cito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work and very readable description (this can later go into the documentation).

Made some suggestions to consider in this PR or later.

services/ifrs/src/ifrs/migration_logic/_manager.py Outdated Show resolved Hide resolved
services/ifrs/src/ifrs/migration_logic/_manager.py Outdated Show resolved Hide resolved
services/ifrs/src/ifrs/migration_logic/_manager.py Outdated Show resolved Hide resolved
services/ifrs/src/ifrs/migration_logic/_manager.py Outdated Show resolved Hide resolved
services/ifrs/src/ifrs/migration_logic/_manager.py Outdated Show resolved Hide resolved
services/ifrs/src/ifrs/migration_logic/_manager.py Outdated Show resolved Hide resolved
services/ifrs/src/ifrs/migration_logic/_manager.py Outdated Show resolved Hide resolved
services/ifrs/tests_ifrs/test_migrations.py Outdated Show resolved Hide resolved
services/ifrs/tests_ifrs/test_migrations.py Outdated Show resolved Hide resolved
services/ifrs/tests_ifrs/test_migrations.py Outdated Show resolved Hide resolved
services/ifrs/tests_ifrs/test_migrations.py Outdated Show resolved Hide resolved
services/ifrs/tests_ifrs/test_migrations.py Outdated Show resolved Hide resolved
@TheByronHimes TheByronHimes merged commit b70a914 into main Jan 22, 2025
13 checks passed
@TheByronHimes TheByronHimes deleted the feature/ifrs_db_versioning_GSI-1271 branch January 22, 2025 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants