Database

We use SQLAlchemy (and its related migration tool Alembic) to manage our PostgreSQL databases.

Commonly used commands

Upgrading local and/or production db

just upgrade-db and just upgrade-prod-db are used to run the migrations present in your local /alembic/versions directory. Alembic tracks revisions by maintaining a linked list (with branching) and running one of those commands simply runs all the migrations between the targeted database's notion of head and the head present in the repo. Some other useful commands in this regard are poetry run alembic history and poetry run alembic heads. You can downgrade by running poetry run alembic downgrade -<number of steps> e.g. downgrade -1. More can be found in the Alembic docs. Any command can be run against prod by adding the --prod flag...be careful.

Restoring your local database from a dump

just dump-and-restore-to-prod does what it says on the tin. It dumps the production database to your local, then it drops your local schema and recreates it from scratch using the migrations present in /alembic/versions and repopulates it from the dump. See also railbird restore-to-dump for a more configurable version of the same.

Generating a revision

just revision <message> assumes you're up to date with your Alembic revisions and have changes present in your local models that you want reflected in an new migration. Running this command creates a new revision with a name of the form <revision hash>_<message from the just revision command camel cased> eg just revision "foo bar barfoo" produces <hash>_foo_bar_barfoo.py.

Running a backfill

railbird database run-script <path to your backfill> will run the backfill of your choice against your local database. Can be run against prod by adding the --prod flag...be careful.

A Quick Guide to Adding or Extending Models

The meat and potatoes of SQLAlchemy is its models; a common task is adding or extending the models.

Make your model (or make your changes)

Make the necessary modifications in railbird/datatypes/models. For our example here we'll add a completely new model although the method here is identical for making minor changes like altering a field or adding fields to an existing model. We'll begin by adding a new file railbird/datatypes/models/<my_model>.py.

import sqlalchemy as sa
from sqlalchemy.orm import Mapped, mapped_column, relationship

from .base import Base
from .user import UserModel


class MyModel(Base):
  __tablename__ = "my_model"

  id = mapped_column(sa.BIGINT, primary_key=True)
  model_prop = mapped_column(
    sa.VARCHAR(length=128),
    nullable=True,
    unique=True,
  )

  user_id: Mapped[int] = mapped_column(
    sa.BIGINT,
    sa.ForeignKey("user.id", onupdate="CASCADE", ondelete="CASCADE"),
    nullable=True,
  )

  user: Mapped[UserModel] = relationship("UserModel", back_populates="my_model")

For good measure we would add the following relationship to the mapped UserModel:

class UserModel(Base):
  __tablename__ = "user"

  id = mapped_column(sa.BIGINT, primary_key=True)

... Model remains unchanged with the exception of

my_model: Mapped["MyModel"] = relationship("MyModel", back_populates="user")

my_model: Mapped[List[MyModel]] = relationship(
    "MyModel", back_populates="user"
)

Create your migration

Now run just revision <message>. That will produce the a migration like the following:

"""add my model

Revision ID: e68ab90a5f34
Revises: 3ba15c725934
Create Date: 2024-08-20 15:13:19.620015

"""

import sqlalchemy as sa

from alembic import op

revision = "e68ab90a5f34"
down_revision = "3ba15c725934"
branch_labels = None
depends_on = None


def upgrade() -> None:
  op.create_table(
    "my_model",
    sa.Column("id", sa.BIGINT(), nullable=False),
    sa.Column("model_prop", sa.VARCHAR(length=128), nullable=True),
    sa.Column("user_id", sa.BIGINT(), nullable=True),
    sa.ForeignKeyConstraint(
      ["user_id"], ["railbird.user.id"], onupdate="CASCADE", ondelete="CASCADE"
    ),
    sa.PrimaryKeyConstraint("id"),
    sa.UniqueConstraint("model_prop"),
    schema="railbird",
  )


def downgrade() -> None:
  op.drop_table("my_model", schema="railbird")

Read over the generated migration carefully and make sure it mirrors your model. Be sure to run isort, flake8, and black on this file as the migration is not conformant with Railbird's style rules and it will inevitably fail CI. Some caveats here:

Alembic doesn't play well with adding or extending enums. This stack overflow post provides a pretty good primer to digging yourself out and you can pattern match existing enums (e.g. ProcessingStatusEnum in models/video).
As with any DB migration the expectation should be that the downward migration exactly undoes the up. Sometimes Alembic has difficulty with this (specifically with Enums). Just do the best you can there.

Test and deploy

If you alter an existing model that already has a mock in tests/conftest.py, be sure to extend the mock so that your field is handled.

You can test your code locally buy running just upgrade-db. This will bring your db up to date with your current migrations. If for whatever reason you need to downgrade you can run poetry run alembic downgrade -1 to undo your most recent migration.

(Optional) Backfill

Lets assume you also want to run a backfill; you have some change you want to retroactively apply to existing rows. For example lets write a backfill that writes MyModel's id as a string to MyModel.model_prop. We'll make a new file backfill/<todays_date>_<backfill_name> so: backfill/09-20-24_my_model_populate_model_prop.py.

import structlog
from dependency_injector.wiring import Provide, inject
from sqlalchemy import select, update
from sqlalchemy.orm import Session

from railbird.containers.main import RBDeps
from railbird.datatypes import models

logger = structlog.get_logger(__name__)

@inject
def main(session: Session = Provide[RBDeps.sync_session]):
  my_models = list(
    session.execute(
      select(models.MyModel).where(
        models.MyModel.model_prop.is_(None)
      )
    )
    .scalars()
    .unique()
  )

  for my_model in my_models:
    update_stmt = (
      update(models.MyModel)
      .where(models.MyModel.id == my_model.id)
      .values(model_prop=str(my_model.id))
    )
    session.execute(update_stmt)
  session.commit()


if __name__ == "__main__":
  main()

Depending on the density these backfills can take a while, so be prepared for that. We can run this backfill locally by running railbird database run-script <path to your directory>/backfill/2024-08-19-mark_error_annotations.py and remotely by running railbird database --prod run-script <your backfill>.