Database
We use SQLAlchemy (and its related migration tool Alembic) to manage our PostgreSQL databases.
Commonly used commands
Upgrading local and/or production db
just upgrade-db
and just upgrade-prod-db
are used to run the migrations present in your local /alembic/versions
directory. Alembic tracks revisions by maintaining a linked list (with branching) and running one of those commands simply runs all the migrations between the targeted database's notion of head and the head present in the repo. Some other useful commands in this regard are poetry run alembic history
and poetry run alembic heads
. You can downgrade by running poetry run alembic downgrade -<number of steps>
e.g. downgrade -1
. More can be found in the Alembic docs. Any command can be run against prod by adding the --prod
flag...be careful.
Restoring your local database from a dump
just dump-and-restore-to-prod
does what it says on the tin. It dumps the production database to your local, then it drops your local schema and recreates it from scratch using the migrations present in /alembic/versions
and repopulates it from the dump. See also railbird restore-to-dump
for a more configurable version of the same.
Generating a revision
just revision <message>
assumes you're up to date with your Alembic revisions and have changes present in your local models that you want reflected in an new migration. Running this command creates a new revision with a name of the form <revision hash>_<message from the just revision command camel cased>
eg just revision "foo bar barfoo"
produces <hash>_foo_bar_barfoo.py
.
Running a backfill
railbird database run-script <path to your backfill>
will run the backfill of your choice against your local database. Can be run against prod by adding the --prod
flag...be careful.
A Quick Guide to Adding or Extending Models
The meat and potatoes of SQLAlchemy is its models; a common task is adding or extending the models.
Make your model (or make your changes)
Make the necessary modifications in railbird/datatypes/models
. For our example here we'll add a completely new model although the method here is identical for making minor changes like altering a field or adding fields to an existing model. We'll begin by adding a new file railbird/datatypes/models/<my_model>.py
.
import sqlalchemy as sa
from sqlalchemy.orm import Mapped, mapped_column, relationship
from .base import Base
from .user import UserModel
class MyModel(Base):
__tablename__ = "my_model"
id = mapped_column(sa.BIGINT, primary_key=True)
model_prop = mapped_column(
sa.VARCHAR(length=128),
nullable=True,
unique=True,
)
user_id: Mapped[int] = mapped_column(
sa.BIGINT,
sa.ForeignKey("user.id", onupdate="CASCADE", ondelete="CASCADE"),
nullable=True,
)
user: Mapped[UserModel] = relationship("UserModel", back_populates="my_model")
For good measure we would add the following relationship to the mapped UserModel:
class UserModel(Base):
__tablename__ = "user"
id = mapped_column(sa.BIGINT, primary_key=True)
... Model remains unchanged with the exception of
my_model: Mapped["MyModel"] = relationship("MyModel", back_populates="user")
my_model: Mapped[List[MyModel]] = relationship(
"MyModel", back_populates="user"
)
Create your migration
Now run just revision <message>
. That will produce the a migration like the following:
"""add my model
Revision ID: e68ab90a5f34
Revises: 3ba15c725934
Create Date: 2024-08-20 15:13:19.620015
"""
import sqlalchemy as sa
from alembic import op
revision = "e68ab90a5f34"
down_revision = "3ba15c725934"
branch_labels = None
depends_on = None
def upgrade() -> None:
op.create_table(
"my_model",
sa.Column("id", sa.BIGINT(), nullable=False),
sa.Column("model_prop", sa.VARCHAR(length=128), nullable=True),
sa.Column("user_id", sa.BIGINT(), nullable=True),
sa.ForeignKeyConstraint(
["user_id"], ["railbird.user.id"], onupdate="CASCADE", ondelete="CASCADE"
),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("model_prop"),
schema="railbird",
)
def downgrade() -> None:
op.drop_table("my_model", schema="railbird")
Read over the generated migration carefully and make sure it mirrors your model. Be sure to run isort, flake8, and black on this file as the migration is not conformant with Railbird's style rules and it will inevitably fail CI. Some caveats here:
- Alembic doesn't play well with adding or extending enums. This stack overflow post provides a pretty good primer to digging yourself out and you can pattern match existing enums (e.g. ProcessingStatusEnum in
models/video
). - As with any DB migration the expectation should be that the downward migration exactly undoes the up. Sometimes Alembic has difficulty with this (specifically with Enums). Just do the best you can there.
Test and deploy
If you alter an existing model that already has a mock in tests/conftest.py, be sure to extend the mock so that your field is handled.
You can test your code locally buy running just upgrade-db
. This will bring your db up to date with your current migrations. If for whatever reason you need to downgrade you can run poetry run alembic downgrade -1
to undo your most recent migration.
(Optional) Backfill
Lets assume you also want to run a backfill; you have some change you want to retroactively apply to existing rows. For example lets write a backfill that writes MyModel's id as a string to MyModel.model_prop. We'll make a new file backfill/<todays_date>_<backfill_name>
so: backfill/09-20-24_my_model_populate_model_prop.py
.
import structlog
from dependency_injector.wiring import Provide, inject
from sqlalchemy import select, update
from sqlalchemy.orm import Session
from railbird.containers.main import RBDeps
from railbird.datatypes import models
logger = structlog.get_logger(__name__)
@inject
def main(session: Session = Provide[RBDeps.sync_session]):
my_models = list(
session.execute(
select(models.MyModel).where(
models.MyModel.model_prop.is_(None)
)
)
.scalars()
.unique()
)
for my_model in my_models:
update_stmt = (
update(models.MyModel)
.where(models.MyModel.id == my_model.id)
.values(model_prop=str(my_model.id))
)
session.execute(update_stmt)
session.commit()
if __name__ == "__main__":
main()
Depending on the density these backfills can take a while, so be prepared for that. We can run this backfill locally by running railbird database run-script <path to your directory>/backfill/2024-08-19-mark_error_annotations.py
and remotely by running railbird database --prod run-script <your backfill>
.