Database Seeding in Python
Misata generates synthetic relational data and loads it directly into a database. No migration scripts. No manual INSERT loops. No orphan rows.
Quick start
import misata
from misata import seed_database
# Generate + seed in two lines
tables = misata.generate("A SaaS company with 1000 users.", seed=42)
report = seed_database(tables, "sqlite:///./dev.db", create=True)
print(report.total_rows) # 6,000+
print(report.table_rows) # {"users": 1000, "subscriptions": 3000, ...}
Supported databases
Misata uses SQLAlchemy under the hood — any SQLAlchemy-compatible database works:
# SQLite (local dev)
seed_database(tables, "sqlite:///./local.db", create=True)
# PostgreSQL
seed_database(tables, "postgresql://user:pass@localhost/mydb", create=True)
# MySQL / MariaDB
seed_database(tables, "mysql+pymysql://user:pass@localhost/mydb", create=True)
Truncate before seeding (CI / staging)
report = seed_database(
tables,
"postgresql://user:pass@staging-db/app",
create=True, # CREATE TABLE IF NOT EXISTS
truncate=True, # TRUNCATE before INSERT
)
Use truncate=True in CI pipelines to reset the database to a known state before
each test run.
Seed from a story (CLI)
misata generate \
--story "A SaaS company with 1000 users" \
--rows 1000 \
--db-url sqlite:///./dev.db \
--db-create \
--db-truncate
Seed into existing SQLAlchemy models
If your project already uses SQLAlchemy ORM models, Misata can introspect them and seed against the existing schema:
from misata import seed_from_sqlalchemy_models
from myapp.models import Base, engine
report = seed_from_sqlalchemy_models(
Base,
engine,
story="A SaaS company with 500 users",
truncate=True,
)
print(report.total_rows)
Introspect an existing database
Misata can read a live database and generate a matching SchemaConfig, then generate data that fits your existing table structure:
from misata import schema_from_db, generate_from_schema
schema = schema_from_db("postgresql://user:pass@localhost/mydb")
print(schema.summary()) # shows your real table structure
# Generate data that matches your actual schema
tables = generate_from_schema(schema)
Why Misata works well for seeding
Database seeding breaks in predictable ways:
| Problem | What usually happens | Misata |
|---|---|---|
| FK violations | Child rows reference missing parents | Tables generated in dependency order; FK values sampled from valid parent pool |
| Flat distributions | All rows look the same | Domain priors: log-normal amounts, Zipf categories, real demographic frequencies |
| Scale mismatch | Child table has wrong number of rows | Row counts planned proportionally per relationship |
| Repeatability | Tests produce different data each run | seed=42 makes generation fully deterministic |
Batch size control
For large databases, control memory usage with batch_size:
report = seed_database(
tables,
"postgresql://user:pass@localhost/prod_clone",
batch_size=10_000, # INSERTs in batches of 10k rows
create=True,
truncate=True,
)
Example: seed a test database in pytest
# conftest.py
import pytest
import misata
from misata import seed_database
from sqlalchemy import create_engine
@pytest.fixture(scope="session")
def db_engine():
engine = create_engine("sqlite:///./test.db")
return engine
@pytest.fixture(scope="session", autouse=True)
def seed_test_db(db_engine):
tables = misata.generate("A SaaS company with 100 users.", seed=42)
seed_database(tables, db_engine, create=True, truncate=True)
Every test run starts from a fresh, realistic seed. seed=42 means the data is
identical across runs, so test assertions stay stable.