Skip to content

Synthetic Data for BI Demos

BI dashboards live or die by their data story. A chart that shows flat revenue or perfectly uniform distributions looks fake in a demo. Misata lets you specify the story — "revenue grows through the year, dips in September, peaks in December" — and generates rows that actually sum to those targets.

The problem with generic fake data in BI

Most synthetic data tools generate rows independently. If you ask for 6,000 sales rows in 2024, you get uniform random amounts spread evenly across all 12 months. That does not look like any real business.

Misata works top-down: define the aggregate shape, then generate rows that fill it.

Exact monthly targets

import misata

schema = misata.parse(
    "A SaaS company with 1000 users. "
    "MRR rises from $50k in January to $200k in December with a dip in September.",
    rows=1000,
)
tables = misata.generate_from_schema(schema)
subscriptions = tables["subscriptions"]

import pandas as pd
monthly = subscriptions.copy()
monthly["month"] = pd.to_datetime(monthly["start_date"]).dt.month
monthly_mrr = monthly.groupby("month")["mrr"].sum()

# Jan: $50,000  ✓
# Sep: dip      ✓
# Dec: $200,000 ✓
# All 12 months hit their targets to the cent

Demo output

Month   Target MRR    Actual MRR   Match
─────  ────────────  ────────────  ─────
Jan    $     50,000  $     50,000      ✓
Feb    $     68,182  $     68,182      ✓
Mar    $     86,364  $     86,364      ✓
Apr    $    104,545  $    104,545      ✓
May    $    122,727  $    122,727      ✓
Jun    $    140,909  $    140,909      ✓
Jul    $    159,091  $    159,091      ✓
Aug    $    177,273  $    177,273      ✓
Sep    $    100,000  $    100,000      ✓  ← dip
Oct    $    163,636  $    163,636      ✓
Nov    $    181,818  $    181,818      ✓
Dec    $    200,000  $    200,000      ✓

All 12 monthly targets hit exactly.

Realistic distributions (not uniform)

Real business data is never uniform. Misata applies domain-specific distributions automatically:

# Fintech — credit score matches real FICO statistics
tables = misata.generate("A fintech company with 2000 customers.", seed=42)
cs = tables["customers"]["credit_score"]

print(f"Mean: {cs.mean():.0f}")   # ~680–720 (real FICO range)
print(f"Std:  {cs.std():.0f}")    # ~70–90   (real FICO range)

# Transaction types follow Zipf's law (one type dominates naturally)
txn_types = tables["transactions"]["transaction_type"].value_counts(normalize=True)
# purchase    42%
# transfer    28%
# withdrawal  18%
# deposit     12%

Healthcare dashboard data

tables = misata.generate("A hospital with 500 patients and doctors.", seed=42)
patients = tables["patients"]

# Blood type distribution matches real ABO/Rh frequencies
bt = patients["blood_type"].value_counts(normalize=True).mul(100).round(1)
# O+   38%  (real: 38%)
# A+   34%  (real: 34%)
# B+    9%  (real:  9%)
# ...

# Age distribution: normal, centred on chronic-care population (mean ≈ 45)
print(patients["age"].mean())   # ≈ 44.7
print(patients["age"].std())    # ≈ 18.0

Ecommerce seasonal curve

schema = misata.parse(
    "An ecommerce store with 5000 customers and orders. "
    "Revenue grows from $100k in January to $300k in November "
    "then $350k in December.",
    rows=5000,
)
tables = misata.generate_from_schema(schema)

# Nov: $300,000 (Black Friday)  ✓
# Dec: $350,000 (Holiday peak)  ✓

Fraud rate calibration

tables = misata.generate(
    "A fintech company with 2000 customers and banking transactions.", seed=42
)
transactions = tables["transactions"]

fraud_rate = transactions["is_fraud"].mean() * 100
print(f"Fraud rate: {fraud_rate:.2f}%")  # 2.00% — calibrated, not random

Connecting to BI tools

Generated DataFrames can be written directly to any database your BI tool connects to:

from misata import seed_database

tables = misata.generate("A SaaS company with 5000 users.", seed=42)
seed_database(tables, "postgresql://user:pass@localhost/bi_demo", create=True)

Then point Tableau, Metabase, Looker, or Power BI at postgresql://localhost/bi_demo. The data has a coherent story, realistic distributions, and proper FK relationships — ready to demo without disclaimers.

Running the examples

pip install misata
python examples/saas_revenue_curve.py
python examples/fintech_fraud_detection.py
python examples/healthcare_multi_table.py
python examples/ecommerce_seasonal.py