Generate CRM Synthetic Data in Python
CRM data is the backbone of every B2B sales tool — but it's also sensitive, and real pipeline data is never available early in development. Misata generates a four-table CRM dataset: companies, contacts, deals, and activities. Deal values are lognormally distributed with a median around $25k (the shape of real B2B pipelines), stage distribution reflects real funnel drop-off, and activity types mirror actual sales rep behavior — 40% email, 30% call, 20% meeting, 10% demo.
Every deal references a valid contact and company. Activities are tied to real deals and contacts. probability increases monotonically with stage advancement. close_date falls in the future for open-stage deals.
import misata
tables = misata.generate("A B2B SaaS CRM with 500 companies and a full sales pipeline", rows=500, seed=42)
print(list(tables.keys())) # ['companies', 'contacts', 'deals', 'activities']
print(tables["deals"].groupby("stage")["value"].describe())
What Misata generates
Four tables: companies → contacts and deals (both linked to companies), and activities (linked to deals and contacts). Full referential integrity throughout.
Tables and columns
| Table | Key columns |
|---|---|
companies |
company_id, name, industry, size, country, revenue, website |
contacts |
contact_id, company_id, name, email, phone, title, lead_source, created_at |
deals |
deal_id, contact_id, company_id, name, value, stage, probability, close_date, owner |
activities |
activity_id, deal_id, contact_id, type, subject, outcome, activity_date |
Realistic distributions
- Deal values lognormal ~$25k median — long tail of enterprise deals matching real B2B pipelines
- Pipeline stages: prospecting 35%, qualification 25%, proposal 20%, negotiation 12%, closed-won 8%
- Activity types: email 40%, call 30%, meeting 20%, demo 10% — matching real sales rep behavior
probabilityis correlated with stage — later stage = higher close probabilitylead_sourcevaries across organic, paid, referral, and outbound channels
Quick start
import misata
tables = misata.generate(
"B2B SaaS company with 500 accounts and a 6-month sales pipeline",
rows=500,
seed=42,
)
# Pipeline value by stage
print(tables["deals"].groupby("stage")["value"].agg(["count", "sum", "mean"]))
# Activity volume by type
print(tables["activities"]["type"].value_counts())
# Win rate
closed = tables["deals"][tables["deals"]["stage"] == "closed-won"]
print(f"Win rate: {len(closed)/len(tables['deals']):.1%}")
Common use cases
- CRM platform demos — populate a demo environment with realistic accounts, contacts, and pipeline data so prospects see a lived-in product
- Lead scoring model training — generate thousands of contacts with source, title, and deal outcomes to train and evaluate propensity models
- Revenue forecasting prototypes — build weighted pipeline models against deals with stage-accurate probability values before connecting real data
- CRM migration testing — validate ETL scripts and field mappings against a full relational dataset before touching production records
- Sales analytics dashboards — build conversion rate, activity cadence, and pipeline velocity reports on realistic CRM data
- Integration QA — test webhooks, sync jobs, and API integrations against realistic email formats, phone numbers, and company sizes
Advanced: Q4 close push narrative
tables = misata.generate(
"SaaS company with aggressive Q4 close push — deal activity spikes in October-November, "
"strong closed-won in December",
rows=1000,
seed=42,
)
Advanced: locale-aware generation
# European B2B — German, French, Spanish company names and contacts
tables = misata.generate("European B2B software company with accounts in Germany and France", rows=500)
# US enterprise — US company names, enterprise deal sizes
tables = misata.generate("US enterprise software company with Fortune 500 accounts", rows=300)
Advanced: quality-guaranteed generation
tables = misata.generate(
"B2B SaaS CRM with 500 accounts",
min_quality_score=85,
smart_correlations=True, # auto-adds company_revenue↔deal_value correlation
rows=500,
seed=42,
)