Generate Marketplace Synthetic Data in Python
Online marketplaces have a distinctive data shape: a small number of power sellers drive most of the volume, buyer ratings cluster high, and listing prices follow a long-tailed distribution. If you generate seller, buyer, and listing data independently you lose all of this structure — and your marketplace analytics tool will look wrong from the first dashboard query. Misata generates a fully wired marketplace dataset where seller volume follows a power-law, ratings are beta-distributed toward 4–5 stars, and every order references a real buyer and listing.
import misata
tables = misata.generate("A freelance marketplace with 500 sellers and 2000 buyers", rows=2000, seed=42)
print(list(tables.keys())) # ['sellers', 'buyers', 'listings', 'orders']
print(tables["orders"][["amount", "status"]].describe())
What Misata generates
Four tables: sellers, buyers, listings (linked to sellers), and orders (linked to buyers and listings). Order completion rate is ~85% by default.
Tables and columns
| Table | Key columns |
|---|---|
sellers |
seller_id, name, email, rating, total_sales, joined_at, country |
buyers |
buyer_id, name, email, total_spent, joined_at |
listings |
listing_id, seller_id, title, category, price, status, created_at |
orders |
order_id, buyer_id, listing_id, amount, status, created_at, completed_at |
Realistic distributions
- Seller volume follows a power-law — top 10% of sellers account for ~60% of total_sales
- Seller ratings beta-distributed (skewed toward 4–5 stars) — matching real platform rating inflation
- Listing prices lognormal — affordable commodities plus premium service listings
- Order completion rate ~85% — remaining 15% in pending, disputed, or cancelled states
completed_atis always aftercreated_atfor completed orders
Quick start
import misata
tables = misata.generate("A freelance marketplace with 500 sellers and 2000 buyers", rows=2000, seed=42)
# Power-law seller distribution
import numpy as np
total_sales = tables["sellers"]["total_sales"].sort_values(ascending=False)
top10_pct = total_sales.head(len(total_sales) // 10).sum() / total_sales.sum()
print(f"Top 10% sellers: {top10_pct:.0%} of total sales")
# Order completion rate
print(tables["orders"]["status"].value_counts(normalize=True))
Common use cases
- Marketplace trust and safety — generate seller profiles with varied rating histories for testing fraud detection and account suspension workflows
- Search ranking model training — use
listingswith price, category, and seller rating features to train relevance ranking models - Commission and fee calculation testing — validate fee structures against thousands of orders with varied amounts and statuses
- Seller analytics dashboard development — build GMV, conversion rate, and listing performance reports on realistic power-law distributed seller data
- Buyer recommendation systems — use
ordershistory to prototype collaborative filtering before real purchase data exists - Dispute resolution workflow testing — generate orders in all status states including disputed for testing escalation logic
Advanced: GMV narrative curves
tables = misata.generate(
"Marketplace with 1k sellers — Black Friday GMV spike, slow January, growing through the year",
rows=5000,
seed=42,
)
Advanced: locale-aware generation
# Latin American marketplace — BRL/MXN pricing, regional product categories
tables = misata.generate("Brazilian ecommerce marketplace with 500 sellers", rows=2000)
# Southeast Asian gig marketplace — SGD pricing, regional skills
tables = misata.generate("Freelance platform in Southeast Asia with 300 sellers", rows=1000)
Advanced: quality-guaranteed generation
tables = misata.generate(
"Freelance marketplace with 500 sellers",
min_quality_score=85,
smart_correlations=True,
rows=2000,
seed=42,
)