Skip to content

Generate Marketplace Synthetic Data in Python

Online marketplaces have a distinctive data shape: a small number of power sellers drive most of the volume, buyer ratings cluster high, and listing prices follow a long-tailed distribution. If you generate seller, buyer, and listing data independently you lose all of this structure — and your marketplace analytics tool will look wrong from the first dashboard query. Misata generates a fully wired marketplace dataset where seller volume follows a power-law, ratings are beta-distributed toward 4–5 stars, and every order references a real buyer and listing.

import misata

tables = misata.generate("A freelance marketplace with 500 sellers and 2000 buyers", rows=2000, seed=42)
print(list(tables.keys()))   # ['sellers', 'buyers', 'listings', 'orders']
print(tables["orders"][["amount", "status"]].describe())

What Misata generates

Four tables: sellers, buyers, listings (linked to sellers), and orders (linked to buyers and listings). Order completion rate is ~85% by default.

Tables and columns

Table Key columns
sellers seller_id, name, email, rating, total_sales, joined_at, country
buyers buyer_id, name, email, total_spent, joined_at
listings listing_id, seller_id, title, category, price, status, created_at
orders order_id, buyer_id, listing_id, amount, status, created_at, completed_at

Realistic distributions

  • Seller volume follows a power-law — top 10% of sellers account for ~60% of total_sales
  • Seller ratings beta-distributed (skewed toward 4–5 stars) — matching real platform rating inflation
  • Listing prices lognormal — affordable commodities plus premium service listings
  • Order completion rate ~85% — remaining 15% in pending, disputed, or cancelled states
  • completed_at is always after created_at for completed orders

Quick start

import misata

tables = misata.generate("A freelance marketplace with 500 sellers and 2000 buyers", rows=2000, seed=42)

# Power-law seller distribution
import numpy as np
total_sales = tables["sellers"]["total_sales"].sort_values(ascending=False)
top10_pct = total_sales.head(len(total_sales) // 10).sum() / total_sales.sum()
print(f"Top 10% sellers: {top10_pct:.0%} of total sales")

# Order completion rate
print(tables["orders"]["status"].value_counts(normalize=True))

Common use cases

  • Marketplace trust and safety — generate seller profiles with varied rating histories for testing fraud detection and account suspension workflows
  • Search ranking model training — use listings with price, category, and seller rating features to train relevance ranking models
  • Commission and fee calculation testing — validate fee structures against thousands of orders with varied amounts and statuses
  • Seller analytics dashboard development — build GMV, conversion rate, and listing performance reports on realistic power-law distributed seller data
  • Buyer recommendation systems — use orders history to prototype collaborative filtering before real purchase data exists
  • Dispute resolution workflow testing — generate orders in all status states including disputed for testing escalation logic

Advanced: GMV narrative curves

tables = misata.generate(
    "Marketplace with 1k sellers — Black Friday GMV spike, slow January, growing through the year",
    rows=5000,
    seed=42,
)

Advanced: locale-aware generation

# Latin American marketplace — BRL/MXN pricing, regional product categories
tables = misata.generate("Brazilian ecommerce marketplace with 500 sellers", rows=2000)

# Southeast Asian gig marketplace — SGD pricing, regional skills
tables = misata.generate("Freelance platform in Southeast Asia with 300 sellers", rows=1000)

Advanced: quality-guaranteed generation

tables = misata.generate(
    "Freelance marketplace with 500 sellers",
    min_quality_score=85,
    smart_correlations=True,
    rows=2000,
    seed=42,
)