Skip to content

YAML Schema

Define your schema in a file, commit it to git, and reproduce the same dataset anywhere — no LLM required.

Scaffold

misata init                                     # writes misata.yaml template
misata init --story "A marketplace"             # parses a story into YAML
misata init --db postgresql://localhost/myapp   # introspects a real database

Schema structure

name: my-app
seed: 42

tables:
  users:
    rows: 1000
    columns:
      user_id:  { type: int, unique: true }
      email:    { type: text, text_type: email }
      plan:     { type: categorical, choices: [free, pro, enterprise] }
      mrr:      { type: float, min: 0, max: 2400, distribution: lognormal }
      signed_up: { type: date, start: "2022-01-01", end: "2024-12-31" }

  orders:
    rows: 5000
    columns:
      order_id: { type: int, unique: true }
      user_id:  { type: foreign_key }
      amount:   { type: float, min: 5.0, max: 500.0 }
      cost:     { type: float, min: 2.0, max: 200.0 }

relationships:
  - "users.user_id  orders.user_id"

constraints:
  - name: profit_margin
    table: orders
    type: inequality
    column_a: amount
    operator: ">"
    column_b: cost

Generate

import misata

schema = misata.load_yaml_schema("misata.yaml")
tables = misata.generate_from_schema(schema)

Or from the CLI — auto-detected if misata.yaml exists in the current directory:

misata generate
misata generate --output-dir data/

Round-trip

# Inspect a programmatically built schema, then save it
schema = misata.parse("A healthcare company with 500 patients")
misata.save_yaml_schema(schema, "healthcare.yaml")

Column types

type Description Key params
int Integer min, max, unique, distribution
float Floating point min, max, decimals, distribution
text String / semantic text_type (name, email, city, …)
categorical Enum / factor choices, probabilities, sampling
boolean True/False probability
date Date column start, end, format
foreign_key FK reference resolved from relationships