Skip to content

LLM-Assisted Generation

Use a language model to produce a richer schema from an open-ended story — useful for complex or unusual domains where the rule-based parser falls short.

pip install "misata[llm]"

Supported providers

Provider Env var Notes
groq GROQ_API_KEY Fast, free tier available
openai OPENAI_API_KEY GPT-4o / GPT-4-turbo
anthropic ANTHROPIC_API_KEY Claude Sonnet / Opus
gemini GOOGLE_API_KEY Gemini Pro via OpenAI-compat endpoint
ollama Fully local, no API key

Usage

from misata import LLMSchemaGenerator

gen = LLMSchemaGenerator(provider="groq")
# gen = LLMSchemaGenerator(provider="anthropic")
# gen = LLMSchemaGenerator(provider="ollama", model="llama3")

schema = gen.generate_from_story(
    "A fraud detection dataset — 2% positive rate, FICO scores, "
    "transaction velocity features, device fingerprints"
)

import misata
tables = misata.generate_from_schema(schema)

When to use it

  • Your domain is niche and the story parser returns a generic schema
  • You need column-level semantics that require world knowledge (e.g. realistic medical codes)
  • You want to iterate on schema design in natural language before committing to YAML

LLM → YAML → version control

Generate once with the LLM, save the schema to YAML, then commit it so future runs are deterministic and free.

schema = gen.generate_from_story("A logistics company …")
misata.save_yaml_schema(schema, "logistics.yaml")