Geospatial Realism
Misata generates location data that makes geographic sense. Coordinates cluster around real cities with natural scatter, postal codes follow correct format patterns, and city/country pairs are consistent.
Automatic detection
Add columns named lat, latitude, lng, longitude, postal_code, or zip_code and Misata infers the right generator automatically:
from misata.schema import SchemaConfig, Table, Column
schema = SchemaConfig(
name="Stores",
tables=[Table(name="stores", row_count=500)],
columns={"stores": [
Column(name="store_id", type="int", unique=True, distribution_params={"min": 1, "max": 501}),
Column(name="city", type="text", distribution_params={"text_type": "city"}),
Column(name="lat", type="float", distribution_params={"text_type": "latitude"}),
Column(name="lng", type="float", distribution_params={"text_type": "longitude"}),
Column(name="postal_code", type="text", distribution_params={"text_type": "postal_code"}),
]},
relationships=[],
)
How coordinates are generated
Misata has a built-in dataset of 60+ major cities across 20 countries, each with real centroid coordinates. For each row:
- A city is sampled (uniformly across the pool)
- A small Gaussian offset is added (~0.25° lat, ~0.35° lng — roughly 20–35 km scatter)
- Coordinates are rounded to 6 decimal places (~10 cm precision)
This means your data will produce realistic-looking clusters when plotted on a map — not random scattered points.
# Coordinates cluster around real cities:
# Tokyo: 35.6762, 139.6503 ± scatter
# London: 51.5074, -0.1278 ± scatter
# São Paulo: -23.5505, -46.6333 ± scatter
Postal codes
Misata generates postal codes using the prefix pattern for each city's country:
| Country | Format | Example |
|---|---|---|
| United States | {2-digit prefix}{3 digits} |
10472 |
| United Kingdom | {letter prefix}{digits} |
EC2847 |
| India | {3-digit prefix}{3 digits} |
400315 |
| Germany | {2-digit prefix}{3 digits} |
10412 |
| Japan | {3-digit prefix}{3 digits} |
100542 |
Explicit text_type usage
Column(name="latitude", type="float", distribution_params={"text_type": "latitude"})
Column(name="longitude", type="float", distribution_params={"text_type": "longitude"})
Column(name="postal_code", type="text", distribution_params={"text_type": "postal_code"})
Supported cities (sample)
New York, Los Angeles, Chicago, Houston, London, Manchester, Toronto, Vancouver, Berlin, Munich, Mumbai, Delhi, Bangalore, Sydney, Melbourne, Tokyo, Osaka, Paris, São Paulo, Singapore, Dubai, Seoul, Amsterdam, and 40+ more across North America, Europe, Asia, and Oceania.
Combining with other location columns
For fully consistent location rows, combine with city and country text types:
columns = [
Column(name="city", type="text", distribution_params={"text_type": "city"}),
Column(name="country", type="text", distribution_params={"text_type": "country"}),
Column(name="lat", type="float", distribution_params={"text_type": "latitude"}),
Column(name="lng", type="float", distribution_params={"text_type": "longitude"}),
Column(name="postal_code", type="text", distribution_params={"text_type": "postal_code"}),
]
Note: city/country and lat/lng are sampled independently — for strict geographic consistency within a row, use a custom generator or the correlation engine.