Localisation
Misata generates country-accurate data automatically — names, salary distributions, national ID formats, currencies, postcodes, and company suffixes — from a geographic signal in your story.
Automatic detection
import misata
# Locale detected from story — no extra flag needed
tables = misata.generate("German SaaS company in Berlin with 2k enterprise customers")
# → de_DE names, salary ~ lognormal median €45k, 5-digit postcodes, GmbH/AG suffixes
tables = misata.generate("Brazilian fintech with R$ payments and CPF verification")
# → pt_BR names, salary median ~R$33.6k, national IDs in CPF format ###.###.###-##
tables = misata.generate("Indian startup in Bangalore with ₹ salary bands and Aadhaar KYC")
# → hi_IN names, salary median ~₹350k/yr, Aadhaar 12-digit national IDs
Explicit locale
# Override detected locale
tables = misata.generate("Ecommerce store with 10k orders", locale="ja_JP")
# Or via CLI
# misata generate --story "Ecommerce store" --locale ja_JP
15 built-in locales
| Locale | Country | Currency | Salary median | National ID |
|---|---|---|---|---|
en_US |
United States | USD / $ | $62 000 | SSN ###-##-#### |
en_GB |
United Kingdom | GBP / £ | £34 000 | NIN AA######A |
de_DE |
Germany | EUR / € | €45 000 | Steuer-IdNr |
fr_FR |
France | EUR / € | €38 000 | NIR |
pt_BR |
Brazil | BRL / R$ | R$33 600 | CPF ###.###.###-## |
es_ES |
Spain | EUR / € | €27 000 | NIE |
hi_IN |
India | INR / ₹ | ₹350 000 | Aadhaar ####-####-#### |
ja_JP |
Japan | JPY / ¥ | ¥4 400 000 | My Number |
zh_CN |
China | CNY / ¥ | ¥90 000 | Resident ID |
ar_SA |
Saudi Arabia | SAR | SAR 96 000 | National ID |
ko_KR |
South Korea | KRW / ₩ | ₩42 000 000 | RRN |
nl_NL |
Netherlands | EUR / € | €42 000 | BSN |
it_IT |
Italy | EUR / € | €29 000 | Codice Fiscale |
pl_PL |
Poland | PLN | PLN 72 000 | PESEL |
tr_TR |
Turkey | TRY | TRY 720 000 | TC Kimlik |
Salary data sourced from OECD, World Bank, ILO (2023–24).
Inspect a locale pack
pack = misata.get_locale_pack("de_DE")
print(pack.salary_median) # 45000
print(pack.currency_symbol) # €
print(pack.top_cities[:3]) # ['Berlin', 'Hamburg', 'Munich']
print(pack.company_suffixes) # ['GmbH', 'AG', 'UG', 'KG', 'e.K.']
print(pack.postcode_pattern) # \d{5}
print(pack.national_id_label) # Steuer-IdNr
Detect from a story
locale = misata.detect_locale("South Korean company in Seoul with KRW salaries")
# → "ko_KR"
locale = misata.detect_locale("A generic SaaS company")
# → "en_US" (default)
What locale affects
- Names — Faker locale pool (
de_DEFaker generates German names,ja_JPgenerates Japanese names) - Salary & age distributions — lognormal/normal priors from national statistics replace the en_US defaults
- Postcodes — pattern-generated to match the country format (e.g. 5 digits for DE,
SW1A 1AAformat for GB) - National IDs — pattern-generated to match country format (CPF, SSN, Aadhaar, etc.)
- Company suffixes — GmbH/AG for Germany, S.A./SARL for France, Ltd/PLC for UK
- Phone prefixes — country dialling code prepended
Asset-backed vocabulary takes priority
If you have ingested Kaggle vocabulary assets for name columns, those always win over locale-based Faker names. Locale is the fallback, not the override.