Skip to content

Localisation

Misata generates country-accurate data automatically — names, salary distributions, national ID formats, currencies, postcodes, and company suffixes — from a geographic signal in your story.

Automatic detection

import misata

# Locale detected from story — no extra flag needed
tables = misata.generate("German SaaS company in Berlin with 2k enterprise customers")
# → de_DE names, salary ~ lognormal median €45k, 5-digit postcodes, GmbH/AG suffixes

tables = misata.generate("Brazilian fintech with R$ payments and CPF verification")
# → pt_BR names, salary median ~R$33.6k, national IDs in CPF format ###.###.###-##

tables = misata.generate("Indian startup in Bangalore with ₹ salary bands and Aadhaar KYC")
# → hi_IN names, salary median ~₹350k/yr, Aadhaar 12-digit national IDs

Explicit locale

# Override detected locale
tables = misata.generate("Ecommerce store with 10k orders", locale="ja_JP")

# Or via CLI
# misata generate --story "Ecommerce store" --locale ja_JP

15 built-in locales

Locale Country Currency Salary median National ID
en_US United States USD / $ $62 000 SSN ###-##-####
en_GB United Kingdom GBP / £ £34 000 NIN AA######A
de_DE Germany EUR / € €45 000 Steuer-IdNr
fr_FR France EUR / € €38 000 NIR
pt_BR Brazil BRL / R$ R$33 600 CPF ###.###.###-##
es_ES Spain EUR / € €27 000 NIE
hi_IN India INR / ₹ ₹350 000 Aadhaar ####-####-####
ja_JP Japan JPY / ¥ ¥4 400 000 My Number
zh_CN China CNY / ¥ ¥90 000 Resident ID
ar_SA Saudi Arabia SAR SAR 96 000 National ID
ko_KR South Korea KRW / ₩ ₩42 000 000 RRN
nl_NL Netherlands EUR / € €42 000 BSN
it_IT Italy EUR / € €29 000 Codice Fiscale
pl_PL Poland PLN PLN 72 000 PESEL
tr_TR Turkey TRY TRY 720 000 TC Kimlik

Salary data sourced from OECD, World Bank, ILO (2023–24).

Inspect a locale pack

pack = misata.get_locale_pack("de_DE")

print(pack.salary_median)        # 45000
print(pack.currency_symbol)      # €
print(pack.top_cities[:3])       # ['Berlin', 'Hamburg', 'Munich']
print(pack.company_suffixes)     # ['GmbH', 'AG', 'UG', 'KG', 'e.K.']
print(pack.postcode_pattern)     # \d{5}
print(pack.national_id_label)    # Steuer-IdNr

Detect from a story

locale = misata.detect_locale("South Korean company in Seoul with KRW salaries")
# → "ko_KR"

locale = misata.detect_locale("A generic SaaS company")
# → "en_US"  (default)

What locale affects

  • Names — Faker locale pool (de_DE Faker generates German names, ja_JP generates Japanese names)
  • Salary & age distributions — lognormal/normal priors from national statistics replace the en_US defaults
  • Postcodes — pattern-generated to match the country format (e.g. 5 digits for DE, SW1A 1AA format for GB)
  • National IDs — pattern-generated to match country format (CPF, SSN, Aadhaar, etc.)
  • Company suffixes — GmbH/AG for Germany, S.A./SARL for France, Ltd/PLC for UK
  • Phone prefixes — country dialling code prepended

Asset-backed vocabulary takes priority

If you have ingested Kaggle vocabulary assets for name columns, those always win over locale-based Faker names. Locale is the fallback, not the override.