Czech IPA Transcription - Help
← Back to Czech transcription ↑ All Languages
Table of Contents
- Getting Started
- Quick Reference (Glossary)
- How to Read IPA Symbols
- Czech Pronunciation Guide
- Orthography to IPA Mapping
- For Developers
About This Tool
This Czech transcription app uses a modified version of the Wiktionary Czech Pronunciation Module running inside the browser via Wasmoon (Lua 5.4 WebAssembly). It converts standard Czech orthography into the International Phonetic Alphabet (IPA).
The system uses a comprehensive Wiktionary data dump as a lexicon to first retrieve phonemic transcriptions from the dictionary. When a word is not found in the lexicon, it falls back to generating transcriptions using the pronunciation module's rule-based approach. This combined strategy provides both high accuracy for common words and broad coverage through rule-based generation.
Dialects & Limitations
- Supported: Standard Czech (spisovná čeština) pronunciation, based on the Bohemian standard.
- Not supported: Regional variants (Moravian, Silesian, Common Czech / obecná čeština reductions like ej for é).
- Edge cases: Very recent loanwords, extremely rare compound words, and certain proper names may not be found in the lexicon. The rule-based fallback handles most of these, but some foreign words may produce approximate results.
Phonemic vs Phonetic
- Phonemic (with dictionary support): A simplified, broad transcription that shows only the sounds that are essential for distinguishing meaning (phonemes). It represents the abstract sound system of the language. Uses Wiktionary lexicon when available, falls back to rules.
Note: Unlike German, the Czech module currently supports only phonemic transcription. No phonetic (narrow) transcription with allophonic detail is available.
Quick Reference (Glossary)
- Phoneme
- The smallest unit of sound in a language that distinguishes meaning (e.g., /p/ vs. /b/ in Czech pas vs. bas).
- Allophone
- A variant pronunciation of a phoneme that doesn't change meaning (e.g., aspirated [pʰ] vs. unaspirated [p]).
- Affricate
- A complex consonant beginning as a stop and releasing as a fricative with same place of articulation, e.g., /t͡s/ (c), /t͡ʃ/ (č), /d͡z/ (dz), /d͡ʒ/ (dž).
- Voicing Assimilation
- A phonological process where a consonant changes its voicing to match the following consonant. In Czech, this is regressive: the voicing of the last obstruent in a cluster determines the voicing of the entire cluster. E.g., odpad /ˈotpat/ — the d devoices before p.
- Final Devoicing
- The process where voiced obstruents become voiceless at the end of a word, e.g., led /lɛt/, mříž /mr̝iːʃ/.
- Raised Alveolar Trill (Ř)
- A unique Czech sound /r̝/ — an alveolar trill with simultaneous frication. It has a voiceless variant /r̝̊/ that appears adjacent to voiceless consonants. Written as ř in Czech orthography.
- Syllabic Consonant
- A consonant acting as a syllable nucleus (without a vowel), marked with subscript mark /n̩/, /r̩/, /l̩/, /m̩/. E.g., prsten /ˈpr̩stɛn/.
- Diphthong
- A gliding vowel sound where the tongue changes position during articulation. Czech has three: /ou̯/, /au̯/, /ɛu̯/.
- Palatalization
- The modification of consonant articulation by raising the tongue towards the hard palate. In Czech, consonants d, t, n are palatalized before ě, i, and í.
- Glottal Stop
- A sound made by closing the glottis, represented as ʔ. Inserted in Czech between the prepositions v/z and a following word-initial vowel.
- Sonorant
- A consonant produced with continuous airflow (m, n, ɲ, r, l, j). Sonorants do not participate in voicing assimilation.
- Obstruent
- A consonant produced by obstructing airflow (stops, fricatives, affricates). Obstruents are subject to voicing assimilation in Czech.
- Syllable Boundary
- A marker (.) showing where syllables divide, affecting pronunciation and stress assignment.
- Geminate
- A doubled consonant sound. Czech simplifies geminates in pronunciation: oddych has /d/, not /dd/.
- Tie Bar
- The combining tie bar (◌͡◌) used in IPA to indicate affricates, showing that two segments form a single phonological unit, e.g., /t͡s/, /t͡ʃ/.
- Length Mark
- The mark ː in IPA indicating a long vowel, e.g., /aː/, /iː/, /uː/ vs. short /a/, /ɪ/, /u/.
- Nonsyllabic Mark
- The inverted breve below ̯ indicating that a vowel in a diphthong is not the syllable nucleus, e.g., /ou̯/, /au̯/.
- Nasal Assimilation
- When n precedes a velar stop (k, g), it assimilates to the velar nasal /ŋ/, e.g., banka /ˈbaŋka/.
- Onset
- The initial consonant(s) of a syllable (e.g., "p" in "pat"). Czech allows complex onsets like /pr/, /str/, /sk/.
- Coda
- The final consonant(s) of a syllable (e.g., "t" in "pat"). Czech codas are subject to final devoicing.
How to Read IPA Symbols
This table helps you understand the IPA symbols used in Czech transcriptions. For each symbol, we provide approximate English equivalents where possible.
Vowel Symbols
Czech has 5 short vowels, 5 long vowels, and 3 diphthongs. Vowel length is phonemic — it distinguishes meaning.
| IPA | Example | English Approximation | Notes |
|---|---|---|---|
| /a/ | pas | "father" (short) | Short open central vowel |
| /aː/ | pás | "father" (long) | Long, written as á |
| /ɛ/ | den | "bed" | Short open-mid front vowel |
| /ɛː/ | réva | "bear" (without r) | Long, written as é |
| /ɪ/ | byl | "sit" | Short, written as i or y |
| /iː/ | bílý | "see" (without glide) | Long, written as í or ý |
| /o/ | rok | "or" (short) | Short close-mid back rounded |
| /oː/ | móda | "go" (without glide) | Long, written as ó |
| /u/ | ruka | "put" | Short close back rounded |
| /uː/ | růst | "food" (without glide) | Long, written as ů or ú |
Consonant Symbols
| IPA | Example | English Approximation | Notes |
|---|---|---|---|
| /r̝/ | řeka | No English equivalent | Raised alveolar trill — unique Czech sound, trill + frication |
| /r̝̊/ | příklad (after p) | No English equivalent | Voiceless raised trill, appears next to voiceless consonants |
| /r/ | rok | Spanish "r" (rolled) | Alveolar trill |
| /c/ | ťam | No English equivalent | Voiceless palatal stop, written as ť |
| /ɟ/ | ďábel | "dune" (softer) | Voiced palatal stop, written as ď |
| /ɲ/ | ňadra | "canyon" (first n) | Palatal nasal, written as ň |
| /ʃ/ | šest | "shoe" | Voiceless postalveolar fricative |
| /ʒ/ | život | "measure" | Voiced postalveolar fricative |
| /t͡s/ | cena | "cats" | Voiceless alveolar affricate, written as c |
| /t͡ʃ/ | čas | "church" | Voiceless postalveolar affricate, written as č |
| /d͡z/ | odzbrojit | "adze" | Voiced alveolar affricate, rare |
| /d͡ʒ/ | džbán | "judge" | Voiced postalveolar affricate, rare |
| /x/ | chata | Scottish "loch" | Voiceless velar fricative, written as ch |
| /ɦ/ | hora | "behind" (breathy) | Voiced glottal fricative, written as h |
| /ŋ/ | banka | "sing" | Velar nasal (allophonic, before k/g) |
| /ʔ/ | v autě | Cockney "bottle" | Glottal stop, inserted between preposition v/z and a following vowel |
Diacritical Marks
| Symbol | Name | Meaning | Example |
|---|---|---|---|
| ˈ | Primary stress | Main emphasis in word (always on first syllable) | /ˈpr̩stɛn/ (ring) |
| ː | Length mark | Vowel is long | /aː/ vs /a/ |
| ̩ | Syllabic | Consonant is syllable nucleus | /r̩/ in prsten |
| ̯ | Non-syllabic | Glide (part of diphthong) | /ou̯/, /au̯/ |
| ͡ | Tie bar | Affricate (single unit) | /t͡s/, /t͡ʃ/ |
| ̝ | Raised | Trill with frication (ř sound) | /r̝/, voiceless /r̝̊/ |
| ̥ | Voiceless | Normally voiced sound devoiced | /r̝̊/ next to voiceless consonants |
Interactive Features
- Click words to cycle variants: Some words have multiple valid pronunciations (e.g., different pronunciations listed in the Wiktionary lexicon). Click any transcribed word to see alternative IPA forms if available.
- Audio playback: Click the speaker icon next to any word or line to hear text-to-speech pronunciation using the Czech (cs-CZ) voice (requires browser TTS support).
- Export results: Use PDF or CSV buttons to save transcriptions in your preferred format.
Multiple Pronunciation Variants
For some Czech words, the system produces multiple valid transcriptions:
- Lexicon entries with variants: Some words in the Wiktionary data dump have more than one listed pronunciation (e.g., words where both a colloquial and standard form exist).
- Lexicon vs. generated: Dictionary lookup and rule-based generation may produce slightly different results. The lexicon form is preferred when available.
When multiple variants exist, click the word to cycle through them. The currently selected variant will be used for PDF/CSV export.
Czech Pronunciation Guide
This guide explains the fundamental rules of Czech pronunciation. Czech has a highly regular spelling-to-sound correspondence — one of the most phonetic languages in Europe. Once you learn the rules, you can reliably predict pronunciation from spelling.
Vowel Length & Quality
Czech has 5 vowel qualities, each with a short and long version. Vowel length is phonemic — it distinguishes meaning.
Short Vowels
- a /a/ — open central, like "father": pas /pas/
- e /ɛ/ — open-mid front, like "bed": den /dɛn/
- i/y /ɪ/ — near-close front, like "sit": byl /bɪl/
- o /o/ — close-mid back rounded, like "or": rok /rok/
- u /u/ — close back rounded, like "put": ruka /ruka/
Long Vowels
Long vowels are written with an acute accent (á, é, í, ó, ú) or a ring (ů). They are approximately twice as long as short vowels and are distinct phonemes:
- á /aː/ — pás (belt) vs. pas (passport)
- é /ɛː/ — rare, mostly in loanwords: réva (vine)
- í/ý /iː/ — bílý (white) vs. byl (was)
- ó /oː/ — rare, mostly in loanwords: móda (fashion)
- ú/ů /uː/ — růst (growth) vs. ruský (Russian)
Minimal Pairs: When Length Changes Meaning
| Short | IPA | Meaning | Long | IPA | Meaning |
|---|---|---|---|---|---|
| pas | /pas/ | passport | pás | /paːs/ | belt |
| dal | /dal/ | gave | dál | /daːl/ | farther |
| dráha | /ˈdraːɦa/ | track | drahá | /draɦaː/ | expensive (fem.) |
| být | /biːt/ | to be | byt | /bɪt/ | apartment |
Soft Consonants (ď, ť, ň) & ě
The letters ď, ť, and ň represent palatal consonants — articulated with the tongue raised toward the hard palate.
The Soft Consonants
- ď /ɟ/ — voiced palatal stop, like a "soft d": ďábel /ˈɟaːbɛl/
- ť /c/ — voiceless palatal stop, like a "soft t": ťukat /ˈcukat/
- ň /ɲ/ — palatal nasal, like "ny" in "canyon": kůň /kuːɲ/
The Letter ě
The letter ě is not a separate vowel — it signals that the preceding consonant is palatalized:
- dě, tě, ně — the consonant becomes palatal: děti /ˈɟɛcɪ/ (children), tělo /ˈcɛlo/ (body), něco /ˈɲɛtso/ (something)
- mě — special case, pronounced /mɲɛ/: město /ˈmɲɛsto/ (city), jmění /ˈjm̩ɲɛɲiː/ (property)
- bě, pě, vě — pronounced with /jɛ/: běh /bjɛx/ (run), věc /vjɛt͡s/ (thing)
The Ř Sound (Raised Alveolar Trill)
The letter ř represents a sound unique to Czech — a raised alveolar trill /r̝/, which is an alveolar trill produced with simultaneous frication (a "buzzing r").
Pronunciation
- Voiced ř /r̝/: Between vowels or next to voiced consonants. Try saying "r" and "zh" simultaneously: řeka /ˈr̝ɛka/ (river), moře /ˈmor̝ɛ/ (sea).
- Voiceless ř /r̝̊/: Next to voiceless consonants (p, t, k, f, s, š, ch, c, č). The voicing is removed: příklad /ˈpr̝̊iːklat/ (example), tři /tr̝̊ɪ/ (three).
When Does ř Devoice?
The rule is simple: ř becomes voiceless /r̝̊/ when it stands next to a voiceless consonant (p, t, k, f, s, š, ch, c, č). In all other positions, it stays voiced.
| Context | Result | Example |
|---|---|---|
| Before a voiceless consonant | /r̝̊/ | tři /tr̝̊ɪ/, přes /pr̝̊ɛs/ |
| After a voiceless consonant | /r̝̊/ | příklad /ˈpr̝̊iːklat/ |
| Between vowels or next to voiced consonants | /r̝/ | řeka /ˈr̝ɛka/, dřevo /ˈdr̝ɛvo/ |
CH and H Sounds
Czech distinguishes two sounds that are often confused by learners:
- h /ɦ/ — voiced glottal fricative. A breathy, voiced "h", like the English "h" but with vocal cord vibration: hora /ˈɦora/ (mountain), aha /ˈaɦa/
- ch /x/ — voiceless velar fricative, like Scottish "loch" or German "Bach": chata /ˈxata/ (cottage), smích /smiːx/ (laughter)
These two are separate phonemes in Czech and distinguish meaning: hrad /ɦrat/ (castle) vs. chlad /xlat/ (cold).
Voicing Assimilation
One of the most important Czech pronunciation rules: when two obstruents (stops, fricatives, affricates) stand next to each other, the first one changes its voicing to match the second one. This is called regressive assimilation.
Voiced–Voiceless Pairs
| Voiceless | Voiced |
|---|---|
| p | b |
| t | d |
| ť /c/ | ď /ɟ/ |
| k | ɡ |
| f | v |
| s | z |
| š /ʃ/ | ž /ʒ/ |
| ch /x/ | h /ɦ/ |
| c /t͡s/ | /d͡z/ |
| č /t͡ʃ/ | dž /d͡ʒ/ |
| ř̊ /r̝̊/ | ř /r̝/ |
How It Works
- Regressive direction: The last obstruent in a cluster determines the voicing of all preceding obstruents. E.g., odpad → the d assimilates to p: /ˈotpat/
- Sonorants do not participate: m, n, ɲ, r, l, j are transparent — they don't trigger and don't change: vzorek /ˈvor̝ɛk/
- v and ř do not trigger assimilation: The consonants v and ř are "weak" — they don't cause preceding consonants to change voicing.
Examples
| Spelling | Underlying | Assimilated IPA | Explanation |
|---|---|---|---|
| odpad | /odpad/ | /ˈotpat/ | d→t before p (assim.); final d→t (devoicing) |
| bez tebe | /bɛz tɛbɛ/ | /bɛs tɛbɛ/ | z→s before t (two words) |
| lézt | /lɛːzt/ | /lɛːst/ | z→s before t (final devoicing) |
| kde | /gdɛ/ | /gdɛ/ | No change — d is voiced, so k→ɡ |
Final Devoicing
At the end of a word (before a word boundary), voiced obstruents become voiceless. This happens after voicing assimilation.
| Letter | Word-final IPA | Intervocalic IPA | Example |
|---|---|---|---|
| b | /p/ | /b/ | led /lɛt/ (ice) |
| d | /t/ | /d/ | med /mɛt/ (honey) |
| ď | /c/ | /ɟ/ | loď /ˈloc/ (ship — ď devoices to /c/) |
| g | /k/ | /ɡ/ | hřib /ɦr̝ɪp/ (boletus) |
| z | /s/ | /z/ | les /lɛs/ (forest) |
| ž | /ʃ/ | /ʒ/ | myš /mɪʃ/ (mouse) |
| h | /x/ | /ɦ/ | sníh /sɲiːx/ (snow) |
Diphthongs (ou, au, eu)
Czech has exactly three diphthongs — gliding vowel sounds where the tongue moves during articulation. Unlike most European languages, Czech diphthongs are relatively rare and predictable.
| Spelling | IPA | Description | Examples |
|---|---|---|---|
| ou | /ou̯/ | Like "ow" in "show" | doufat /ˈdou̯fat/ (to hope), použít /ˈpou̯ʒiːt/ (to use) |
| au | /au̯/ | Like "ow" in "cow" | pauza /ˈpau̯za/ (pause), auto /ˈau̯to/ |
| eu | /ɛu̯/ | Like "ay" + "oo" quickly | euro /ˈɛu̯ro/, pseudonym /ˈpsɛu̯donɪm/ |
Note: The combination ou is the most common Czech diphthong. The combinations au and eu appear mostly in loanwords. Other vowel sequences (like ea, ia) are not diphthongs — they belong to separate syllables.
Syllabic Consonants (m, n, r, l)
In certain positions, the consonants m, n, r, and l can function as the nucleus of a syllable — replacing a vowel. This is marked with a syllabic mark ̩ below the consonant.
When Does This Happen?
- Between a consonant and an obstruent (stop, fricative, affricate): prsten /ˈpr̩stɛn/ (ring), vlk /vl̩k/ (wolf)
- Word-finally after an obstruent: sedm /ˈsɛdm̩/ (seven), podzim /ˈpodzim̩/ (autumn, in some analyses)
Common Examples
| Word | IPA | Syllabic Consonant |
|---|---|---|
| prsten | /ˈpr̩stɛn/ | /r̩/ |
| vlk | /vl̩k/ | /l̩/ |
| sedm | /ˈsɛdm̩/ | /m̩/ |
| osm | /ˈosm̩/ | /m̩/ |
| trh | /tr̩ɦ/ | /r̩/ |
Note: Syllabic ɲ and j do not occur — only m, n, r, and l can be syllabic.
Stress Patterns
Czech stress is fixed on the first syllable of each word. This is much simpler than languages like English or Russian, where stress is unpredictable.
Basic Rules
- Simple words: Stress on the first syllable: mléko /ˈmlɛːko/, počítač /ˈpot͡ʃiːtat͡ʃ/
- Compound words: Each component keeps its own first-syllable stress, but the primary stress falls on the first syllable of the whole word.
- Prefixes: The stress stays on the prefix: nej- in nejlepší /ˈnɛjlɛpʃiː/ (best).
No Stress Shift
Unlike Russian or Polish, Czech stress never moves to a different syllable. It is always on the first syllable regardless of word length, prefixes, or suffixes.
Y/Ý Merger with I/Í
In modern Czech, the letters y/ý and i/í represent the exact same sounds:
- y, ý = /ɪ/, /iː/
- i, í = /ɪ/, /iː/
The distinction between y and i is purely orthographic — it reflects historical pronunciation differences that no longer exist in standard Czech. The spelling rules are:
- After h, ch, b, m, p, f, v, s, z: Written as y/ý (mnemonic: "h ch b m p f v s z")
- After d, t, n: Written as i/í (these trigger palatalization), or y/ý in certain traditional spellings (e.g., byl, syn)
- After other consonants: Written as i/í
The y/i distinction is important for spelling but has no effect on pronunciation in standard Czech.
Comprehensive Spelling-to-IPA Mapping
This section provides a complete mapping of Czech spelling (orthography) to pronunciation (IPA). These tables reflect the actual rules used by our transcription engine.
1. Vowels (Monophthongs)
| Letter | IPA (Short) | Example (Short) | IPA (Long) | Example (Long) |
|---|---|---|---|---|
| a | /a/ | pas /pas/ | /aː/ | pás /paːs/ |
| e | /ɛ/ | den /dɛn/ | /ɛː/ | réva /ˈrɛːva/ |
| i / y | /ɪ/ | byl /bɪl/ | /iː/ | bílý /ˈbiːliː/ |
| o | /o/ | rok /rok/ | /oː/ | móda /ˈmoːda/ |
| u | /u/ | ruka /ˈruka/ | /uː/ | růst /ruːst/ |
2. Diphthongs
| Spelling | IPA | Context / Rule | Example |
|---|---|---|---|
| ou | /ou̯/ | Always | doufat /ˈdou̯fat/ |
| au | /au̯/ | Always | pauza /ˈpau̯za/ |
| eu | /ɛu̯/ | Always | euro /ˈɛu̯ro/ |
3. Simple Consonants (One-to-One)
| Letter | IPA | Example |
|---|---|---|
| b | /b/ | balit /ˈbalɪt/ |
| d | /d/ | dům /duːm/ |
| f | /f/ | fakt /fakt/ |
| g | /ɡ/ | garáž /ˈɡaraːʃ/ |
| l | /l/ | les /lɛs/ |
| m | /m/ | město /ˈmɲɛsto/ |
| n | /n/ | nos /nos/ |
| p | /p/ | pivo /ˈpɪvo/ |
| r | /r/ | rok /rok/ |
| s | /s/ | sen /sɛn/ |
| t | /t/ | tok /tok/ |
| v | /v/ | vidět /ˈvɪɟɛt/ |
| z | /z/ | zima /ˈzɪma/ |
| j | /j/ | jen /jɛn/ |
4. Complex Consonants & Digraphs
| Spelling | IPA | Context / Rule | Example |
|---|---|---|---|
| c | /t͡s/ | Always | cena /ˈt͡sɛna/ |
| č | /t͡ʃ/ | Always | čas /t͡ʃas/ |
| ch | /x/ | Always (single phoneme) | chata /ˈxata/ |
| š | /ʃ/ | Always | šest /ʃɛst/ |
| ž | /ʒ/ | Always | žena /ˈʒɛna/ |
| ď | /ɟ/ | Always | ďábel /ˈɟaːbɛl/ |
| ť | /c/ | Always | ťukat /ˈcukat/ |
| ň | /ɲ/ | Always | kůň /kuːɲ/ |
| ř | /r̝/ | Always (voiceless /r̝̊/ near voiceless cons.) | řeka /ˈr̝ɛka/, tři /tr̝̊ɪ/ |
| h | /ɦ/ | Always | hora /ˈɦora/ |
5. Context-Dependent Rules
| Spelling | IPA | Condition | Example |
|---|---|---|---|
| ě | /jɛ/ | After b, p, m, v, f | věc /vjɛt͡s/ |
| /ɲɛ/ or palatalization | After d, t, n (palatalizes preceding consonant) | děti /ˈɟɛcɪ/, město /ˈmɲɛsto/ | |
| w | /v/ | Always | Wagner /ˈvagnɛr/ |
| Only in loanwords and proper names | |||
| x | /ks/ | Always | taxi /ˈtaksɪ/ |
| Only in loanwords | |||
| q | /k/ | Always | quasi /ˈkuasɪ/ |
| Only in loanwords (very rare) | |||
6. Special Rules
| Rule | Input | Output | Example |
|---|---|---|---|
| ex- before vowel | exaktní | /ˈɛɡzaktɲiː/ | ex → egz; ni → ɲi (palatalization) |
| exh- | exhibice | /ˈɛɡzɦɪbɪt͡sɛ/ | exh → egzh |
| i/y + vowel (word-initial) | iámbický | /ˈjaːmbɪt͡skiː/ | i → j before vowel |
| i/y + vowel (mid-word) | piáno | /ˈpɪjaːno/ | Interpolated /j/ |
| v/z before consonant | v Praze | /ˈfprazɛ/ | v→f before p; v elides; z stays voiced |
| v/z before vowel | v autě | /ˈfʔau̯cɛ/ | v→f before ʔ; glottal stop inserted; tě→cɛ |
| n before k/g | banka | /ˈbaŋka/ | n → ŋ (nasal assimilation) |
Implementation Details (for Developers)
The Czech transcription engine is implemented as a Lua module (cs-pron_wasm.lua), a modified
version of the Wiktionary Czech Pronunciation Module. It runs in the browser via
Wasmoon (Lua 5.4 WebAssembly).
Processing Pipeline
Text goes through the following stages in order:
1. Preprocessing
- Text is lowercased.
- Commas and dashes are converted to IPA foot boundaries |.
- Punctuation (?, !) in the middle of sentences is treated as foot boundaries.
- Spaces are normalized; word boundaries
#and foot boundaries##are inserted. - Hyphens are converted to spaces (treating hyphenated words as separate words).
- Doubled ns (nn) are simplified to single n.
2. Palatalization
Before ě, i, and í, the consonants d, t, n are converted to their palatal equivalents using Unicode combining caron:
| Input | Output | Example |
|---|---|---|
| dě, tě, ně | ď+e, ť+e, ň+e | děti → ďeti → /ˈɟɛcɪ/ |
| di, ti, ni (also í) | ďi, ťi, ňi | titul → ťitul → /ˈcɪtul/ |
| mě | m+ň+e | město → mňesto → /ˈmɲɛsto/ |
3. Special Prefixes
| Rule | Input | Output |
|---|---|---|
| ex- before vowel | exaktní | egzaktní → /ˈɛɡzaktɲiː/ |
| exh- | exhibice | egzhibice → /ˈɛɡzɦɪbɪt͡sɛ/ |
| i/y + vowel (word-initial) | iámbický | jámbický → /ˈjaːmbɪt͡skiː/ |
| i/y + vowel (mid-word) | piáno | piáno → pi+j+áno → /ˈpɪjaːno/ |
4. Character Substitution
Each Czech letter (or digraph) is replaced with its IPA equivalent using a lookup table. This is the core orthography-to-phoneme conversion:
| Input | IPA | Input | IPA |
|---|---|---|---|
| á | aː | c | t͡s |
| č | t͡ʃ | ď | ɟ |
| e | ɛ | é | ɛː |
| ě | jɛ | g | ɡ |
| h | ɦ | i | ɪ |
| í | iː | ň | ɲ |
| ó | oː | q | k |
| ř | r̝ | š | ʃ |
| ť | c | ú | uː |
| ů | uː | w | v |
| x | ks | y | ɪ |
| ý | iː | ž | ʒ |
5. Multiple-to-Single Encoding
Multi-character IPA symbols (affricates, ř) are temporarily encoded as single characters to simplify pattern matching during voicing assimilation and other rules:
| IPA | Encoded As | Reason |
|---|---|---|
| t͡s | ʦ | Single-char for pattern matching |
| t͡ʃ | ʧ | Single-char for pattern matching |
| d͡z | ʣ | Single-char for pattern matching |
| d͡ʒ | ʤ | Single-char for pattern matching |
| r̝ | ř | Single-char for pattern matching |
| r̝̊ | ṙ | Single-char for pattern matching |
6. Consonantal Prepositions
The prepositions v and z are handled specially at word boundaries:
| Context | Rule | Example |
|---|---|---|
| Before a vowel | Glottal stop ʔ is inserted | v autě → fʔau̯cɛ |
| Before a consonant | The preposition consonant elides | v Praze → fprazɛ (v→f, elides; z stays voiced) |
7. Voicing Assimilation
Regressive assimilation: the last obstruent in a cluster determines the voicing of all preceding obstruents. Sonorants (m, n, ɲ, r, l, j) are transparent. The consonants v and ř do not trigger assimilation.
| Input | Output | Rule |
|---|---|---|
| odpad | otpat | d→t before p; final cluster devoices |
| kde | gdɛ | k→ɡ before voiced d |
| vzorek | vor̝ɛk | z stays voiced (v doesn't trigger); r̝ stays voiced |
Special case: smus → zmus (exception to prevent incorrect devoicing).
8. Final Devoicing
All voiced obstruents at word boundaries (#) become voiceless. This happens after voicing assimilation.
9. Ř Devoicing
The ř (r̝) becomes voiceless (r̝̊) when adjacent to any voiceless obstruent (except r̝̊ itself).
10. Syllabic Consonants
Sonorants m, n, r, l (but not ɲ or j) become syllabic when:
- Positioned between a consonant and an obstruent: prsten → pr̩stɛn
- At word end after an obstruent: sedm → sɛdm̩
11. Nasal Assimilation
n before k or g becomes /ŋ/: banka → baŋka.
12. Stress Assignment
If the transcription has no spaces and no existing stress mark, primary stress ˈ is automatically prepended (first syllable).
13. Decoding & Geminate Elimination
The temporary single-character encodings are converted back to proper IPA multi-character sequences.
Then doubled (geminate) consonants are simplified to single consonants (e.g., tt → t,
tt͡s → t͡s). Finally, # word boundary markers are removed.
Example Transformation
| Input | Key Steps | Final IPA |
|---|---|---|
| příklad | p → p; ř → r̝; í → iː; k → k; l → l; a → a; d → d; ř devoices before voiceless cluster → r̝̊; stress added | /ˈpr̝̊iːklat/ |
| město | mě → mɲɛ; s → s; t → t; o → o; stress added | /ˈmɲɛsto/ |
| odpad | o → o; d → d; p → p; a → a; d → d; d→t before p (assim.); final d→t (devoicing); stress added | /ˈotpat/ |
| prsten | p → p; r → r; s → s; t → t; e → ɛ; n → n; r becomes syllabic (between cons. and obstr.); stress added | /ˈpr̩stɛn/ |
| banka | b → b; a → a; n → n; k → k; a → a; n→ŋ before k (nasal assim.); stress added | /ˈbaŋka/ |
Lexicon Architecture
Before rule-based processing, the system looks up words in a Wiktionary data dump (339KB, V3 prefix-compressed
format). The lexicon is loaded from czech_lexicon.zip via a Web Worker and wrapped in a
LargeDictionaryHandler for efficient chunked lookup.
- If a word is found in the lexicon, its phonemic transcription is used directly.
- If not found, the Lua module generates a transcription using the rules described above.
- Some words have multiple pronunciations in the lexicon — these appear as click-to-cycle variants.
Common Issues & Limitations
Known Transcription Problems
This table shows known issues where the automatic transcription may be incorrect:
| Input | System Output | Correct IPA | Cause | What to Do |
|---|---|---|---|---|
| obecný | /ˈobɛtsniː/ | /ˈobɛtsniː/ | — | Correctly generated by rules |
| exhibice | /ˈɛɡzɦɪbɪt͡sɛ/ | /ˈɛɡzɦɪbɪt͡sɛ/ | — | Correctly handles ex- prefix |
| quasi | /ˈkuasɪ/ | /ˈkuasɪ/ | — | Correctly maps q → k |
| Very rare loanwords | Approximate | Varies | Not in lexicon; rule-based fallback may not handle unusual letter combinations | Check Wiktionary; report if incorrect |
| Dialectal spellings | Standard IPA | May differ | Module uses standard Czech only | Use standard spelling for best results |
| Proper names (rare) | Approximate | Varies | Some names not in lexicon; foreign names may need special rules | Verify with native speaker or Wiktionary |
General Limitations
- Phonemic only: The Czech module produces broad phonemic transcription, not narrow phonetic transcription with allophonic detail (aspiration, exact vowel quality in context, etc.).
- Standard Czech only: Regional variants (Moravian, Silesian, Common Czech / obecná čeština) are not supported.
- No compound word splitting: Very long compound words may not be split correctly into components for stress assignment.
- Foreign proper names: Names from non-Slavic languages may produce approximate results since the rules are optimized for Czech orthography.
For technical issues or suggestions, please visit our GitHub repository.