English Transcriber(Explanation below)

Please wait until the page finishes loading
before starting a transcription

Explanation

The transcriber is a tool that enables you to convert text into Musa. For now, it only works with American English text written in the Roman alphabet, and it only produces Musa with defective punctuation. If you need full Musa punctuation, you will need to edit the Musa by hand. You will also need to transcribe by hand any words not in the dictionary. The Vocalize option adds Musa vowels as diacritics over the Roman text. It sometimes gets confused, so you will have to check the output.

The main challenge in transcription is distinguishing heteronyms (words that are spelled alike but pronounced differently, like read or project). Here, you have two choices: you can let the transcriber choose one of the heteronyms on its own, or you can tell it which one you want. However, this transcriber only knows 71 of the most common heteronyms; there are many more. We also let you choose between dialectal variants: for example, I pronounce caught to rhyme with thought, but CMUdict says it's more common to rhyme it with cot. In this case and a hundred others, I made my dialect the default :)

This transcriber uses the Carnegie Mellon University Pronouncing Dictionary (CMUdict) of North American English, which has over 134,000 entries, including many proper names. Words that are not found in the dictionary will be left untranscribed, but numerals and punctuation will be converted (using defective punctuation). One-syllable grammar words are assumed to be unstressed, but in your text, some of these may be stressed - What is THAT? What IS that? WHAT is THAT??? - so please check the text after transcription.

The CMUdict distinguishes between open schwa and close schwi (which they write as IH), for instance in Lisa's versus leases. In Musa, we write the English schwa with the letter for a lower ɐ, since that's how it's usually pronounced. We write the schwi with a back letter most of the time (but not in suffixes like -ic -ics -ist -ism -ish -ive or -ing). But in many cases, CMUdict uses a schwa where I would use a schwi; for example, in words like bullet button. Based on Flemming's work, almost all of these schwas should be raised to schwis. The main cases when schwa is not raised are when initial or final as in about, sofa, sofas, sofa's, un- and up-, or before another vowel as in extraordinary. But the exceptions are too numerous and too diverse to be corrected algorithmically, and yet schwa/schwi occurs too often in English for us to ask you each time. So until a better solution appears, we give you two choices: accept the CMUdict readings or raise all schwas to schwis except in the cases mentioned above.

The dictionary entries have been adjusted to account for aspiration of fortis plosives at the beginning of words and stressed syllables, glottalization of fortis plosives at the ends of words, devoicing of lenis plosives next to unvoiced sounds or pauses, and darkening of L after vowels. Coronals t and d are spelled as flaps between a stressed vowel and a reduced vowel, but not across word boundaries or despite an intervening r l m n.

Here is the dictionary itself, as a text file. It contains over 134,000 English words and names, along with their Musa transcriptions. This is the same data used in the Transcriber; note that the names are pronounced as in North America, which is often different from how they are pronounced in their country of origin.

English Musa Dictionary

(In many browsers, the file will display in the browser, and you will have to save it.)


© 2002-2024 The Musa Academy musa@musa.bet 24oct24