Digital Musa

Encoding

From the human point of view, a Musa text is a sequence of half letters. But from a digital point of view, a Musa text is a sequence of full letters. In other words, the Musa encoding is essentially Alphabet Gait. Not only is it complete and compact, but unsophisticated rendering engines will still produce legible output.

Musa is encoded in the Private Use Area of Unicode, starting at E000 and ending at E2FF: three full pages. We also use the page E3xx for Musa Markup (explained on the next page). This makes Musa compatible with Unicode (and the SIL PUA) but not part of it. Since Unicode is not allowed to change, it won't be appropriate to encode Musa directly in Unicode until it stops evolving.

The first Musa codepoint, E000, has a special meaning as the end-of-text character. It indicates to computers that there's no more text to display or transmit. The ASCII equivalent is ETX 0003, while in the C language, it's 0000. The rest of the first four lines - E001-E03F - are used to encode Musa shapes (not letters). This collection includes a few shape variants, so that they can be used as half-letters, as short versions, as keycaps, as shapes on blocks and tiles, and other possible uses for the bare shapes. Here's a list of them:

CodepointShapeUnicode Name
E001MUSA YA SHAPE
E002MUSA FI SHAPE
E003MUSA FA SHAPE
E004MUSA FU SHAPE
E005MUSA YU SHAPE
E006MUSA NU SHAPE
E007MUSA MU SHAPE
E008MUSA PU SHAPE
E009MUSA NA SHAPE
E00AMUSA PA SHAPE
E00BMUSA KA SHAPE
E00CMUSA TA SHAPE
E00DMUSA SA SHAPE
E00EMUSA WA SHAPE
E00FMUSA MA SHAPE
E010MUSA LU SHAPE
E011MUSA WI SHAPE
E012MUSA SI SHAPE
E013MUSA TI SHAPE
E014MUSA KI SHAPE
E015MUSA PI SHAPE
E016MUSA NI SHAPE
E017MUSA SU SHAPE
E018MUSA KU SHAPE
E019MUSA TU SHAPE
E01AMUSA RI SHAPE
E01BMUSA TURNED KA SHAPE
E01CMUSA TURNED TA SHAPE
E01DMUSA TURNED SA SHAPE
E01EMUSA TURNED WA SHAPE
E01FMUSA TURNED NA SHAPE
E020MUSA TURNED PA SHAPE
E021MUSA TURNED WI SHAPE
E022MUSA BOTTOM SA SHAPE
E023MUSA BOTTOM KA SHAPE
E024MUSA TOP SA SHAPE
E025MUSA TOP KA SHAPE
E026MUSA SEMI NU SHAPE
E027MUSA SEMI MU SHAPE
E028MUSA SEMI PU SHAPE
E029MUSA SEMI NA SHAPE
E02AMUSA SEMI PA SHAPE
E02BMUSA SEMI KA SHAPE
E02CMUSA SEMI TA SHAPE
E02DMUSA SEMI SA SHAPE
E02EMUSA SEMI WA SHAPE
E02FMUSA SEMI MA SHAPE
E030MUSA SEMI LU SHAPE
E031MUSA SEMI WI SHAPE
E032MUSA SEMI SI SHAPE
E033MUSA SEMI TI SHAPE
E034MUSA SEMI KI SHAPE
E035MUSA SEMI PI SHAPE
E036MUSA SEMI NI SHAPE
E037MUSA SEMI SU SHAPE
E038MUSA SEMI KU SHAPE
E039MUSA SEMI TU SHAPE
E03AMUSA SEMI RI SHAPE
E03BMUSA TWIN KA SHAPE
E03CMUSA TWIN TA SHAPE
E03DMUSA TWIN SA SHAPE
E03EMUSA TWIN WI SHAPE
E03FMUSA TWIN WA SHAPE

The rest of the E0xx page, the entire E1xx page, and most of the E2xx page encode letters, not shapes. Here is the complete set of codepoints; some are unused.

_0_1_2_3_4_5_6_7_8_9_A_B_C_D_E_F
E00_
E01_
E02_
E03_
E04_
E05_
E06_
E07_
E08_
E09_
E0A_
E0B_
E0C_
E0D_
E0E_
E0F_
_0_1_2_3_4_5_6_7_8_9_A_B_C_D_E_F

+0+1+2+3+4+5+6+7+8+9+A+B+C+D+E+F+10+11+12+13+14+15
E100
E116
E12C
E142
E158
E16E
E184
E19A
E1B0
E1C6
E1DC
E1F2
E208
E21E
E234
E24A
E260
E276
E28C
E2A2
E2B8
E2CE
Con+0+1+2+3+4+5+6+7+8+9+A+B+C+D+E+F+10+11+12+13+14+15

The hexadecimal numbers at top left, top and left add up to indicate the code point of the letter in each cell. For instance, the Musa n is at code point E116.

The double-wide Musa logo is at E232, as if it were spelled by its two components. The Musa colon is at E1FD, as if it were spelled by two circles. E12D-E131 and E13B-E141 are used by our virtual keyboards to display control characters.

The Musa dot letter is encoded at E040 separately from the normal Unicode space at 0020. The rule is that the space between Musa text and other text is the normal space, but the dot is used within Musa text. That confounds the non-Musa end-of-line algorithms so that lines of Musa text in Alphabet gait are justified. Gaits with larger glyphs may have to leave an extra space or two at the right side of a line.

The Hentrax Musa Element font includes all the Musa codepoints, even if they don't correspond to a Musa letter. You can type these invalid letters in the Editor by selecting the Hentrax font.

Digital Gaits

In Musa, the gaits are implemented using OpenType Advanced Typography, which specifies substitutions or positionings of glyphs in certain circumstances. For example, in Kana gait, a sequence of consonant+vowel is replaced by the corresponding kana. The feature set is rich enough for everything Musa needs, mostly ligatures and contextual alternates.

Since gaits are implemented as fonts, there's no need for special treatment during text entry, transmission or storage. Musa text can be searched and sorted without regard for gait, and foreign words in text that can't be written in the gait of the text will appear in Alphabet gait. On the next page, we'll explain how Musa Markup gives you a way to embed the gait in the text without changing the letters.

Musa fonts share a common naming format: a font name, the word Musa, and then a gait keyword, like Dushan Musa Alphabet or Zhouhei Musa Fangzi, followed by a style (Regular, Bold, Italic, ...). The possible gait keywords are:

Domains

The conventional extension for sites completely in Musa will be .musa or the single Musa letter , at E232. However, there isn't yet a Musa superdomain, so your site could be musa.mysite.com or mysite.com/musa, for example.


< Letter Reference Markup >


© 2002-2024 The Musa Academy musa@musa.bet 25sep23