Did you know bidirectional (BiDi) languages have no upper or lower case? Do you know when to omit a vowel from BiDi text?
Even for multilinguals who know both left-to-right (LTR) and right-to-left (RTL) languages like English and Japanese, bidirectional languages have unique characteristics that might surprise you. As the second post in our “crash course” on bidirectional languages for localization professionals, let’s discuss these differences — and their implicit pitfalls for globalization managers.
For purpose of illustration, we’ll use Arabic as the model. As discussed in our first post, Arabic is the common denominator of most BiDi languages, and it’s widely regarded as the most complex. So while the following features are not intrinsic to every BiDi language, it’s a solid list of what you’ll encounter in the entire BiDi language group.
The Arabic script proper comprises 28 consonants, three long vowels, three short vowels, two diphthongs, a glottal stop, and one diacritic mark. Other BiDi languages borrowed and modified Arabic script, often adding letters to represent sounds not present in Arabic, such as “v” or “p.” However, dual official languages in many countries, as well as a liberal use of Western cognates and brand names, mean that BiDi languages often feature a snippet of a LTR text in a Roman alphabet like Italian or French in the middle of a block of RTL text.
In BiDi languages, long vowels are written, but short vowels are usually omitted. To illustrate how that would work in English, imagine encountering the text, “TH MN N TH MOON.” Is this “the man in the moon,” or “the men on the moon?” BiDi languages expect readers to supply topical knowledge and context analysis to fill in the missing vowels and give meaning to the sentence.
(Images via Arabicgenie.com)
Reading right-to-left means the whole experience — not just the line of text — is right-to-left. Books are alphabetized on a shelf RTL. Books open from what Westerners would consider the back cover. Diagrams, illustrations, margins, and controls are all reversed, as are back and next arrows. A sequence of instructional images, such as “Assembling This Bookshelf in Five Easy Steps” must be reversed, or else they become instructions on how to disassemble an existing bookshelf.
The process of recreating LTR graphics, text, content and user interface for RTL or BiDi languages is called “mirroring,” and we’ll go into more details in a future post. Mirroring can really trip Westerners up, and the resulting errors are confounding for a BiDi audience.
Letter Forms and Ligatures
BiDi scripts are inherently cursive, which means that each BiDi script has multiple letter forms, depending whether it’s used at the start, middle, or end of the word — and there’s a standalone (detached) form. Hebrew has only five letters with both a normal form and a final form. In Arabic script, certain letters can only connect with other letters on the right side, but not the left. This creates slight breaks in the script flow that are subtly smaller than breaks between words.
(Image via Ukindia)
(Image via OpenGroup)
Arabic letters can also take different shapes, or ligatures, depending on the other letters around them. The letters are combined, or “stacked” in different ways, based on which letters are being written. Here’s an example of the name “Muhammad” written out fully, followed by the preferred ligature form:
(Image via WikiMedia)
While some ligatures are optional, others are mandatory, but not all fonts support the use of ligatures. Still others fail to render the proper letter forms, merely stringing the standalone forms together. Worst case scenario: some fonts display standalone letter forms together in LTR text strings.
As noted in our first BiDi 101 blog post, numbers are read LTR in BiDi text, so the number "one thousand" would be 1000, not 0001, amid a RTL text block.
That said, there are two number systems commonly found in BiDi languages: the Arabic number system that's most familiar to Westerners, and the Indic number system. Contrary to what you might expect, the more common system among Arabic script languages is the Indic number system (or variations of it). See the Indic, Arabic and Anglicized numeral names in Arabic below.
(Image via Daniel S. Chereck)
Can You Spot the Errors?
If you've been paying attention, you should be able to spot some problems with the following real-world examples of bad BiDi script.
(Image via Slodive)
This one's easy: the script is inappropriately rendered vertically. The letters should be joined in cursive in a horizontal line of RTL text and then rotated 90 degrees counter-clockwise.
(Image via WebCertain)
In this example, the words are actual Arabic words, but the letters are unjoined. In addition, they are incorrectly displayed LTR instead of being RTL.
In this example, which I captured during a recent excursion in Europe, the letters are joined together, so everything is OK, right? Not so fast — the display is still LTR, and as such, the letters are displayed in the reverse order.
How'd you do? If you spotted even one or two of the errors, great work! You have the foundation to appreciate some of the special considerations required for proper display of BiDi script in software and websites. Please stay tuned for the next post in the series, and in the meantime, share your great real-world examples of botched BiDi scripting.