BiDi 101: Bidirectional Script Traits That Vex Localization Pros

BiDi 101 Bidirectional Scripts

Did you know bidirectional (BiDi) languages have no upper or lower case? Do you know when to omit a vowel from BiDi text?

Even for multilinguals who know both left-to-right (LTR) and right-to-left (RTL) languages like English and Japanese, bidirectional languages have unique characteristics that might surprise you. As the second post in our “crash course” on bidirectional languages for localization professionals, let’s discuss these differences — and their implicit pitfalls for globalization managers.

For purpose of illustration, we’ll use Arabic as the model. As discussed in our first post, Arabic is the common denominator of most BiDi languages, and it’s widely regarded as the most complex. So while the following features are not intrinsic to every BiDi language, it’s a solid list of what you’ll encounter in the entire BiDi language group.

Alphabets

The Arabic script proper comprises 28 consonants, three long vowels, three short vowels, two diphthongs, a glottal stop, and one diacritic mark. Other BiDi languages borrowed and modified Arabic script, often adding letters to represent sounds not present in Arabic, such as “v” or “p.” However, dual official languages in many countries, as well as a liberal use of Western cognates and brand names, mean that BiDi languages often feature a snippet of a LTR text in a Roman alphabet like Italian or French in the middle of a block of RTL text.

Vowels

In BiDi languages, long vowels are written, but short vowels are usually omitted. To illustrate how that would work in English, imagine encountering the text, “TH MN N TH MOON.” Is this “the man in the moon,” or “the men on the moon?” BiDi languages expect readers to supply topical knowledge and context analysis to fill in the missing vowels and give meaning to the sentence.

Direction

Text is written RTL, numerals are written LTR, but BiDi languages pretty much never go vertical. The only intuitive way to achieve vertical BiDi script is to start with a standard line of horizontal RTL text and rotate it 90 degrees counterclockwise so that it points downward.

vertical_bidi_scripthoriz_bidi_script

(Images via Arabicgenie.com)

Mirroring

Reading right-to-left means the whole experience — not just the line of text — is right-to-left. Books are alphabetized on a shelf RTL. Books open from what Westerners would consider the back cover. Diagrams, illustrations, margins, and controls are all reversed, as are back and next arrows. A sequence of instructional images, such as “Assembling This Bookshelf in Five Easy Steps” must be reversed, or else they become instructions on how to disassemble an existing bookshelf.

The process of recreating LTR graphics, text, content and user interface for RTL or BiDi languages is called “mirroring,” and we’ll go into more details in a future post. Mirroring can really trip Westerners up, and the resulting errors are confounding for a BiDi audience.

Letter Forms and Ligatures

BiDi scripts are inherently cursive, which means that each BiDi script has multiple letter forms, depending whether it’s used at the start, middle, or end of the word — and there’s a standalone (detached) form. Hebrew has only five letters with both a normal form and a final form. In Arabic script, certain letters can only connect with other letters on the right side, but not the left. This creates slight breaks in the script flow that are subtly smaller than breaks between words.

arabic_letter_forms

(Image via Ukindia)

hebrew_letter_forms

(Image via OpenGroup)

Arabic letters can also take different shapes, or ligatures, depending on the other letters around them. The letters are combined, or “stacked” in different ways, based on which letters are being written. Here’s an example of the name “Muhammad” written out fully, followed by the preferred ligature form:

ligatures

(Image via WikiMedia)

While some ligatures are optional, others are mandatory, but not all fonts support the use of ligatures. Still others fail to render the proper letter forms, merely stringing the standalone forms together. Worst case scenario: some fonts display standalone letter forms together in LTR text strings.

Numerals

As noted in our first BiDi 101 blog post, numbers are read LTR in BiDi text, so the number "one thousand" would be 1000, not 0001, amid a RTL text block.

That said, there are two number systems commonly found in BiDi languages: the Arabic number system that's most familiar to Westerners, and the Indic number system. Contrary to what you might expect, the more common system among Arabic script languages is the Indic number system (or variations of it). See the Indic, Arabic and Anglicized numeral names in Arabic below.
bidi_numerals

(Image via Daniel S. Chereck)

Can You Spot the Errors?

If you've been paying attention, you should be able to spot some problems with the following real-world examples of bad BiDi script.

Example A

vertical_tattoo

(Image via Slodive)

This one's easy: the script is inappropriately rendered vertically. The letters should be joined in cursive in a horizontal line of RTL text and then rotated 90 degrees counter-clockwise.

Example B:

disconnected_script

(Image via WebCertain)

In this example, the words are actual Arabic words, but the letters are unjoined. In addition, they are incorrectly displayed LTR instead of being RTL.

Extra Credit:

extra_credit

In this example, which I captured during a recent excursion in Europe, the letters are joined together, so everything is OK, right? Not so fast — the display is still LTR, and as such, the letters are displayed in the reverse order.

How'd you do? If you spotted even one or two of the errors, great work! You have the foundation to appreciate some of the special considerations required for proper display of BiDi script in software and websites. Please stay tuned for the next post in the series, and in the meantime, share your great real-world examples of botched BiDi scripting.