Marking Up Taglish

10:24 am PHT

One unique problem with having a weblog when you’re bilingual is that it’s a bit hard to mark up your multi-language text semantically correctly.

A not-so-recent discussion that came up in the webdev blogging circles is what the correct element should be used when marking up foreign text (and other un-emphasized italics). Some say that <i> with the correct lang attribute is the most appropriate element to use since by convention, foreign text is displayed in an italicized font. Others contend that <i> should not be used since it is a presentational element: one should use span (some even advocate em) with the correct lang attribute.

In my weblog, I mark up foreign text using the <i> element; I was convinced by the rather passionate arguments put forth by the <i> camp. The default language of my blog is English (en-US) while the most common secondary language is Tagalog (tl). Doing The Right Thing™ with respect to correct markup of foreign text should then be easy since I’ve decided on my convention, right?

Not exactly. There are a couple of minor issues that arise when one is bilingual—especially when you use a mixed language like Taglish (Tagalog with English). First, it’s somewhat difficult to know which parts of the text should be marked or not. A Taglish sentence like the following contains an excellent fusion of English and Tagalog grammar and vocabulary. It provides an illustrative example for our purposes.

Nagpunta ako sa shopping center kanina to meet up with my classmates at para i-print ang report namin.

The sentence is a perfectly acceptable piece of non-standard Taglish (as if there were a standard one). Translated, the sentence means, “I went to the shopping center a while ago to meet up with my classmates and to print our report.” It contains three phrases: the first and last in Tagalog, and the middle in English. Furthermore, the two Tagalog phrases have embedded English words, and the last phrase contains an English verb that’s been converted to Tagalog (i-print).

I’ve marked up that sentence according to what I consider is the correct way. The two Tagalog phrases are marked as Tagalog, excepting the two borrowed English words, while the English-now-Tagalog verb is left as Tagalog. It’s actually debatable whether I’ve marked up that verb correctly. Some nitpickers would suggest “i-print” instead of “i-print” marking up only the prefix as Tagalog. A few others would, in addition, suggest spelling the embedded English words using Tagalog orthography and marking it as Tagalog (e.g., “siyoping senter”), but it looks ridiculous.

The second issue concerns the ease of marking up the text when you’re writing a blog entry. If I were using an unsophisticated blogging software that requires me to enter valid XHTML, I’d have to type in the following nightmare:

<i lang="tl">Nagpunta ako sa</i> shopping center <i lang="tl">kanina</i> to meet up with my classmates, <i lang="tl">at para i-print ang</i> report <i lang="tl">namin.</i>

Text markup systems like Textile and Markdown make this somewhat simpler:

__[tl]Nagpunta ako sa__ shopping center __[tl]kanina__ to meet up with my classmates, __[tl]at para i-print ang__ report __[tl]namin.__

However, all the language codes add an uncomfortable amount of friction when typing entries. It would be wonderful if these text markup systems had the feature for defining my own additional markup so that I could, for instance, say that three underscores (___) should mean italics with the lang attribute set to “tl”. Textile, I think, supports this through the use of filters.

But fortunately, I’ve decided back when I started my weblog that I’ll develop my own CMS and text markup system. Supporting then my personal idiosyncrasies is a matter of changing the underlying software. I’ve now modified my blogging software to pre-filter three underscores and convert them to the markup that signifies italicized text with the “tl” attribute. In the future, it’s possible that I may extend my markup system software to incorporate this pre-filtering seamlessly.

Filed under

Add your comment | No comments yet


Comment times are in Philippine time (+0800).

Post your comment here

Comments moderated: Comments for this entry is now moderated. That means that the author will have to approve the comment before it can be viewed by the public.

Remember The Data Above? (Uses Cookies)

Comment shown to:

Comment notes

Your name and e-mail address are required. Your e-mail won't be displayed.