Wednesday, April 16, 2008


Georgian is a language that started with no content in OmegaWiki. It is spoken by some 4.4 million people mainly in Georgia, Turkey, Iran and Russia. It is written in the Georgian alphabet.

Statistics are in and off themselves not that relevant but they do allow you to tell a story. OmegaWiki started with the content of GEMET, the GEneral Multilingual Environmental Thesaurus, the languages that are part of this resource have a head start in size.

Today thanks to the hard work of Sopho, Georgian is the first language that grew bigger then one of the languages supported by Gemet. The OmegaWiki statistics show that Japanese might be the next language to grow bigger then Slovenian..

Because of the beautiful characters, Georgian is my favourite example of showing the value of the localisation in OmegaWiki. It is really special to see the same content optimised in such a way. :)

Saturday, April 05, 2008

Unicode 5.1

Today I learned that Unicode 5.1 has been released. The information that I received informs me that one major feature will be of particular relevance to Japanese, Chinese and Korean texts by enabling ideographic variation sequences. The linebreaking for Polish and Portuguese hyphenation has been improved. The Indic languages will be happy with improved text segmentation algorithm.

There are 1624 new encoded characters, this includes characters required for Malayam and Myanmar but there are also new characters for the Latin script. New is support for the Cham, Lepcha, Ol Chiki, Rejang, Saurashtra, Sundanese, and Vai scripts.

For the techies, the collation algorithms have been updated to include all the new characters. This has also an effect on contractions like the ch in the Slovak language.

Many of these things have an effect on languages supported in Wikimedia projects. My question is when will we have support for this. Is this a function of the MediaWiki / PHP code and is it also a function of the browser ??