Tuesday, January 26, 2010

Language specific annotations

In order to allow for transcriptions (see previous post), I implemented the possibility to have language specific annotations.

Up to now, it was already possible to have language specific annotations only when that annotation is a list of options. For example, gender shows "masculine" and "feminine" in French, and "masculine", "feminine" and "neutrum" in German (and nothing in English). However, the other annotations (texts and link) were always available for all languages.

Language specific annotations allow for example that the annotation "pinyin" shows up only for words in Mandarin. This is now possible, and configurable by adding the corresponding annotations to the language definedMeaning (e.g. Mandarin (simplified) , Japanese ).

At the moment, the following transcriptions are available:
- pinyin for simplified Mandarin and traditional Mandarin
- revised Hepburn romanization for Japanese
- Hiragana and Katakana for Japanese (as a way of reading a word in kanji)

More transcriptions can be easily added, as soon as a contributor shows interest in it.

It is possible to do more than just transcriptions with language specific annotations. For example, we could imagine to have links to some public domain (for example Webster) or authoritative English dictionaries (Oxford dictionary online), as a way of providing an attestation for a given syntrans (spelling + definition). Such a link would be available only for English words.

Other ideas and thoughts are welcome.


Tuesday, January 19, 2010

Romanizations in Omegawiki

Romanizations are important to learn a language since it allows us, Latin alphabet readers, to read a word in a non-Latin script.

We are going to implement romanizations in OmegaWiki as text attributes. However, there are several concurring romanization systems for each non-Latin script, and it has to be decided which systems we are going to use.

First of all, there are several ISO norms for romanizations: (Wikipedia link). There are also many other romanization systems which are not ISO norms.

Among the languages and scripts listed in Wikipedia, I have knowledge and interest in Mandarin and Japanese. So, I'll discuss them below. For the rest, help is welcome.

For Mandarin Chinese, it is clear that the ISO norm, i.e. pinyin, is to be used. It is what is in the books when we learn the language, and what appears in almost all dictionaries, it is the most used system to write Chinese on a computer, and it is even taught to Chinese people at school.

For Japanese, the situation is not easy. The ISO norm is the Kunrei-shiki romanization . However, the most widely used seems to be the Hepburn romanization . It is the one that is used in my books at home. So we have the choice between several possibilities:
- should we use the Kunrei-shiki only, because it is ISO,
- should we use the Hepburn only, because it is the most widely used,
- or should we implement both?

For the other non-Latin scripts and languages , if you have knowledge or interest in Cyrillic, Arabic, Hebrew, Greek, Georgian, Armenian, Thai, Korean, Indic scripts or any other script that is not in the list, you are welcome to give your thoughts about the following question (here or at the International Beer Parlour):

"Should we use the ISO norm, or another system?"


P.S.: there is also the cyrillization of Japanese. Is it a desired feature as well?

Wednesday, January 06, 2010


Thanks to Dh, it is now possible to add etymologies in Omegawiki (in fact, since about a month).

It has been implemented as a translatable text attribute. It means that the etymologies for any word can be explained in all languages.

This may seem like an overkill, and actually, the initial idea was to have only one text field, and to enter the etymology of a word in the language of that word, since we expected that only people who know a language would be interested in etymologies of words of that language.

However, while I might be interested in the etymology of a Latin word, I am not able to write it in Latin, and there is also no particular reason to write it only in English in that case.

What is missing now is the possibility to have etymons as links to the corresponding DefinedMeanings. This is also a desired feature for having the words of a definition linking to the corresponding DM to avoid ambiguity, and will need further development.

Have fun adding etymologies,