Thursday, July 29, 2010

#Conceptwiki

The ConceptWiki environment is an umbrella for several categories of professional data. It is hosted by NBIC / the Concept Web Alliance and it provides a rich environment with many types of data that have a bio-medical background.

The software it uses is very similar to the one used by OmegaWiki. This is due to the long association between people behind OmegaWiki and the ConceptWiki. It used to be that there was no room for multi-lingual content in the ConceptWiki but that is changing.

The hosting of OmegaWiki and the ConceptWiki used to be on the same server. this made it easy to connect OmegaWiki translations to ConceptWiki ontological content. For a concept like yaws you find many translations at OmegaWiki, you also find a link to the Wikipedia article and a mapping to the UMLS part of its database.

Indonesian dance mask

There is no Commons category yet on the subject, otherwise you might find this mask depicting the face of a sufferer of this disease.

The plan is to bring OmegaWiki and the ConceptWiki closer together again. I hope that the ConceptWiki will become a multilingual resource and we are going to start by sharing resources.
Thanks,
      GerardM

Endangered African languages

Sorosoro is a program that aims at studying and documenting endangered languages.

They have recently published videos of native people giving some vocabulary in four endangered African languages: Punu, Mpongwe, Akele and Benga. The words are about body parts, numbers, colors and common phrases.



I have enabled these languages for editing at OmegaWiki and added all the words mentioned in the videos. By doing so, these translations are not only available to people speaking English, French or Spanish (the three languages of the video), but to all the other languages at OmegaWiki.

We now have 28 expressions in Akele and Benga, 59 in Mpongwe and 78 in Punu.

If you know about more resources (vocabulary) for endangered languages, you are more than welcome to mention them.

Thanks,
Kipcool.

Wednesday, July 21, 2010

So we like #Commons

Yesterday Kipcool surprised us with a more visible link to Wikipedia, today he added Commons to it as well. Commons is essentially different in that there is only one link to Commons per concept.

When you look for pictures of a horse at Google or Bing, it makes a big difference in what language you are looking for that animal. If you look for instance for an "សេះ" you will find far fewer horses.


This is what it looks like in Arabic. As there is an annotation referring to the Arabic Wikipedia article, the Arabic article is selected.
Thanks,
      GerardM

Tuesday, July 20, 2010

So we like #Wikipedia ...

Every now and then, I am happily surprised with new functionality for OmegaWiki. This time Kipcool made our existing Wikipedia visible. When there are references to a Wikipedia article, he will point you to the article in your language.


The expression dólar shows the Dutch Wikipedia article for me. It will show the Spanish article for Ascander and the English article for Kipcool (the French and German article are not linked yet).

The link is added to the page with Javascript. This prevents additional load on our server. I hope you like it, it is an other excellent reason to dig into our annotions.
Thanks,
      GerardM

Wednesday, June 09, 2010

Démoustication

For #OmegaWiki it is relevant to find as many translations as possible. This is what makes the concept work. The French word démoustication has as its definition "The control or reduction of a mosquito population in a certain place". This definition has me check if "muggenbestrijding" exists in the Dutch language. It does and I can even find pictures of people shooting with air guns on mosquitoes..

Having only three expressions for mosquito control is ok, as it is there with a definition in two languages. It is important to know the word. When the same concept appears starting with a different language we will always be able to merge the two DefinedMeanings.
Thanks,
      GerardM

Sunday, June 06, 2010

#Wikipedia interlanguage links

On a mailing list of the WMF there is a big discussion about interlanguage links; should they be shown by default or not. This discussion is the result of a choice made by the Usability team to hide them. Many people have strong opinions, one of the best contributions is one by Gregory Maxwell.

That thread is not the place to improve the approaches to interlanguage links. The goal of these alternate approaches is to increase the number of people that are informed by a Wikipedia.

Alternative one
Allow for interlanguage pages when there is not even a stub. We aim to inform, but when there is no article, not even a stub, we currently do not inform. The page may include a definition, but what it should show is articles on this subject can be found in these languages. Followed by a list of languages and the names of the articles.

This is easy to implement. The links are still maintained by bots and these referral pages do not need to show elsewhere. 

Alternative two
The technology behind OmegaWiki supports links to Wikipedia articles for quite some time. It even refers to Commons categories filled with data files on a subject.

This alternative is a bit more involved, but it allows for more functionality. It may need to need some fiddling in order to get it ready for WMF usage, but hey it is great functionality with many links to Wikipedia in there already.
Thanks,
     GerardM

Wednesday, June 02, 2010

#Omegawiki WOTD; adénosine

Today's word of the day is adénosine it is the French word for adenosine. It was a word started by Kipcool and I added some bits and pieces to the concept.

The first thing I did was add a load of translations. As I want to have on average more then 10 Expressions per DefinedMeaning, it has more effect to add translations then to start a new DefinedMeaning.

The second thing I did was to make it part of the chemical compound class. Once adenosine was known as a chemical compound, I was able to add some annotations like the chemical formula. The last thing I did was to map it to the concept adenosine in the UMLS. This is really valuable for those who want to map the translations of OmegaWiki to the rich information that can be found in this resource.
Thanks,
      GerardM

A guest blog about the Telugu experience

 I am happy to welcome Veeven's contribution to the OmegaWiki blog.

We now have 1,000+ Telugu expressions and 35 definitions in Telugu in OmegaWiki. It is nothing when compared to the words in Telugu language. However, I think it is a good first step.

To show off, here is a screen shot showing the multiple meanings for the word "వర్ణము":


I joined OmegaWiki in 2008. I didn't add so many expressions or definitions in the two years. From about a month ago, I started working more regularly.

Working on OmegaWiki is a different experience. It is nice to see more and more of the interface change to Telugu as you work on Community class attributes and language names. One drawback is the performance: when editing, the pages load slowly and be non-responsive for a second or two (browser trying to sort the tables). I hope it gets better.

I wish more people come forward and contribute to Telugu on OmegaWiki.

Saturday, May 29, 2010

Support for Telugu in #OmegaWiki II

When Veeven added పీడ్మోంటీ for Piedmontese, it was the last language with no translation in Telugu that had a translation for Telugu.

As a consequence almost all the labels used on the concept Telugu are now available in Telugu. Because of the hard work done at translatewiki.net, the experience is great even though there are just 805 expressions in Telugu at this time.

What you do have is for instance a translations from English to Spanish in great numbers. Having this with a Telugu interface is one reason why the investment in time is worth it.

At this time there are 249 languages supported in OmegaWiki. Your language may be among them. You can make the same difference for the people that read and write your language.
Thanks,
     GerardM

Tuesday, May 18, 2010

Support for Telugu in #OmegaWiki

When Veeven added the Telugu numeral, I was really pleased as I had been trying for some time to elevate the number of Telugu expressions over the 500.

I was really surprised when the labels did not change to Telugu. Happily Kipcool found that the "Wikimedia key" was missing and now everything works as designed for Telugu.
Thanks,
     GerardM

Community class attributes

In #OmegaWiki the community class attributes are a collection of words or phrases that are used as labels for attributes of a class. In the last few days a few attributes were added to the "number" class. This was to allow the expression of numbers in Thai, Laotian and Telugu.


These 65 words are with the names of languages the most relevant concepts. They are effectively what localises the data for the people who selected a language in their interface.

It is really appreciated when they are completed with some urgency; this is what makes OmegaWiki functional for our readers.
Thanks,
       GerardM

Sunday, May 09, 2010

#Ambaradan release 0.6.0


The first release of Ambaradan has hit the Internet. It hit the Internet and, it intends to stay there. So congratulations to the ream that has been working so hard to get to this stage, it is an awesome development.

The software as you can find it, is a first alpha release and it does not have all its functionality yet. As I have observed Ambaradan for a few days, I can testify that several bug fixes and improvements have gone in. So it is very much something to be watched.

One of the really cool features is that Ambaradan provides web font support; this means that as long as they can find a freely licensed Unicode font you will not need to install a font on your system.

It will be interesting to follow the development of Ambaradan, I hope that the ambitious specifications will slowly but surely make it into reality. The potential is there to become awesome.
Thanks,
      GerardM

Wednesday, May 05, 2010

Polyglot functionality on #OmegaWiki

#Multilingual #MediaWiki is really helped by the Polyglot extension. What it does is show for instance the main page in your language if there is a translation. This helps the usability of OmegaWiki and any other installation a lot when multilinguality is important.

There are a few cotchas, the first is that it prefers the ISO 630-1 code when one is available. This means en in stead of eng, fr / fra, nl / nld .. This is the same for the Babel extension. However, this extension that the change to the short version for you.

Given that the ISO 639-6 will restore the primacy of existing two character codes, it makes sense for OmegaWiki to use this algorithm.

When Kipcool had a look at the code, he found that for OmegaWiki only a few lines were really needed... If you want to see the default or English text, you can always add "&redirect=no" to the URL.

Tuesday, May 04, 2010

Statistics, because we love them

I could not even say that we were looking at the statistics again, considering that we might want multiple statistics, or less then a day later we have multiple new statistics. Kipcool outdid himself..

Not only does it show the statistics in "your" language, when "your" language shows from right to left, so will the statistics.

Syntrans statistics in the Arabic script
As you can see, it shows nicely with Arabic characters, you will believe that it does the Latin script (boring), it also does Cyrillic, Devanagari and ..

Definitions in Georgian
These two statistics are new, the first ie the "Syntrans statistics" show all the words used for a language and as such it includes homonyms and synonyms. The second shows the number of definitions that exist for a language.

We are really happy with the result so far, now we have to phase out the old statistics and see if we want to do some caching of results.
Thanks,
       GerardM

Sunday, May 02, 2010

Statistics ... a work in progress

At #OmegaWiki we have nice statistics, nice bar graphs that show the number of expressions for each language. The numbers are nice except.. except for the language of the labels.. they are in English. This is not really how we want to present them because we pride ourselves on our multilingual support.


Now that we are looking again at our numbers, we are considering what numbers to show. The DefinedMeanings, the Expressions or the Syntrans records. The first shows the number of concepts, the second the number of expressions that are used for a language and the last will include the homonyms as well. What do you think?

Obviously these numbers show the current status and you can help us improve these numbers.
Thanks,
       GerardM

Thursday, April 29, 2010

હાથકડી is the word of the day

Everyday at #OmegaWiki another word of the day. There are several people who divine what it should be. Sometimes the word is in the news, sometimes like today it is a word that was added the previous day.

OmegaWiki currently supports 247 languages and when you look at the distribution of the words over the languages you find that it tails off quite sharply. This does not mean that expressions in languages like Gujarati are not welcome. For from it and, to show our wish for collaborators for for instance Gujarati, હાથકડી is today's word of the day.
Thanks,
     GerardM

Monday, April 19, 2010

#Wikipedia article

At #OmegaWiki we have twenty translations of the phrase "Wikipedia article". This phrase is one of the Community class attributes, these are used to indicate a relation. 

The relation indicated by the "Wikipedia article" is an article in the Wikipedia of that language. Currently many articles are referred to in the English and the Spanish Wikipedia.

We would appreciate it when you help us with translations for this phrase in your language.
Thanks,
       GerardM

Sunday, April 18, 2010

Word not found

When an expression is not found at #OmegaWiki, a link will be proposed for you to create either a new article or a new expression..


Kipcool informed me about this on IRC, he also expressed his amazement at the speed in which localisations become available at translatewiki.net. There are already localisations for 10 languages for the new message.
Thanks,
      GerardM

What to do when a word is not part of your vocabulary

Every now and then, I find a word that I do not know. For a normal person, it is a trip to the dictionary, for me it is a trip to OmegaWiki and often I find that the word is not yet in there.

Today it was the word cohort that I did not know. I knew it as something to do with Roman military, but it has also something to do with statistics..

I like pictures in my blogs, people playing soldiers is more fun to look at then some bar charts.
Thanks,
      GerardM

Thursday, April 15, 2010

The ratio between DefinedMeanings and Expressions

#OmegaWiki as a resource provides information in many languages. Currently there are some 245 languages enabled for the inclusion of expressions. I often add words that feature as an Apertium concept and in order for it to be useful for the purpose of Apertium, it is important to add as many translations as possible.


As a consequence we currently have some 9.27 Expressions for every DefinedMeaning. Slowly but surely this ratio is moving upward and this provides some secret motivation :) This does not mean that we have something like 9.27 translations for every concept; an Expression is just a string of characters and when  this string of characters is used in multiple languages or is used for multiple concepts, it is still only one Expression.

I would love to know what the ratio is between the number of concepts and the number of translations; it will be a higher number ..
Thanks,
    GerardM

Tuesday, April 13, 2010

15,000 #Portuguese expressions in #OmegaWiki

We are happy to have a sufficient number of expressions in the Portuguese language at OmegaWiki. It shows how OmegaWiki provides the functionality that you will not find elsewhere.


The labels are shown in Portuguese when you have selected Portuguese as your default language. Better still, when you edit in OmegaWiki, the same labels can be selected in Portuguese.


We would dearly love to provide the same functionality to languages like Telugu, Kannada, Abchaz, Guarani.. We can help by pointing out how you can quickly provide a localised environment, we can help by adding the expressions for any languages when we find them. In the end we need people interested in supporting their language.
Thanks,
       GerardM

Friday, April 09, 2010

Working on the #Apertium M list


When you want to add content to #OmegaWiki, it helps when you have a goal. As you can see from the abundance of "red links", there is plenty left to do to add to the Apertium words that start with an "M".

Apertium as you may recall is a free/open-source machine translation platform. Working on the Apertium lists in OmegaWiki makes sense because its content is freely licensed as well and, the Apertium people are welcome to use it for their purposes.
Thanks,
     GerardM

Tuesday, March 23, 2010

OmegaWiki is back on line

When a technician does not know that a live application is running on a server, he will just turn the server off. That is what happened to OmegaWiki two weeks ago.

What happened next was an amazing group of people getting together, they found someone in the USA willing to go to the office where the server was, take it home and in a multi-continent operation get the data from that system.


OmegaWiki is now operational again; it runs on a server of Erik Moeller, Siebrand, Kim and Marc worked on the issue from the Netherlands and Cyde in the USA. Once the server was up and running, it was Kipcool who got everything working again.

I am really grateful for the hosting we received in the past from Knewco, I am really grateful and happy for the support of so many fine men who brought this project that has so much passion, love, work in it back from the brink.

I have added some translations for the concept Wikipedia, I enjoyed the fact that I could add all the different expressions all in one go. I needed them for some work at translatewiki.net
Thanks,
      GerardM

Saturday, March 20, 2010

Off line (progress report II)

The OmegaWiki server is now connected to the Internet. The data is being moved so happily a positive progress report :)
Thanks,
    GerardM

Friday, March 19, 2010

Off line (progress report)

Sadly things have not progressed as quickly as we would have hoped. I understand that today the server will be picked up to transfer the data from the server and prepare for a relaunch.

More info when I have it ..
Thanks,
    GerardM

Saturday, March 06, 2010

Off line

OmegaWiki is for the moment off line. Our host forgot that OmegaWiki was on their server. It will be available again soon I have been promissed.

It does however mean that we are looking for hosting elsewhere.
Thanks,
     GerardM

Thursday, February 18, 2010

400.000 expressions


Today at OmegaWiki we have reached the milestone of 400.000 expressions for more than 43.000 concepts in 242 languages.

It was longer to reach than expected, because the statistics where first too optimistic by showing 20.000 more expressions than we actually had. This is now fixed.

The statistics reveals that, among these 242 languages, 49 have more than 1.000 expressions, and 10 have more than 10.000, namely English, Castilian, German, Dutch, French, Italian, Portuguese, Swedish, Finnish and Polish. The first seven of these languages correspond to the languages in which the regular contributors are native (or fluent).

What is not known is how many definitions we have in these languages. An improvement of the statistics page is needed.

Thanks,
Kipcool.

Tuesday, February 16, 2010

Adding multiple translations at once

At Omegawiki, it used to be that if you want to add 10 translations or synonyms to a word, you have to add them one by one. This implies that the page is reloaded 10 times. It takes time for the contributor, and load for the server.

But now, with a bit of java, there is the possibility to add multiple translations without reloading the page.

For this, you just have to click on the green "+" and a new row appears.

I give translations as an example, but it works also for definitions, classes, annotations, etc.

I am glad, it is my first steps in the AJAX world, and it will allow us to reach faster the 400k milestone.

Thanks,
Kipcool.

Tuesday, January 26, 2010

Language specific annotations

In order to allow for transcriptions (see previous post), I implemented the possibility to have language specific annotations.

Up to now, it was already possible to have language specific annotations only when that annotation is a list of options. For example, gender shows "masculine" and "feminine" in French, and "masculine", "feminine" and "neutrum" in German (and nothing in English). However, the other annotations (texts and link) were always available for all languages.

Language specific annotations allow for example that the annotation "pinyin" shows up only for words in Mandarin. This is now possible, and configurable by adding the corresponding annotations to the language definedMeaning (e.g. Mandarin (simplified) , Japanese ).

At the moment, the following transcriptions are available:
- pinyin for simplified Mandarin and traditional Mandarin
- revised Hepburn romanization for Japanese
- Hiragana and Katakana for Japanese (as a way of reading a word in kanji)

More transcriptions can be easily added, as soon as a contributor shows interest in it.

It is possible to do more than just transcriptions with language specific annotations. For example, we could imagine to have links to some public domain (for example Webster) or authoritative English dictionaries (Oxford dictionary online), as a way of providing an attestation for a given syntrans (spelling + definition). Such a link would be available only for English words.

Other ideas and thoughts are welcome.

Kipcool.

Tuesday, January 19, 2010

Romanizations in Omegawiki

Romanizations are important to learn a language since it allows us, Latin alphabet readers, to read a word in a non-Latin script.

We are going to implement romanizations in OmegaWiki as text attributes. However, there are several concurring romanization systems for each non-Latin script, and it has to be decided which systems we are going to use.

First of all, there are several ISO norms for romanizations: (Wikipedia link). There are also many other romanization systems which are not ISO norms.

Among the languages and scripts listed in Wikipedia, I have knowledge and interest in Mandarin and Japanese. So, I'll discuss them below. For the rest, help is welcome.

For Mandarin Chinese, it is clear that the ISO norm, i.e. pinyin, is to be used. It is what is in the books when we learn the language, and what appears in almost all dictionaries, it is the most used system to write Chinese on a computer, and it is even taught to Chinese people at school.

For Japanese, the situation is not easy. The ISO norm is the Kunrei-shiki romanization . However, the most widely used seems to be the Hepburn romanization . It is the one that is used in my books at home. So we have the choice between several possibilities:
- should we use the Kunrei-shiki only, because it is ISO,
- should we use the Hepburn only, because it is the most widely used,
- or should we implement both?

For the other non-Latin scripts and languages , if you have knowledge or interest in Cyrillic, Arabic, Hebrew, Greek, Georgian, Armenian, Thai, Korean, Indic scripts or any other script that is not in the list, you are welcome to give your thoughts about the following question (here or at the International Beer Parlour):

"Should we use the ISO norm, or another system?"

Thanks,
Kipcool.

P.S.: there is also the cyrillization of Japanese. Is it a desired feature as well?

Wednesday, January 06, 2010

Etymologies

Thanks to Dh, it is now possible to add etymologies in Omegawiki (in fact, since about a month).

It has been implemented as a translatable text attribute. It means that the etymologies for any word can be explained in all languages.

This may seem like an overkill, and actually, the initial idea was to have only one text field, and to enter the etymology of a word in the language of that word, since we expected that only people who know a language would be interested in etymologies of words of that language.

However, while I might be interested in the etymology of a Latin word, I am not able to write it in Latin, and there is also no particular reason to write it only in English in that case.

What is missing now is the possibility to have etymons as links to the corresponding DefinedMeanings. This is also a desired feature for having the words of a definition linking to the corresponding DM to avoid ambiguity, and will need further development.

Have fun adding etymologies,
Kipcool.