Wednesday, January 31, 2007

Greek languages

At OmegaWiki, we saw that Lou started to change the capitalisation of language names... A few days ago I was surprised that the Georgian names for languages were incorrectly spelled. Now it is Greek.

It is really powerful to see that by having the languages corrected, it will be available for everybody who wants to know about Greek. This reason for using OmegaWiki proves itself again.


Sunday, January 28, 2007

Latin roots etc.

Well yesterday one thing came into mind - a dictionary a teacher of mine at the language school had. It was a dictionary that listed Latin words with many translations into other languages and one thing is obvious: all these words of course were similar in all languages. If you knew one of them and studied the other language it would have been easy to create the relative words following a set of rules for most of them.

So one thing should be obvious: to insert these words with their translations into OmegaWiki ... but well, there is one problem with Latin - the "normal" Latin language should not be mixed with the taxonomical Latin that is used in science ... so we need to create two languages: Latin and taxonomical Latin ... who knows if the relative language codes exist somewhere in the ISO 639 standards.


Friday, January 26, 2007

OLPC needs a dictionary viewer

I had a word with the director of content for the OLPC, the One Laptop Per Child Project. As you know OmegaWiki is the project that works on providing the OLPC with dictionary content. We are working on all these words, and while we are making steady progress, there is so much still left to do. We are getting more Expressions in many languages, the definitions are lagging and while we do our best, it is still very much the difference between there being nothing and there being next to nothing. It does however show that things are getting under way...

As the moment when kids are exposed to the systems is drawing closer, it is relevant that the data can be used. So we need a dictionary viewer. It needs to run on Linux and, it should have a small footprint. As we will provide all these languages, it will be interesting to see how the rich tapestry that OmegaWiki tries to weave will materialise on these nifty systems.

When you have a suggestion, please let us know :)


Wednesday, January 24, 2007

Georgian names for languages

In the past I got permission to copy content from a resource with the names of languages. I am still grateful for the data. It got the Dutch Wiktionary going really nicely and, as we needed at the time those names of languages for the user interface.

With OmegaWiki we had the same issue; we needed language names again for the user interface. This was to make it possible for people to see the labels of translations in their own language. From the moment the data became available we have learned a lot, for instance that language in languages like Danish and Italian do not capitalise the names of languages.

Today I was told that many of the names of languages in Georgian were found to be in error and had been corrected. The great news for OmegaWiki is, that we only have to do this once and it is good everywhere. The sad thing is that it is probably wrong in many, many Wiktionaries. There were two types of errors; it was just wrong or it was the name of someone from a country in stead of the name of the language.

The best I can do for the Wiktionaries is notify in this way as I do not really now what needs doing.


Tuesday, January 23, 2007

Stichting Open Progress

Stichting Open Progress is the Dutch not for profit organisation that is the legal organisation behind OmegaWiki. As OmegaWiki is growing to the extend where we have to consider contracts for hosting, grants and the like, we had a need for an organisation.

The need for an organisation was also felt as we already had some projects where we would have been better able to do things when there was a legal entity backing up the activities. Some of these projects are quite substantial.

Open Progress aims to develop both Open Source/Free Software and Open Content/Free Content projects. As part of its mission it gives room for projects that are aligned with the aims of the stichting. Obviously OmegaWiki is the first; from an organisational point of view, the OmegaWiki commission decides on the issues that arise. Resolution will be enacted for the project by the stichting provided they are in line with the Dutch law and, provided they do not circumvent the aims of the stichting. This way Open Progress hopes to make OmegaWiki a safe haven where people and organisations work in the understanding that the aims of the project will be respected.

There are two websites for OpenProgress; in line with the experiences of the Wikimedia Foundation, we have both an internal and an external wiki. The internal will use Semantic MediaWiki to leverage as much as possible the information that we will include. As the information will include both personal information and confidential project information, the internal will be invite only.

Gerard Meijssen
voorzitter Stichting Open Progress

Monday, January 15, 2007

Destinazione Italia

Destinazione Italia is a project of the University of Bamberg. It provides training for people learning an advanced level of Italian. Bamberg is a German University and many of its students are German. Many of the students do have a different mother tongue. Learning a third language based on the knowledge of a second language is less effective than learning based on the knowledge of the mother tongue.

I am really proud to announce that OmegaWiki has been selected by the University of Bamberg as the platform that will host the lexicological information for "Destinazione Italia". The initial phase of the project will create a lot of Italian based DefinedMeanings. In the second phase we will translate these words to English, German and Spanish. The third phase is to find translations in as many other languages as we can get.

Research done by Zdenek Broz learned, that when the combination of quality translations of German, English and Spanish is found, it will allow the inclusion of translations of other languages when these translations are shared in a different resource. According to Zdenek's figures this will get us an accuracy of around, probably better than 95%.

There is a budget to get us many translations in other languages. The sweet thing is, when we are able to provide quality translations, the budget can be used for other things. This can be to improve the OmegaWiki usability, it can also be to spend money on a language that is not part of the initial list of languages "Destinazione Italia" supports.

The challenge is therefore, how much can we do with a limited budget. What will be the added value of creating content in a Wiki environment. When will OmegaWiki reach the tipping point where collaboration in OmegaWiki is the obvious thing to do, "Destinazione Italia" will help us reach that point. :)


Saturday, January 13, 2007

Alexa and statistics again

As you may know, Alexa is a company that tries to divine the relative ranking of one website when it comes to traffic. It uses some functionality that is associated with the Internet Explorer browser. This is a browser that comes standard with the Windows operating system and it used to be absolutely dominate the global market. This market was eroded by the Firefox browser, Firefox has carved out a niche for itself and has a worldwide use of more than 15 percent.

The distribution of the use is different from country to country; in Germany Firefox is much more popular. The distribution is also different from website to website; for OmegaWiki a big percentage of people use Firefox; in November the traffic from both browsers was evenly split. This means that the reporting of Alexa has a bias that severely affects its accuracy.

For OmegaWiki Alexa provides the best statistics we currently can provide you with. We have a technical issue that limits the usefulness of our webaliser statistics. Because of the name change from WiktionaryZ, you do find that there are separate OmegaWiki and WiktionaryZ statistics at Alexa.. I wrote them about it, and Alexa will merge the data in one to two weeks.

Any way, better statistics in two weeks, both from Alexa and from our webaliser. It will be interesting to see more realistically if and to what extend our traffic is evolving.


Tuesday, January 09, 2007

Relation types

OmegaWiki includes one thesaurus at the moment. The GEMET thesaurus was a boon, having it demonstrated really well that what was then WiktionaryZ is able to include a thesaurus and does a good job showing relations.

The next step will be to demonstrate that we can reliably include multiple thesauri. This is a lot more complicated. The problem has to do with the relation types used and what they mean. The issue is that you cannot infer that what is meant by a particular phrase like "is part of" in one thesaurus means the same in an other.

This means that you have to tread carefully. The first thing that you can do is treat a collection as a self contained unit. The relation types would as a consequence be only available and applicable to those DefinedMeanings that are part of the collection.

When a collection is to be integrated, there will be a need to merge those DefinedMeanings that are conceptually the same. This may merge pre existing relations and collection relations. In effect this may demonstrate that certain relation types are indeed the same and consequently the collection relations may now get a relation type that is of an higher level.

The higher level relations are based on domains. You will agree with me that only organisms include proteins. The consequence is that both parts of such a relation will have to be either an organism or a protein.

The last level would be the universal relation types. They will be true never mind the domain. Currently ALL relation types can be universally applied. The current GEMET relation types will not remain that way. They will prove to be quite arbitrary and I expect that we will at some stage restrict their usage. This will likely be offset by functionality that will offset the pain of losing a tool that is quiet popular.


Sunday, January 07, 2007

IPA or the International Phonetic Alphabet

I woke up having dreamt of OmegaWiki. I had a brainwave; it is easy to include IPA into OmegaWiki. I sprinted out of bed, asked Leftmost if it could be done, it could and then I started to look into IPA for the first time. I have started reading and I do admit understanding the text is beyond me. Some facts that can be found:
  • IPA has approximately 107 base symbols and 55 modifiers.
  • There is a specific chart for the sounds of English.
  • Different resources use IPA in different ways.
  • There are even different symbols used for British English :(
The observation that I always made was that on the English Wiktionary there is one "broad" IPA transcription. And, it is also made clear that the English language publications "cheat" to make it better understandable for people who already speak English.

The result is that, yes we can enter IPA transcriptions when we enable it. The problem however is that we should not allow IPA transcriptions optimised for the English speakers. OmegaWiki is to be used by people with ANY language as a background.

I think that before we enable IPA transcriptions there should be some more discussion.


Wednesday, January 03, 2007

Tiny Winy Buggy Bugs ... that take loads of time to explain ...

Every now and again I have that situation ... I send people to OmegaWiki or before WiktionaryZ and they tell me: but the dictionary is completely wrong ... the reason?

Well there is that already known bug where you have a wrong page title like here with the title heavy having the contents for small. In the meantime I explained it more and more often ... all this takes time - the first impression we give when people look at such pages is: OmegaWiki is full of errors - even simple stuff seems to be wrong. They don't know that this is a bug and I am wondering how many went away being deluded by our product.

Is it so difficult to get this fixed? How valuable is our time that we need to explain the same thing over and over again?

Well: for sure it must be done before going officially live, otherwise our credibility will be very much questioned and we will need loads of hours to explain the same bug over and over again.