OmegaWiki: May 2007

Friday, May 25, 2007

After a week of hacking, testing !!

A lot of work has been done on the OmegaWiki functionality. We have been working on functionality that is of importance to the organisations that we hope to collaborate with.

There were several issues that we have dealt with:

Support multiple "data-sets" within a single OmegaWiki installation. These sets can be used to store imported "authoritative databases," such as scientific databases.
Users can navigate within a data-set or choose a different one to look at. The default set can be configured globally, for a user group, or for an individual user.
Different data-sets can have different permission levels.
DefinedMeanings in different data-sets that are identical (describing the same concept) can be mapped to each other.
When data is imported, we can choose which data-set to import it into.

There are several parts of the puzzle that are still missing; we are however at a stage where we need to test our data. So we are going to make this functionality go live soon. The first thing is to know that after all the database changes and much refactored functionality everything still works.

The next thing will be to experiment with a first authoritative or additional database. The obvious first resources are the GEMET collection and the ISO-639-6 collection. This is all in preparation of more partners that will be collaborating in the OmegaWiki environment.

More functionality will be implemented in the coming weeks:

The possibility to add multiple values without having having to reload the editor each time
Allowing for annotations that are dependent on previously set values; this will for the first time provide us with terminological functionality
More functionality is in the pipe line, I think you will love it when we have it :)

Thanks,
GerardM

PS It was a fun week, we had a day with a negative number of lines added. We had to change functionality to enable the software to run under Windows. To relax, I have read several chapters of Accelerando. It was fun to watch Kim and Erik work together, my appreciation for both grew. It was gratifying to see my dream become more of a reality :)

Sunday, May 20, 2007

Annotations, hyphenations and IPA

On OmegaWiki we aannotate. In addition to the sample sentences, it is now possible to add hyphenations. A thank you to Sean Burke and Kim Bruning who made this possible.. :)

It is also possible to include the International Phonetic Alphabet or IPA. On the one hand we should feel confident that people will do good. On the other hand, a lot of the IPA notations out there are not useful because they assume that the persons using it have a specific background.

In OmegaWiki we have a public that is truly multi-lingual. This is best experienced when you change the user preferences to another language. Most of the language labels may be shown in the selected language. The consequence of a multi-lingual public is that only IPA notations without language specific shortcuts are useful.

I am sure that you have an opinion about this, we hope to learn your arguments ..

Thanks,
GerardM

Monday, May 14, 2007

Domains and OmegaWiki

Some days Lejocelyn added a feature request about Domains on OmegaWiki. Being one of those points that are also most relevant to me personally of course I answered. Why domains are so relevant? Well: let's say we have 1.000.000 expressions for English-German for a translator, but for us only a certain set of data is relevant when we do translations, so having all 1.000.000 Expressions to search, with all potential results in our glossary window is some kind of an overkill and instead of helping you to find the right term it would take you the triple of the time you need to look things up in a dictionary (let's say about physics or medicine).

Dictionaries are general, yes, but then the amount of specialistic terminology is limited to what is most often used, therefore each of us still has these very special dictionaries about just one topic and these are our most valuable tools besides Internet (well yes, there are terms that are not in our dictionaries, so we have to search for them in available texts about the topic we are translating).

What I would like to say with that: domains might not be relevant to somebody searching for just one word every now and then, but they are most relevant when you want to use a ressource in a professional way.

Thanks for considering to have Domains within OmegaWiki.

Friday, May 11, 2007

OmegaWiki supports many linguistic entities

OmegaWiki aims to include all words in all languages and provide both lexical, terminological and ontological information. As the discussion of what makes a language is an endless one, those languages that are included in the ISO-639 codes are the ones that are supported.

Having chosen the ISO-639-3 to start of with has proven a great start. It did however not provide the granularity needed to categorize words to their linguistic entity. How to deal with languages that are written in several scripts, how to deal with regional differences? This is what this iteration of the ISO-639 standard does not deal with.

The implication is that this standard on its own does not suffice. By combining the data with other standards with other codes it is possible to provide more granularity, but how to deal with dialects like Westfries, that is spoken in the area where I grew up?

As the OmegaWiki project was evolving and getting traction, I got into contact with Debbie Garside. She is heading Geolang an organisation that has been preparing for a long time the next iteration of the ISO-639 standard, the ISO-639-6. The aim is to include at least 25.000 linguistic entities in a hierarchical structure. Adopting this data would allow OmegaWiki to better achieve its aim; include all words of all languages.

When a standard is published there is a prescribed period in which the public is invited to comment on a standard. So far this has been done using e-mail. Experience shows that when the amount of subject is too big, e-mail is not a tool to cope. Geolang had explored the option of using Wiki technology before, this sadly did not lead to the right synergy. In OmegaWiki however, there was both an active interest in language standards, it included not only the Wiki methodology, it even allows for the inclusion of the data in a true hierarchical way.

By publishing the data in a wiki, in essence everybody with an interest in orthographies and dialects is invited to comment, modify and add to the hierarchical data. To make this into a standard, there will be a need to assess the community generated data and assert the validity of the information provided. This is where the World Language Documentation Centre will play its role. As its name implies, it documents languages and it will do so in the broadest sense of the word. Obviously an organisation like this will only function well when it is as an organisation an inclusive organisation. The make-up of the current board reflects many specialities that make up linguistics and the language industry.

It is with a fair amount of satisfaction that I can announce that Sean Burke, one of the volunteers of OmegaWiki has imported the first batch of the ISO-DIS-639-6 data in time for the inaugural meeting of the World Language Documentation Centre. Both OmegaWiki and the WLDC will rely on collaboration, to get the necessary work done. Our challenge will be to provide the infrastructure and the minimal organisation to start and sustain our projects.

With the inaugural meeting, the WLDC it is proclaimed to the world that as an organisation the WLDC is ready for business. With the first data available in OmegaWiki, the first request to the world to collaborate on the languages that are spoken the orthographies that are written goes out. it is the start of acquiring the meta data that helps us understand the data that is already out there and consequently make from all this data information because we will become better able to parse the data.

Thanks,
GerardM

Sunday, May 06, 2007

New functionality

OmegaWiki has collections. These collections serve to indicate that certain DefinedMeanings are related. Collections can serve a purpose; the GEMET collection for instance is a resource that was the data that started our project. The OLPC collection is a list of the first words that we want in all language to start a multilingual dictionary for the OLPC project.

In these statistics, we have a tool to tell people what projects we have within OmegaWiki. This allows people to work on things that are of interest to them. The really sweet thing is that it shows like a work in progress, it shows what needs doing and, what has already been done.

There are several projects that are dear to me and can use more attention:

The OLPC collection aims to give kids a dictionary in their language
The ISO-639-3 collection is used for the localisation of our user interface
The Swadesh collection is for linguists one of the more relevant collections when they compare languages
The Destinazione Italia collection is a project that will be used for students of advanced Italian. This is a project by the University of Bamberg.
Yes, the GEMET collection is very dear to me, it is an important resource of environmental terminology. It is used in many places and it was a boon when we were given permission to use it in our project.

When you do not see your language in a collection, just add one word to any of the DefinedMeanings that are part of the collection and the next time it will be there. When your language is not supported in OmegaWiki, let me know and I will see how to remedy this.

Thanks,
GerardM