Friday, May 11, 2007

OmegaWiki supports many linguistic entities

OmegaWiki aims to include all words in all languages and provide both lexical, terminological and ontological information. As the discussion of what makes a language is an endless one, those languages that are included in the ISO-639 codes are the ones that are supported.

Having chosen the ISO-639-3 to start of with has proven a great start. It did however not provide the granularity needed to categorize words to their linguistic entity. How to deal with languages that are written in several scripts, how to deal with regional differences? This is what this iteration of the ISO-639 standard does not deal with.

The implication is that this standard on its own does not suffice. By combining the data with other standards with other codes it is possible to provide more granularity, but how to deal with dialects like Westfries, that is spoken in the area where I grew up?

As the OmegaWiki project was evolving and getting traction, I got into contact with Debbie Garside. She is heading Geolang an organisation that has been preparing for a long time the next iteration of the ISO-639 standard, the ISO-639-6. The aim is to include at least 25.000 linguistic entities in a hierarchical structure. Adopting this data would allow OmegaWiki to better achieve its aim; include all words of all languages.

When a standard is published there is a prescribed period in which the public is invited to comment on a standard. So far this has been done using e-mail. Experience shows that when the amount of subject is too big, e-mail is not a tool to cope. Geolang had explored the option of using Wiki technology before, this sadly did not lead to the right synergy. In OmegaWiki however, there was both an active interest in language standards, it included not only the Wiki methodology, it even allows for the inclusion of the data in a true hierarchical way.

By publishing the data in a wiki, in essence everybody with an interest in orthographies and dialects is invited to comment, modify and add to the hierarchical data. To make this into a standard, there will be a need to assess the community generated data and assert the validity of the information provided. This is where the World Language Documentation Centre will play its role. As its name implies, it documents languages and it will do so in the broadest sense of the word. Obviously an organisation like this will only function well when it is as an organisation an inclusive organisation. The make-up of the current board reflects many specialities that make up linguistics and the language industry.

It is with a fair amount of satisfaction that I can announce that Sean Burke, one of the volunteers of OmegaWiki has imported the first batch of the ISO-DIS-639-6 data in time for the inaugural meeting of the World Language Documentation Centre. Both OmegaWiki and the WLDC will rely on collaboration, to get the necessary work done. Our challenge will be to provide the infrastructure and the minimal organisation to start and sustain our projects.

With the inaugural meeting, the WLDC it is proclaimed to the world that as an organisation the WLDC is ready for business. With the first data available in OmegaWiki, the first request to the world to collaborate on the languages that are spoken the orthographies that are written goes out. it is the start of acquiring the meta data that helps us understand the data that is already out there and consequently make from all this data information because we will become better able to parse the data.


No comments: