Thursday, July 04, 2013

View and download a list of words in any language

I was asked how it is possible to see the list of words that we have for a given language.

In the Data-search special page , it is possible to browse through the list of words in any language, using the "next 100" and "previous 100" buttons, similarly to how category pages works at Wikipedia. You can also filter your search by spellings.

For example the list of words in Micmac - an indigenous language of North America - can be consulted here (thanks, Amqui, for contributing :) ) and will look like this with definitions in French - because I have my interface in French:

Furthermore, a list of words can now also be downloaded in csv format in the new Ow_downloads special page , thanks to the work of Hiong3.eng5.

This page shows a list of lists which can be downloaded, and the date when it was generated. It is possible to click on "regenerate" to obtain a new list. The generation of an uptodate list is then processed by the server when it has time - in order not to slow down the normal operations on the server - and is usually ready after a few minutes, as one can see by visiting the page again.

If you are interested in a language that is not in the list, please request it, thanks!

It is planned to make it possible to download translation lists for any pair of languages from that page in the future.


Tommi Pirinen said...

This might be a stupid question, but where do I request new languages? I maintain a morphological analyser of Finnish and I think this csv would be a very useful resource. And I'm looking forward to the language pair lists as well, they will be very good for rule-based machine translation.

Christoph M. said...

You can request here in the blog, or directly at the discussion page of OmegaWiki:

I created the list in Finnish. We have (only) about 12000 words in Finnish, so probably many words are still missing.

Thanks for your interest :)