Tuesday, June 04, 2013

When part of speeches matter

For linguists, translators, et al. part of speech (adjective, noun, verb) is pretty much important. In dictionaries, it is the first information that is given about a word. The different meanings of words are usually sorted first by part of speeches.

From a relational database-ical point of view, however, a part of speech is just one annotation of a syntrans (the association of a word and a meaning) and looks equally important as any other syntrans annotation, such as "gender", "international phonetic alphabet", "area" (indicating if a word is spoken only in a specific area).

In Wiktionary, the interface came first, and it is clear from viewing a page that a word is either a noun, a verb, etc. For a computer it is less clear. In OmegaWiki, it is quite the contrary. The relational database came first, so that the data is very computer-friendly, but then we have to build an interface on top of it and tell it what is important for a human. If part of speeches matter, we have to display them more visibly and sort meanings by part of speeches.

This feature was often requested, and this is exactly what OmegaWiki does now :-)

Definitions and translations of the various meanings of the English  word "round" in French.

As usual in OmegaWiki, the part of speech information is translated in the user language. When the part of speech of a definition is not known, it will display "??".
Definitions and translations in Spanish of the English word "square".

When none of the meanings have a part of speech, it will not display "??" because having only "??" on top of the page would only be confusing.

Definition and translation in Breton of the Cantonese word for France.
Yes, we do Breton-Cantonese dictionary :)
Most of the missing data about part of speeches of words could be imported by bots. We already have the API to add annotations easily. We just need someone who would like to run such a bot. Any programming language will do.


1 comment:

Purodha said...


(1) imho there is a grammar glitch in your post, repeated several times:

The different meanings of words are usually sorted first by part of speeches.

The different meanings of words are usually sorted first by parts of speech.

How do you think that POS data should be imported, and from where? I would like to closer look at it if you can put me on track ;-)