Saturday, March 10, 2007

Managing data with some SQL

The great thing about OmegaWiki is that the data is in a database. You might say that this is not that special, every wiki uses a database. Today, we have as a first time done some curation on the data; everywhere where a word in en-US was written exactly the same as in English, we have deleted the English. One example is the word "competition", in the history you will find the deletions.

I am really grateful that Leftmost has started to use SQL to fix things for us. It saves us what is most valuable; the time of our editors.

There are other things that we can do, I have asked to have all Bulgarian words that are capitalised changed to lower case where the Russian words are lower case. This is to fix something that is done consistently this way in the GEMET database. With these improvements, the GEMET data becomes usable for other purposes; things like data mining .. :)

Thanks,
GerardM

1 comment:

GerardM said...

Sean has run another nice query; for all the words in Bulgarian that were upper case and where the Russian equivalent is lower case, make the Bulgarian also lower case. This did fix a lot of the GEMET data. All the Bulgarian words used to be in upper case...

Thanks,
GerardM