Monday, September 03, 2007

Connecting data from different databases

In OmegaWiki there are different datasets. These represent different origins and have a different emphasis. What we are working on is to connecting the data in these different datasets. Currently over four percent of our Community data is connected to data of the UMLS.

These connections are not without problems. The UMLS does not have the same (lexical) outlook; it is quite happy to have a singular and a plural to be part of the same concept. In OmegaWiki we do not support the notion of plurals yet. For the UMLS it is not a problem to include Geologists as it is included as a subject heading. We have it connected to geologist.

Lyme disease has several synonyms that are problematic from a lexical point of view; only "Lyme borreliosis" is what I expect to find in a dictionary. This does not necessarily mean that "Borreliosis, Lyme" is not useful to have. The Community database knows some 15 translations and thereby adds value to the English only content for Lyme disease.

With four percent of the Community Database connected, in reality we haven't scratched the surface of the UMLS. The UMLS is a well explored resource and I am sure that there are many resources that have made connections already. I hope we will find the people, the organisations willing to share the work that they have already done.


No comments: