C/ Tajo, s/n, 28670 Villaviciosa de Odón
Intelligent systems for access to information are increasingly integrating text mining techniques and content analysis, and semantic resources such as ontologies. In ISIS projects and play a central role SINAMED using text categorization, automatic extraction of summaries and ontologies for improving access to information on a specific biomedical domain: patient clinical records and biomedical scientific information associated. In the development of the two projects involving a consortium of research groups from three universities (Universidad Europea de Madrid, Universidad de Huelva, Universidad Complutense de Madrid), a hospital (Hospital de Fuenlabrada, Madrid), and a development company software (Bitext).
SINAMED and ISIS projects are focused on access to information contained in a specific part of the biomedical domain: patient clinical records and scientific literature related. These two projects have a strong mutual correspondence different but complementary orientation.
Access techniques and content analysis
Automatic text categorization can be applied, for example, to classify medical reports using standard descriptors such as Medical Subject Headings (MeSH). However, the variability of language and the lack of data needed for effective learning limits the effectiveness of these systems. Moreover, text categorization has rarely been applied in the biomedical environment, while the use of this technique to medical information written in Spanish is virtually nonexistent. Such problems can be addressed with the use of lexical-semantic resources. In the medical domain, there are specific resources available, such as UMLS (Unified Medical Language System), which make it possible. ontologies include PLN, knowledge discovery and support for interoperability. However, we must make the difference between biomedical biomedical ontologies and terminologies. Biomedical Ontologies provide an organizational framework of concepts and entities involved in biological processes in a system of hierarchical and associative relations that allows reasoning about medical knowledge. In contrast, biomedical terminologies promote a standard way of naming the domain concepts (Bodenreider, et al., 2003).
In environments where access to information, abstracts (mono or multi-document) have proven useful, improving the effectiveness when applied to different tasks, such as ad hoc or interactive recovery. The application to the medical domain has a number of challenges that have not been sufficiently addressed in previous work. Among them, the following problems can be highlighted. Much of the summarization systems are designed to handle documents written in one language (mainly English), although there are a variety of text collections and resources in other languages (especially Spanish). Furthermore, most systems work with documents belonging to a very restricted domain. It is therefore necessary to develop techniques that can be applied to broader, or at least, that can easily adapt to a subdomain to another.
As in text categorization, we believe that the integration of knowledge from resources such as UMLS, which has also some bilingual components, can play an important role in solving both problems.
The ISIS project is composed of a number of partner agencies that actively contribute to the technological development of the project: GSI-Universidad Europea de Madrid, Universidad de Huelva, Bitext, Hospital de Fuenlabrada.