MEDICAL MINER

Logotipo_del_Ministerio_de_Ciencia_e_Innovación,0
logoPlanEROJO1federlogo_UHU_color
logo-uvigo.2medical miner
Principal Investigator:

Contact:
buenaga<at>uem.es

Address:
C/ Tajo, s/n, 28670 Villaviciosa de Odón

Duration:
2010-2013

Project Page:

Outline

Translational medicine is an emerging effort in medical practice that seeks to transfer scientific results from laboratories to clinical practice, for the patient diagnosis and treatment (also referred as “bench-to-bedside”). The perspective change has occurred recently as a result of the genomics and bioinformatics revolution. However, this situation has been accompanied by a serious problem: the generation of large amounts of information is causing a major bottleneck in medical research and its application. This information is in structured format, mainly related to molecular biology research, as well as text, from research results.

The main goal of the project is to analyse, experiment and develop new text and data mining techniques in an interrelated way, in intelligent medical information systems. New techniques of both types will be developed, more efficient, interrelated, and better adapted to specific aspects of the domain. An intelligent information access system based on them will be developed, offering advanced functionalities able to interrelate medical information, mainly information (text and data) from clinical records and scientific documentation, making use of standard resources of the domain (e.g. UMLS, SNOMED, Gene Ontology). An open source platform will be developed integrating all the elements. An evaluation will be conducted, analysing the new techniques efficacy as well as the whole system, in an open environment with final users.

Translational medicine is an emerging effort in medical practice that seeks to transfer scientific results from laboratories to clinical practice, for the patient diagnosis and treatment (also referred as “bench-to-bedside”). The perspective change has occurred recently as a result of the genomics and bioinformatics revolution. However, this situation has been accompanied by a serious problem: the generation of large amounts of information is causing a major bottleneck in medical research and its application. This information is in structured format, mainly related to molecular biology research, as well as text, from research results.

The main goal of the project is to analyse, experiment and develop new text and data mining techniques in an interrelated way, in intelligent medical information systems. New techniques of both types will be developed, more efficient, interrelated, and better adapted to specific aspects of the domain. An intelligent information access system based on them will be developed, offering advanced functionalities able to interrelate medical information, mainly information (text and data) from clinical records and scientific documentation, making use of standard resources of the domain (e.g. UMLS, SNOMED, Gene Ontology). An open source platform will be developed integrating all the elements. An evaluation will be conducted, analysing the new techniques efficacy as well as the whole system, in an open environment with final users.

Specific Goals

The project’s overall objective is to explore, experiment and develop new technologies for text and data mining, so intertwined, as a key element in intelligent medical information. We’ll design both types of new techniques, more efficient, interconnected, and better adapted to specific problems in the medical domain. We will implement a smart system for information access based on them that offers advanced features based on the ability to interrelate the medical information, mainly clinical histories (text and data) and scientific documentation. The overall objective is broken down into the following specific objectives:

O1. Development of new text mining techniques adapted to bilingual medical domain. Including basic operations of text categorization, text summarization, named entity recognition and information extraction, capable of addressing specific problems of special relevance in the domain, such as the treatment of negation and the use of lexical semantics Medical resources(UMLS, SNOMED, MMTX) and capable of processing texts in English and Spanish.

O2. Development of new techniques of data mining focused on the analysis of personal health information; analysis, design, adaptation and / or implementation of new techniques for classification and grouping capable of using biological explicit knowledge (provided by the techniques of  text mining) in order to improve overall performance on specific problems of medical domain.

O3. Developing a platform for integration and sharing of knowledge between text and data mining techniques. Adoption of a standard in the form of component software (plugging) for coding, storage and multiplatform execution allowing integration of developed techniques. We will develop a non-intrusive and MVC based programming environment that allows (i) interconnection, implementation and integration of techniques with a well defined input / output and (ii) independent problem design. Enhance integration with existing resources as Freeling and Gate (language analysis), RapidMiner and Weka (machine learning), ontologies and lexical resources (Gene Ontology, UMLS, WordNet and EuroWordNet), analysis of medical texts (MMTx) and implementations of particular techniques to address domain specific aspects (BioConductor). The software is geared to automatic processing of texts and information sources used in the project: scientific text from Medline, Biomedcentral and TREC Genomics Track, and clinical information from I2B2, CMC and project specific.

O4. Design of an intelligent medical information system capable of text and data mining: We will develop a system that offers advanced features based on the ability to interrelate the information in clinical and scientific documentation with implementation and testing of data mining algorithms. We will design and implement interrelated search algorithms with fully operational interaction and interfaces, appropriate for medical diagnosis and prediction.

O5. Evaluation of the efficiency, effectiveness and usability: assessment processes will be conducted with groups of end users with appropriate size, in two types of environments:

  • open: These experiments will be designed to measure the usability of the interface and the satisfaction of users of the system in medical tasks. Usability parameters to be evaluated will be: average time of execution of interesting queries and satisfaction of users.
  • controlled environment: These experiments are aimed at evaluating the improvements in the effectiveness of text and data mining algorithms, and lead on standard assessment collections such as Ohsumed, TREC-Genomics Track, CMC and I2B2 Challenge.

Participants

MedicalMiner initiative brings together three singular but complementary research groups leading the following subprojects:

References

  • [Maña2007] Maña López M., Mata J., Carrero F., Gómez Hidalgo J.M. Feature Engineering and Quick Prototyping of Gene Mention Classifiers. Second BioCreAtIvE Challenge Workshop: Critical Assessment of Information Extraction in Molecular Biology. Spanish Nacional Cancer Research Centre (CNIO), Madrid, 2007.
  • [Mata2008] Mata Jacinto, Manuel J. Maña, José M. Bermúdez, Noa P. Cruz, Patricia Jiménez. Handling Negation in Classification of Clinical Texts. In Proceedings of the AMIA Workshop on Challenges in Natural Language Processing for Clinical Data, Washington, D.C. (EE.UU.), noviembre 2008.
  • [Carrero2008] Carrero F., Cortizo JC, Gómez JM, Buenaga M., 2008. In the Development of a Spanish Metamap, In Proceedings of the Seventeenth ACM Conference on Information and Knowledge Management (California, USA) CIKM’08, ACM, New York.
  • [Buenaga2008] Buenaga M., Gachet D., Maña M., Villa M, Mata J., 2008. Clustering and Summarizing Medical Documents to Improve Mobile Retrieval, ACM SIGIR Workshop on Mobile Information Retrieval, Singapur, Julio 2008, ACM Press.
  • [GómezPerez2009] Gómez-Perez J., Kohler S., Buenaga M., Rubio M., et al. 2009. Towards Interoperability in eHealth Systems. A Three-Dimensional Approach Based on Standards and Semantics. Healthinf 2009, International Conference on Health Informatics, Oporto, Portugal, Enero 2009.