Personalized medicine (PM, Personalized Medicine) seeks to identify personalized therapies that make safe and effective individualized treatment of specific patients. One of the great difficulties in carrying out this clinical practice effectively is that currently there are no flexible information systems capable of providing accurate knowledge, updated and interrelated stratified based access to multiple data sources heterogeneous type . All this information, generated in experimental studies , clinical trials and in daily clinical practice and recently through biomedical sensors and large data sets freely available and interlaced (Open and Linked Data) should become an extraordinary source of knowledge for the advancement of the PM. However, the PM is currently facing great challenges. It is necessary to integrate heterogeneous information scattered in multiple origins, different genre, domain, structure and scale, which also plays a very important textual component. To meet these challenges, this project proposes a coordinated application of information integration techniques for type cover heterogeneous sources of text and data mining to facilitate the extraction of associated knowledge.

The main objective of the project is to design tools to integrated and intelligent access to information related to get useful knowledge extraction in the context of the PM. We propose three usage scenarios: (i) assistance to healthcare professionals during the decision making process of clinical settings, (ii) access to relevant information about their health status and dependent chronic patients and (iii) to support evidence-based training of new medical students. Most effective techniques are proposed for operations such as summarization, retrieval of images from text , information retrieval, named entity recognition and extraction of information from large data sets from sets of sensors and using open data. Tools will be implemented to gain knowledge from biomedical mainly public resources. They will design an architecture and Web application framework that enables integration of processes and techniques of text and data mining and integration of information in a fast, consistent and reusable (via plugins). Finally, we develop intelligent tools for user support in the three scenarios defined: decision-making for the diagnosis and treatment , patients, and training. In addition, experiments were conducted to evaluate both effectiveness and usability by conducting systematic and users. In the case of the former, participating in competitions like TREC -Medical Records, CLEF, TAC, DDIExtraction, i2b2, BioCreative, CoNLL Shared Task or BioNLP Shared Task. Evaluations with users, will consider both open and controlled environments.

Related projects

MedicalMiner (TIN-2009-14057-C03-01) is a project for the integration of explicit knowledge in text mining techniques for the development of translational medicine tools.

FlipIT!–Flipped Classroom in the European Vocational Education (2015-1-HU01-KA202-013555). The project is funded by the European Commission (European ICT Sector Skills Alliance - VET open course for mobile apps creators (AppSkil). Public contract (554271-EPP-1-2014-1-UK-EPPKA2-SSA). the analysis and subsequent development of a methodology based on the pedagogical model FLipped Classroom applied to VET.A MOOC for teachers will be developed and pre and post test will be carried out.

BIDAMIR (BIomedical Data Mining and Information Retrieval, TIC 07629). Excellence Research Project of Junta de Andalucia. The main objective of this project is the development of an intelligent system of clinical information that allows access to textual information and extract useful knowledge from structured data sources.

VirtualcloudCarer(TSI-020100-2011-83) is a project for the development of a highly personalized Service Platform for each of the actors involved: dependents, family and primary care professionals, hospitals and social services, allowing, on the one hand, sensing and telemonitoring of the dependent person and his / her environment, both domiciliary and external, and on the other, making possible the integration in the Digital Society of those people with their physical capacities widely degraded, adapting the systems and the environment to them and not vice versa.

MAVIR (Improving Access and Visibility of Multilingual Web Information for the Community of Madrid, S2009 / TIC1542). The MAVIR Consortium is a research network co-financed by the Community of Madrid within the IV Regional Plan of Scientific Research and Technological Innovation (IV PRICIT) and formed by a multidisciplinary team of scientists, technicians, linguists and documentalists to develop an integrative effort in the areas of research, training and technology transfer.
NAVIGA (E!4583 / CDTI) is a European project funded by the "Eurostars Eureka" initiative. The Naviga Project aims to reduce the risk that social groups with special vulnerabilities, such as the elderly or disabled, may be victims of digital progress, by making available to these groups the tools to stay active both in the exercise of the mind and in the social life.


Intelligent Systems Group (GSI), European University of Madrid (Madrid-UEM). Subproject 1 (UEM-IPHealth): Integration of access methods to open sources of information and sensor data for health education and decision-making (TIN-2013-47153-C3-1-R)

Laboratory for Information Retrieval and Mining of Texts and Data (Labyrinth), University of Huelva (Huelva-UHU). Subproject 2 (UHU-IPHealth): Text and data mining to support decision-making and learning in the field of health (TIN2013-47153-C3-2-R)
Next Generation Computer Systems Group (SING), University of Vigo (Ourense-UVigo). Subproject 3 (UVigo-IPHealth): Platform for Integration of Intelligent Techniques for Biomedical Information Analysis (TIN2013-47153-C3-3-R)



Manuel de Buenaga Rodríguez
Diego Gachet Paez
Enrique Puertas Sanz
Margarita Rubio Alonso
María Asunción Hernando Jerez
María José Busto Martínez
María Teresa Villalba de Benito
María Cruz Gaya López
Rosa Belén Mohedano del Pozo
Fernando Aparicio Galisteo
Rafael Muñoz Gil
María de la Luz Morales Botello

Laberinto (UHU)

Manuel J.Maña López
Jacinto Mata Vázquez
Miguel Á. Vélez Vélez
Manuel de la Villa Cordero
Noa Patricia Cruz Díaz

SING (UVigo)

Florentino Fernández Riverola
Reyes Pavón Rial
Rosalía Laza Fidalgo
José Ramón Méndez Reboredo
Daniel González Peña
Miguel Reboiro Jato
Fernando Díaz Gómez
Francisco José González Cabrera
Mª del Carmen Rodríguez Otero
Eva María Lorenzo Iglesias
M. Lourdes Borrajo Diz
Adrián Seara Vieira
Rubén Romero González
David Ruano Ordás



From the perspective of the end user, BioClass is a platform that focuses on the application of reasoning models for the classification of texts. It is designed to work with the results obtained from a process of retrieving information from a text database, where the documents may or may not be relevant to a specific topic. BioClass takes this data as input and offers multiple filters and machine learning algorithms to handle the automatic classification problem. From a developer perspective, BioClass offers an abstraction layer that faces the classification process. Thanks to this, the developer can use its architecture and apply new models of reasoning.


Busclimed (Biomedical information finder for mobile devices) is an application that allows you to consult information about medical terms through the use of Linked Data. The patient or health care professional can view terms related to a particular disease, symptom or drug, consult scientific articles from sources such as Pubmed or Medline Plus about them, or see also information from the National Drug File when the searched term is a drug.

Download the android app from the Google Play Store

Big Data Analytics in Cardiology

Big Data analysis technologies are having and will have a huge impact on health. The benefits of "Big Data" include improved quality and accuracy of clinical decisions, improved processing speed for large amounts of data, and detection of diseases at an early stage. Here we use tools compatible with Big Data technology to predict the mortality of patients in an intensive care unit using the R software.


The objective of the ClinLaP (Clinical Language Analysis Platform) application is to help patients to better understand what their clinical history says, showing relevant information extracted from various biomedical sources. To do this, the system has a language analysis module that automatically extracts those medical terms that are relevant in the text, and allows the user to obtain information about the meaning of each term with a simple click. In addition, interesting related information such as symptoms, treatments, medications and scientific articles in which the disease or drug is mentioned, is shown, among others.
This application has been awarded the "Fujitsu Linked Open Data 2015"


jARVEST (Java web harvesting library) is a simple web scraping tool. It is implanted through a powerful domain-specific language based on JRuby, facilitating development with minimal code.


DISSUM is a tool to support the process of labeling a biomedical corpus of clinical evolution sheets for the semi-automatic creation of hospital discharge reports using automatic summarization techniques. The process of annotating a corpus for abstracts, essential for applying automatic learning and quality assessment techniques, is a complex task requiring chronological access to documents, selection of meaningful sentences within size constraints, as well as typological differentiation of these sentences. A tool has been developed that manages the whole process facilitating and economizing the work of the human annotators.

