Personal projects

Text Mining in WEKA Cookbook

During my years of work on text mining, I have had a very good experience with the WEKA Machine Learning library. I have decided to share this experience with the community via my "Text Mining in WEKA Cookbook", a list of recipes in text mining using this wonderful library.

I provide my recipes (tutorials, experiments, hints) mainly in the form of posts in my blog. Please enjoy it and feel free to contact me to suggest topics for new posts.

The SMS Spam Corpus

The SMS Spam Corpus v.0.1 is a set of SMS tagged messages that have been collected for SMS Spam research by my colleague Enrique Puertas Sánz and me.

The corpus has been used in several research studies on SMS Spam filtering. An updated version highly improved by my colleague Tiago A. Almeida has been released as the SMS Spam Collection. I recommend you to use this latter one in your research.

Research projects

These are some of the Research & Development projects I have been involved in.

At Optenet

Demons (2010-2013): DEcentralized, cooperative and privacy-preserving MONitoring for trustworthinesS - a framework for security collaboration among ISPs.

WENDY (2010-2012): WEb-access coNfidence for chilDren and Young - age detection in Social Networks.

Negobot (2010): A Conversational Agent Based on Game Theory for the Detection of Paedophile Behaviour.

At the Universidad Europea de Madrid

ISSE (2007-): Semantic-Based Interoperability for Health Services.

MAVIR (2006-): Joint Research Program on Information Access and Natural Language Processing.

SINAMED (2006-): Text classification for information access in biomedicine.

FISME (2005): A R&D contract for Vodafone for the prospective analysis of content-based SMS spam filtering.

ISIS (2005): Intelligent unified access to the patient medical record.

TEFILA I & II (2002-04): Web filtering in corporations.

POESIA (2002-04): Public Opensource Environment for a Safer Internet Access - Web filtering in schools.

Hermes (2000-01): the bilingual, personalized newspaper.

CODI (1999-2000): a photograph recommender system.

Mercurio (1999-2000): the Spanish personalized newspaper.