C/ Tajo, s/n, 28670 Villaviciosa de Odón
Nowadays many newspapers offer digital access to their contents. Moreover users can subscribe to newspapers’ services and receive daily news by e-mail. Unfortunately, most of them are simple transcriptions of their printed version. More advanced systems include user-profiling options which allow users to define what kind of information they want to receive.
There are two main approaches to define user interests about content. First, category-based systems list some categories –usually newspaper services are required to automatically categorize each news item for each user-defined category to build each user final message.
Through last years our efforts were oriented to offer personalized information access by integrating all previously defined user-profiling systems in a monolingual setting, such as MERCURIO.
However, there are many circumstances which favour multilingual information systems –specially at the actual European Union context, where information flow involves interactions with documents in several languages. This kind of requirements promote automatic translation and multilingual services –for instance, EU official organisms make use of translation systems like EC Systran, multilingual information access like CELEX, and multilingual systems for analysis content on the Internet as POESIA . In this paper we present Hermes1, a multilingual news filtering system which allows users to receive personalized messages containing most interesting news extracted from digital versions of several European newspapers, using several languages.
Hermes has been designed for providing personalized news according to user interests, sending by e-mail a message with a set of news (title and a summary). For generating each message, both user and news information are retrieved, formally represented and properly processed to obtain the final resulting news representation for each user. As
we describe below, news selection is based on the Vector Space Model (VSM), applying a simplification of Rocchio algorithm that was pre-viously applied with satisfactory results.
Then, the system acquires news information. Everyday Hermes connects to e-newspapers, one in Spanish and another one in English, and gets textual content from each news item. These are processed to obtain their summaries and an internal representation according to the VSM: each news item is mapped to a vector which ponderates each term relevance. Each item is also categorized to be sent to users who selected resulting category. That categorization process involves a generally accepted
categories system (Yahoo! and Yahoo! Spain first level categories), which was processed to extract a VSM representation for each category.
The evaluation process involved a group of 23 users taken from the research team, a group of students (Computer Science and Journalism) and some users that are neither related with computers nor journalism. Evaluation takes into account quantitative and qualitative aspects of the system, made by the selected users. Qualitative analysis have been made using data taken from objective parameters of the system, and a final valuation that reflects user satisfaction with the system and its results. Then, a report is done by evaluators showing positive and negative aspects of the system.
After evaluating Hermes we can conclude that the main goal has been reached. Hermes is one of the first cross-lingual personalized e-news systems, and has proven a valid prototype to demonstrate and evaluate how several text analysis and natural language processing techniques can be combined to improve communication across different language environ-ments. Of course there are several improvements that can be made in the system.