C/ Tajo, s/n, 28670 Villaviciosa de Odón
The project’s objective is TEFILA design, development, evaluation and promotion of techniques for the development of advanced tools, flexible, configurable, and more effective than current filtering of information on the WWW, aimed at Internet service providers .Develop techniques allows Internet service providers to companies offering new value-added service. This service provides greater assurance client companies a productive and profitable use of the Internet from the workplace.
The project TEFILA 2 is the design, development, evaluation and promotion of techniques for the development of advanced tools, flexible, configurable, and more effective than current filtering of information in the WWW, aimed at service providers Internet. Develop techniques to allow Internet service providers to companies offering a new value added service. This service provides greater assurance client companies a productive and profitable use of the Internet from the workplace.
Scientific and technological objectives
Current tools for filtering Internet content are very limited, mainly due to technical employing overly simplistic, covering few content domains (usually limited to pornographic content), content filtering usually in one language (usually the English), and lack the flexibility to be adapted to other content domains and languages.
In the project TEFILA 2 aims to develop a set of innovative techniques aimed at producing more effective filtering systems, flexible and configurable than today. The main scientific contributions of TEFILA 2 can be framed within the following research areas:
- Natural Language Engineering. The content filtering task is specifically a document categorization task [Sebastiani, 2002]. First, it is necessary to improve the filtering mechanisms currently used to increase the effectiveness of categorization.Moreover, it is necessary to develop techniques aimed at developing a multilingual categorization system, given the nature of Internet multilingualism.
- Machine Learning. The use of learning techniques significantly reduces the development effort text classification systems. It is necessary to investigate the use of learning methods based on costs because users typically considered more harmful not to block harmful content otherwise. By applying machine learning techniques enable software agents learn criteria for recognizing objectionable Internet content in different formats (text and images). On the other hand, agents need to learn how to coordinate their actions to collectively solve the problem of filtering objectionable content.
- Agents Technology. We will integrate this technology in the project by providing distributed and collaborative vision of knowledge. Diversifying and specializing the operation of the system and its various operating parts get greater adaptability and robustness of the system in filtering information online.
- Image analysis. To develop image classification techniques to learn about real databases to enhance text categorization, and search techniques objects to mark images containing relevant elements as symbols of violent groups and the like.
In order to evaluate the techniques proposed in TEFILA 2, incrementally develop a prototype tool for Internet content filtering. The prototype developed is open source (opensource), thereby reducing development cost and increase its spread.
Additionally, we proceed to the investigation and evaluation of an alternative architecture that allows to offer a web service that can be accessed by other applications (eg, browsers, proxy tools, etc).