C/ Tajo, s/n, 28670 Villaviciosa de Odón
The “Buscador Geoléxico” contemplate algorithms and heuristics to relate self-created physical distances between words associated to places with lexical distances between words. Based on these criteria, in response provide a set of places reasonably close to a given location, whose name or description includes words lexically introduced as the search object.
The obvious application of this system is part of a search for places of interest, which are available for the user’s location – either because they represent, have GPS or because there is a locator service mobile provides the information – to a POI search application (points of interest) in the environment.
Currently, we are working hard to incorporate the “natural language” as a logical means man-machine interface. Many research centers, universities and companies are actively working on specific results and, ultimately, increase awareness about the possibilities of interpretation of texts introduced by humans to provide services of any kind.
The most common document search in current systems provides a collection of documents with a certain mentality of discrimination. Put another way, a document contains a certain word or it does not, no intervening space. It seeks a result assessing whether each keyword appears in the documents, and containing them are presented as a management system: There is no interdependence between the documents for the search.
In search classical models was basically measure the relevance of a document based on the number of occurrence of the words contained in the document with the words of the query. A deficiency of this way of measuring the relevance is not taken into account the semantic context of the word, and as a result there are two fundamental problems in finding information using these methods, synonymy (different words with the same meaning ) and polysemy (same terms with different meanings). In the classical models are not to be taken as relevant documents containing terms with the same meaning that one of the words in your query this fact harm the factor called “recall”, but nevertheless be returned as relevant documents containing equal terms to consultation but have different meanings, this fact will reduce the “precision”.
The current trend in state of the art (one might cite Google as the main competitor in this type of solution) is oriented in its newest version, to latent semantic indexing (LSI, Latent Semantic Indexing), provides a significant advance. This methodology includes statistical probability and correlation that helps deduce the semantic distance between words.
In addition to storing the lists of words that have any document in the collection, the method examines the document collection as a whole, and thus to discern which documents contain the same words.
LSI is concerned not only with a document to study keywords and list them in the database, but also the study of a collection of documents and the recognition and identification of words that are common between these documents. Thus, leads to the conclusion about the semantic relationship between the words used in these documents.The process then finds documents that include or make use of these words semantically close. The resulting documents are indexed to be closely related to a context, according to latent semantic indexing.
Latent Semantic Indexing proposes a method to solve the problems of traditional methods.The idea is to move from a set of terms to a set of entities where we draw the latent structure in the association between terms and documents. To analyze this structure was chosen latent semantic analysis method (two-mode factor analysis) based on the Singular Value Decomposition.
LSI considers documents that have many words in common are semantically related, and conversely, those with few words in common are semantically apart. This idea fits surprisingly well with how people process information.
However, remember that all LSI techniques not understand at all what the words mean, although the set may appear as “smart”.
In search of a database with LSI indexing, the system searches document similarity values for each keyword, and does not require semantically next two documents share all keywords. No need to match all the words to that offer meaningful results.
The “Search Geoléxico” goes even beyond LSI, which is what they first see it seeks to find how to structure a data model and physical distance pair join or lexical semantics.
Experience of similar project group
The research Intelligent Systems Group of the Universidad Europea de Madrid (UEM) directly related to project activities are:
- Intelligent Information Access
- Language Engineering, Ontologies and Multilingual Environments
- Advanced Interface Design and Ubiquitous Computing
Related research projects:
- MAVIR : Improving Access and Visibility Multilingual Information Network in the Community of Madrid
- ISSE: Semantic-based Interoperability for Electronic Health
- NIMOV: System, use of mobile technologies for monitoring patients: early discharge of children underweight.
- SINAMED: Design and integration of summarization techniques and automatic text categorization to access bilingual information in biomedical
- MobiHelp SSystem tracking and tracing risk patients (feasibility study)
- ISIS: Intelligent System for Integrated Access to Information Patient History and Related Medical Documentation
- ALLES: “Advanced Long Distance Language Education System”
- Observatorio de tecnologías de la traducción (OTT)
- TEFILA2: Techniques based Filtering Language Engineering, Machine learning and agents
- POESIA (Public Opensource Environment for a Safer Internet Access)
Among the results of the project include the development of a pilot have been used together the following technologies: Application Google Maps, Google Translator, WordNet ontology and Android platform.