Models based on language
Although earlier models (vectorial, boolean, probabilistic and relevance feedback) are very usefull, today are very important the natural language proccessing. Therefore another of the models to take account is the model based on language. These models are based on a set of knowledge to get decipher and interpret texts to obtain a list of descriptors automatically.
Natural Language is the most ambiguous of all languages and every word depending on the context in which they are located, can mean great variety of things. So it helps languages documentaries knowledge representation as thesauri and ontologies to try to decipher the natural language. If this is related to documents on the Web, you can see how there are other techniques (metadata or semantic languages) to be able to represent the knowledge that contain power and retrieve information. Within semantics languages the most famous is XML (eXternal Markup Language).
If all the documents on the Web were structured, the process of recovery of the information would be quick and easy, but this is unfortunately not the case and there is a large percentage of unstructured documents.
Models based on inference nets
The inference networks have two sources of emergence: retrieval probabilistic model and Bayesian networks. Within a network inference there are two networks that comprise: network consulting and network documents.
The referral network arises when the user in question makes his question. This network has two types of nodes: consultation and terms (terms of the documents). In each term node will end arches (represented by arrows), which connect you to corresponding query nodes.
In terms of the network of documents, it is a fixed network, remains unchanged. It is composed like the previous two types of nodes: terms and documents. These nodes correspond to the terms of documents and the documents respectively. For each node type document emerging arches that relate to the terms indexed.
As comes from the probabilistic model, the next step is to calculate the odds and once it has been estimated is the inference, which would document each instance of successive and calculates the probability that the query is satisfied with the document instanced.
This model introduces a series of random variables that represent whether the information requested has been met. These random variables are binary.
That a particular document is relevant is determined by the supporting evidence that a particular remark (dj) gives the query (q). It is represented as follows:
P(q^dj)
This page has been developed for one Computer Engineering subject of Carlos III University of Madrid, specifically, Recovery and Access of Information.
Versions available:
Topics made:
Unsupervised Information Extraction and Retrieval
Usability and accessibility in the positioning and information retrieval
Also of interest:
Retrieval motors of XML/RDF documents
Retrieval y organization of information
Process Language for Information Retrieval
Metadatas and XML/RDF documents for retrieval
Retrieval and Organizing Information
Extraction information whith supervised clasification
Organizing information whith unsupervised clasification
Retrieval Motors of XML/RDF documents