Probabilistic Model
As its name suggests, this model is based on the calculation of the probability that the document is relevant to the consultation. So if we take either a paper among a set of m documents, there is a certain probability that the document is relevant to the question done. We have to analyze the characteristics that make a document to be relevant.
The formula for obtaining the probability would be relevant:
P(relevance) = m / N, where m is the set of relevant documents and N is the set of all documents.
To calculate the relevancy uses a series of weights given to the characteristics of the document. To see the relevance of the indices used terms which are known as descriptors with the weights that have been established. This aims to recover the documents on which there are the best descriptors which you used in the query.
As used weights, we can calculate a certain degree of relevance and to which the results can be ordered as in the vector model or the Boolean extended.
The major problem with this model is that a hypothesis needs to begin its implementation and through which the initialisation relevant documents as well as weights. Besides this counted as the number of terms that appear in the supposed independent makes the entire calculation of probabilities initial estimate is complex.
Relevance feedback
This model rewrites the query made by the user from the relevant documents obtained from an initial search. With this reformulation is intended to obtain a more accurate set of results, as well as recalculate the weights of relevant terms.
As ever more relevant results are obtained which are losing relevance will be discarded.
The modification of the consultation can be done in two ways: normal or automatic. In the manual, as their name suggests, tell the user which documents are relevant to them. The automatic assumption that elected the first n are relevant.
The algorithm followed to obtain the documents most relevant is to Rocchio.
Using this technique is intended to bring the consultation to relevant documents. The results of this model are very good because it improves a high degree retrieval of documents relevant to the consultation.
But if there is a poorly chosen word in the query that the results will be worse.
This page has been developed for one Computer Engineering subject of Carlos III University of Madrid, specifically, Recovery and Access of Information.
Versions available:
Topics made:
Unsupervised Information Extraction and Retrieval
Usability and accessibility in the positioning and information retrieval
Also of interest:
Retrieval motors of XML/RDF documents
Retrieval y organization of information
Process Language for Information Retrieval
Metadatas and XML/RDF documents for retrieval
Retrieval and Organizing Information
Extraction information whith supervised clasification
Organizing information whith unsupervised clasification
Retrieval Motors of XML/RDF documents