Stepping Stones & Pathways: An Alternative Interpretation to User Queries

An Alternative Interpretation to User Queries

IR systems have evolved from difficult to use, expert-mediated retrieval systems, to become the gateway for everyday users accessing large collections, like the ACM Digital Library, ResearchIndex, or the WWW itself, where browsing alone is impractical. Studies show that in the case of the WWW, the largest collection ever available, at least 85% use search engines to find material in the Web, yet the "inability to find relevant information" is still one of the top reasons for user dissatisfaction. The causes of this "inability" are varied. We propose to explore a different way to interpret the information request in a user query, and proceeding from that, a different way to present the query result and allow user feedback on it.

Through most of its modern history the IR field has had a common interpretation and approach regarding the intention of a user query. In this interpretation, a user query is the practical description of an information need. It describes a concept the user is interested in. The objective of this query is to retrieve all documents that are closely related to that concept. The input to the IR system is a set of words describing the query, possibly using logical connectors and other modifiers. The output of the system is then a list of documents ranked according to the likelihood that document content matches the concept.

While this interpretation has worked successfully most of the time, it neglects a not trivial observation: The query represents a single concept in the user's mind, but it may not be a single concept in the document space. That is, a query that from the user's point of view is a single concept may be more than one topic when analyzing the topics on a collection and their relationships.

We propose a new interpretation for a user query. In this interpretation a query represents two related, separable concepts. By separable we mean that the two concepts are each identifiable from the query formulation. The objective of the query then is to retrieve one or more sequences of documents that support a valid set of relationships between the two concepts. The input to the IR system is still a set of words, but now it is representing two concepts. The output of the system is a connected network of chains of evidence. Each chain is made of a sequence of concepts (stepping stones). Each concept is logically connected to the next and previous one, and the chains provide a rationale (a pathway) for the connection between the two original concepts. To increase the user's understanding of the chain, it is desirable that the stepping stones be justified by concrete documents, along with the connections (relationships) among those documents.We define and refine a probabilistic retrieval scheme that enhances retrieval through a) a framework based on belief networks that takes advantage of combining multiple sources of evidence; and b) user feedback at document, group, and relationship levels.

 
 

First year report of 18 August 2003 is available in three forms: PDF, Word, and XML

For more information contact Edward A. Fox, fox@vt.edu, +1-540-231-5113.

This page last updated 18 August 2003.