A generic ranking function discovery framework by genetic programming for information retrieval

https://doi.org/10.1016/j.ipm.2003.08.001Get rights and content

Abstract

Ranking functions play a substantial role in the performance of information retrieval (IR) systems and search engines. Although there are many ranking functions available in the IR literature, various empirical evaluation studies show that ranking functions do not perform consistently well across different contexts (queries, collections, users). Moreover, it is often difficult and very expensive for human beings to design optimal ranking functions that work well in all these contexts. In this paper, we propose a novel ranking function discovery framework based on Genetic Programming and show through various experiments how this new framework helps automate the ranking function design/discovery process.

Introduction

The information retrieval (IR) field is undergoing dramatic development and change due to advances in information technology and computation techniques. The large amount of digital information increasingly available in our society makes information retrieval research one of the most exciting and important fields. According to SearchEngineWatch.com,1 75% of online users use search engines to traverse the web, which indicates the importance of information retrieval in our daily life. Despite the recent advances of information retrieval or search technologies, studies (Gordon & Pathak, 1999) show that the performance of search engines are not quite up to the expectations of end users. Users often spend quite a lot of time sifting through hit lists full of irrelevant results.

There are various reasons contributing to the dissatisfaction of end users: imprecise query formulation, unfamiliarity with system usage, etc. We argue in this paper that the ranking strategies adopted by these search engines also deserve part of the blame. Ranking strategies, often called ranking functions in IR, are used to order search results in an order of decreasing relevance match with a user's search query. Most IR systems use a single fixed ranking strategy to support the information seeking task of all users for all queries irrespective of the heterogeneity of end users and queries––so-called “consensus search”––in which the computed relevancy for the entire population is presumed appropriate for each user (Pitkow et al., 2002). It is true that one of the major benefits of “consensus search” is that all users get the same results, which fosters result-sharing among users (Pitkow et al., 2002). However, there are many other cases where users prefer search results to be tailored to their own personal preference––so-called personalized search or personalized ranking (Pitkow et al., 2002). Most current search engines do not support such an advanced personalized search feature.

Both consensus search and personalized search require a good ranking function to obtain good performance. Although there are various ranking functions available, most of them are manually designed by IR experts based on heuristics, experience, or observations. Although some of these ranking functions, such as that used in Okapi (see Eq. (2)), are designed based on probabilistic theory, their performance for each individual query is not guaranteed. In other words, even though those theoretically justified ranking functions may work reasonably well on average for a set of queries, they may not work well for each individual query. In fact, various ranking function evaluations and comparative studies (Salton, 1989; Zobel & Moffat, 1998) showed that these ranking functions do not work consistently well across queries. Moreover, it requires a lot of human effort to design a personalized ranking function for each individual query. Finding an optimal ranking function for a particular query or a group of queries remains a design challenge for IR research.

In this paper, we introduce a systematic and automatic discovery framework to aid the ranking function design process. This ranking function discovery is based on an artificial intelligence technique called Genetic Programming, which is widely used in various optimal design and data mining applications (Koza, 1992). We show through various experiments using real textual data that the new ranking function discovery framework is a flexible and powerful discovery tool for optimal ranking function design.

The remainder of this paper is organized as follows. In Section 2 we review related research on ranking function design and evaluation. In Section 3 we give a formal definition of the ranking function discovery problem and describe our ranking function discovery framework based on Genetic Programming. Section 4 discusses several experiments validating our ranking function discovery framework. We discuss related work in Section 5 and conclude the paper with implications of this study and future research directions in Section 6.

Section snippets

Prior research on ranking function design and evaluation

IR systems use ranking functions to order documents according to the documents' estimated match with a user query. To facilitate this relevance estimation process, both documents and user queries need to be transformed into a form that can be effectively processed by computers. One of the most successful models is the so-called Vector Space Model (VSM) (Salton, 1971, Salton, 1989).

The VSM is the underlying model for this study for two reasons:

  • (1)

    Ease of interpretation

    The VSM is a well-grounded

Nature of the ranking function discovery problem

The problem of finding a good ranking function is illustrated in Fig. 1. “1” and “0” stand for “relevant” and “non-relevant”, respectively, in the column of “Rele.” of both document tables.

The problem of finding a ranking function can be formalized as follows:

Given as input a user query (a set of queries) and a set of training documents with known relevance judgments for the query (queries), a ranking function is sought by the discovery framework that can potentially rank all relevant documents

Experiments

To test the ranking function discovery framework, we used the Associated Press (AP) news collection from the TREC conference (Harman, 1996) as our textual data. This news collection contains more than 240,000 news articles from 1988 to 1990 and covers a variety of domains and topics. It has been used widely in the IR field to test new retrieval algorithms. More specifically, we use AP88 (79,919 documents) as the training data, AP89 (84,678 documents) as the validation data, and AP90 (78,321

Related work

There have been several efforts on ranking function optimization in IR literature.

The earliest work is done by Fox, 1983, in which Fox used polynomial regression to optimize the ranking function. Fuhr et al. (Fuhr & Buckley, 1991; Fuhr & Pfeifer, 1994) used probabilistic models as machine learning approaches. The concept of relevance description used in Fuhr and Buckley (1991), Fuhr and Pfeifer (1994) are very similar to the weighting evidences (tf,df,…) we used for ranking. The difference in

Conclusions

In this paper, by effectively leveraging the clues of different weighting features used by many IR experts, we demonstrated that a machine intelligence tool like GP can help us automate and discover better ranking functions for a variety of contexts, which would be, otherwise, very tedious and difficult for any human being. More specifically, the new ranking function discovery framework based on GP can be used to effectively discover either personalized ranking functions for each individual

References (32)

  • N Fuhr et al.

    A probabilistic learning approach for document indexing

    ACM Transactions on Information Systems

    (1991)
  • N Fuhr et al.

    Probabilistic information retrieval as combination of abstraction, inductive learning and probabilistic assumptions

    ACM Transactions on Information Systems

    (1994)
  • Gey, F. C. (1994). Inferring probability of relevance using the method of logistic regression. In The proceedings of...
  • M Gordon

    Probabilistic and genetic algorithms for document retrieval

    Communications of ACM

    (1988)
  • M Gordon

    User-based document clustering by redescribing subject descriptions with a genetic algorithm

    Journal of the American Society for Information Science

    (1991)
  • Harman, D. K. (1993). Overview of the first text retrieval conference (TREC-1). In D. K. Harman (Ed.), Proceedings of...
  • Cited by (0)

    An earlier version of this paper was presented at the 2000 International Conference on Information Systems by Fan, Gordon, and Pathak (2000).

    View full text