DESCRIPTION OF THE RUTGERS INFORMATION RETRIEVAL DATABASE (RIRD) DERIVED FROM A STUDY OF INFORMATION SEEKING AND RETRIEVING Tefko Saracevic, Ph.D. School of Communication, Information and Library Studies Rutgers, The State University of New Jersey 4 Huntington Street New Brunswick, N.J. 08903 SOURCE The Rutgers' Information Retrieval Database (RIRD) has been assembled during a study conducted from 1985 until 1987. This and several other studies are a part of a long-term effort whose collective aim is to contribute to a formal, scientific characterization of the elements involved in information seeking, searching and retrieving, particularly in relation to the cognitive context and human decisions and interactions involved. The objectives of the study that resulted in RIRD were to conduct the observations under as real-life conditions as possible related to: (1) user CONTEXT of questions in information retrieval; (2) the structure and classification of QUESTIONS; (3) cognitive traits and decision making of SEARCHERS; and (4) different SEARCHES of the same question. The approach was as real-life as possible (rather than laboratory) in the following sense: * User posed questions related to their current research or work and evaluated the answers as to relevance and utility accordingly. The texts of questions and restrictions on questions in RIRD are exactly as written by the users. The evalu- ation of answers in RIRD as to their relevance were provided by users. Users were not paid, but did receive a free search. * Searchers were professionals, i.e. searching is a part of their regular job. They were paid for their time. * Searching was done on existing files on DIALOG. There were no time restrictions on searching. * Answers provided to users were full records as available from the given DIALOG files. RIRD answers were downloaded from DIALOG files. Thus, RIRD consists of users, questions, answers and evalu- ations derived under realistic conditions as found in many regular information retrieval applications. The study was exhaustively described in a series of articles and a Final Report with elaborate appendices [1-4], thus only a brief description is provided here. EXPERIMENTAL SET-UP Forty users posed one question each, described the context of their questions and evaluated the answers as to relevance and utility. Thirty-nine professional searchers were recruited. Each searcher was tested on four cognitive tests, indicated his/her subject expertise and frequency of on-line searching. For each question there were nine searches; of those, five searches were done by different searchers and four were done on the basis of different sources for search terms. Thus, there were 360 searches for the 40 questions. Searching was done on DIALOG. For each question a single, most appropriate DIALOG file was chosen. Altogether, 22 DIALOG files were used. A union of outputs from the 9 searches for each question was created and sent to the user who posed the question for evaluation. In a separate ex- periment involving another group of 21 searchers, user questions were classified as to various characteristics (domain, clarity, specificity, complexity and presup- position). Counting the answers search by search WITHOUT elimination of the duplicates (i.e. the SUM of the answers produced by different searches for the same question), the 360 searches for 40 questions retrieved 8956 answers of which 2749 were judged by users as relevant (R), 2538 as partially relevant (P) and 3669 as not relevant (N). Counting the answers after the duplicates were eliminated (i.e. the UNION of answers produced by different searches for the same question) the 360 searches retrieved 5411 UNIQUE answers, of which 1343 were judged as relevant (R), 1448 as partially relevant (P) and 2620 as not relevant (N). RIRD contains these 5411 evaluated answers to 40 questions. FILES IN RIRD RIRD consists of 4 files: 1. QUESTION FILE (Q-FILE): contains the written texts of 40 questions as given by users and the indications as to what types of search the users wanted. 2. ANSWER FILE (A-FILE): contains the full records of answers exactly as retrieved from given DIALOG file. The only addition to the answers is the first line in each answer which identifies for each answer the RIRD question number, the DIALOG file searched, and the sequence number in which the answer was presented to the user. 3. EVALUATION FILE (E-FILE): contains for each question the DIALOG acquisition number for each answer (a number assigned by the database producer or by DIALOG that uniquely identifies each docu- ment in the file) and next to the acquisition number the user evaluation (R for relevant, P for partially relevant or N for not relevant) for that answer. 4. RESULT FILE (R-FILE): contains the summary of search results for each question as to numbers of answers retrieved, their evaluations, the user's utility ratings and the performance figures for the 9 different searches for each question. QUESTION FILE (Q-FILE) The question file is intended for information and not for searching. Users of RIRD can use the text of questions and user prescription of constraints on the questions to construct their own searches for searching of the answer file. The Q-File contains 40 texts of questions each subdivided as follows: QUESTION NUMBER: gives a three digit number for the question, e.g. 002 refers to question number 2. A1. BRIEF TITLE: question title A2. QUESTION STATEMENT: the text of the question as written by the user. Any subdivisions (e.g. paragraphs, numbering) are reproduced as given by users. TYPE OF SEARCH REQUESTED: users were given a choice to provide further information about the desired search along the following 5 characteristics: B. Specificity of the search: precise or broad C. Application for which the search was requested D. Language of answers: English only or other languages E. Any suggestions for a specific DIALOG file(s) to be searched. (Some users suggested subject areas to be searched, others left this blank). ANSWER FILE (A-FILE) The A-file consists of 40 subfiles of answers - one for each of the 40 questions, identified by a six digit number explained below and prefixed by letter Y. The duplicate answers (i.e. same answers retrieved by different searches for each question) are eliminated. Thus, the file contains altogether 5411 answers for the 40 questions. (The number of answers for each of the questions individually is given in the Result File (R-File). Each answer in the A-file is subdivided as follows: First line: RIRD assigned number identifying the question number, DIALOG file searched and the sequence of the answer provided. The numbers are prefixed by a letter Y. Then, a six digit number follows: the first three numbers refer to question number and the last three to DIALOG file searched. (DIALOG gives a name and a number to its files). This is followed by a space and a number signifying the sequence of the given answer in relation to the total number of answers provided. For instance, "Y002218 1" identifies the 1st answer presented for question 002 which was searched in File 218. Second line: Acquisition number assigned by data- base producer or DIALOG. This number is used in the Evaluation File to indicate whether the given answer was judged relevant, partially relevant or not relevant by users. All other lines: fields as assigned by the data- base producer and described in a DIALOG blue sheet for each of its files. An inverted file was created from the answer file for searching. EVALUATION FILE (E-FILE) For each answer in the Answer File the Evaluation File has the user relevance evaluation. The E-file has the list of DIALOG acquisition numbers (second line in A-file) fol- lowed by a letter: R for judged relevant, P for partially relevant and N for not relevant. The E-file can be used to check the performance of new searches (e.g. based on new search algorithms, search rules, instructions) as to relevance and derive desired measures, such as precision, recall, relevance odds and others for which a relevance judgement is necessary. RESULT FILE (R-FILE) Each record in the Result File is identified by question number and DIALOG file (database) number that was searched. There are three parts identifying different results for each question. The results pertain only to the project from which RIRD was derived. The first part provides as to retrieved number (#) of answers (called here `abstracts') and their evaluations, including the overall precision. Note: included are data on numbers of answers evaluated and not evaluated. Because of users' reaction to size of output, for all except 3 questions where the size exceeded 150 answers, only the first 150 were sent and since DIALOG is organized on last- in-first-out principle these represented the chronologically newest documents. `Not evaluated' there means number of answers retrieved, but not sent to users. The second part provides user evaluation of the utility of all answers as a whole. Five utility measures were used: 1. Time that user spent in evaluation 2. Dollar value assigned to answers (for many questions users could not assign a monetary value) 3. Worth of the answers on a Likert scale from 1 (practically worthless) to 5 (worth much more than the time it has taken) 4. Contribution to resolution of the problem from 1 (nothing contributed) to 5 (substantial contribution) 5. Degree of satisfaction with the results of the search from 1 (dissatisfied) to 5 (satisfied). The third part contains performance figures for each of the nine searches (SEAR) used for searching each question: by 5 different searchers and 4 different search types. The searches are identified by a three digit number. The numbers starting with 0 (Zero) identify searches done by different searchers; the last two digits are searcher's number (there were all together 39 searchers). The numbers starting with 1,2,3 or 4 identify different types of searches based on different sources: Type 1 - search based on a taped problem statement by the user but without recourse to the written question Type 2 - search based on the taped problem statement plus the written question Type 3 - search based on terms in the written question without any elaboration, as if done automatically Type 4 - search based on terms in written question plus terms from an appropriate thesaurus for elaboration The data given for the nine searches are labeled A to M as explained in the record itself. Recall was measured as comparative recall in relation to the union of output for all nine searches. NOTE: To encourage further research and test of various hypotheses that can be formulated on the basis of the data of this project, a tape in ASCII format is provided containing all the data files, together with over 30 SPSSX programs used in analysis. Contact: Paul Kantor, Tantalus Inc. 3257 Ormond Rd, Cleveland Heights, OH 44118. DIALOG FILES (DATABASES) SEARCHED As mentioned, altogether 22 different DIALOG files were used. The following is the list of DIALOG file used in each question. QUESTION DIALOG DIALOG FILE NUMBER FILE NUMBER NAME 1 11 PsycINFO 2 218 NURSING & ALLIED HEALTH 3 64 CHILD ABUSE & NEGLECT 4 154 MEDLINE 5 148 TRADE & INDUSTRY INDEX 6 6 NTIS 7 75 MANAGEMENT CONTENTS 8 154 MEDLINE 9 37 SOCIOLOGICAL ABSTRACTS 10 154 MEDLINE 11 154 MEDLINE 12 13 INSPEC 13 15 ABI/INFORM 14 151 HEALTH PLANNING & ADMINISTRATION 15 154 MEDLINE 16 11 PsycINFO 17 5 BIOSIS PREVIEWS 18 15 ABI/INFORM 19 75 MANAGEMENT CONTENTS 20 15 ABI/INFORM 21 37 SOCIOLOGICAL ABSTRACTS 22 108 AEROSPACE DATABASE 23 32 METADEX 24 191 ART LITERATURE INTERNATIONAL 25 1 ERIC 26 38 AMERICA: HISTORY AND LIFE 27 13 INSPEC 28 38 AMERICA: HISTORY AND LIFE 29 8 COMPENDEX PLUS 30 71 MLA BIBLIOGRAPHY 31 61 LISA 32 8 COMPENDEX PLUS 33 8 COMPENDEX PLUS 34 13 INSPEC 35 154 MEDLINE 36 90 FOREIGN TRADE & ECON ABSTRACTS 37 16 PTS PROMT 38 61 LISA 39 15 ABI/INFORM 40 16 PTS PROMT REFERENCES 1. Saracevic, T., Kantor, P., Chamis, A.Y., Trivison, D. "A Study in Information Seeking and Retrieving. I. Background and Methodology," JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 39(3):161-176, 1988. 2. Saracevic, T., Kantor, P. "A Study in Information Seeking and Retrieving. II. Users, Questions and Effectiveness," JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 39(3):177-196, 1988. 3. Saracevic, T., Kantor, P. "A Study in Information Seeking and Retrieving. III. Searchers, Searches, and Overlap," JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 39(3):197-216, 1988. 4. Saracevic, T., Kantor, P., Chamis, A.Y., Trivison, D. EXPERIMENTS ON THE COGNITIVE ASPECTS OF INFORMATION SEEKING AND RETRIEVING. FINAL REPORT FOR NATIONAL SCIENCE FOUNDATION GRANT IST-8505411. Springfield, VA: National Technical Information Service; 1987, (PB87-157699/AS). Bethesda, MD: Education Research Information Center; 1987, (ED 281530). ACKNOWLEDGMENTS The project was sponsored by an NSF grant (IST85-05411) and a DIALOG grant for search time and conducted at Case Western Reserve University and Rutgers University; statistical analysis was done at Tantalus, Inc. under Paul Kantor. We gratefully acknowledge the splendid cooperation from our users, searchers, and classification judges. Although remaining anonymous, as is the custom, they made the study possible. The project was managed and data collected by Alice Chamis and Donna Trivison. Our further thanks are to Elizabeth Logan and Nancy Woelfl for technical advice, Jun- Min Jeong, J.J. Lee, Moula Cherikh and Altay Guvenir for programming, and Betty Turock and Louise Su for conducting the classification experiment. The achievements would not be possible without all these people. Permissions given by the database producers to download and reproduce the answers are also gratefully acknowledged.