IRList Digest Tuesday, 25 August 1987 Volume 3 : Issue 29 Today's Topics: Announcement - Abstracts from next ACM SIGIR Forum (part 1 of 4) News addresses are ARPANET: fox%vtopus.cs.vt.edu@relay.cs.net BITNET: foxea@vtvax3.bitnet CSNET: fox@vt UUCPNET: fox@vtopus.uucp ---------------------------------------------------------------------- Date: Mon, 10 Aug 87 15:17:43 CDT From: nancy@usl-vb.usl.edu (Nancy ) Subject: Abstracts from next ACM SIGIR Forum - sent by Raghavan ABSTRACTS (part 1 of 4) (Chosen by G. Salton from recent issues of journals in the retrieval area). 1. FUZZY RELATIONAL DATABASES: REPRESENTATIONAL ISSUES AND REDUCTION USING SIMILARITY MEASURES Henri Prade and Claudette Testemale Laboratoire Langages et Systemes Informatiques Universite Paul Sabatier 118 Route de Narbonne 31062 Taulouse Cedex, France Until Now, the idea of a fuzzy database has been investigated along different lines: Some authors have dealt with the imprecision of attri- bute values by modeling, using fuzzy similarity relations, the extent to which these values could be regarded as interchangeable. Others have used possibility distributions for representing fuzzily known or incompletely known attribute values. The first approach, which cannot accommodate incomplete information, is restated in the framework of rough sets extended to fuzzy relations. Besides, in the second one, similarity meas- ures between attribute values can be introduced and computed; then a com- parison of the two approaches is provided. The proposed similarity meas- ure, based on a fuzzy Hausdorff distance, estimates the mismatch between two possibility distributions. From storage and query-evaluation points of view, it may be interesting to gather items having similar attribute values. Thus the similarity measures previously considered can be used for the reduction of the fuzzy database. When several items have suffi- ciently similar values for each attribute in a relation, the reduction is performed by taking for each attribute the union of these similar values. The consequences of the reduction process on query evaluation are studied. (JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No. 2, pp. 118-126, 1987) 2. KNOWLEDGE-ASSISTED DOCUMENT RETRIEVAL: II. THE RETRIEVAL PROCESS Gautam Biswas, James C. Bezdek, Viswanath Subramanian, and Marisol Marques. Department of Computer Science University of South Carolina Columbia, South Carolina 29208 This article presents our conceptual model of the retrieval process of a document-retrieval system. The retrieval mechanism input is an unambi- guous intermediate form of a user query generated by the language proces- sor using the method described previously. Our retrieval mechanism uses a two-step procedure. In the first step a list of documents pertinent to the query are obtained from the document database, and then an evidence- combination scheme is used to compute the degree of support between the query and individual documents. The second step uses a ranking procedure to obtain a final degree of support for each document chosen, as a func- tion of individual degrees of support associated with one or more parts of the query. The end result is as set of document citations presented to the user in a ranked order in response to the information request. Numer- ical examples are given to illustrate various facets of the overall sys- tem, which has been proto-typically implemented in modular form to test system response to changes in model parameters. (JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No. 2, pp. 97-110, 1987) 3. KNOWLEDGE-ASSISTED DOCUMENT RETRIEVAL: I. THE NATURAL-LANGUAGE INTERFACE Gautam Biswas, James C. Bezdek, Marisol Marques, and Viswanath Subramanian Department of Computer Science University of South Carolina Columbia, South Carolina 29208 In this article we describe the conceptual model and processing of (constrained) natural-language queries in information retrieval systems. A language interface based on fuzzy set techniques is proposed to handle the uncertainty inherent in natural-language semantics. The conceptual model is developed and exemplified in the context of document retrieval. Specifically, the user query is considered to be a triple, q = ( q , q , q ) where q c y n c indicates the part of the query that deals with concepts and operators that link these concepts, q identifies the publication period the user y is interested in, and q n pertains to the number of documents to be retrieved. We describe query decomposition using an augmented transition network parser and the assign- ment of functions and relations needed by each portion of the query to represent uncertainties inherent in the natural language. The output of the natural-language interface is then passed to a knowledge-based retrieval mechanism that will be described in a companion article (Part II). (JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No. 2, pp. 83-96, 1987) 4. A NOTE ON WEIGHTED QUERIES IN INFORMATION RETRIEVAL SYSTEMS Ronald R. Yager Machine Intelligence Institute Iona College New Rochelle, New York 10801 Several authors have suggested the introduction of fuzzy set methodolo- gies as a means for improving the performance of information-retrieval systems [1-8]. In a recent survey [9] of information-retrieval technolo- gies, Bartschi discusses the fuzzy set model among other models. A prob- lem of considerable interest to designers of fuzzy set retrieval systems concerns itself with the evaluation of the retrieval status function in the situation in which the query terms or the search criteria have weights indicating their importance to the requester. A number of approaches have been suggested for this problem, but Bartschi [9] points out some of the difficulties with each of these proposed methods. We suggest an alterna- tive methodology for handling weighted queries in a fuzzy environment. (JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No. 1, pp. 23-24, 1987) 5. A GRAPHICAL DATABASE INTERFACE FOR CASUAL, NAIVE USERS Clifford Burgess Computer Science Department University of Southern Mississippi Hattiesburg, Mississippi 39403 and Kathleen Swigger Computer Science Department North Texas State University Denton, Texas 76203 This paper is concerned with some aspects of database interfaces for casual, naive users. A ``casual user'' is defined as an individual who wishes to execute queries once or twice a month, and a ``naive user'' is someone who has little or no expertise in operating computers. The study focuses on a specific group of casual, naive users, analyzes their needs and proposes a solution. The proposed interface consists of a graphical display of a model of a database and a natural language query language. One of the unique properties of the database interface is that it allows the user to see local item names within the context of a global structure. The interface was then tested to determine whether it was acceptable to the user population and to discover the level of graphical model that the users would find most comfortable. (INFORMATION PROCESSING & MANAGEMENT, Vol. 22, No. 6, pp. 511-521, 1986) 6. COMPRESSION OF INDEX TERM DICTIONARY IN AN INVERTED-FILE ORIENTED DATA- BASE: SOME EFFECTIVE ALGORITHMS Janusz L. Wisniewski Applied Informatics Department Nicholas Copernicus University Grudziadzke 5/7, Torun, Poland A new method of index term dictionary compression in an inverted-file- oriented database is discussed. A technique of word coding that generates short fixed-length codes obtained from the index terms themselves by analysis of monogram and bigram statistical distributions is described. Transformation of the index term dictionary into a code dictionary preserves a word-to-word discrimination with a rate of three synonyms per 1300 terms, at compression ratio up to 90% and at low cost in terms of the CPU time expenditure. When applied in computer network environment, it offers substantial savings in communication channel utilization at negli- gible response time degradation. Experimental data for 26,113 index term dictionary of the New York Times Info Bank available via a computer net- work are presented. (INFORMATION PROCESSING & MANAGEMENT, Vol. 22, No. 6, pp. 493-501, 1986) 7. COMPUTER USE OF A MEDICAL DICTIONARY TO SELECT SEARCH WORDS John O'Connor Computer Science and Electrical Department Packard Lab #19 Lehigh University Bethlehem, Pennsylvania 18015 In a preceding experiment in text-searching retrieval for cancer ques- tions, search words were humanly selected with the aid of a medical dic- tionary and cancer textbooks. Recall results were (1) using only stems of question words (humanly stemmed): 20%; (2) adding dictionary search words: 29%; (3) adding also textbook search words: 70%. For the experiment reported here, computer procedures for using the medical dictionary to select search words were developed. Recall results were (1) for question stems (computer stemmed): 19%; (2) adding search words computer selected from the dictionary: 24%. Thus the computer procedures compared to human use of the dictionary were 50% successful. Human and computer false retrieval rates were almost equal. Some hypotheses about computer selec- tion of search words from textbooks are also described. (INFORMATION PROCESSING & MANAGEMENT, Vol. 22, No. 6, pp. 477-486, 1986) 8. IMPLEMENTING AGGLOMERATIVE HIERARCHIC CLUSTERING ALGORITHMS FOR USE IN DOCUMENT RETRIEVAL Ellen M. Voorhees Department of Computer Science Cornell University Ithaca, New York 14853 Searching hierarchically clustered document collections can be effec- tive [6], but creating the cluster hierarchies is expensive, since there are both many documents and many terms. However, the information in the document-term matrix is sparse: Documents are usually indexed by rela- tively few terms. This paper describes the implementations of three agglomerative hierarchic clustering algorithms that exploit this sparsity so that collections much larger than the algorithms' worst case running times would suggest can be clustered. The implementations described in the paper have been used to cluster a collection of 12,000 documents. (INFORMATION PROCESSING & MANAGEMENT, Vol. 22, No. 6, pp. 465-476, 1986) 9. NATIONAL SCIENCE FOUNDATION SUPPORT FOR COMPUTER AND INFORMATION SCIENCE AND ENGINEERING Harold E. Bamford and Charles N. Brownstein National Science Foundation Washington, D. C. 20550 The National Science Foundation has supported research in the informa- tion sciences for 25 years, initially through its Office of Scientific Information, later through the Office of Science Information Service and the Division of Science Information, and most recently through the Divi- sion of Information Science and Technology. The Foundation has also sup- ported research in computer science and engineering, most recently through the Division of Computer Research and the Division of Computer and Infor- mation Engineering. On May 1, 1986 all these elements were brought together to form the Directorate for Computer and Information Science and Engineering (CISE), one of the five research branches of the Foundation. A more persuasive demonstration of the Foundation's commitment to this dynamic new field of research would hardly have been possible. (INFORMATION PROCESSING & MANAGEMENT, Vol. 22, No. 6, pp. 449-452, 1986) [Note: continued in next 3 issues - Ed] ------------------------------ END OF IRList Digest ********************