Date: Sun, 7 Sep 86 16:00:44 edt From: vtisr1!irlistrq To: fox Subject: IRList Digest V2 #39 Status: R IRList Digest Sunday, 7 September 1986 Volume 2 : Issue 39 Today's Topics: Call for Papers - Call for contributions to ACM SIGIR Forum Abstracts - Appearing in latest issue of ACM SIGIR Forum, Part 2 News addresses are ARPANET: fox%vt@csnet-relay.arpa BITNET: foxea@vtvax3.bitnet CSNET: fox@vt UUCPNET: seismo!vtisr1!irlistrq ---------------------------------------------------------------------- Date: Sun, 7 Sep 86 11:48:53 edt From: fox (Ed Fox) Subject: call for papers for ACM SIGIR Forum, fall 1986 It is time to gather short articles, book reviews, abstracts, announcements, etc. for the next Forum. I will be putting out this issue, so send electronic versions (unless you say otherwise they may appear in IRList too) or paper copies (done in camera ready form, single spaced). I look forward to receiving your materials in the next few weeks. Many thanks, Ed Fox (co-editor for Forum). ------------------------------ Date: Wed, 23 Jul 1986 13:06 CST From: Vijay V. Raghavan Subject: SIGIR FORUM Abstracts [Part 2 - Ed] [Note: Members of ACM SIGIR should have received the spring/summer Forum, and can find these on pages 30-31. The rest will appear in machine readable form also in later issues of IRList. - Ed] ABSTRACTS (Chosen by G. Salton or V. Raghavan from 1984 issues of journals in the retrieval area) 10. TESTING OF A NATURAL LANGUAGE RETRIEVAL SYSTEM FOR A FULL TEXT KNOWLEDGE BASE Lionel M. Bernstein and Robert E. Williamson Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD 20209 "A Navigator of Natural Language Organized Data" (ANNOD) is a retrieval system which combines use of probabilistic, linguistic, and empirical means to rank individual paragraphs of full text for their similarity to natural language queries proposed by users. ANNOD includes common word deletion, word root isolation, query expansion by a thesaurus, and application of a complex empirical matching (ranking) algorithm. The Hepatitis Knowledge Base, the text of a prototype information system, was the file used for testing ANNOD. Responses to a series of users' unrestricted natural language queries were evaluated by three testers. Information needed to answer 85 to 95% of the queries was located and displayed in the first few selected paragraphs. It was successful in locating information in both the classified (listed in Table of Contents) and unclassified portions of text. Development of this retrieval system resulted from the complementarity of and interaction between computer science and medical domain expert knowledge. Extension of these techniques to larger knowledge bases is needed to clarify their proper role. (JASIS, Vol. 35(4): 235-247; 1984) 11. A COMPARISON OF THE COSINE CORRELATION AND THE MODIFIED PROBABILISTIC MODEL W. Bruce Croft Computer and Information Science Dept. University of Massachusetts Amherst, MA 01003 It has been pointed out that the comparison between the performance of the cosine correlation and the modified probabilistic model was incomplete. In particular, the term weights used for the cosine correlation were term frequencies within the document text. Salton has for some time used a term weight known as 'tf.idf' in his retrieval experiments with the cosine correlation. This weight consists of the within document term frequency (sometimes normalized by the maximum frequency) multiplied by the inverse document frequency weight. Although the inverse document frequency weight can be regarded as a product of the retrieval process, it has also been used as part of the indexing process in that the weight is assigned to the terms in the document representatives. In this note, we shall present the results of retrieval experiments with the cosine correlation and the tf.idf weights. The comparison of these results to those obtained with the modified probabilistic model leads to some interesting conclusions about the cosine correlation. (Information Technology, Vol. 3 No. 2 113-115, April 1984 12. SCIENTIFIC INQUIRY: A MODEL FOR ONLINE SEARCHING Stephen P. Harter School of Library and Information Science, Indiana University, Bloomington, IN 47405 Scientific inquiry is proposed as a philosophical and behavioral model for online information retrieval. The nature of scientific research concepts of variable, hypothesis formulation and testing, operational definition, validity, reliability, assumption, and the cyclical nature of research are established. A case is made for the inevitability of end-user searching. It is argued that the model is of interest now only for its own sake, for the intellectual parallels that can be established between two apparently disparate human activities, but as a useful framework for discussion and analysis of the online search process from an educational and evaluative viewpoint. (JASIS, VOL. 35(2): 110-117; 1984) 13. A DRILL AND PRACTICE PROGRAM FOR ONLINE RETRIEVAL Bert R. Boyce School of Library and Information Science, Louisiana State University, LA 70803 David Martin, Barbara Francis, and Mary Ellen Slevert Department of Information Science, University of Missouri at Columbia, 110 Stewart Hall, Columbia, MO 65211 DAPPOR, a drill and practice program for online retrieval provides reinforcement to students engaged in learning the basic command protocols of the major vendors of bibliographic databases. The DAPPOR evaluation program overcomes the difficult problems of determining the correctness of a user response in a highly flexible environment. The coding of answer definitions and the process of recursive reduction used by the evaluation program are described. (JASIS, Vol. 35(2): 129-134; 1984) 14. TWO PARTITIONING TYPE CLUSTERING ALGORITHMS Fazli Can and Esen A. Ozkarahan Arizona State University, Tempe, AZ 85287 In this article, two partitioning type clustering algorithms are presented. Both algorithms use the same method for selecting cluster seeds; however, assignment of documents to the seeds is different. The first algorithm uses a new concept called "cover coefficient" and it is a single-pass algorithm. The second one uses a conventional measure for document assignment to the cluster seeds and is a multipass algorithm. The concept of clustering, a model for seed oriented partitioning, the new centroid generation approach, and an illustration for both algorithms are also presented in the article. (JASIS, Vol. 35(5): 268-276 1984) 15. ARTIFICIAL INTELLIGENCE: UNDERLYING ASSUMPTIONS AND BASIC OBJECTIVES Nick Cercone Computing Science Department, Simon Fraser University, Burnaby, British Columbia, Canada V5A 1S6 Gordon McCalla Department of Computational Science, University of Saskatchewan, Saskatoon, Saskatchewan, Canada S7N 0W0 Artificial intelligence (AI) research has recently captured media interest and it is fast becoming our newest "hot" technology. AI is an interdisciplinary field which derives from a multiplicity of roots. In this article we present our perspectives on methodological assumptions underlying research efforts in AI. We also discuss the goals (design objectives) of AI across the spectrum of subareas it comprises. We conclude by discussing why there is increased interest in AI and whether current predictions of the future importance of AI are well founded. (JASIS Vol, 35(5): 280-290; 1984) 16. NATURAL LANGUAGE PROCESSING Ralph Grishmman Department of Computer Science, New York University, 251 Mercer Street, New York, NY 10012 Natural language processing has two primary roles to play in the storage and retrieval of large bodies of information: providing a friendly, earily-learned interface to information retrieval systems, and automatically structuring texts so that their information can be more easily processed and retrieved. This article outlines the organization of a natural language interface for data retrieval (a "questions - answering system") and some of the approaches being taken to text structuring. It closes by describing a few of the research issues in computational linguistics and a possibility for using interactive natural language processing for information acquisition. (JASIS, Vol. 35(5): 291-296; 1984) 17. EXPERT SYSTEMS: A TUTORIAL N. Shahla Yaghmai School of Library and Information Science, University of Wisconsin-Milwaukee, P.O. Box 413, Milwaukee, WI 53102 Jacqueline A. Maxin Computer Services, The H.W. Wilson Company, Bronx, NY 10452 Expert systems are intelligent computer applications that use data, a knowledge base, and a control mechanism to solve problems of sufficient difficulty that significant human expertise is necessary for their solution. Expert systems use artificial intelligence problem-solving and knowledge- representation techniques to combine human expert knowledge about a problem area with human expert methods of conceptualizing and reasoning about that problem area. As a result, it is expected that such systems can reach a level of performance comparable to that of a human expert in a specialized problem area. The high-level knowledge base and associated control mechanism of expert systems are in essence a model of the expertise of the best practitioners of the problem area in question and, hence, human users are provided with expert opinions about problems in that area. Expert systems do not pretend to give final or ultimate conclusions to displace human decision making; they are intended for consulting purposes only. (JASIS, Vol. 35(5); 297-305; 1984) 18. APPROACHES TO MACHINE LEARNING Pat Langley The Robotics Institute, Carnegie-Mellon University, Pittsburgh, PA 15213 Jaime G. Carbonell Department of Computer Science, Carnegie-Mellon University, Pittsburgh, PA 15213 The field of machine learning strives to develop methods and techniques to automate the acquisition of new information, new skills, and new ways of organizing existing information. This article reviews the major approaches to machine learning in symbolic domains, illustrated with occasional paradigmatic examples. (JASIS, Vol. 35(5); 306-316: 1984) 19. ARTIFICIAL INTELLIGENCE: A SELECTED BIBLIOGRAPHY Compiled by Linda C. Smith Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801 The literature of artificial intelligence (AI) is scattered over many books, journals, conference proceedings, and technical reports. This selected annotated bibliography, arranged by type of material, can serve as an introduction to that literature. (JASIS, Vol. 35(5); 317-319: 1984) 20. AUTOMATIC SEARCH TERM VARIANT GENERATION K. Sparck Jones and J. I. Tait Computer Laboratory, University of Cambridge The paper describes research designed to improve automatic pre-coordinate term indexing by applying powerful general- purpose language analysis techniques to identify term sources in requests, and to generate variant expression of the concepts involved for document text searching. (Journal of Documentation, Vol. 40, No. 1, March 1984, pp. 50-66). 21. HIERARCHIC AGGLOMERATIVE CLUSTERING METHODS FOR AUTOMATIC DOCUMENT CLASSIFICATION Alan Griffiths, Lesley A. Robinson and Peter Willett Department of Information Studies, University of Sheffield, Western Bank, Sheffield S10 2TN, UK This paper considers the classifications produced by application of the single linkage, complete linkage, group average and Ward clustering methods to the Keen and Cranfield document test collections. Experiments were carried out to study the structure of the hierarchies produced by the different methods, the extent to which the methods distort the input similarity matrices during the generation of a classification, and the retrieval effectiveness obtainable in cluster based retrieval. The results would suggest that the single linkage method, which has been used extensively in previous work on document clustering, is not the most effective procedure of those tested, although it should be emphasized that the experiments have used only small document test collections. Journal of Documentation, Vol. 40, No. 3, September 1984, pp. 175-205. 22. PROBABILISTIC AUTOMATIC INDEXING BY LEARNING FROM HUMAN INDEXERS S. E. Robertson Department of Information Science, City University, Northampton Square, London EC1V 0HB P. Harding Inspec, Station House, Nightingale Road, Hitchin, Hertfordshire SG5 1RJ A probabilistic model previously used in relevance feedback is adapted for use in automtic indexing of documents (in the sense of imitating human indexers). The model fits with previous work in this area (the 'adhesion coefficient' method), in effect merely suggesting a different way of arriving at the adhesion coefficients. Methods for the application of the model are proposed. The independence assumptions used in the model are interpreted, and the possibility of a dependence model is discussed. Journal of Documentation, Vol. 40, No. 4, December 1984, pp. 264-270. ------------------------------ END OF IRList Digest ********************