Date: Mon, 3 Feb 86 20:26 EST To: irdis at vpi Subject: IRList Digest V2 #5 IRList Digest Monday, 3 Feb 1986 Volume 2 : Issue 5 Today's Topics: Announcement - AAP Author's Guide Available Abstracts - Pt. 1 Selection by Raghavan from IR Journals ---------------------------------------------------------------------- Date: Sat, 25 Jan 86 14:21:41 est From: vtvax5::foxea@vtcs1.VT Subject: AAP Author's Guide Available [Extract from Newsletter of: Electronic Manuscript Project -- January 7, 1986 (202) 232-3335 -- Ed] AAP through its Electronic Manuscript Project has been developing a Standard for the preparation, markup and exchange of electronic manuscripts since 1983. ... The project results are now ready for distribution. ... An Author's Guide and other reference documents will be announced during the next few months. The Standard is the technical reference manual. ... is an application of a draft international standard known as the Standard Generalized Markup Language (SGML). The AAP Standard applies to the descriptive markup of the type of materials AAP members publish. This first document provides information for: Technical specialists ... implementing the AAP Standard ... Developers and vendors of SGML processing tools. ... shipment in February 1986 ... single copy is $100 ... [Note: form describing price options, copies of AAP Standard are available from Carol Risher, AAP, 2005 Mass. Ave., NW, Washington DC 20036 - Ed] ------------------------------ Date: Sun, 26 Jan 86 20:36:45 est From: "V.J. Raghavan" Date: Fri, 24 Jan 86 19:21:25 cst Subject: submission to IR list [Part 1 of retrieval abstracts - Ed] .op .pl75 blurbs2.vr ABSTRACTS (Chosen by V. Raghavan from recent issues of journals in the retrieval area) 1. REQUIREMENTS FOR QUERY EVALUATION IN WEIGHTED INFORMATION RETRIEVAL Martin Bartschi Department for Scientific and Engineering Computer Applications BBC Brown Boveri & Company, Limited CH-5401 Baden, Switzerland In this article, a general mathematical framework for information retrieval models is presented, giving more insight into the evaluation mechanisms that may be used in IR. In this framework a number of requirements for operators improving recall and precision are investigated. It is shown that these requirements can be satisfied by using descriptor weights and one combination operator only. Moreover, information items and queries - virtual items - can be treated in exactly the same way, reflecting the fact, that they both are descriptions of a number of concepts. Evaluation is performed by formulating similarity homomorphisms from query descriptions to retrieval status values that allow ranking items accordingly. For good comparisons, however, ranking must be done by the achieved proportion of perfect similarity rather than by similary itself. In term independence models, this normalization process may satisfy the homomorphism requirement of algebra under certain conditions, but it contradicts the requirement in the term dependence case. Nevertheless, by demanding separability for the unnormalized part of the evaluation measure, it can be guaranteed that the query is evaluated to each item descriptor by descriptor, and the normalization value must be calculated only once for the whole query, using the query as a whole. Also, for each real item, the normalization value must have been calculated only once, using the item as a whole. INFORMATION PROCESSING & MANAGEMENT, Vol 21, No. 4, pp. 291- 303, 1985) 2. A METHOD OF MEASURING INFORMATION IN LANGUAGE, APPLIED TO MEDICAL TEXTS Daniel B. Gordon and Naomi Sager Linguistic String Project New York University 251 Mercer Street New York, NY 10012 In this study, quantitative measures of the information content of textual material have been developed based upon analysis of the linguistic structure of the sentences in the text. It has ben possible to measure such properties as: (1) the amount of information contributed by a sentence to the discourse; (2) the complexity of the information within the sentence, including the overall logical structure and the contributions of local modifiers; (3) the density of information based on the ratio of the number of words in a sentence to the number of information-contributing operators. Two contrasting types of texts were used to develop the measures. The measures were then applied to contrasting sentences within one type of text. The textual material was drawn from narrative patient records and from the medical research literature. Sentences from the records were analyzed by computer and those from the literature were analyzed manually, using the same methods of analysis. The results show that quantitative measures or properties of textual information can be developed which accord with intuitively perceived differences in the informational complexity of the material. (INFORMATION PROCESSING & MANAGEMENT, Vol. 21, No. 4., pp. 269-289, 1985) 3. A COMPARATIVE STUDY OF MULTIPLE ATTRIBUTE TREE AND INVERTED FILE STRUCTURES FOR LARGE BIBLIOGRAPHIC FILES S. V. Nageswara Rao and S. Sitharama Iyengar Department of Computer Science Coates Hall, Louisiana State University, Baton Rouge, LA 70803, U.S.A. C. E. Veni Madhavan School of Automation, Indian Institute of Science, Bangalore - 560012, India A variety of data structures such as inverted file, mulit-lists, quad tree, k-d tree, range tree, polygon tree, quintary tree, multidimensional tries, segment tree, doubly chained tree, the grid file, d-fold tree, super B-tree. Multiple Attribute Tree (MAT), etc. have been studied for multidimensional searching and related problems. Physical data base organization, which is an important application of multi-dimensional searching, is traditionally and mostly handled by employing inverted file. This study proposes MAT data structure for bibliographic file systems, by ilustrating the superiority of MAT data structure over inverted file. Both the methods are compared in terms of preprocessing, storage, and query costs. Worst-case complexity analysis of both the methods, for a partial match query, is carried out in two cases: (a) when directory resides in main memory, (b) when directory resides in secondary memory. In both cases, MAT data structure is shown to be more efficient than the inverted file method. Arguments are given to illustrate the superiority of MAT data structure in an average case also. An efficient adaptation of MAT data structure, that exploits the special features of MAT structure and bibliographic files, is proposed for bibliographic file systems. In this adaptation, suitable techniques for fixing and ranking of the attributes for MAT data structure are proposed. Conclusions and proposals for future research are presented. (INFORMATION PROCESSING & MANAGEMENT, Vol. 21, No. 5, Pp. 433-442, 1985) 4. TOWARD RELATIONSHIPS-QUERYING IN DOCUMENT RETRIEVAL SYSTEMS Henryk Rybinski, Janusz Rolecki, Januez Getta and Hanna Popowska Institute for Scientific, Technical and Economic Information 00-926 Warszawa, ul. Zurawia 3/5, Poland A concept of end-user query language with facilities of expressing relationships between objects kept in a data base is presented. The idea of nesting these facilities in typical document system query language is shown. Special kinds of referring terms are designed. Examples of usage of the new facilities are attached. (INFORMATION PROCESSING & MANAGEMENT, Vol. 21, No. 5, pp. 419-431, 1985) 5. ON RELATIVE INDEXING IN FUZZY RETRIEVAL SYSTEMS Ronald Rousseau Katholieke Industriele Hogeschool, Zeedijk 101, 8400 Oostende, Belgium In their study on relative indexing, Choros and Danilowicz give a modifiation function for weights of dexcriptors of documents. We show that this function lacks some desirable properties and we give a different modification function which does not have these drawbacks. (INFORMATION PROCESSING & MANAGEMENT, No. 21, No. 5, pp. 415-417, 1985) 6. COMPUTER-AIDED SEARCHING OF BIBLIOGRAPHIC DATA BASES: ONLINE ESTIMATION OF THE VALUE OF INFORMATION David R. Morehead and William B. Rouse Center for Man-Machine Systems Research, Georgia Institute of Technology, Atlanta, GA 30332, U.S.A. The study presents and synthesizes the results of a series of five experiments on human information-seeking behavior in three different information-seeking environments. The first three experiments utilized a highly-controlled, simulated information-seeking task developed to study human search strategies in citation networks. Emphasis in the fourth and fifth experiments was placed on assessing the value of information for humans in realistic search environments. Subjects search on a topic of their own choice in a data base of fiction in Experiment Four and a data base of technical literature in Experiment Five. After summarizing the experimental results, a conceptual model of how humans value information is presented. The model is then used as a basis for a broad interpretation of the empirical results. Implications of both the empirical and modeling results are considered for the areas of information retrieval logic, system flexibility, retrieval methods, types of aiding, online estimation of information value, and computerizing versus computer-aiding. (INFORMATION PROCESSING & MANAGEMENT, Vol. 21, No. 5, pp. 387-399, 1985) 7. BRIEF COMMUNICATION: A NOTE ABOUT INFORMATION SCIENCE RESEARCH Gerard Salton Department of Computer Science, Cornell University, Ithaca, NY 14853 This note deals with the relationship between information science research and practice. The impression that the field is moribund and that the research output is uniformly inferior is not supported by an examination of the information retrieval literature. (JASIS, 36(4): 268-271; 1985) 8. A PROBABILISTIC THEORY OF INDEXING AND SIMILARITY MEASURE BASED ON CITED AND CITING DOCUMENTS K. L. Kwok Computer Science Department, Queens College, City University of New York, Flushing, NY 11367 A new model of viewing a document based on the citing- cited realtionship between documents is introduced. Using Bayes' decision theory, it is shown how a source document may be indexed and weighted by its set of relevant cited or citing document features, corresponding to a one pass relevance feedback Model 1 (Probabilistic indexing) or Model 2 (probabilistic retrieval) system of [8]. Once every document in a collection has been so indexed, various forms of similarity measures based on probability of topical relevance between documents are derivable, including asymmetric, symmetric, and the relationship with Model 3 of [8]. Application to retrieval and document clustering are also discussed. (JASIS, 36(5): 342-351; 1985) 9. A BIBLIOMETRIC DISTRIBUTION WHICH REALLY WORKS H. S. Sichel Department of Statistics, University of the Witwatersrand, Johannesburg, South Africa The Generalized Inverse Gaussian-Poisson Distribution is suggested as an all-embracing mathematical model for bibliometric frequency distributions. Twelve examples are given which show that the new model cannot be rejected by virtue of an objective chi-squared test. (JASIS, 36(5): 314-321; 1985) [rest of the abstracts submitted will be in Issue 6 - Ed] ------------------------------ END OF IRList Digest ********************