Date:     Mon, 3 Feb 86 20:26 EST
To:       irdis at vpi
Subject:  IRList Digest V2 #5

IRList Digest           Monday, 3 Feb 1986      Volume 2 : Issue 5

Today's Topics:
   Announcement - AAP Author's Guide Available
   Abstracts - Pt. 1 Selection by Raghavan from IR Journals

----------------------------------------------------------------------

Date: Sat, 25 Jan 86 14:21:41 est
From: vtvax5::foxea@vtcs1.VT
Subject: AAP Author's Guide Available

[Extract from Newsletter of:
 Electronic Manuscript Project -- January 7, 1986 (202) 232-3335 -- Ed]

AAP through its Electronic Manuscript Project has been developing a
Standard for the preparation, markup and exchange of electronic
manuscripts since 1983. ...  The project results are now ready for
distribution.  ...  An Author's Guide and other reference documents will be
announced during the next few months.

The Standard is the technical reference manual.   ...   is an application of a
draft international standard known as the Standard Generalized Markup
Language (SGML).  The AAP Standard applies to the descriptive markup of
the type of materials AAP members publish.

This first document provides information for:
   Technical specialists ... implementing the AAP Standard ...
    Developers and vendors of SGML processing tools.

...  shipment in February 1986  ... single copy is $100 ...

[Note:  form describing price options, copies of AAP Standard are available
from Carol Risher, AAP, 2005 Mass. Ave., NW, Washington DC 20036 - Ed]

------------------------------

Date: Sun, 26 Jan 86 20:36:45 est
From: "V.J. Raghavan" <ihnp4!sask!regina!raghavan@ucbvax.berkeley.edu>
Date: Fri, 24 Jan 86 19:21:25 cst
Subject: submission to IR list [Part 1 of retrieval abstracts - Ed]

.op
.pl75                                             blurbs2.vr

                            ABSTRACTS
(Chosen  by  V.  Raghavan from recent issues of journals  in  the 
retrieval area)


1.   REQUIREMENTS  FOR  QUERY EVALUATION IN WEIGHTED  INFORMATION 
     RETRIEVAL

     Martin Bartschi
     Department   for   Scientific   and   Engineering   Computer 
     Applications
     BBC Brown Boveri & Company, Limited
     CH-5401  Baden, Switzerland

          In  this article,  a general mathematical framework for 
     information  retrieval  models  is  presented,  giving  more 
     insight  into the evaluation mechanisms that may be used  in 
     IR.    In  this  framework  a  number  of  requirements  for 
     operators  improving recall and precision are  investigated.  
     It  is  shown that these requirements can  be  satisfied  by 
     using  descriptor weights and one combination operator only.  
     Moreover,  information  items and queries - virtual items  - 
     can be treated in exactly the same way, reflecting the fact, 
     that  they  both are descriptions of a number  of  concepts.  
     Evaluation   is   performed   by   formulating    similarity 
     homomorphisms  from  query descriptions to retrieval  status 
     values  that  allow ranking  items  accordingly.   For  good 
     comparisons,  however,  ranking must be done by the achieved 
     proportion  of  perfect similarity rather than  by  similary 
     itself.   In  term independence models,  this  normalization 
     process may satisfy the homomorphism requirement  of algebra 
     under certain conditions, but it contradicts the requirement 
     in  the term dependence case.   Nevertheless,  by  demanding 
     separability  for  the unnormalized part of  the  evaluation 
     measure, it can be guaranteed that the query is evaluated to 
     each  item descriptor by descriptor,  and the  normalization 
     value  must  be  calculated only once for the  whole  query, 
     using the query as a whole.   Also,  for each real item, the 
     normalization  value  must have been calculated  only  once, 
     using the item as a whole.

     INFORMATION PROCESSING & MANAGEMENT, Vol 21, No. 4, pp. 291-
     303, 1985)

2.   A  METHOD OF MEASURING INFORMATION IN LANGUAGE,  APPLIED  TO 
     MEDICAL TEXTS

     Daniel B. Gordon and Naomi Sager
     Linguistic String Project
     New York University
     251 Mercer Street
     New York, NY 10012
          In this study, quantitative measures of the information 
     content  of textual material have been developed based  upon 
     analysis of the linguistic structure of the sentences in the 
     text.   It  has ben possible to measure such properties  as: 
     (1)  the amount of information contributed by a sentence  to 
     the discourse;  (2) the complexity of the information within 
     the  sentence,  including the overall logical structure  and 
     the  contributions  of local modifiers;  (3) the density  of 
     information  based on the ratio of the number of words in  a 
     sentence   to   the   number   of   information-contributing 
     operators.
          Two contrasting types of texts were used to develop the 
     measures.   The  measures were then applied  to  contrasting 
     sentences within one type of text.  The textual material was 
     drawn  from  narrative patient records and from the  medical 
     research  literature.    Sentences  from  the  records  were 
     analyzed  by  computer and those from  the  literature  were 
     analyzed manually,  using the same methods of analysis.  The 
     results  show  that quantitative measures or  properties  of 
     textual  information  can  be developed  which  accord  with 
     intuitively   perceived  differences  in  the  informational 
     complexity of the material.

     (INFORMATION PROCESSING & MANAGEMENT,  Vol.  21, No. 4., pp. 
     269-289, 1985)

3.   A COMPARATIVE STUDY OF MULTIPLE ATTRIBUTE TREE AND  INVERTED 
     FILE STRUCTURES FOR LARGE BIBLIOGRAPHIC FILES

     S. V. Nageswara Rao and S. Sitharama Iyengar
     Department of Computer Science
     Coates Hall,
     Louisiana State University,
     Baton Rouge, LA 70803, U.S.A.

     C. E. Veni Madhavan
     School of Automation,
     Indian Institute of Science,
     Bangalore - 560012, India

          A  variety  of data structures such as  inverted  file, 
     mulit-lists,  quad tree, k-d tree, range tree, polygon tree, 
     quintary tree,  multidimensional tries, segment tree, doubly 
     chained  tree,  the grid file,  d-fold tree,  super  B-tree.  
     Multiple  Attribute Tree (MAT),  etc.  have been studied for 
     multidimensional searching and related  problems.   Physical 
     data base organization, which is an important application of 
     multi-dimensional  searching,  is  traditionally and  mostly 
     handled by employing inverted file.  This study proposes MAT 
     data   structure   for  bibliographic   file   systems,   by 
     ilustrating  the  superiority  of MAT  data  structure  over 
     inverted  file.   Both the methods are compared in terms  of 
     preprocessing,   storage,   and  query  costs.    Worst-case 
     complexity analysis of both the methods, for a partial match 
     query,  is  carried  out in two cases:  (a)  when  directory 
     resides  in  main  memory,  (b) when  directory  resides  in 
     secondary  memory.   In both cases,  MAT data  structure  is 
     shown  to  be more efficient than the inverted file  method.  
     Arguments  are  given to illustrate the superiority  of  MAT 
     data  structure  in  an average  case  also.   An  efficient 
     adaptation of MAT data structure,  that exploits the special 
     features  of  MAT  structure  and  bibliographic  files,  is 
     proposed   for   bibliographic  file   systems.    In   this 
     adaptation,  suitable  techniques for fixing and ranking  of 
     the   attributes  for  MAT  data  structure  are   proposed.  
     Conclusions and proposals for future research are presented.

     (INFORMATION PROCESSING & MANAGEMENT,  Vol.  21,  No. 5, Pp. 
     433-442, 1985)

4.   TOWARD RELATIONSHIPS-QUERYING IN DOCUMENT RETRIEVAL SYSTEMS

     Henryk Rybinski, Janusz Rolecki, Januez Getta
     and Hanna Popowska
     Institute for Scientific, Technical and Economic Information
     00-926 Warszawa, ul. Zurawia 3/5, Poland

          A concept of end-user query language with facilities of 
     expressing relationships between objects kept in a data base 
     is  presented.   The  idea of nesting  these  facilities  in 
     typical  document system query language is  shown.   Special 
     kinds of referring terms are designed.  Examples of usage of 
     the new facilities are attached.

     (INFORMATION PROCESSING & MANAGEMENT,  Vol.  21,  No. 5, pp. 
     419-431, 1985)

5.   ON RELATIVE INDEXING IN FUZZY RETRIEVAL SYSTEMS

     Ronald Rousseau
     Katholieke Industriele Hogeschool,
     Zeedijk 101,
     8400 Oostende, Belgium

          In  their  study  on  relative  indexing,   Choros  and 
     Danilowicz  give  a  modifiation  function  for  weights  of 
     dexcriptors of documents.   We show that this function lacks 
     some   desirable   properties  and  we  give   a   different 
     modification function which does not have these drawbacks.

     (INFORMATION PROCESSING & MANAGEMENT,  No.  21,  No.  5, pp. 
     415-417, 1985)

6.   COMPUTER-AIDED SEARCHING OF BIBLIOGRAPHIC DATA BASES: ONLINE 
     ESTIMATION OF THE VALUE OF INFORMATION

     David R. Morehead and William B. Rouse
     Center for Man-Machine Systems Research, 
     Georgia Institute of Technology,
     Atlanta, GA  30332,  U.S.A.

          The  study  presents and synthesizes the results  of  a 
     series  of  five experiments  on  human  information-seeking 
     behavior     in    three    different    information-seeking 
     environments.    The  first  three  experiments  utilized  a 
     highly-controlled,    simulated   information-seeking   task 
     developed  to  study  human search  strategies  in  citation 
     networks.   Emphasis in the fourth and fifth experiments was 
     placed  on assessing the value of information for humans  in   
     realistic search environments.   Subjects search on a  topic 
     of  their own choice in a data base of fiction in Experiment 
     Four  and a data base of technical literature in  Experiment 
     Five.    After  summarizing  the  experimental  results,   a 
     conceptual   model  of  how  humans  value  information   is 
     presented.   The  model is then used as a basis for a  broad 
     interpretation  of the empirical results.   Implications  of 
     both  the empirical and modeling results are considered  for 
     the   areas   of   information   retrieval   logic,   system 
     flexibility,  retrieval  methods,  types of  aiding,  online 
     estimation  of information value,  and computerizing  versus 
     computer-aiding.

     (INFORMATION PROCESSING & MANAGEMENT,  Vol.  21,  No. 5, pp. 
     387-399, 1985)

7.   BRIEF  COMMUNICATION:  A  NOTE  ABOUT  INFORMATION   SCIENCE 
     RESEARCH

     Gerard Salton
     Department of Computer Science,
     Cornell University, Ithaca, NY 14853

          This   note   deals  with  the   relationship   between 
     information  science research and practice.   The impression 
     that  the field is moribund and that the research output  is 
     uniformly inferior is not supported by an examination of the 
     information retrieval literature.

     (JASIS, 36(4): 268-271; 1985)

8.   A  PROBABILISTIC THEORY OF INDEXING AND  SIMILARITY  MEASURE 
     BASED ON CITED AND CITING DOCUMENTS

     K. L. Kwok
     Computer Science Department,
     Queens College, City University of New York,
     Flushing, NY 11367

          A  new model of viewing a document based on the citing-
     cited realtionship between documents is  introduced.   Using 
     Bayes'  decision theory,  it is shown how a source  document 
     may  be indexed and weighted by its set of relevant cited or 
     citing  document  features,  corresponding  to  a  one  pass 
     relevance feedback Model 1 (Probabilistic indexing) or Model 
     2  (probabilistic  retrieval) system  of  [8].   Once  every 
     document in a collection has been so indexed,  various forms
     of  similarity  measures  based on  probability  of  topical 
     relevance   between  documents  are   derivable,   including 
     asymmetric,  symmetric, and the relationship with Model 3 of 
     [8].  Application to retrieval and document clustering  are 
     also discussed.

     (JASIS, 36(5): 342-351; 1985)


9.   A BIBLIOMETRIC DISTRIBUTION WHICH REALLY WORKS

     H. S. Sichel
     Department of Statistics, 
     University of the Witwatersrand,
     Johannesburg, South Africa

          The  Generalized Inverse Gaussian-Poisson  Distribution 
     is  suggested  as an all-embracing  mathematical  model  for 
     bibliometric  frequency distributions.   Twelve examples are 
     given  which show that the new model cannot be  rejected  by 
     virtue of an objective chi-squared test.

     (JASIS, 36(5): 314-321; 1985)
[rest of the abstracts submitted will be in Issue 6 - Ed]

------------------------------

END OF IRList Digest
********************