IRList Digest           Tuesday, 25 August 1987      Volume 3 : Issue 29

Today's Topics:
   Announcement - Abstracts from next ACM SIGIR Forum (part 1 of 4)

News addresses are ARPANET: fox%vtopus.cs.vt.edu@relay.cs.net
   BITNET: foxea@vtvax3.bitnet   CSNET: fox@vt   UUCPNET: fox@vtopus.uucp

----------------------------------------------------------------------
     
Date: Mon, 10 Aug 87 15:17:43 CDT
From: nancy@usl-vb.usl.edu (Nancy )
Subject: Abstracts from next ACM SIGIR Forum - sent by Raghavan

                           ABSTRACTS (part 1 of 4)

(Chosen by G. Salton from recent issues of journals in the retrieval area).

1.  FUZZY RELATIONAL DATABASES:  REPRESENTATIONAL ISSUES AND  REDUCTION  USING
    SIMILARITY MEASURES
    Henri Prade and Claudette Testemale
    Laboratoire Langages et Systemes Informatiques
    Universite Paul Sabatier
    118 Route de Narbonne
    31062 Taulouse Cedex, France
       Until Now, the idea of a fuzzy database  has  been  investigated  along
    different  lines:   Some authors have dealt with the imprecision of attri-
    bute values by modeling, using fuzzy similarity relations, the  extent  to
    which  these values could be regarded as interchangeable. Others have used
    possibility distributions for representing fuzzily known  or  incompletely
    known  attribute  values.   The  first  approach, which cannot accommodate
    incomplete information,  is  restated  in  the  framework  of  rough  sets
    extended to fuzzy relations.  Besides, in the second one, similarity meas-
    ures between attribute values can be introduced and computed; then a  com-
    parison  of the two approaches is provided.  The proposed similarity meas-
    ure, based on a fuzzy Hausdorff distance, estimates the  mismatch  between
    two  possibility  distributions.  From storage and query-evaluation points
    of view, it may be interesting to gather items  having  similar  attribute
    values.   Thus  the  similarity measures previously considered can be used
    for the reduction of the fuzzy database.  When several items  have  suffi-
    ciently  similar values for each attribute in a relation, the reduction is
    performed by taking for each attribute the union of these similar  values.
    The consequences of the reduction process on query evaluation are studied.
    (JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No.  2,
    pp. 118-126, 1987)

2.  KNOWLEDGE-ASSISTED DOCUMENT RETRIEVAL: II. THE RETRIEVAL PROCESS
    Gautam Biswas, James C. Bezdek, Viswanath Subramanian, and Marisol
    Marques.
    Department of Computer Science
    University of South Carolina
    Columbia, South Carolina 29208
       This article presents our conceptual model of the retrieval process  of
    a  document-retrieval system.  The retrieval mechanism input is an unambi-
    guous intermediate form of a user query generated by the language  proces-
    sor using the method described previously.  Our retrieval mechanism uses a
    two-step procedure.  In the first step a list of  documents  pertinent  to
    the  query  are obtained from the document database, and then an evidence-
    combination scheme is used to compute the degree of  support  between  the
    query  and individual documents.  The second step uses a ranking procedure
    to obtain a final degree of support for each document chosen, as  a  func-
    tion of individual degrees of support associated with one or more parts of
    the query.  The end result is as set of document  citations  presented  to
    the user in a ranked order in response to the information request.  Numer-
    ical examples are given to illustrate various facets of the  overall  sys-
    tem,  which  has  been proto-typically implemented in modular form to test
    system response to changes in model parameters.
    (JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No.  2,
    pp. 97-110, 1987)

3.  KNOWLEDGE-ASSISTED DOCUMENT RETRIEVAL:  I.  THE NATURAL-LANGUAGE INTERFACE
    Gautam Biswas, James C. Bezdek, Marisol Marques, and Viswanath
    Subramanian
    Department of Computer Science
    University of South Carolina
    Columbia, South Carolina 29208
       In this article we describe the  conceptual  model  and  processing  of
    (constrained)  natural-language  queries in information retrieval systems.
    A language interface based on fuzzy set techniques is proposed  to  handle
    the  uncertainty  inherent  in natural-language semantics.  The conceptual
    model is developed and exemplified in the context of  document  retrieval.
    Specifically,  the user query is considered to be a triple,
                q  =  ( q ,  q ,  q  )    where q
                         c    y    n             c
    indicates the part of the query that deals with concepts and operators
    that link these concepts, q   identifies the publication period the user 
                               y
    is interested in, and q
                           n
    pertains  to  the  number of documents to be retrieved.  We describe query
    decomposition using an augmented transition network parser and the assign-
    ment  of  functions  and  relations needed by each portion of the query to
    represent uncertainties inherent in the natural language.  The  output  of
    the  natural-language  interface  is  then  passed  to  a  knowledge-based
    retrieval mechanism that will be described in a  companion  article  (Part
    II).
    (JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No.  2,
    pp. 83-96, 1987)

4.  A NOTE ON WEIGHTED QUERIES IN INFORMATION RETRIEVAL SYSTEMS
    Ronald R. Yager
    Machine Intelligence Institute
    Iona College
    New Rochelle, New York 10801
       Several authors have suggested the introduction of fuzzy set methodolo-
    gies  as  a  means  for improving the performance of information-retrieval
    systems [1-8].  In a recent survey [9] of information-retrieval  technolo-
    gies,  Bartschi discusses the fuzzy set model among other models.  A prob-
    lem of considerable interest to designers of fuzzy set  retrieval  systems
    concerns  itself  with  the evaluation of the retrieval status function in
    the situation in which the query terms or the search criteria have weights
    indicating their importance to the requester.  A number of approaches have
    been suggested for this problem, but Bartschi [9] points out some  of  the
    difficulties  with each of these proposed methods.  We suggest an alterna-
    tive methodology for handling weighted queries in a fuzzy environment.
    (JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, Vol. 38, No.  1,
    pp. 23-24, 1987)

5.  A GRAPHICAL DATABASE INTERFACE FOR CASUAL, NAIVE USERS
    Clifford Burgess
    Computer Science Department
    University of Southern Mississippi
    Hattiesburg, Mississippi 39403
              and
    Kathleen Swigger
    Computer Science Department
    North Texas State University
    Denton, Texas 76203
       This paper is concerned with some aspects of  database  interfaces  for
    casual,  naive  users.   A ``casual user'' is defined as an individual who
    wishes to execute queries once or twice a month, and a ``naive  user''  is
    someone  who  has little or no expertise in operating computers. The study
    focuses on a specific group of casual, naive users, analyzes  their  needs
    and  proposes  a solution.  The proposed interface consists of a graphical
    display of a model of a database and a natural  language  query  language.
    One  of  the unique properties of the database interface is that it allows
    the user to see local item names within the context of a global structure.
    The  interface  was  then tested to determine whether it was acceptable to
    the user population and to discover the level of graphical model that  the
    users would find most comfortable.
    (INFORMATION PROCESSING & MANAGEMENT, Vol. 22, No. 6, pp. 511-521, 1986)

6.  COMPRESSION OF INDEX TERM DICTIONARY IN AN  INVERTED-FILE  ORIENTED  DATA-
    BASE:  SOME EFFECTIVE ALGORITHMS
    Janusz L. Wisniewski
    Applied Informatics Department
    Nicholas Copernicus University
    Grudziadzke 5/7, Torun, Poland
       A new method of index term dictionary compression in an  inverted-file-
    oriented database is discussed.  A technique of word coding that generates
    short fixed-length codes obtained  from  the  index  terms  themselves  by
    analysis  of  monogram  and bigram statistical distributions is described.
    Transformation of  the  index  term  dictionary  into  a  code  dictionary
    preserves  a word-to-word discrimination with a rate of three synonyms per
    1300 terms, at compression ratio up to 90% and at low cost in terms of the
    CPU  time  expenditure.   When applied in computer network environment, it
    offers substantial savings in communication channel utilization at  negli-
    gible  response  time degradation. Experimental data for 26,113 index term
    dictionary of the New York Times Info Bank available via a  computer  net-
    work are presented.
    (INFORMATION PROCESSING & MANAGEMENT, Vol. 22, No. 6, pp. 493-501, 1986)

7.  COMPUTER USE OF A MEDICAL DICTIONARY TO SELECT SEARCH WORDS
    John O'Connor
    Computer Science and Electrical Department
    Packard Lab #19
    Lehigh University
    Bethlehem, Pennsylvania 18015
       In a preceding experiment in text-searching retrieval for cancer  ques-
    tions,  search  words were humanly selected with the aid of a medical dic-
    tionary and cancer textbooks.  Recall results were (1) using only stems of
    question words (humanly stemmed): 20%; (2) adding dictionary search words:
    29%; (3) adding also textbook  search  words:  70%.   For  the  experiment
    reported  here,  computer  procedures  for using the medical dictionary to
    select search words were developed.  Recall results were (1) for  question
    stems  (computer  stemmed): 19%; (2) adding search words computer selected
    from the dictionary: 24%.  Thus the computer procedures compared to  human
    use  of  the  dictionary  were  50%  successful.  Human and computer false
    retrieval rates were almost equal. Some hypotheses about  computer  selec-
    tion of search words from textbooks are also described.
    (INFORMATION PROCESSING & MANAGEMENT, Vol. 22, No. 6, pp. 477-486, 1986)

8.  IMPLEMENTING AGGLOMERATIVE HIERARCHIC CLUSTERING  ALGORITHMS  FOR  USE  IN
    DOCUMENT RETRIEVAL
    Ellen M. Voorhees
    Department of Computer Science
    Cornell University
    Ithaca, New York 14853
       Searching hierarchically clustered document collections can  be  effec-
    tive  [6],  but creating the cluster hierarchies is expensive, since there
    are both many documents and many terms.  However, the information  in  the
    document-term  matrix  is  sparse:  Documents are usually indexed by rela-
    tively few terms.  This  paper  describes  the  implementations  of  three
    agglomerative  hierarchic clustering algorithms that exploit this sparsity
    so that collections much larger than the algorithms'  worst  case  running
    times  would  suggest  can be clustered.  The implementations described in
    the paper have been used to cluster a collection of 12,000 documents.
    (INFORMATION PROCESSING & MANAGEMENT, Vol. 22, No. 6, pp. 465-476, 1986)

9.  NATIONAL SCIENCE FOUNDATION SUPPORT FOR COMPUTER AND  INFORMATION  SCIENCE
    AND ENGINEERING
    Harold E. Bamford and Charles N. Brownstein
    National Science Foundation
    Washington, D. C. 20550
       The National Science Foundation has supported research in the  informa-
    tion  sciences  for  25  years, initially through its Office of Scientific
    Information, later through the Office of Science Information  Service  and
    the  Division  of Science Information, and most recently through the Divi-
    sion of Information Science and Technology.  The Foundation has also  sup-
    ported research in computer science and engineering, most recently through
    the Division of Computer Research and the Division of Computer and  Infor-
    mation  Engineering.   On  May  1,  1986  all  these elements were brought
    together to form the Directorate for Computer and Information Science  and
    Engineering  (CISE),  one of the five research branches of the Foundation.
    A more persuasive demonstration of the  Foundation's  commitment  to  this
    dynamic new field of research would hardly have been possible.
    (INFORMATION PROCESSING & MANAGEMENT, Vol. 22, No. 6, pp. 449-452, 1986)
[Note: continued in next 3 issues - Ed]

------------------------------
     
END OF IRList Digest
********************