Date:     Tue, 11 Feb 86 00:58 EST
To:       irdis at vpi
Subject:  IRList Digest V 2 #8

IRList Digest           Monday, 10 Feb 1986      Volume 2 : Issue 8
 
Today's Topics:
   Email - Change in Origin of IRList (repeat of msg in Issue 7)
   Query - Opportunities for well trained graduate with MLS in NYC?
         - Central source on new technology and development?
   Abstracts - Articles selected by Salton or Raghavan (pt. 2 of 3)
 
----------------------------------------------------------------------
 
>From fox Mon Feb  3 09:40 EST 1986
Subject: Changing of site of origin of IRlist (repeat of msg in Issue 7)
 
Dear IRList Subscribers,
 
In the interest of reliability and cost savings, IRlist will be sent from
   seismo!vtisr1!irlistrq
instead of
   fox%vpi@csnet-relay
 
To be sure there is no mishap, Issue 7 is being sent from irlistrq with
this message, and Issue 8 is being sent from the CSNET address above.  If
you receive one issue but not the other, please notify me at one of the
addresses given below. Also, if you are missing any issues (V1 1-28, V2 1-8),
feel free to contact me if you can't get a copy from another user or site.
 
I hope this works smoothly! - Ed
 __________________________________________________________________
UUCP: seismo!vtisr1!irlistrq
ARPA: vtisr1!irlistrq@seismo foxea@vtvax3.bitnet@wiscvm
      fox@vtcs1.bitnet@wiscvm fox%vpi@csnet-relay
CSNET:fox@vpi
BITNET:foxea@vtvax3 fox@vtcs1
 
------------------------------
 
From: KJP%ibm-sj.arpa@CSNET-RELAY
Date: 4 Feb 86 13:50:36 EST
 
 ...
   [...]   just completed her MLIS degree in
Library and Information Scienece from Berkeley.  She concentrated in
Information Systems (i.e., took several programming, data-base, etc. courses)
and is now looking for a job in the New York City area.  Her ultimate
goal is to go into MIS or become a data-base administrator.  The problem
that she is encountering is that, on the East Coast, people look at an
MLS degree and conclude that you are able to do only library work.  On
the other hand, had she remained in the San Francisco area where companies
know about Berekeley's MLIS program, she could have had two job offers:
as a programmer with [...], or as a data-base administrator with a
[...] company.  So, my question for you is: Do you know of any
companies in the NY area which look beyond the "MLS" label to see that
this degree is well-suited for non-traditional "library" jobs?  Any help
would be appreciated.
 
  Thanks,
   Ken Perry
[Note: Information Science is indeed an area where employers must really
look at the individual's background and gauge ability for the task at hand!
The person mentioned might try the Information Industry Association, 316
Penn. Ave., SE, Ste. 400, Washington D.C. 20003 (202) 544-1969 or
JOBLINE at American Society for Information Science, 1424 Sixteenth St., N.W.,
Suite 404, Washington D.C. 20036 (202) 462-1000 to file a resume and ask to
be listed in announcements.
    Readers - send suggestions to Ken or me to forward if you have other
suggestions. - Ed]
 
------------------------------
 
Date:         3-FEB-1986 16:20:31
From:        ARCHIVE%vax3.oxford.ac.uk@cs.ucl.ac.uk
     
...
I'd be interested in IRList - I do a lot of work at present in that area
using special architectures to give content addressing capabilities. Use my
personal account LOU @ OX.VAX1 rather than ARCHIVE tho.
...
Incidentally, a company called Sydney has been showing a CD-ROM version of the
Library of Congress Catalogue around here lately; also word has reached us
from California of a CD-ROM version of the Thesaurus Linguae Graecae. Is there
any central place where information about these evolving technologies can be
obtained?
     
Best wishes, Lou Burnard
     
[Note: There is an annual publication of the Amer. Society of Inf. Science
called ARIST.  Volume 20 is the next due.  They have good surveys on many topics
such as one in Vol. 19 by Chuck Goldstein on Storage Technology.  ASIS has 
numerous special interest groups to try to cover the field.  [Too] many 
conferences are being held -- March 4-7 in Seattle will be the 1st Int'l Conf. 
on CD ROM.
    Does anyone have other comments on information sources? - Ed]
 
------------------------------
 
From: "V.J. Raghavan"<raghavan%uregina1.bitnet@CSNET-RELAY>
Date: Fri, 24 Jan 86 19:20:08 cst
To: IRList%vpi.csnet@CSNET-RELAY
Subject: submission to IR list [long set of abstracts - Ed]
 
 
.op
.pl75
                                                         blurbs.vr
 
                            ABSTRACTS
 
(Chosen by G.  Salton or V. Raghavan from 1983 issues of journals 
 in the retrieval area)                         
 
11.  INFORMATION RETRIEVAL AT THE SEDGWICK MUSEUM 
 
     M.F. Porter
     Dept. of Earth Sciences, University of Cambridge,
     Downing Street, Cambridge CB2 3EQ, UK
 
          The Sedgwick Museum at the University of Cambridge  now 
     has  a  high  quality  and comprehensive  online  IR  system 
     covering   its  collection  of  450,000  catalogued   fossil 
     objects.    The   indexing  process,   and   the   retrieval 
     capabilities  are  described in detail,  and an  example  is 
     given  of  how  the  IR  system is  used  with  real  museum 
     enquiries.  It is also shown how the IR system is used as an 
     aid  in many different apsects of data management,  such  as 
     catalogue  updating and editing,  and dealing with loans  of 
     specimens and movements of specimens between drawers.
 
     (INFORMATION TECHNOLOGY: RESEARCH & DEVELOPMENT, Vol. 2, No. 
     4, pp. 169-186, 1983)
 
12.  THE UTAH TEXT RETRIEVAL PROJECT
 
     L.A. Hollaar
     Dept. of computer Science, University of Utah,
     Salt Lake City, UT 84112
 
          The  Utah Text Retrieval Project seeks  well-engineered 
     solutions to the implementation of large (over 50 x 10**9)
     characters), inexpensive (less than a dollar a query), rapid 
     (average  response  time  of 10  seconds)  text  information 
     retrieval  systems.   It  was  established in  1980  in  the 
     Department  of  Computer Science at the University of  Utah, 
     and  is an outgrowth of a similar project at the  University 
     of Illinois with which the author was associated.
          At  the  present  time,  the project  has  three  major 
     components.   Perhaps,  the  best known is the work  on  the 
     specialized   processors,   particularly   search   engines, 
     necessary to achieve the desired performance and cost.   The 
     other  two concern the user interface to the system and  the 
     system's  internal  structure.   The work on user  interface 
     development  is  not only concentrating on  the  syntax  and 
     semantics  of  the query language,  but also on the  overall 
     environment the system presents to the user.   Environmental 
     enhancements  include  convenient ways to  'browse'  through 
     retrieved documents,  access to other information  retrieval 
     systems   through  gateways  supporting  a  common   command 
     interface,  and interfaces to word processing systems.   The 
     system's  internal  structure is based on a high-level  data 
     communications  protocol linking the user  interface,  index 
     processor, search processor, and other system modules.  This 
     allows  them  to  be  easily  distributed  in  a   multi- or 
     specialized-processor  configuration.   It  also allows  new 
     modules, such as a knowledge-based query reformulator, to be 
     added.
 
     (INFORMATION TECHNOLOGY: RESEARCH & DEVELOPMENT, Vol. 2, No. 
     4, pp. 155-168, 1983)
 
13.  A GENERALIZED TERM DEPENDENCE MODEL IN INFORMATION RETRIEVAL
 
     C.T. Yu
     Dept. of Information Engineering, 
     University of Illinois-Chicago Circle, Chicago, Illinois, 60680
 
     C. Buckley
     Dept. of Computer Science, Cornell Univesity, Ithaca, NY 14853
 
     K. Lam
     Dept. of Statistics, Hong Kong University, Hong Kong
 
     G. Salton
     Dept. of Computer Science, Cornell University, Ithaca, NY 14853
 
          The tree dependence model has been used successfully to 
     incorporate  dependencies between certain term pairs in  the 
     information retrieval process,  while the Bahadur Lazarsfeld 
     Expansion  (BLE)  which specifies dependencies  between  all 
     subsets  of  terms  has  been used  to  identify  productive 
     clusters of items in a clustered database environment.   The 
     successes of these models are unlikely to be accidental;  it 
     is of interest therefore to examine the similarities between 
     the two models.
          The  disadvantage  of the BLE model is the  exponential 
     number  of terms appearing in the full expression,  while  a 
     truncated  BLE  system  may  produce  negative   probability 
     values.   The  disadvantage of the tree dependence model  is 
     the  restriction to dependencies between certain term  pairs 
     only  and  the exclusion of  higher-order  dependencies.   A 
     generalized  term  dependence model is  introduced  in  this 
     study  which does not carry the disadvantages of either  the 
     tree  dependence  or  the  BLE  models.   Sample  evaluation 
     results  are  included to illustrate the operations  of  the 
     generalized system.
 
     (INFORMATION TECHNOLOGY: RESEARCH & DEVELOPMENT, Vol. 2, No. 
     4, pp. 129-154, 1983)
 
14.  FULLY AUTOMATIC BOOK INDEXING
 
     Martin Dillon
     School of Library Science, University of North Carolina
 
     Laura K. McDonald
     Information Systems, Blue Corss-Blue Sheild of North Carolina
 
          The  Fully  Automatic Syntactically-based  Indexing  of 
     Text  (FASIT)  system represents the contents of a  document 
     without  a  full  parse or semantic analysis  of  the  text.  
     Content-bearing  units  are isolated and then  grouped  into 
     quasi-synonymous  classes  whose main term is used to  index 
     the document.   Previous experiments with FASIT demonstrated 
     its  usefulness in an associational  retrieval  environment; 
     the  experiment  described here explores FASIT's value as  a 
     book-indexing  system.    It  is  difficult  to  avoid   the 
     conclusion that this indexing approach offers the promise of 
     being practical and effective.
 
     (JOURNAL OF DOCUMENTATION, Vol. 39, No. 3, pp. 135-154, 1983)
 
15.  EXTENDED BOOLEAN INFORMATION RETRIEVAL
 
     Gerard Salton
     Cornell University
 
     Edward A Fox
     International Institute for Tropical Agriculture, Ibadan, Nigeria
 
     Harry Wu
     ITT Programming Technology Center
 
          A new, extended Boolean information-retrieval system is 
     introduced  that is intermediate between the Boolean  system 
     of  query processing and the vector-processing  model.   The 
     query structure inherent in the Boolean system is preserved, 
     while  at the same time weighted terms may  be  incorporated 
     into both queries and stored documents; the retrieved output 
     can  also be ranked in strict similarity order with the user 
     queries.  A conventional retrieval system can be modified to 
     make use of the extended system.   Laboratory tests indicate 
     that  the extended system produces better  retrieval  output 
     than either the Boolean or the vector-processing system.
 
     (ACM COMMUNICATIONS, Vol. 26, No. 11, pp. 1022-1036, 1983)
 
16.  HIERARCHICAL   FILE  ORGANIZATION  AND  ITS  APPLICATION  TO 
     SIMILAR-STRING MATCHING
 
     Tetsuro Ito and Makoto Kizawa
     University of Library and Information Science,
     Ibaraki, Japan
 
          The  automatic  correction  of  misspelled  inputs   is 
     discussed from a viewoint of similar-string matching.  First 
     a  hierarchical file organization based on a linear ordering 
     of  records  is  presented  for  retrieving  records  highly 
     similar  to any input query.   Then the spelling problem  is 
     attacked  by constructing a hierarchical file for a  set  of 
     strings  in  a dictionary of English  words.   The  spelling 
     correction  steps proceed as follows:   (1) find one of  the 
     best-match  strings which are most similar to a  query,  (2) 
     expand the search area for obtaining the good-match strings, 
     and  (3) interrupt the file search as soon as  the  required 
     string  is displayed.   Computational experiments verify the 
     performance  of  the  proposed  methods  for  similar-string 
     matching under the UNIX time-sharing system.
 
     (ACM TRANSACTIONS ON DATABASE SYSTEMS,  Vol.  8,  No. 3, pp. 
     410-433, 1983)
 
17.  INDEXING AND RETRIEVAL STRATEGIES FOR NATURAL LANGUAGE  FACT 
     RETRIEVAL
 
     Janet L. Kolodner
     Georgia Institute of Technology
 
          Researchers  in  artificial intelligence have  recently 
     become   interested  in  natural  language  fact  retrieval; 
     currently,  their research is at a point where it can  begin 
     contributing to the field of Information Retrieval.  In this 
     paper,  strategies  for  a natural language  fact  retrieval 
     system  are  mapped  out,  and  approaches to  many  of  the 
     organization  and  retrieval problems  are  presented.   The 
     CYRUS system,  which keeps track of important people and  is 
     queried  in  English,  is presented and used  to  illustrate 
     those solutions.
 
     (ACM TRANSACTIONS ON DATABASE SYSTEMS,  Vol.  8,  No. 3, pp. 
     434-463, 1983)
 
18.  PARTIAL MATCH RETRIEVAL USING HASHING AND DESCRIPTORS
 
     K. Ramamohanarao, John W. Lloyd, and James A. Thom
     University of Melbourne
 
          This  paper  studies a partial-match  retrieval  scheme 
     based  on hash functions and descriptors.   The emphasis  is 
     placed  on  showing  how the use of a  descriptor  file  can 
     improve the performance of the scheme.   Records in the file 
     are  given  addresses according to hash functions  for  each 
     field in the record.  Furthermore, each page of the file has 
     associated with it a descriptor, which is a fixed-length bit 
     string,  determined  by the records actually present in  the 
     page.   Before  a  page  is accessed to see if  it  contains 
     records  in the answer to a query,  the descriptor  for  the 
     page  is  checked.   This  check may show that  no  relevant 
     records are on the page and,  hence,  that the page does not 
     have  to be accessed.   The method is shown to have  a  very 
     substantial performance advantage over pure hashing schemes, 
     when  some fields in the records have large key  spaces.   A 
     mathematical  model  of the scheme,  plus an  algorithm  for 
     optimizing performance, is given.
     
     (ACM TRANSACTIONS ON DATABASE SYSTEMS,  Vol.  8,  No. 4, pp. 
     552-576, 1983)
 
19.  OUTLINE OF A GENERAL PROBABILISTIC RETRIEVAL MODEL
 
     Abraham Bookstein
     University of Chicago
 
          For reasons of technical convenience, current retrieval 
     algorithms based on probabilistic reasoning are derived from 
     models  that  assume patrons evaluate documents using a  two 
     value  relevance scale.   This paper extends the  theory  by 
     describing  a model which includes a more general  relevance 
     scale.   This  model permits a re-examination of the earlier 
     theory as a special case of that developed here and leads to 
     a more satisfying interpretation of the ranking principle of 
     the earlier models.
 
     (JOURNAL OF DOCUMENTATION, Vol. 39, No. 2, June 1983, pp. 63-72)
 
20.  TEXT ANALYSIS AND BASIC CONCEPT STRUCTURES
 
     John M. Weiner
     University of Southern California, School of Medicine,
     2025 Zonal Avenue, Los Angeles,  CA 90033,  U.S.A.
 
          Information  specialists frequently are called upon  to 
     analyze unfamiliar subjects.   With the growth in volume and 
     topic,  specialists  will  require techniques to  deal  with 
     textual  material  rapidly  and  effectively.    This  paper 
     describes  a method of text analysis designed to  facilitate 
     extraction  of terms related to a single  characteristic  or 
     concept.  The term extraction is performed by completing the 
     sentence: "The characterisic of interest is described by ( -
     descriptive  term - )."  Using this method,  the analyst can 
     extract  attributes  of the basic characteristic  and  terms 
     representing  related  characteristics.    With  these   two 
     classes  of  terms,  the analyst can build a  basic  concept 
     structure describing the subject matter.  Prior knowledge of 
     the  subject  is not required.   The method  is  illustrated 
     using pathological descriptions of female genital cancers.
 
     (INFORMATION PROCESSING & MANAGEMENT,  Vol.  19,  No. 5, pp. 
     313-319, 1983)
 
21.  AUTOMATIC  SPELLING  CORRECTION USING A  TRIGRAM  SIMILARILY 
     MEASURE
 
     Richard C. Angell, George E. Freund and Peter Willett
     Department of Information Studies, University of Sheffield
     Sheffield  S10 2TN England
 
          A  nearest neighbour search procedure is described  for 
     the  automatic correction of  misspellings.   The  procedure 
     involves  the replacement of a misspelt word by that word in 
     a dictionary which best matches the misspelling,  the degree 
     of  match  being calculated using a  similarity  coefficient 
     based  on  the number of trigrams common to the  two  words.  
     Experiments  with  a collection of 1544 misspellings  and  a 
     dictionary  of  64,636  words  suggest  that  the  procedure 
     results in the unique identification of the correct spelling 
     for over 75% of the misspellings if the correct form of  the 
     word  is  in  the dictionary,  and that this figure  may  be 
     increased  to  over  90%  if  near,   rather  than  nearest, 
     neighbours are acceptable.
 
     (INFORMATION PROCESSING & MANAGEMENT,  Vol.  19,  No. 4, pp. 
     255-261, 1983)
 
------------------------------
 
END OF IRList Digest
********************