Date:     Thu, 10 Oct 85 12:55 EST
To:       irdis at vpi
Subject:  IRList Digest V1 #14

IRList Digest           Thursday, 10 Oct 1985      Volume 1 : Issue 14

Today's Topics:
   Query - Pointers to commercial and research IR systems
   Article - Dissertation Abstract on Clustering & Cluster Search
   Announcement - Seminar on Form of Linguistic Knowledge
		  [rest of digest extracted from AIList issues]
                Discussion - change to workstation computing environment
                Discussion - design issues for building environments
                Tech reports & bibliography - more sources
   References - CSLI Reports
              - Recent Technical Reports
              - Recent Articles
              - Recent Articles

----------------------------------------------------------------------

From: Abel.pa@XEROX
Date: Tue, 1 Oct 85 10:25:25 PDT
Subject: Re: IRList Digest V1 #12

Ed,

In your editorial comment on the query from Ramesh Astik [...]
you mentioned commercial systems and some expert systems which may meet
Ramesh's needs.  I would be interested in pointers to literature about
these systems.  If you feel that there is general interest in the list
of these references, then please post it on the IRlist.  If not, then I
would appreciate your sending it to me.  
	Thanks,

	Mark Abel
	System Concepts Laboratory
	Xerox PARC
 [Mark:
     In my earlier comment I mentioned a good reference book on the
   area.  There is a good deal of literature, appearing in Journal
   of the Amer. Soc. of Inf. Science, Journal of Documentation, Inf.
   Proc. and Management, and various ACM publications.  Two of the
   earliest and best know systems are SMART (worked on at Harvard,
   Cornell, VPI, etc.) and SIRE (developed at Syracuse and George
   Mason Univ.).  A new version of SMART is available from Chris Buckley 
   or Gerard Salton at Cornell, for UNIX 4.2BSD.  SIRE, marketted by 
   KNM, Inc. (contact gmu90x!mkoll@seismo) is similar to SMART.  
   AIRS, Inc., in MD, is developing a new product based on some work
   in SMART, which should have a nice user interface.
     The Montreal Conf. in June 1985 led to Proc. of the Eighth Annual 
   Int. ACM SIGIR Conf. on R&D in Inf.  Ret., ACM Order No. 606850, 
   which has some papers of interest.  RUBRIC (p.243-51) is one expert 
   system being developed (contact fuzzy1@aids-unix.arpa).  Another, 
   CODER, being implemented at Virginia Tech, is previewed (p. 42-53).  
   A paper by Wiliamson (p. 252-266) mentions a new commericial system 
   derived from the early work on SMART.  
     Marcus, at MIT, has worked on CONIT, which gives a uniform interface 
   to many information vendors and databases.  Current efforts include
   adding search expertise.  This approach differs from most of the ones
   above, since it is a front-end to existing systems.
     Many of the developers of systems mentioned here, and of others,
   are IRList readers, so I hope there will be more comments! - Ed]

------------------------------

Subject: From: Ellen Voorhees <ellen%gvax.cs.cornell.edu@CSNET-RELAY>
Date: Fri, 4 Oct 85 14:34:35 edt
Subject: Thesis Abstract

    The following is the abstract of my doctoral dissertation.  The thesis
was written under the direction of Gerard Salton at Cornell University.
It will soon be a technical report;  anyone interested in obtaining
a copy may do so by writing to
     Technical Reports Librarian
     405 Upson Hall
     Department of Computer Science
     Cornell University
     Ithaca, New York 14853.          

Ellen Voorhees
ellen@gvax.cs.cornell.edu (arpanet)
{ihnp4,allegra}!cornell!ellen (UUCP)
crnlcs%ellen (BITNET)



  The Effectiveness and Efficiency of Agglomerative Hierarchic Clustering
                        in Document Retrieval

    The major component of a document retrieval system is the component that
searches the document collection and selects the documents to be returned in
response to a query.  Since users wait for the results of the search, the
component must be efficient as well as effective.  The main goal of this thesis
is to compare clustered file searches and inverted file searches in order to
determine under what circumstances one search is to be preferred over the
other.  A preliminary goal is to define a good cluster search.

    Three types of agglomerative clustering strategies, the single link, the
complete link, and the group average link methods, are investigated.  Searches
of the single link hierarchy, the cluster hierarchy used extensively in
previous research, are shown to be inferior to searches of the other hierarchy
types.  Searches of the group average link and complete link hierarchies
perform similarly for small collections; for larger collections, searches of
the complete link hierarchy are more effective.  A top-down search of the
group average link hierarchy is the most time efficient search asymptotically.

    The experimental evidence suggests that the difference in the efficiency
and effectiveness of the complete link and group average link searches is due
to the restricted depth of the complete link hierarchy.  The depth of the group
average link hierarchy increases as the size of the collection increases, but
the depth of the complete link hierarchy does not.  Thus the largest clusters
in the complete link hierarchy are not very large, and the clusters can be
accurately represented by centroids.  Since the depth of the hierarchy does not
increase with collection size, searches of the complete link hierarchy should
remain effective for larger collections.

    The top-down search of the complete link hierarchy is somewhat more
effective than the inverted file search.  The relative efficiency of the two
searches depends on the relative efficiency of accessing a page and computing
a similarity, since the cluster search accesses many more pages but computes
fewer similarities than the inverted file search.  For an inexpensive
similarity measure, the inverted file search is much more efficient.

------------------------------

From: Peter de Jong <DEJONG%MIT-OZ@MIT-MC>
Date: Mon, 30 Sep 1985  09:41 EDT
Subject: Cognitive Science Calendar


Tuesday  1, October  7:30pm   Room: 34-401 (Grier Conference Room)

            MIT Center For Cognitive Science

           "On the Form of Linguistic Knowledge"

Authors:    Janet Dean Fodor and Stephen Crain
            University of Connecticut

Speaker:    Janet Dean Fodor

Commentary: Professor Howard Lasnik
            Department of Linguistics
            University of Connecticut

            Professor Scott Weinstein
            Department of Philosophy
            MIT

The Center for Cognitive Science is pleased to announce the first
Cognitive Science Seminar of the fall semester.  The seminars will
again be held on Tuesday evenings from 7:30 to 9:30 pm.

Each program will consist of presentations of recent papers in
Linguistics, Philosophy, Psychology and Artificial Intelligence;
The format of each program contains commentary, reply, open discussion
and a brief refreshment break. 

Copies of the paper are available from Karen Persinger
Room 20B-225  (617) 253-7358

------------------------------

Date: Tue 24 Sep 85 11:10:07-PDT
From: Ana Haunga <HAUNGA@SUMEX-AIM.ARPA>
Subject: Seminar - KSL/Symbolic Systems Resources Group (SU)

[Note: all of the issues below impact information storage and retrieval
research and practice.  How about some discussion on this now!  Would
someone be interested in running a panel on this topic at some upcoming 
conference? - Ed]
		  
      [rest of digest extracted from AIList issues]
[Copied from AIList Digest Sunday, 29 Sep 1985  Volume 3 # 131 - Ed]


                     KSL/Symbolic Systems Resources Group

                        Tom Rindfleisch and Bill Yeager
                              Stanford University

This is the first of several SIGLunches this fall that will summarize work in
each of the five sublabs of the Stanford Knowledge Systems Laboratory (KSL),
including the Heuristic Programming Project, HELIX Group, Medical Computer
Science Group, Logic Group, and Symbolic Systems Resources Group (SSRG).  This
week's talk will consist of a brief overview of the KSL as an AI laboratory and
a survey of SSRG research and development activities.

Since 1980, the computing environment for KSL research has been moving slowly
away from central time-shared mainframes (like the SUMEX 2060 and VAX) toward
networked Lisp workstations.  Improvements in workstation performance, falling
prices, better packaging, and a wider vendor selection are now accelerating
this transition.  Over the next five years, we are proposing to phase out the
SUMEX research mainframes so that all KSL computing will be workstation-based
-- not only research program development but common tasks like text processing,
mail, file management, and budgeting.  This raises several important issues
that will require a community system software effort comparable to that in the
1970's that led to the current TOPS-20 and UNIX environments.

How can the user computing environment be improved using workstation bitmapped
graphics and AI methods for more intelligent systems/applications programs?

How can user displays connect flexibly to workstations -- from home, over
remote networks like ARPANET, and locally over Ethernet?

How can the considerable computing power distributed among many workstations be
combined to support individual user tasks?

What are the impacts on network protocols and services (file servers, gateways,
printing, etc.) of large numbers of workstations?

------------------------------

Date: Tue 24 Sep 85 10:35:38-PDT
From: Terry Winograd <WINOGRAD@SU-CSLI.ARPA>
Subject: Seminar Series - Software Environments (CSLI)

[Copied from AIList Digest Sunday, 29 Sep 1985  Volume 3 # 131 - Ed]


New project meeting on environments
Mondays 1-2 in the trailer classroom, Ventura
[Future meetings will be from 12 to 1:15.]

Beginning Monday, Sept. 30 there will be a weekly meeting on
environments for working with symbolic structures (this includes
programming environments, specification environments,  document
preparation environments, "linguistic workstations", and grammar-
development environments).  As a part of doing our research, many
of us at CSLI have developed such environments, sometimes as a matter
of careful design, and sometimes by the seat of the pants.  In this
meeting we will present to each other what we have done, and also look
at work done elsewhere (both through guest speakers and reading
discussions).

The goal is to look at the design issues that come up in building
environments and to see how they have been approached in a variety of
cases.  We are not concerned with the particular details ("pop-up menus
are/aren't better than pull-down menus") but with more fundamental
problems.  For example:

  What is the nature of the underlying structure the environment supports:
  chunks of text? a data-base of relations? a tree or graph structure?
  How is this reflected in the basic mode of operation for the user?

  How does the user understand the relation between objects (and
  operations on them) that appear on the visible representation (screen
  and/or hardcopy) and the corresponding objects (and operations) on
  some kind of underlying structure?  How is this maintained in a
  situation of multiple presentations (different views and/or multiple
  windows)?  How is it maintained in the face of breakdown (system
  failure or catastrophic user error in the middle of an edit, transfer,
  etc.)?

  Does the environment deal with a distributed network of storage and
  processing devices?  If so, does it try to present some kind of
  seamless "information space" or does it provide a model of objects
  and operations that deals with moving things (files, functions, etc.)
  from one "place" to another, where different places have relevant
  different properties (speed of access, security, shareability, etc.)?

  How is consistency maintained between separate objects that are
  conceptually linked (source code and object code, formatter source
  and printer-ready files, grammars and parse-structures generated from
  them, etc.)?  To what extent is this simply left to user convention,
  supported by bookkeeping tools, or automated?

  What is the model for change of objects over time?  This includes
  versions, releases, time-stamps, reference dates, change logs, etc.,
  How is information about temporal and derivational relationships
  supported within the system?

  What is the structure for coordination of work?  How is access to the
  structures regulated to prevent "stepping on each other's toes"? to
  facilitate joint development? to keep track of who needs to do what
  when?

Lurking under these are the BIG issues of ontology, epistemology,
representation, and so forth.  Hopefully our discussions on a more down-
to-earth level will be guided by a consideration of the larger picture
and will contribute to our understanding of it.

The meeting is open to anyone who wishes to attend.  Topics will be
announced in advance in the newsletter.  The first meeting will
be devoted to a general discussion of what should be addressed and to
identifying the relevant systems (and corresponding people) within
CSLI, and within the larger (Stanford, Xerox, SRI) communities in
which it exists.

------------------------------

Date: 27 Sep 1985 20:03-CST
From: leff%smu.csnet@CSNET-RELAY.ARPA
Subject: Bibliographies

[Copied from AIList Digest Sunday, 29 Sep 1985  Volume 3 # 130 - Ed]

You have probably noticed the announcement of the new tech
report list.  [...]  The thing that started me on this
was the response of AIList readers to my lists of tech reports.
>From what filled up my mailbox, it was obvious that many if not
most of your readers were not seeing the tech report lists and
a substantial fraction of those did not even know that tech reports
existed!  Hopefully this list will serve a useful function for
everyone.  It was something that should have been done a long
time ago.  [...]

I have increased the number of magazines from which my bibliographies
(type 1) are drawn.  We now have added ComputerWorld as well as a
few minor magazines.  ComputerWorld did a very good job on IJCAI-85
and I found material there that was no place else.  [...]

According to bib, we now have 430 documents sent to you since I
changed formats to machine readable.  This does not include
information sent to you in other formats.


  [I would like to thank Laurence for providing his services to
  AIList and the net community.  It's a heck of a hobby, but he
  does a great job.  -- KIL]
  [We are also grateful!  In IRList I will copy only those items
  I think will be of interest to IRList readers. - Ed]

------------------------------

Date: Wed 25 Sep 85 16:54:08-PDT
From: Emma Pease <Emma@SU-CSLI.ARPA>
Subject: New CSLI Reports

[Copied from AIList Digest Sunday, 29 Sep 1985  Volume 3 # 129 - Ed]

                             NEW CSLI REPORTS

      Report No. CSLI-85-31, ``A Formal Theory of Knowledge and Action''
   by Robert C. Moore, and Report No. CSLI-85-32, ``Finite State
   Morphology: A Review of Koskenniemi'' by Gerald Gazdar, have just been
   published.  These reports may be obtained by writing to David Brown,
   CSLI, Ventura Hall, Stanford, CA 94305 or Brown@SU-CSLI.

------------------------------

Date: 27 Sep 1985 19:49-CST
From: leff%smu.csnet@CSNET-RELAY.ARPA
Subject: Recent Technical Reports

[Copied from AIList Digest Sunday, 29 Sep 1985  Volume 3 # 129 - Ed]

Addresses for requests:

Carnegie Mellon University, CS Department, Pittsburgh, PA 15213
Stanford University, Department of Computer Science, Stanford, CA 94305

%A Anne von der Leith Gardner
%T An Artificial Intelligence Approach to Legal Reasoning
%R STAN-CS-85-1045
%I Stanford University Department of Computer Science
%D JUN 1984, 205 pages (microfiche only $2.00_

%A Masaru Tomita
%T An Efficient Context-free Parsing Algrithm for Natural Languages
and its Applications
%D MAY 1985
%I Carnegie Mellon Computer Science

%A Gary L. Bradshaw
%T Learning to Recognize Speech Sounds: A Theory and Model
%D JUN 1985
%I Carnegie Mellon Computer Science

%A Masaru Tomita
%T Feasibility Study of Personal/Interactive Machine Translation
Systems
%D JUL 1985
%I Carnegie Mellon Computer Science

%A Jaime G. Carbonnell
%A Masaru Tomita
%T New Approaches to Machine Translation
%D JUL 1985
%I Carnegie Mellon Computer Science

%A Robert E. Frederking
%T Syntax and Semantics in Natural Language Parsers
%D MAY 1985
%I Carnegie Mellon Computer Science

%A Jeannette M. Wing
%A Farhad Arbab
%T Geometric Reasoning: A New Paradigm for Processing Geometric Information
%D JUL 1985
%I Carnegie Mellon Computer Science

------------------------------

Date: 27 Sep 1985 20:02-CST
From: leff%smu.csnet@CSNET-RELAY.ARPA
Subject: Recent Articles

[Copied from AIList Digest Sunday, 29 Sep 1985  Volume 3 # 129 - Ed]

%A Elain Marsh
%A Carol Friedman
%T Transporting the Linguistic String Project from a Medical
to a Navy Domain
%J ACM Transactions on Office Information Systems
%D APR 1985

%A Jonathan Slocum
%A Carol F. Justus
%T Transportability to Other Languages: The Natural Language
Processing Project in the AI Program at MCC
%J ACM Transactions on Office Information Systems
%D APR 1985

%A Samuel S. Epstein
%T Transportable Natural Language Processing Through Simplicity
%J ACM Transactions on Office  Information Systems
%D APR 1985

%A Bozena Hernisz Thompson
%A Frederick B. Thompson
%T ASK is Transportable in Half a Dozen Ways
%J ACM Transactions on Office Information Systems
%D APR 1985

%A Fred J. Damerau
%T Problems and Some Solutions in Customization of Natural Language Database
Front Ends
%J ACM Transactions on Office Information Systems
%D APR 1985

%A Carole D. Hafner
%A Kurt Golden
%T Portability of Syntax and Semantics in Datalog
%J ACM Transactions on Office Information Systems
%D APR 1985

[Note: Thanks to Bruce Ballard for editing this special issue of
TOOIS, and to the authors for giving a nice rounded presentation. - Ed]

%A Appelbaum
%A Ruspini
%T ARIES: A Tool for Inference Under Conditions of Imprecision and
Uncertainty
%J BOOK15

%A Klinger
%T Search Processes for the Application of Artificial Intelligence
%J BOOK15

%A Rosenberg
%T ERIK, An Expert Ship Message Interpreter: New Mechanism for Flexible
Parsing
%J BOOK15

%A Obermeier
%A de Hilster
%T DIID -- A Data-independent Interface for Databases -- The
AI Perspective
%J BOOK15

%A Roger Schank
%A Steven Shwartz
%T The Role of Knowledge Engineering in Natural Language Systems
%J BOOK17

%A Jaime Carbonnel
%T The Role of User Modeling in Natural Language Interface Design
%J BOOK17

%A Gerald Hice
%A Stephen Andriole
%T Artificial Intelligent Videotex
%J BOOK17

%A Dexter Fletcher
%T Intelligent Instructional Systems in Training
%J BOOK17

------------------------------

Date: 27 Sep 1985 20:00-CST
From: leff%smu.csnet@CSNET-RELAY.ARPA
Subject: Recent Articles

[Extracted from AIList Digest Sunday, 29 Sep 1985  Volume 3 # 130 - Ed]

%A A. Lansner
%A O. Ekeberg
%T Reliability and Speed of Recall in an Associate Network
%J IEEE Transactions on Pattern Analysis and Machine Intelligence
%V PAMI-7
%N 4
%D JUL 1985
%P 490-498

%A Sol Libes
%T Bytelines
%J Byte
%D SEP 1985
%P 420
%V 10
%N 9
%X Kurzweil Applied Intelligence speech recognition KVS-3000
%X Kurzweil has introduced a KVS-3000 that can handle 1000 words
continuous speech with 100 per cent accuracy.  It is selling at
$3000.00 in quantity and comes in PC, multibus and RS232C versions.
It is speaker adaptive and its performance increases the more
it talks with the same user.

%A Jean Renard Ward
%%A Barry Blesser
%T Interactive recognition of Handprinted Characters for Computer
Input
%J IEEE Computer Graphics and Applications
%V 5
%N 9
%P 24-37
%K Character Recogniton Pencept
%X discusses the human interface issues once you have character
recognition on your computer; i. e. how best to interface handwritten
character recognition with your product.

%A S. Jerrold Kaplan
%T Designing a Portable Natural Language Database Query System
%J ACM Transactions on Database Systems
%V 9
%N 1
%D MAR 1984
%P 1-19

%A C. Hornsby
%A H. C. Leung
%T The Design and Implementation of a Flexible Retrieval Language
for a Prolog Database System
%J SIGPLAN
%V 20
%N 9
%D SEP 1985
%P 43-51
%X implementation of a database management system in PROLOG

%A Jeffry Beeler
%T Symantec package out
%J ComputerWorld
%V 19
%N 38
%P 12
%D SEP 23, 1985
%K Symantec natural language microcomputer data base system interface
%X a new product which uses natural language to interface with a
database

------------------------------

END OF IRList Digest
********************