Date: Thu, 10 Oct 85 12:55 EST To: irdis at vpi Subject: IRList Digest V1 #14 IRList Digest Thursday, 10 Oct 1985 Volume 1 : Issue 14 Today's Topics: Query - Pointers to commercial and research IR systems Article - Dissertation Abstract on Clustering & Cluster Search Announcement - Seminar on Form of Linguistic Knowledge [rest of digest extracted from AIList issues] Discussion - change to workstation computing environment Discussion - design issues for building environments Tech reports & bibliography - more sources References - CSLI Reports - Recent Technical Reports - Recent Articles - Recent Articles ---------------------------------------------------------------------- From: Abel.pa@XEROX Date: Tue, 1 Oct 85 10:25:25 PDT Subject: Re: IRList Digest V1 #12 Ed, In your editorial comment on the query from Ramesh Astik [...] you mentioned commercial systems and some expert systems which may meet Ramesh's needs. I would be interested in pointers to literature about these systems. If you feel that there is general interest in the list of these references, then please post it on the IRlist. If not, then I would appreciate your sending it to me. Thanks, Mark Abel System Concepts Laboratory Xerox PARC [Mark: In my earlier comment I mentioned a good reference book on the area. There is a good deal of literature, appearing in Journal of the Amer. Soc. of Inf. Science, Journal of Documentation, Inf. Proc. and Management, and various ACM publications. Two of the earliest and best know systems are SMART (worked on at Harvard, Cornell, VPI, etc.) and SIRE (developed at Syracuse and George Mason Univ.). A new version of SMART is available from Chris Buckley or Gerard Salton at Cornell, for UNIX 4.2BSD. SIRE, marketted by KNM, Inc. (contact gmu90x!mkoll@seismo) is similar to SMART. AIRS, Inc., in MD, is developing a new product based on some work in SMART, which should have a nice user interface. The Montreal Conf. in June 1985 led to Proc. of the Eighth Annual Int. ACM SIGIR Conf. on R&D in Inf. Ret., ACM Order No. 606850, which has some papers of interest. RUBRIC (p.243-51) is one expert system being developed (contact fuzzy1@aids-unix.arpa). Another, CODER, being implemented at Virginia Tech, is previewed (p. 42-53). A paper by Wiliamson (p. 252-266) mentions a new commericial system derived from the early work on SMART. Marcus, at MIT, has worked on CONIT, which gives a uniform interface to many information vendors and databases. Current efforts include adding search expertise. This approach differs from most of the ones above, since it is a front-end to existing systems. Many of the developers of systems mentioned here, and of others, are IRList readers, so I hope there will be more comments! - Ed] ------------------------------ Subject: From: Ellen Voorhees Date: Fri, 4 Oct 85 14:34:35 edt Subject: Thesis Abstract The following is the abstract of my doctoral dissertation. The thesis was written under the direction of Gerard Salton at Cornell University. It will soon be a technical report; anyone interested in obtaining a copy may do so by writing to Technical Reports Librarian 405 Upson Hall Department of Computer Science Cornell University Ithaca, New York 14853. Ellen Voorhees ellen@gvax.cs.cornell.edu (arpanet) {ihnp4,allegra}!cornell!ellen (UUCP) crnlcs%ellen (BITNET) The Effectiveness and Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval The major component of a document retrieval system is the component that searches the document collection and selects the documents to be returned in response to a query. Since users wait for the results of the search, the component must be efficient as well as effective. The main goal of this thesis is to compare clustered file searches and inverted file searches in order to determine under what circumstances one search is to be preferred over the other. A preliminary goal is to define a good cluster search. Three types of agglomerative clustering strategies, the single link, the complete link, and the group average link methods, are investigated. Searches of the single link hierarchy, the cluster hierarchy used extensively in previous research, are shown to be inferior to searches of the other hierarchy types. Searches of the group average link and complete link hierarchies perform similarly for small collections; for larger collections, searches of the complete link hierarchy are more effective. A top-down search of the group average link hierarchy is the most time efficient search asymptotically. The experimental evidence suggests that the difference in the efficiency and effectiveness of the complete link and group average link searches is due to the restricted depth of the complete link hierarchy. The depth of the group average link hierarchy increases as the size of the collection increases, but the depth of the complete link hierarchy does not. Thus the largest clusters in the complete link hierarchy are not very large, and the clusters can be accurately represented by centroids. Since the depth of the hierarchy does not increase with collection size, searches of the complete link hierarchy should remain effective for larger collections. The top-down search of the complete link hierarchy is somewhat more effective than the inverted file search. The relative efficiency of the two searches depends on the relative efficiency of accessing a page and computing a similarity, since the cluster search accesses many more pages but computes fewer similarities than the inverted file search. For an inexpensive similarity measure, the inverted file search is much more efficient. ------------------------------ From: Peter de Jong Date: Mon, 30 Sep 1985 09:41 EDT Subject: Cognitive Science Calendar Tuesday 1, October 7:30pm Room: 34-401 (Grier Conference Room) MIT Center For Cognitive Science "On the Form of Linguistic Knowledge" Authors: Janet Dean Fodor and Stephen Crain University of Connecticut Speaker: Janet Dean Fodor Commentary: Professor Howard Lasnik Department of Linguistics University of Connecticut Professor Scott Weinstein Department of Philosophy MIT The Center for Cognitive Science is pleased to announce the first Cognitive Science Seminar of the fall semester. The seminars will again be held on Tuesday evenings from 7:30 to 9:30 pm. Each program will consist of presentations of recent papers in Linguistics, Philosophy, Psychology and Artificial Intelligence; The format of each program contains commentary, reply, open discussion and a brief refreshment break. Copies of the paper are available from Karen Persinger Room 20B-225 (617) 253-7358 ------------------------------ Date: Tue 24 Sep 85 11:10:07-PDT From: Ana Haunga Subject: Seminar - KSL/Symbolic Systems Resources Group (SU) [Note: all of the issues below impact information storage and retrieval research and practice. How about some discussion on this now! Would someone be interested in running a panel on this topic at some upcoming conference? - Ed] [rest of digest extracted from AIList issues] [Copied from AIList Digest Sunday, 29 Sep 1985 Volume 3 # 131 - Ed] KSL/Symbolic Systems Resources Group Tom Rindfleisch and Bill Yeager Stanford University This is the first of several SIGLunches this fall that will summarize work in each of the five sublabs of the Stanford Knowledge Systems Laboratory (KSL), including the Heuristic Programming Project, HELIX Group, Medical Computer Science Group, Logic Group, and Symbolic Systems Resources Group (SSRG). This week's talk will consist of a brief overview of the KSL as an AI laboratory and a survey of SSRG research and development activities. Since 1980, the computing environment for KSL research has been moving slowly away from central time-shared mainframes (like the SUMEX 2060 and VAX) toward networked Lisp workstations. Improvements in workstation performance, falling prices, better packaging, and a wider vendor selection are now accelerating this transition. Over the next five years, we are proposing to phase out the SUMEX research mainframes so that all KSL computing will be workstation-based -- not only research program development but common tasks like text processing, mail, file management, and budgeting. This raises several important issues that will require a community system software effort comparable to that in the 1970's that led to the current TOPS-20 and UNIX environments. How can the user computing environment be improved using workstation bitmapped graphics and AI methods for more intelligent systems/applications programs? How can user displays connect flexibly to workstations -- from home, over remote networks like ARPANET, and locally over Ethernet? How can the considerable computing power distributed among many workstations be combined to support individual user tasks? What are the impacts on network protocols and services (file servers, gateways, printing, etc.) of large numbers of workstations? ------------------------------ Date: Tue 24 Sep 85 10:35:38-PDT From: Terry Winograd Subject: Seminar Series - Software Environments (CSLI) [Copied from AIList Digest Sunday, 29 Sep 1985 Volume 3 # 131 - Ed] New project meeting on environments Mondays 1-2 in the trailer classroom, Ventura [Future meetings will be from 12 to 1:15.] Beginning Monday, Sept. 30 there will be a weekly meeting on environments for working with symbolic structures (this includes programming environments, specification environments, document preparation environments, "linguistic workstations", and grammar- development environments). As a part of doing our research, many of us at CSLI have developed such environments, sometimes as a matter of careful design, and sometimes by the seat of the pants. In this meeting we will present to each other what we have done, and also look at work done elsewhere (both through guest speakers and reading discussions). The goal is to look at the design issues that come up in building environments and to see how they have been approached in a variety of cases. We are not concerned with the particular details ("pop-up menus are/aren't better than pull-down menus") but with more fundamental problems. For example: What is the nature of the underlying structure the environment supports: chunks of text? a data-base of relations? a tree or graph structure? How is this reflected in the basic mode of operation for the user? How does the user understand the relation between objects (and operations on them) that appear on the visible representation (screen and/or hardcopy) and the corresponding objects (and operations) on some kind of underlying structure? How is this maintained in a situation of multiple presentations (different views and/or multiple windows)? How is it maintained in the face of breakdown (system failure or catastrophic user error in the middle of an edit, transfer, etc.)? Does the environment deal with a distributed network of storage and processing devices? If so, does it try to present some kind of seamless "information space" or does it provide a model of objects and operations that deals with moving things (files, functions, etc.) from one "place" to another, where different places have relevant different properties (speed of access, security, shareability, etc.)? How is consistency maintained between separate objects that are conceptually linked (source code and object code, formatter source and printer-ready files, grammars and parse-structures generated from them, etc.)? To what extent is this simply left to user convention, supported by bookkeeping tools, or automated? What is the model for change of objects over time? This includes versions, releases, time-stamps, reference dates, change logs, etc., How is information about temporal and derivational relationships supported within the system? What is the structure for coordination of work? How is access to the structures regulated to prevent "stepping on each other's toes"? to facilitate joint development? to keep track of who needs to do what when? Lurking under these are the BIG issues of ontology, epistemology, representation, and so forth. Hopefully our discussions on a more down- to-earth level will be guided by a consideration of the larger picture and will contribute to our understanding of it. The meeting is open to anyone who wishes to attend. Topics will be announced in advance in the newsletter. The first meeting will be devoted to a general discussion of what should be addressed and to identifying the relevant systems (and corresponding people) within CSLI, and within the larger (Stanford, Xerox, SRI) communities in which it exists. ------------------------------ Date: 27 Sep 1985 20:03-CST From: leff%smu.csnet@CSNET-RELAY.ARPA Subject: Bibliographies [Copied from AIList Digest Sunday, 29 Sep 1985 Volume 3 # 130 - Ed] You have probably noticed the announcement of the new tech report list. [...] The thing that started me on this was the response of AIList readers to my lists of tech reports. >From what filled up my mailbox, it was obvious that many if not most of your readers were not seeing the tech report lists and a substantial fraction of those did not even know that tech reports existed! Hopefully this list will serve a useful function for everyone. It was something that should have been done a long time ago. [...] I have increased the number of magazines from which my bibliographies (type 1) are drawn. We now have added ComputerWorld as well as a few minor magazines. ComputerWorld did a very good job on IJCAI-85 and I found material there that was no place else. [...] According to bib, we now have 430 documents sent to you since I changed formats to machine readable. This does not include information sent to you in other formats. [I would like to thank Laurence for providing his services to AIList and the net community. It's a heck of a hobby, but he does a great job. -- KIL] [We are also grateful! In IRList I will copy only those items I think will be of interest to IRList readers. - Ed] ------------------------------ Date: Wed 25 Sep 85 16:54:08-PDT From: Emma Pease Subject: New CSLI Reports [Copied from AIList Digest Sunday, 29 Sep 1985 Volume 3 # 129 - Ed] NEW CSLI REPORTS Report No. CSLI-85-31, ``A Formal Theory of Knowledge and Action'' by Robert C. Moore, and Report No. CSLI-85-32, ``Finite State Morphology: A Review of Koskenniemi'' by Gerald Gazdar, have just been published. These reports may be obtained by writing to David Brown, CSLI, Ventura Hall, Stanford, CA 94305 or Brown@SU-CSLI. ------------------------------ Date: 27 Sep 1985 19:49-CST From: leff%smu.csnet@CSNET-RELAY.ARPA Subject: Recent Technical Reports [Copied from AIList Digest Sunday, 29 Sep 1985 Volume 3 # 129 - Ed] Addresses for requests: Carnegie Mellon University, CS Department, Pittsburgh, PA 15213 Stanford University, Department of Computer Science, Stanford, CA 94305 %A Anne von der Leith Gardner %T An Artificial Intelligence Approach to Legal Reasoning %R STAN-CS-85-1045 %I Stanford University Department of Computer Science %D JUN 1984, 205 pages (microfiche only $2.00_ %A Masaru Tomita %T An Efficient Context-free Parsing Algrithm for Natural Languages and its Applications %D MAY 1985 %I Carnegie Mellon Computer Science %A Gary L. Bradshaw %T Learning to Recognize Speech Sounds: A Theory and Model %D JUN 1985 %I Carnegie Mellon Computer Science %A Masaru Tomita %T Feasibility Study of Personal/Interactive Machine Translation Systems %D JUL 1985 %I Carnegie Mellon Computer Science %A Jaime G. Carbonnell %A Masaru Tomita %T New Approaches to Machine Translation %D JUL 1985 %I Carnegie Mellon Computer Science %A Robert E. Frederking %T Syntax and Semantics in Natural Language Parsers %D MAY 1985 %I Carnegie Mellon Computer Science %A Jeannette M. Wing %A Farhad Arbab %T Geometric Reasoning: A New Paradigm for Processing Geometric Information %D JUL 1985 %I Carnegie Mellon Computer Science ------------------------------ Date: 27 Sep 1985 20:02-CST From: leff%smu.csnet@CSNET-RELAY.ARPA Subject: Recent Articles [Copied from AIList Digest Sunday, 29 Sep 1985 Volume 3 # 129 - Ed] %A Elain Marsh %A Carol Friedman %T Transporting the Linguistic String Project from a Medical to a Navy Domain %J ACM Transactions on Office Information Systems %D APR 1985 %A Jonathan Slocum %A Carol F. Justus %T Transportability to Other Languages: The Natural Language Processing Project in the AI Program at MCC %J ACM Transactions on Office Information Systems %D APR 1985 %A Samuel S. Epstein %T Transportable Natural Language Processing Through Simplicity %J ACM Transactions on Office Information Systems %D APR 1985 %A Bozena Hernisz Thompson %A Frederick B. Thompson %T ASK is Transportable in Half a Dozen Ways %J ACM Transactions on Office Information Systems %D APR 1985 %A Fred J. Damerau %T Problems and Some Solutions in Customization of Natural Language Database Front Ends %J ACM Transactions on Office Information Systems %D APR 1985 %A Carole D. Hafner %A Kurt Golden %T Portability of Syntax and Semantics in Datalog %J ACM Transactions on Office Information Systems %D APR 1985 [Note: Thanks to Bruce Ballard for editing this special issue of TOOIS, and to the authors for giving a nice rounded presentation. - Ed] %A Appelbaum %A Ruspini %T ARIES: A Tool for Inference Under Conditions of Imprecision and Uncertainty %J BOOK15 %A Klinger %T Search Processes for the Application of Artificial Intelligence %J BOOK15 %A Rosenberg %T ERIK, An Expert Ship Message Interpreter: New Mechanism for Flexible Parsing %J BOOK15 %A Obermeier %A de Hilster %T DIID -- A Data-independent Interface for Databases -- The AI Perspective %J BOOK15 %A Roger Schank %A Steven Shwartz %T The Role of Knowledge Engineering in Natural Language Systems %J BOOK17 %A Jaime Carbonnel %T The Role of User Modeling in Natural Language Interface Design %J BOOK17 %A Gerald Hice %A Stephen Andriole %T Artificial Intelligent Videotex %J BOOK17 %A Dexter Fletcher %T Intelligent Instructional Systems in Training %J BOOK17 ------------------------------ Date: 27 Sep 1985 20:00-CST From: leff%smu.csnet@CSNET-RELAY.ARPA Subject: Recent Articles [Extracted from AIList Digest Sunday, 29 Sep 1985 Volume 3 # 130 - Ed] %A A. Lansner %A O. Ekeberg %T Reliability and Speed of Recall in an Associate Network %J IEEE Transactions on Pattern Analysis and Machine Intelligence %V PAMI-7 %N 4 %D JUL 1985 %P 490-498 %A Sol Libes %T Bytelines %J Byte %D SEP 1985 %P 420 %V 10 %N 9 %X Kurzweil Applied Intelligence speech recognition KVS-3000 %X Kurzweil has introduced a KVS-3000 that can handle 1000 words continuous speech with 100 per cent accuracy. It is selling at $3000.00 in quantity and comes in PC, multibus and RS232C versions. It is speaker adaptive and its performance increases the more it talks with the same user. %A Jean Renard Ward %%A Barry Blesser %T Interactive recognition of Handprinted Characters for Computer Input %J IEEE Computer Graphics and Applications %V 5 %N 9 %P 24-37 %K Character Recogniton Pencept %X discusses the human interface issues once you have character recognition on your computer; i. e. how best to interface handwritten character recognition with your product. %A S. Jerrold Kaplan %T Designing a Portable Natural Language Database Query System %J ACM Transactions on Database Systems %V 9 %N 1 %D MAR 1984 %P 1-19 %A C. Hornsby %A H. C. Leung %T The Design and Implementation of a Flexible Retrieval Language for a Prolog Database System %J SIGPLAN %V 20 %N 9 %D SEP 1985 %P 43-51 %X implementation of a database management system in PROLOG %A Jeffry Beeler %T Symantec package out %J ComputerWorld %V 19 %N 38 %P 12 %D SEP 23, 1985 %K Symantec natural language microcomputer data base system interface %X a new product which uses natural language to interface with a database ------------------------------ END OF IRList Digest ********************