IRList Digest Tuesday, 9 December 1986 Volume 2 : Issue 68 Today's Topics: Query - Information resource management Discussion - bib/refer software development Abstracts - Knowledge-based software components catalogue - Tech Reports on IR from Virginia Tech in 1986 CSLI - Quantified and referring NPs, pronouns anaphora News addresses are ARPANET: fox%vt@csnet-relay.arpa BITNET: foxea@vtvax3.bitnet CSNET: fox@vt UUCPNET: seismo!vtisr1!irlistrq ---------------------------------------------------------------------- Date: Wed, 3 Dec 86 22:27 EST From: V6M@PSUVM.bitnet Subject: IRM notes, hints, research, rumors ANYTHING Dear Professor, I have a PRESSING need for help with the buzz words "INFORMATION RESOURCE MANAG EMENT ". I'd appreciate any help that the group could give to me on this topic, which was a hot term around 1978-81 but which seems to have died out. SeeAlso.. Data Resource Management End User Computing Information Center (used incorrectly by DPMA types to be used as a synonym for End User Computing) RelatedTerm..Thesaurus BTW...anybody using a thesaurus in building or querying commercial DBMS in a business, application environment???? HELP!!!! Vince Marchionni 215 337 1400 ext 274 ------------------------------ Subject: Re: Six character limit in indxbib [really bib/refer] Date: 29 Nov 86 12:35:56 +1000 (Sat) Message-Id: <10.533612156@mulga> From: John Shepherd > In IRList Digest V2 #53, Mitchell Wyle writes: > > For starters, I shall use Unix's addbib, sortbib, roffbib, indxbib, > lookbib suite of programs. The manuals say that one can change the > options of indxbib when it stems, stop lists etc. I have found the list > of the 100 most common words (/usr/lib/refer/eign), but I can't > figure out how to change the stemming from 6 characters. I perused the sources here and it looked like the 6 was hardwired into the "mkey" program (which is the component of "indxbib" which scans the bib files and finds the keys). It seems that "indxbib" passes any options it gets straight on to "mkey", but I couldn't find any documentation on "mkey". It does have a number of options (e.g. to set minimum number of chars to consider in keys), but none to extend the key length. > In IRList Digest V2 #61, David Brown writes: > > Any information on this would be very welcome here too - we are currently > using indxbib/lookbib etc. and the lower-level utilities they call as the > basis of our (small and rudimentary) online catalogue, but are having > problems with false drops caused by this 6 character limit. We have done quite a bit of work with bibliographies here at the University of Melbourne over the last few years. We initially started using the "refer" system but after finding that it couldn't quite do what we wanted in some places, we eventually switched over to Tim Budd's "bib" system, which we found was more flexible and seemed easier to use than "refer". Note that the data formats they use are (with a few minor exceptions) identical. We also thought that "lookup" ("bib"s version of "lookbib") would form a useful basis for an on-line retrieval system, and Isaac Balbin (whom you may know from his Logic Programming Bibliography) wrote a nice interactive front-end called "seebib" (it also knows how to deal with "refer" databases). It allows you to scan forwards and backwards through a list of answers to a bibliographic query, and to save some or all of the matches in files. Considering their simplicity, both these systems do a remarkable job as data retrieval packages. However, they still have a number of disadvantages, not the least of which is the necessity to rebuild the entire index each time a reference is added. Has anyone had any experience with systems with similar functionality to bib/refer (particularly their usefulness as troff pre-processors) but which use more general database systems? - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - John Shepherd Department of Computer Science, University of Melbourne, CSNET: jas%mulga.oz@australia Parkville, 3052, ARPA: jas%mulga.oz@seismo.css.gov AUSTRALIA UUCP: ...!munnari!mulga!jas ------------------------------ Date: Fri Nov 28 12:48 EST 1986 From: fox To: seismo!ihnp4!hoqam!wbf Subject: reusability Bill: There is an interesting technical report about software reusability and IR that you might want to obtain. A Knowledge-Based Software Components Catalogue Ian Sommerville and Murray Wood Research report CS/ST/5/86 Software Technology Research Group Department of Computer Science University of Strathclyde Glascow, Scotland Tel: 041-552 4400 Abstract: There is currently a growing interest in the reuse of previously designed, coded, tested, and documented software, primarily for reasons of economy and reliability. One of the major problems in moving to a paradigm based on reuse rather than re-invention is the storage and retrieval of software components. In this paper we argue that conventional keyword based techniques are insufficient as a method of describing software components for storage and retrieval in a software components catalogue. Our approach to the problem of software component description is based on an attempt to identify the basic concepts of the software component domain and the relationships between those concepts. These concepts and their relationships can be represented by what we have termed software function frames. We describe a prototype implementation of a cataloging system based on these ideas. ------------------------------ Date: Sat Nov 29 07:34 EST 1986 From: fox Subject: IR Related Tech Reports for 1986, Virginia Tech Dept of Comp. Sci. Copies of the following may be ordered by sending to emtront%vt@csnet-relay.arpa emtront@vtcs1.bitnet Elizabeth Tront, Dept. of Computer Science, Virginia Tech Blacksburg VA 24061 A Comparison of Two Methods for Soft Boolean Operator Interpretation in Information Retrieval Edward A. Fox, Sharat Sharan TR 86-1 ABSTRACT Information retrieval systems generally are given Boolean logic queries by users or search intermediaries, in order that an efficient and effective search for relevant documents can be automatically carried out. Previous work has shown that an extended interpretation of Boolean queries can dramatically improve search effectiveness. Experimental evidence is given on the relative performance of the p-norm method and a parameterized fuzzy-logic approach suggested by Paice. Regression analysis supports expected results of parameter settings and gives further insight into why the p-norm scheme is superior. A Knowledge-Based System for Composite Document Analysis and Retrieval: Design Issues in the CODER Project Edward A. Fox, Robert K. France TR 86-6 ABSTRACT The CODER (COmposite Document Expert/Extended/Effective Retrieval) Project aims at applying a variety of methods developed in the realm of artificial intelligence to improve the performance of information retrieval systems. Logic programming, expert systems, blackboards, user models, natural language processing, and knowledge representation will be applied to handle a collection of more than three years of issues of the AIList ARPANET Digest. This paper gives background, describes related work, explains the design principles and architecture, and closes with future plans. Architecture of an Object-Oriented Expert System for Composite Document Analysis, Representation, and Retrieval Edward A. Fox, Robert K. France TR 86-10 ABSTRACT The CODER project is a multi-year effort to investigate how best to apply artificial intelligence methods to increase the effectiveness of information retrieval systems. The use of individually tailored specialist experts coupled with standardized blackboard modules for communication and control, and external knowledge bases for maintenance of factual world knowledge, allows for quick prototyping and flexibility under change. The system is structured as a set of communicating modules, designed under an object-oriented paradigm, using TCP/IP, UNIX, Mu-Prolog, and C. Expert Retrieval for Computer Message Systems Edward A. Fox TR 86-13 ABSTRACT This paper describes how information storage and retrieval and arti- ficial intelligence methods can be integrated with modern computers and net- works to provide access for broad classes of users to archives of electronic: mail, digests, and bulletin boards. A status report is given on the COmposite Document Expert/extended/effective Retrieval project, designed to employ communities of experts, operating on multiple communicating computers, for free text analysis, indexing, and retrieval. Details of the document-type expert are included to illustrate the approach. A Call for Integrating Advanced Information Retrieval Models with CD-ROM / Microcomputer Systems Edward A. Fox TR 86-14 ABSTRACT Recent advances in computer hardware and storage devices will allow inexpensive personal systems to be used by individuals to rapidly access vast collections of text. Research into database management, artificial intelligence, and information retrieval can be applied to develop advanced retrieval systems. Retrieval models based on browsing, extended Boolean, vector, probabilistic, and artificial intelligence approaches have all been advanced as more effective for searchers than conventional methods. The CODER project aims to integrate these techniques. Ultimately it is hoped that CD-ROM based information retrieval systems will be released with many of the capabilities mentioned. Building the CODER Lexicon: The Collins English Dictionary and Its Adverb Definitions Edward A. Fox, Robert C. Wohlwend, Phyllis R. Sheldon, Qi Fan Chen, and Robert K. France TR 86-23 ABSTRACT In order to support some of the processing desired in the CODER (COmposite Document Expert/extended/effective Retrieval) project, and to allow experimentation in information retrieval and natural language processing, a lexicon was constructed from the machine readable Collins Dictionary of the English Language. Characteristics of the dictionary, conversion from typesetter form to Prolog relations, and comparisons of the result with an earlier effort for Webster's Seventh New Collegiate Dictionary are discussed. Finally, a grammar for adverb definitions is presented, together with a description of defining formula that usually indicate the type of the adverb. Development of the CODER System: A Test-bed for Artificial Intelligence Methods in Information Retrieval Edward A. Fox TR 86-40 ABSTRACT The CODER (COmposite Document Expert/extended/effective Retrieval) system is a test-bed for investigating the application of artificial intelligence methods to increase the effectiveness of information retrieval systems. Particular attention is being given to analysis and representation of heterogeneous documents, such as electronic mail digests or messages, which vary widely in style, length, topic, and structure. Since handling passages of various types in these collections is difficult even for experimental systems like SMART, it is necessary to turn to other techniques being explored by information retrieval and artificial intelligence researchers. The CODER system architecture involves communities of experts around active blackboards, accessing knowledge bases that describe users, documents, or lexical items of various types. Most of the lexical knowledge base construction work is now complete, and experts for search and temporal reasoning can perform a variety of processing tasks. User information and queries are being gathered, and the first prototype is nearly complete. It appears that a number of artificial intelligence techniques are needed to best handle such common, but complex, document analysis and retrieval tasks. ------------------------------ Date: Sat, 15 Nov 86 01:28:42 est From: EMMA@CSLI.STANFORD.EDU Subject: CSLI Calendar, November 13, No. 7 [Extract - Ed] NEXT WEEK'S SEMINAR Quantified and Referring Noun Phrases, Pronouns Anaphora Mark Gawron and Stanley Peters November 13 and 20, 1986 A variety of interactions have been noted between scope ambiguities of quantified noun phrases, the possibility of interpreting pronouns as anaphoric, and the interpretation of elliptical verb phrases. Consider, for example, the following contrast, first noted in Ivan Sag's 1976 dissertation. (1) John read every book before Mary did. (2) John read every book before Mary read it. The second sentence is interpretable either to mean each book was read by John before Mary, or instead that every book was read by John before Mary read any. The first sentence has only the former interpretation. The seminar will describe developments in situation theory pertinent to the semantics of various quantifier phrases in English, as well as of `referring' noun phrases including pronouns, and of anaphoric uses of pronouns and elliptical verb phrases. We aim to show how the theory of situations and situation semantics sheds light on a variety of complex interactions such as those illustrated above. (This seminar is a continuation of the seminar held on November 13.) ------------------------------ END OF IRList Digest ********************