IRList Digest Monday, 28 November 1988 Volume 4 : Issue 56 Today's Topics: Query - CD-ROMs of Value Discussion - CD-ROM Network Server - Pedagogical Models for IR Announcement - Workshop on Evaluation of NLP Systems - Staffing at NSF - Times Collection for IR Research - New CSLI Publications News addresses are Internet: fox@vtopus.cs.vt.edu BITNET: foxea@vtcc1.bitnet (replaces foxea@vtvax3) ---------------------------------------------------------------------- Date: Tue, 11 Oct 88 11:16 N From: Subject: CD-ROM data of value? Dear Ed, ... CD-Rom interfaces for microcomputers have reached the point where they become affordable for small departments and for home use. However, is there already any real use for them? Which data CDs are at this moment available (what is it called, what is on it, is it in a standard format, where can you get it, what does it cost)? ========= An extension to this: what data will be on the Virginia Disk(s)? Best wishes, Hans van Halteren COR_HVH @ HNYKUN52.BITNET [Note: Virginia Disc One is still in process - we have added a Time collection from Cornell via UMass, a Rutgers collection from Tefko Saracevic, and have almost all of the LISA collection from P. Willet - are still waiting for last piece of that. Regarding other CD-ROMs - I think this year marks the turning point in price and availability - many new titles coming out in High Sierra or ISO form at lower and lower prices. - Ed] ------------------------------ Date: Tue, 11 Oct 88 12:12:20 PDT From: PAAAAA7@CALSTATE Subject: CD Roms and Such Mr. Fox; I was just given a copy of your CD-Rom letter to Dr. Dick Botting, and thought I would pass along a product I just found out about. Meridian Data Inc. markets an interface to plug a High Sierra CD-ROM into a Novell Network. Sounds like something we all could use! If you are interested, please let me know and I will grab the address for you. -Rich McGee Computer Center CSUSB ------------------------------ Date: Fri, 30 Sep 88 20:11 PDT To: Ed A Fox From: Marcia Bates Subject: Pedagogical models for IR [Note: see earlier discussion in issues 39, 40, 46 - Ed.] I just came onto IRLIST, so I do not know the text of the original question regarding pedagogical models. However, I have some suggestions that may be of interest. First, one must ask whether the interest is in searches only within a given automated system, or in the overall process of retrieving info. Some of the most important questions a searcher--whether end user or expert--must decide in a real-life search are which system to use and whether to use a manual or online system.Some of the most serious errors made by librarian-students and by practitioners involve initiating searches on systems that are quite inappropriate for the query in hand--looking for statistical data on biblio- graphic databases, for example. If these broader questions are of interest, then there is a growing literature both in the online area and in the area of "bibliographic instruction" in the library field. IRLIST members are probably more familiar with the online literature than with BI. For BI, Constance Mellon's 1987 book Bibliographic Instruction: The Second Generation is a good place to start. Some otherinteresting recent articles dealing with college students are: Stoan, Stephen K. "Research in Library Skills..." College and Research Libraries, 45 (Mar 84): 99-109. Dunn, Kathleen "Psychological Needs and Source Linkages in Undergraduate Information Seeking Behavior." College and Research Libraries 47 (Sept 86): 475-481. Also book and articles by Nigel Ford, who has dealt in great detail with higher education students. If, on the other hand, interest is in automated systems primarily, or in the design of search interfaces, a number of my own articles deal with many of these issues (often dealing with both manual and online systems simultaneously), with models or model fragments being more or less explicit. See my: "Information Search Tactics" Journal of ASIS 30(July 1979): 205-214. Describes classes of search models, classes of tactics, and 29 particular search tactics. "Idea Tactics" JASIS 30 (Sept 1979): 280-289. 17 additional tactics. "Search Techniques." Annual Review of Information Science and Technology 16 (1981): 139-169. Reviews lib/info sci literature on psychology of searching to that date "Locating Elusive Science Information: Some Search Techniques" Special Libraries 75 (Apr 84): 114-120. Drawing on model of the scientific publication cycle, the searcher is shown how to locate info thought to be inaccessible. "The Fallacy of the Perfect 30-item Online Search." RQ 24 (Fall 84): 43-50. Discusses psychological traps for end users and intermediaries that degrade quality of online searches. "An Exploratory Paradigm for Online Information Retrieval." In Intelligent Information Systems for the Information Society, ed. by B.C. Brookes. Amsterdam: Elsevier, 1986, p. 91-99. Model of broad classes of info seeking plus discussion of two major types of search in automated systems. "Subject Access in Online Catalogs: A Design Model." JASIS 37 (Nov 86): 357-376. Model incorporating psychological and linguistic factors in design of subject search interface for online catalogs.Drawing upon recent research and thinking, departs dramatically from a number of traditional assumptions. "How to Use Information Search Tactics Online." ONLINE 11 (May 87): 47-54. Groups and applies search tactics in online environment to help at several states of search, as well as in cases where too many or too few items retrieved. "How to Use Controlled Vocabulary More Effectively in Online Searching. ONLINE, Nov. 88, in press. Argues that features of various indexing and classification systems used in databases must be taken into account in online search formulation, i.e., different indexing systems require different strategies, so searcher must be able to recognize controlled vocabulary types. See also David Bawden, "Information Systems and the Stimulation of Creativity" Journal of Info Science 12 (86): 203-216. --Marcia J. Bates ifq0mjb@uclamvs.bitnet (Graduate School of Library and Information Science 120 PLB UCLA Los Angeles, CA 90024-1520 USA) ------------------------------ Date: Fri, 2 Sep 88 12:19:15 EDT From: palmer@BURDVAX.PRC.UNISYS.COM Subject: nl evaluation workshop CALL FOR PARTICIPATION Workshop on Evaluation of Natural Language Processing Systems Dec 8-9 Wayne Hotel, Wayne, PA (Philadelphia) There has been much recent interest in the difficult problem of evaluating natural language systems. With the exception of natural language interfaces there are few work- ing systems in existence, and they tend to be concerned with very different tasks and use equally different techniques. There has been little agreement in the field about training sets and test sets, or about clearly defined subsets of problems that constitute standards for different levels of performance. Even those groups that have attempted a meas- ure of self-evaluation have often been reduced to discussing a system's performance in isolation - comparing its current performance to its previous performance rather than to another system. As this technology begins to move slowly into the marketplace, the need for useful evaluation tech- niques is becoming more and more obvious. The speech com- munity has made some recent progress toward developing new methods of evaluation, and it is time that the natural language community followed suit. This is much more easily said than done and will require a concentrated effort on the part of the field. There are certain premises that should underly any dis- cussion of evaluation of natural language processing sys- tems: (1) It should be possible to discuss system evaluation in general without having to state whether the pur- pose of the system is "question-answering" or "text processing." Evaluating a system requires the definition of an application task in terms of I/O pairs which are equally applicable to question- answering, text processing, or generation. (2) There are two basic types of evaluation: a) "black box evaluation" which measures system performance on a given task in terms of well-defined I/O pairs; and b) "glass box evaluation" which examines the internal workings of the system. For example, glass box per- formance evaluation for a system that is supposed to perform semantic and pragmatic analysis should include the examination of predicate-argument rela- tions, referents, and temporal and causal relations. Given these premises, the workshop will be structured around the following three sessions: 1) Defining "glass box evaluation" and "black box evaluation." 2) Defining criteria for "black box evaluation." A Proposal for establishing task oriented benchmarks for NLP Systems (Session Chair - Beth Sundheim) 3) Defining criteria for "glass box evaluation." (Session Chair - Jerry Hobbs) Several different types of systems will be discussed, including question-answering sys- tems, text processing systems and generation systems. [Note: too late for sending in papers - sorry for not getting this out sooner - I have omitted that section since it is no longer relevant - Ed.] Martha Palmer Unisys Research & Development PO Box 517 Paoli, PA 19301 palmer@prc.unisys.com (215) 648-7228 ------------------------------ Date: Tue, 27 Sep 88 23:41 EDT From: EHRICH@vtcs1.cs.vt.edu Subject: NSF staffing Recently someone asked me where to send something at NSF. Here is the staffing of the CCR (Computer and Computation Research) and IRIS (Information, Robotics, and Intelligent Systems) divisions: CCR: Director: Peter Freeman Deputy Director: Helen Gigley Theory: Errol Lloyd Computer Architecture: Zeke Zalcstein Software Engineering: John Gannon Numeric and Symbolic Computation: Kamal Abdali Software Systems: Thomas Keenan (returning after October 15) IRIS: Director: Y.T. Chien Deputy Director: Bruce Barnes (surprise, folks?) Robotics and Machine Intelligence: Ken Laws Knowledge and Cognitive Systems: Henry Hamburger Database and Expert Systems: Maria Zemankova Interactive Systems: Hal Bamford Information Technology and Organizations: Laurence Rosenberg ------------------------------ Date: Mon, 10 Oct 88 17:00 EDT From: krovetz@UMass Subject: times collection Ed, Just wanted to let you know that I finally did get the old Times collection from Cornell (thanks to Chris Buckley). I have four files: the documents themselves, the queries, a stopword list, and a set of relevance judgements. Chris said he tried SMART on it and got an average performance of 64% precision (averaged over the 25%, 50% and 75% recall levels). This is pretty high, but the collection is also rather small (425 documents and 83 queries). -bob [Note: I am putting that collection and a few others onto Virginia Disc One - thanks to Bob for his perseverence in sending the many files over BITNET, till we have them all - Ed.] ------------------------------ Date: Wed, 12 Oct 88 17:13:00 PDT From: Emma Pease Subject: CSLI Calendar, October 13, 4:4 [Extract - Ed.] ... NEW PUBLICATIONS The following reports have recently been published. They may be obtained, or a full list acquired by writing to Trudy Vizmanos, CSLI, Ventura Hall, Stanford, CA 94305-4115, or publications@csli.stanford.edu. 112. Bare Plurals, Naked Relatives, and Their Kin. Dietmar Zaefferer $2.50 113. Events and ``Logical Form''. Stephen Neale $2.00 114. Backwards Anaphora and Discourse Structure: Some Considerations. Peter Sells $2.50 115. Toward a Linking Theory of Relation Changing Rules in LFG. Lori Levin $4.00 116. Fuzzy Logic. L. A. Zadeh $2.50 117. Dispositional Logic and Commonsense Reasoning. L. A. Zadeh $2.00 118. Intention and Personal Policies. Michael Bratman $2.00 119. Propositional Attitudes and Russellian Propositions. Robert C.Moore $2.50 120. Unification and Agreement. Michael Barlow $2.50 121. Extended Categorial Grammar. Suson Yoo and Kiyong Lee $4.00 122. The Situation in Logic---IV: On the Model Theory of Common Knowledge. Jon Barwise $2.00 123. Unaccusative Verbs in Dutch and the Syntax-Semantics Interface. Annie Zaenen $3.00 124. What Is Unification? A Categorical View of Substitution, Equation and Solution. Joseph A. Goguen $3.50 125. Types and Tokens in Linguistics. Sylvain Bromberger $3.00 126. Determination, Uniformity, and Relevance: Normative Criteria for Generalization and Reasoning by Analogy. Todd Davies $4.50 127. Modal Subordination and Pronominal Anaphora in Discourse. Craige Roberts $4.50 128. The Prince and the Phone Booth: Reporting Puzzling Beliefs. Mark Crimmins and John Perry $3.50 129. Set Values for Unification-Based Grammar Formalisms and Logic Programming. William Rounds $4.00 130. Fifth Year Report of the Situated Language Research Program. free 131. Locative Inversion in Chichewa: A Case Study of Factorization in Grammar. Joan Bresnan and Jonni M. Kanerva $5.00 132. An Information-Based Theory of Agreement. Carl Pollard and Ivan A.Sag $4.00 ------------------------------ END OF IRList Digest ********************