IRList Digest Saturday, 18 Jan 1986 Volume 2 : Issue 2 Today's Topics: Query - Dictionary and Lexicon Efforts, and IR Discussion - Oxford Text Archive address Suggested texts for Natural Language Processing Course Announcement - ACM Tran. on Office Inf. Sys.: news from Editor References - North-Holland 1985/6 catalog extract ---------------------------------------------------------------------- Date: Thu, 16 Jan 86 14:40:23 GMT From: Bran Boguraev Subject: machine-readable dictionaries Acknowledge-to:bkb@uk.ac.cam.cl Hello, Somewhat belatedly [post Xmas vac] I came across your message on the IRList to Lynn Bates re machine-readable dictionaries. I have been working with Longman's dictionary of contemporary English for about a year and a half now, and feel that it is possibly the best of existing dictionaries in machine-readable form. Among other nice features [like the limited, and relatively small, set of basic defining words, together with guaranteed non-circularity in the word sense definitions], it utilises a very elaborate system of grammar codes, which defines idiosyncratic syntactic behaviour of individual word senses; in addition the dictionary contains semantic tagging [independent of, and in fact orthogonal to, the word sense definitions] and subject coding. An awful lot of stuff [some of it only available on the tape], potentially very useful to computational linguists and IR types [I am guessing about the IR applications, which will bring me to the point of this message]. Incidentally, the grammar codes in the Longman dictionary are a rational reconstruction / redesign / elaboration of the more rudimentary coding system employed by OALDCE; also the typesetting tape is in a considerably cleaner and more directly usable form than the OALDCE [we didn't have to do the massive editing and proof-reading, for example, that Roger Mitton will have spent a whole year on by the end of this spring]. I am working on a paper for a forthcoming workshop on "automating the lexicon"; it will look at the relevance and use of dictionary entries [from existing machine readable dictionaries] for computational linguistics research. I am planning to do a survey of how information available in machine-readable form is used, directly or indirectly, to perform a variety of tasks in different application environments - parsing within strict grammatical frameworks, (robust) semantic interpretation, phrasal analysis, information retrieval, and so forth... I would also like to look at the ways people use more or less "raw" dictionaries to build customised lexicons for their particular application. I would be happy to post the paper, in a month's or so time, if anyone out there is interested. You mention in a different message to John Roach that you are working on building a NL lexicon from Collins and OALDCE. I would greatly appreciate any material - notes, comments, papers and/or drafts - which address or even touch upon these and related issues. I would also appreciate some sort of indication how the IR community uses, and would like to use, information obtained from MRDs for IR research --- where / how can I get pointers to this?. You also mention work by Evens, Ahlswede and others to parse the dictionary entries. Could you point me at some references for this or put me in touch with them? Here in Cambridge we have a set of programs to analyse word sense definitions and extract their meaning in the form of small semantic networks; these are subsequently used for building sort hierarchies and applied to various NL [front end] applications. I apologise for a long-winded request for information; I know very little about IR work and have been following the IRList only for the last couple of months. Please post all or subset(s) of this message to the list, if you think anyone would be interested, or it will get a response. I have papers describing our work with the Longman's tape here in more detail. Thanks for your help, Bran Boguraev University of Cambridge Computer Laboratory Corn Exchange Street Cambridge CB2 3QG England [Bran: Thank you for your very humble message! I would welcome papers on your work, which I see relates well to a number of U.S. activities. To address some of your comments: 1) I have heard about the Longman's tapes but alas they rejected my request for permission to use a copy for research. 2) The Virginia Tech effort has just been written up for some conferences and will be described later this winter in some technical reports. I can add you to distribution list if you like. 3) Some references follow. Regards, Ed] ----- references that may be of interest ------- %A Thomas Ahlswede %D 1981 %T A Linguistic String Grammar for Adjective Definitions from Webster's Seventh Collegiate Dictionary %I Illinois Inst. of Tech. %R MS Thesis %C Chicago, IL %A T. Ahlswede %A M. Evens %D April 1983 %T Generating a Relational Lexicon from a Machine-Readable Dictionary %J Proceedings SRI Workshop on Machine-Readable Dictionaries %C Menlo Park, CA %A Thomas Ahlswede %A Martha Evens %T A Lexicon for a Medical Expert System %J Proceedings Workshop on Relational Models of the Lexicon %C Stanford Univ. %D June 29, 1984 %O To appear as chapter in book ed. by Martha Evens. %A Thomas E. Ahlswede %D July 1985 %T A Tool Kit for Lexicon Building %P 268-276 %J Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics %C Chicago, Illinois %A Robert Amsler %D 1980 %T The Structure of the Merriam-Webster Pocket Dictionary %R Dissertation %I Univ. of Texas at Austin %C Austin, TX %A Robert A. Amsler %T Machine-Readable Dictionaries %P 161-209 %E Martha E. Williams %B Annual Review of Inf. Science and Tech. Volume 19 %I Knowledge Industry Publications %D 1984 %C White Plains, NY %A Martha W. Evens %A Raoul N. Smith %D 1978 %T A Lexicon for a Computer Question-Answering System %J Amer. J. Comp. Ling. %V 4 %N 83 (Microfiche) %A Martha W. Evens %A Bonnie E. Litowitz %A Judith A. Markowitz %A Raoul N. Smith %A Oswald Werner %T Lexical-semantic relations: a comparative survey %I Linguistic Research, Inc. %C Carbondale, Ill. %D 1980 ------------------------------ From: Stavros Macrakis Date: Wed, 15 Jan 86 13:17:33 EST In-Reply-To: Ed Fox's message of Sun, 15 Dec 85 18:13 EST Subject: Oxford Text Archive Thanks again for the information on the Oxford Text Archive. I just received their information packet. By the way, you might want to mention to others that their net address from Arpanet is Archive%uk.ac.ox.vax3@ucl-cs.arpa @cs.ucl.ac.uk -s ------------------------------ From: WALLFESH%UCONNVM.BITNET%wiscvm.wisc.edu@CSNET-RELAY Date: Fri, 17 Jan 1986 09:19:34 EST Subject: Re: Text for NLP course "Strategies for Natural Language Processing", by Wendy Lenhert, might be an ideal text. I think its published by LEA. Last sping, the grad NLP course here used "In-Depth Understanding" by Michael Dyer and "Dynamic Memory" by Roger Schank. The course was directed toward students with some NLP background. This spring, "Inside Computer Understanding", and possibly another book, will be used. I'm not sure whether the focus of the course has changed. Sande Wallfesh ------------------------------ From: "Robert B. Allen at petrus.UUCP" Date: Thu, 9 Jan 86 15:08:37 est The January, 1986 issue of the ACM Transactions on Office Information Systems has several papers which may be of interest to readers of irlist: R.H. Trigg and M. Weiser TEXTNET: A Network-Based Approach to Text Handling W.P. Jones and S.T. Dumais Spatial and Symbolic Filing J. Donahue and J. Widom Whiteboards: A Graphical Database Tool S. Hudson and R. King A System for Constructing Semantically Knowledgeable Editors P. Martin and D. Tsichritzis Complete Logical Routings in Computer Mail Systems I also have the pleasure of announcing that Bruce Croft has joined the Editorial Board. Bob Allen ------------------------------ From: fox (Ed Fox) Date: Sat, 11 Jan 86 15:58:13 est Subject: North-Holland 1985/6 Catalog - some entries of interest %E Rickheit, G. %E Strohner, H. %T Inferences in Text Processing (Advances in Psychology, 29) %D 1985 %Z This volume critically evaluates the present state of research in the domain of inferences in text processing and indicates new areas of research. The books is structured around the following theoretical aspects: ... representational ... procedural ... contextual %E Le Ny, J.-F. %E Kintsch, W. %T Language and Comprehension %D 1982 %Z The study of language comprehension and production has become the focus of much recent work in psychology. This book presents and overview of current developments in this area. ... cognitive (lexical, conceptual, imaged) structures, importance attached to syntactic versus semantic processing, emphasis on reading activity or on pragmatics, modelization and simulation ... %E Flammer, A. %E Kintsch, W. %T Discourse Processing (Advances in Psychology, 8) %D 1982 %Z ... 46 selected and edited contributions from the Intl Symp, held in Fribourg in 1981, and represents a truly international overview of the developments in research on written and oral discourse. The main themes are: text structure, coherence, inference, memory processes, attention and control, goal perspectives, and educational implications %E Bara, B.G. %E Guida, G. %T Computational Models of Natural Language Processing (Fundamental Studies in Computer Science, 9) %D 1984 %Z ... illustration of models for natural language processing, and the discussion of their role in the development of computational studies o language. ... intended for scholars acquainted with natural language research, but not necessarily specialists in all the disciplines included in the area of natural language processing %E Charniak, E. %E Wilks, Y. %T Computational Semantics (Fund. Studies in Comp. Sci., 4) %D 1976 %Z ... frank discussion of the fundamental disputes ... by a number of expersts ... alternative approaches to computational semantics ... links ... to discussions in linguistics, philosophy, and psycholinguistics %E Dahl, V. %E Saint-Dizier, P. %T Natural Language Understanding and Logic Programming %D 1985 %Z ... most recent developments in computational linguistics as viewed from a logic programming standpoint. ... Topics covered include formal representations of natural language, (logic) grammar formalisms, methods for analysis, specific linguistic problems such as coordination, and questions such as the convenience of programming in natural language. ******************************and many others************************ ------------------------------ END OF IRList Digest ********************