IRList Digest Tuesday, 3 November 1987 Volume 3 : Issue 39 Today's Topics: Discussion - Lexicon development: terms used by library catalog searchers - Barriers to library access Announcement - CELEX lexical databases - ACM CHI '88 Workshop on user interface consistency - Position in advanced software development (relating to IR) COGSCI - Parsing free word order languages - Experience, memory, and reasoning CSLI - External Language and Internal Representation - Situated Automata News addresses are Internet or CSNET: fox@vtcs1.cs.vt.edu BITNET: fox@vtcs1.bitnet ---------------------------------------------------------------------- Date: Wed, 14 Oct 87 10:25:04 CST From: JEFF HUESTIS Subject: LEXICON DEVELOPMENT I've been doing some keyword work with our catalog (an online KWOC index has been the only thing ... possible to do at this time because of other commitments on programmer time) which, of course, included some word counting for development of a "stoplist". The results are, not surprisingly, somewhat different from the distribution found in the Brown Corpus, the top 400 words from which formed our original stoplist. I thought something like this approach would be useful in doing anything with term-weighting. However, it would presumably duplicate the resource you already have in your VTLS data. [Note: we could generate such statistics but it may differ from yours due to the difference between library collections - that might be interesting to see. - Ed] On the other hand, we also have several million log records from user searches in our online catalog. The subject searches would, like the catalog data, be skewed toward information retrieval, as compared with the Brown Corpus, and might be useful as representing the concept space of real people, as opposed to librarians, computer scientists, and other information workers. Again, the value of the data, as I see it, is primarily related to term weighting. Let me know if you're interested in this data, either in raw or condensed form. . . . Of course, given that you're working with AILIST, maybe the Brown Corpus is the most appropriate text for word counting, other than the digests themselves. But comparing the different distributions may still be useful, if someone wants to do it. [Note: this might be of interest to others - has any one developed statistics of this type? Does anyone have comments? One thing I might be able to do is to put up a merged word list with frequency info. from various sources, on one of the CDROMs we will be mastering - let me know if anyone thinks these statistics might be of value for weighting experiments. - Ed] ------------------------------ Date: Sun, 11 Oct 87 23:48:02 EDT From: dws@EDDIE.MIT.EDU (Don W. Saklad) Subject: barriers to library access Subtle pervasive censorship by library officials at our Boston Public Library still continues after the appointment of the new director. Savvy visitors and users ask for documentation such as annual reports and library system manuals so to know the system and thus develop effective and efficient techniques like people who've made careers in libraries. Reference services decline these inquiries and related requests sometimes as whimsical or intrusive. The point is that the 130 year old public library should conserve and maintain archives. Even our library board's public meetings should be accessible, instead of the intimidation that has discouraged civic interest. Also marketing the library should encourage civic participation with comment, criticism, praise and suggestions. BPL public relations seems to be this formulated paternalistic approach to constituent groups that through long experience they know what's best for everyone without asking. ------------------------------ Date: Wed, 7 Oct 87 14:31 N From: Subject: CELEX lexical databases To: foxea@vtvax3 X-Original-To: foxea@vtvax3, CELEX We think the work of the CELEX project may be of interest to the readers of your digest, and would be grateful if you could include this short notice in a future edition. With thanks, Marcel Bingley Gavin Burnage -- CELEX Nijmegen -- ================================================================================ C E L E X - CENTRE FOR LEXICAL INFORMATION ============================================= CELEX is a new and rapidly-developing project undertaken by several Dutch institutions which aims to provide extensive information on the English and Dutch languages for use in many types of research. Detailed information on orthography, morphology, phonology, word frequency, syntactical categories etc. has been collected and collated from several sources and, by means of the ORACLE relational database management system, structured to form a highly flexible and wide-ranging source of lexical information. Newsletters detailing the development and current standing of the first stage of the CELEX project (the first stage covers all but semantic information, which will be added in the second stage beginning 1989) are now available to anyone who is interested. If you have not already been placed on our mailing list, then please send your surface and electronic addresses to : CELEX@HNYMPI52 ------------------------------ Date: Sat, 17 Oct 87 15:42:53 DNT From: Jakob Nielsen Tech Univ of Denmark Subject: Announcement of ACM CHI'88 Workshop on user interface consistency Call for Participation CHI'88 Workshop on COORDINATING USER INTERFACES FOR CONSISTENCY A limited attendance, invitational workshop on Coordinating User Interfaces for Consistency is being organized for the ACM CHI'88 Conference in Washington, DC. The workshop will be held on Monday, May 16, 1988. The goal of the workshop is to discuss methods for coordinating the design of user interfaces for consistency and to produce a set of recommendations for people responsible for this aspect of user interface design. This workshop will not cover the subject of user interface standards in the sense of how a standard is arrived at or what standards should be recommended. (The workshop will focus on HOW user interfaces can be made to look and feel similar rather than WHAT they should look and feel like.) One of the most important aspects of usability is consistency in user interfaces. Consistency should apply both within the individual application and across complete computer systems and product families. Practical methods for coordinating user interface design are not well known. Issues of interest would include: * What we mean by consistency * User Interface Architectures * In-house standards * Methods for quality assurance of compliance with consistency rules * Methods for coordinating small-scale projects * Methods for coordinating large-scale projects * Automated methods for checking user interface consistency Participation in the workshop will be limited to twelve people. Individuals wishing to attend the workshop may request an invitation by submitting a two-page position paper. Applicants should also briefly list major projects in user interface coordination in which they have participated. Position papers are due no later than March 1, 1988 Four copies, single-spaced should be sent (use airmail to ensure arrival by March 1) to: Jakob Nielsen Department of Computer Science Technical University of Denmark DK-2800 Lyngby Copenhagen DENMARK Telephone: International access +45-1-38 23 20 BITnet: datJN@NEUVM1 ArpaNET: datJN%NEUVM1.bitnet@csnet-relay Please submit position papers in hardcopy. Notification to invited participants will be mailed by airmail March 15, 1988. Invited participants will also be sent copies of the selected position papers along with a final agenda for the workshop. ------------------------------ Date: Thu, 15 Oct 87 10:05:26 EDT Subject: Reply to your message and a job posting From: j.a.king%dayton.ncr.com@RELAY.CS.NET . . . Enclosed is a job-posting for a position at NCR R&D. If you know of any doctoral students or other interested parties, please forward this posting. . . . Thanks. Jim King j.a.king@dayton.ncr.com Job Posting NCR Corporation Available Position in Advanced Software Development October 15, 1987 Consulting Analyst - AI Resposibilities: Programming in the areas of intelligent interface design, information retrieval, planning, knowledge-based systems and other areas of advanced office information systems - through the use of object-oriented techniques. Preferred Background: Applicant should possess a solid background in UNIX AI workstation environments, specifically Symbolics. Experience with object-oriented programming, e.g. Flavors, Smalltalk, Common Loops, C++, etc. is required. Minimum of a B.S. in Computer Science and two years experience required, graduate level degree preferred. Location: NCR Corporate Research and Development Division in Dayton, Ohio. Contact: Nelson Hazeltine or James King at (513)-445-1060 or 1090. ------------------------------ Date: Mon, 5 Oct 1987 18:46 EDT From: Peter de Jong Subject: Cognitive Science Calendar [Extract - Ed] Date: Friday, 2 October 1987 16:17-EDT From: Paul Resnick Re: AI Revolving Seminar-- Michael Kashket Thursday 8, October 4:00pm Room: NE43- 8th floor Playroom The Artificial Intelligence Lab Revolving Seminar Series Order Parser Word A Free- Mike Kashket (kash@oz.ai.mit.edu) MIT AI Lab Free-word order languages (where the words of a sentence may be spoken in virtually any order with little effect on meaning) pose a great problem for traditional natural language parsers. Standard, rule-based parsers have operated with some degree of success on fixed-word order languages (e.g., English), relying on the order between words to drive the construction of the parse tree. In order to cover the varying sequences of free word order, however, these parsers have had to use grammars that contain one rule for each permutation of a sentence. The result was a linguistically uninteresting parse that did not even represent the basic distinction between the verb's subject and object. A shift from rule-based to principle-based parsing seems to be the answer. A parser grounded on a linguistically principled theory---in this case, the recently developed Government-Binding theory---has a grammar that consists of independent modules, each representing a different facet of the language. For order phenomena two representations are mandated: one that encodes linear precedence, and one that encodes hierarchical, syntactic relations (such as subject and object). In this scheme, linear ordering is represented only where it is syntactically relevant. This new parsing technique should also work for fixed-word order languages. Here we take advantage of the parameters of GB theory. The claim is that, rather than allowing unconstrained differences between grammars, we can account for the variation among languages of the world by encoding the grammar for each language with only a simple, finite list of parameter settings. For ordering phenomena, there are two parameters: the part of speech that identifies the subject and the object, and whether words or morphemes are involved. In this talk, I will present an implemented, GB-based parser that handles Warlpiri, a free-word order aboriginal language of central Australia. I will also discuss the promise of this approach for handling fixed-order languages such as English. Ngakarnanyarra nyanyi. (All come.) ------------------------------ Date: Tue, 20 Oct 1987 09:58 EDT From: Peter de Jong Subject: Cognitive Science Calendar [Extract - Ed] Date: Monday, 19 October 1987 17:51-EDT From: Paul Resnick Re: AI Revolving Seminar Thursday-- Janet Kolodner Thursday 22, October 4:00pm Room: NE43- 8th floor Playroom The Artificial Intelligence Lab Revolving Seminar Series Experience, Memory, and Reasoning Janet Kolodner Much of the reasoning people do is based on previous experiences similar to their current situation. The process of using a previous experience to reason about a current one is called case-based reasoning. In case-based reasoning, a reasoner remembers a previous case and then adapts it to fit the current situation. A reasoner that uses case-based reasoning can take reasoning shortcuts, avoid previously-made errors, and focus on important parts of a problem and important knowledge that might otherwise have been missed. To build usable case-based reasoning systems on the computer, we must discover the best ways to make case-based inferences, how to best organize and retrieve cases in memory, and how to integrate case-based with other reasoning methods. In this talk, I will present several case-based reasoning methods and present some of the problems involved in developing case-based problem solving systems. Examples will come from several expert and common-sense domains, and examples from several experimental programs will be shown. ------------------------------ Date: Wed 14 Oct 87 17:42:31-PDT From: Emma Pease Subject: CSLI Calendar, Oct. 15, 3:3 [Extract - Ed] External Language and Internal Representation Pat Hayes (Hayes.pa@xerox.com) October 22 Language evolved, and is used, for communication between intelligent agents. Internally represented information is used quite differently, and different assumptions must be made in thinking about ways of encoding it for use inside a mind. In particular, communication can assume an intelligent decoder on the other end but is severely constrained by the bandwidth of speech, while internal representations seem to have much wider channels of communication available between their component parts but must be explicit and detailed to an extent that would be inappropriate for a `natural' language. I will argue that general talk of `information' ignores this important distinction and is therefore sometimes confusing in discussions of situated agency. ------------------------------ Date: Thu 22 Oct 87 09:09:09-PDT From: Emma Pease Subject: CSLI Calendar, Oct. 22, 3:4 [Extract - Ed] An Introduction to Situated Automata Part I: Basic Concepts Stan Rosenschein October 29 This is the first of two lectures on the situated-automata approach to the analysis and design of embedded systems. This approach seeks to ground our understanding of embedded systems in a rigorous, objective analysis of their informational properties, where information is modeled mathematically in terms of correlations between states of the system and conditions in the environment. In this talk we motivate the general framework, present the central mathematical ideas on how information is carried in the states of automata, and relate the mathematical properties of the model to key theoretical issues in AI including the nature of knowledge, its representation in machines, the role of syntactic deduction, "nonmonotonic" reasoning, and the relation of knowledge and action. Some general technological implications of the approach, including reduced reliance on conventional symbolic inference and increased opportunities for parallelism, will be discussed. The second lecture will describe the application of the situated-automata perspective to specific problems arising in the design of integrated intelligent agents, including problems of perception, planning and action selection, and linguistic communication. ------------------------------ END OF IRList Digest ********************