IRList Digest Tuesday, 15 November 1988 Volume 4 : Issue 53 Today's Topics: Query - Public domain / low cost hypermedia software - Books on information science - IR courses in Boston area / medical informatics - CDROMs and comparisons of IR systems Email - Any problems with format code in dissertation file Abstracts - Comparing extended Boolean approaches - Perfect Hashing Functions Announcement - Reminder about ACM Doc. Proc. Systems Conf. News addresses are Internet: fox@vtopus.cs.vt.edu BITNET: foxea@vtcc1.bitnet (replaces foxea@vtvax3) ---------------------------------------------------------------------- Date: Wed, 14 Sep 88 08:39:18 SST From: Desai Subject: Looking for Public domain software ... Are there any hypertext / hypemedia software in the public domain (source inclusive) that can be obtained for a reasonable fee? Any input will be useful and thanks in advance. With best regards Desai ------------------------------ Date: Fri, 23 Sep 88 12:54:49 EDT From: David Johnson Subject: books on information science Can anyone tell me the name and whereabouts of any bookstore that regularly maintains a stock, preferably large and varied, of information- science books??? ------------------------------ Date: Tue, 11 Oct 88 12:21:52 EDT From: pattison@harvard.harvard.edu Subject: IR courses in the Boston area? A friend asked me to post this, I will handle email responses. Dr. Fox: I am sending you this note via the address of an officemate of mine, as I do not yet have one. My name is Bill Hersh, and I am currently doing a research fellowship in Medical Informatics at Harvard School of Public Health. In case you don't know, Medical Informatics is an interdisciplinary field that does research into computer applications in medicine. [Note: Yes, I am familiar with some work in that field and am glad that there in interest in bringing the two areas together since they have so many points in common. - Ed] My personal area of interest is in the development of interactive electronic textbooks of medicine. In my reviewing of the literature, I have come across the field of Information Retrieval, and the work that you, Salton, and others have done is been very interesting to me. [Note: Nancy Roderer at Columbia is doing some interesting work in connection with their IAIMS effort. At Columbia there is development of CTIM which relates to your area - Ed.] I am writing you this note to inquire if you know of any people doing work or giving courses in IR in the Boston area. My fellowship covers tuition for courses, so I would be able to take any courses from any school in the Boston area. I have a good background in computer science, and I have already read Salton's book, so I think I would be able to handle any course in the subject. [Note: I hope Dick Marcus at MIT can help with this - Ed] In addition to reading Salton's book and many recent articles in Information Processing and Management and the Journal of ASIS, I have also joined SIGIR. Would you recommend any other avenues in which to become more knowledgeable in the field? [Note: You have a good start. There are articles in many places that are hard to track down for people entering the field. The Annual Review of Information Science and Technology is a good way to find those works since it has such a carefully done bibliography with each article. I suggest you come to IR conferences such as the SIGIR one in Cambridge MA in June 1989 - and invite SIGIR to have a panel at some of the Medical Informatics conferences - perhaps some jointly sponsored event would help bridge the gap? - Ed.] William Hersh, MD Fellow in Medical Informatics Dept. of Radiology Computer Science Brigham and Women's Hospital 75 Francis St. Boston, MA 02115 617-732-6505 ------------------------------ From: microsof!jerryd@beaver.cs.washington.edu Subject: CDROMs and comparison articles Date: Mon Oct 10 14:46:55 1988 ... Are there any other recent discs or reports out comparing commercially available information retrieval engines or systems? (I've subscribed to SIGIR, so I'll be seeing upcoming information there.) Jerry J. Dunietz Microsoft Corporation uw-beaver!microsoft!jerryd (UUCP) ------------------------------ Date: Sat, 27 Aug 88 14:57 EDT From: Subject: RE: IRList Digest V4 #44 special format codes I noticed that in this Digest the symbols ".]" and ".[" appeared in the leftmost column. It has been my experience that some mailers or--more likely--gateways refuse to deal with lines starting with a ".". I would be interested in knowing if Digest V4 #44 got through unscathed to all the intended recipients on the list. Peter Junger JUNGER@CWRU [Note: I moved the period to column 2 in the next set - hope that helps - Ed.] ------------------------------ Date: Wed, 7 Sep 88 12:55:53 edt From: whayl@vtopus.cs.vt.edu (Whay Choong Lee) Subject: Experimental Comparison of ... Interpreting Boolean Queries Experimental Comparison of Schemes for Interpreting Boolean Queries Whay C. Lee and Edward A. Fox Department of Computer Science VPI&SU, Blacksburg, VA 24061 ABSTRACT The standard interpretation of the logical operators in a Boolean retrieval system is in general too strict. A standard Boolean query rarely comes close to retrieving all and only those documents which are relevant to the user. An AND query is often too narrow and an OR query is often too broad. The choice of the AND results in retrieving on the left end of a typical average recall-precision graph, while the choice of the OR results in retrieving on the right end, implying a tradeoff between precision and recall. This study basically examines various proposed schemes, the P-norm, Classical Fuzzy-Set, MMM, Paice and TIRS, which provide means to soften the interpretation of the logical operators, and thus to attain both high precision and high recall search performance. Each of the above schemes has shown great improvement over the standard Boolean scheme in terms of retrieval effectiveness. The differences in retrieval effectiveness between P-norm, Paice and MMM are shown to be relatively small. However, related performance results obtained gives evidence of the ranking: P-norm, Paice, MMM and then TIRS. This study employs the INNER PRODUCT function for computing the similarity between a document point and a query point in TIRS. There may be other choices of similarity functions for TIRS, but irrespective of the function used, the TIRS approach, having to deal with associated min-terms rather than the original query, is difficult to realize and involves far greater computational overhead than the other schemes. The P-norm scheme, being a distance-based approach, has greater intuitive appeal than the Paice or MMM scheme. However, in terms of computational overhead required of each scheme, both the Paice and MMM are superior to P-norm. The Paice and MMM schemes are essentially variations of the classical fuzzy-set scheme. Both perform much better than the classical fuzzy-set scheme in terms of retrieval effectiveness. CR Categories and Subject Descriptors: H.3.1 [:hp3.Information Storage and Retrieval:ehp3.]: Content Analysis and Indexing, H.3.3 [:hp3.Information Storage and Retrieval:ehp3.]: Information Search and Retrieval -- Query Formulation, Retrieval Models, Search Process, H.3.6 [:hp3.Information Storage and Retrieval:ehp3.]: Library Automation. General terms: algorithms, experimentation, design, performance Additional Keywords and Phrases: Boolean retrieval, logical operators, P-norm, Paice, MMM, TIRS, Fuzzy-set, effectiveness ------------------------------ Date: Wed, 14 Sep 88 09:13 EDT From: Edward A. Fox Subject: New technical report on Perfect Hashing Tech report 88-30 has title "A More Cost Effective Algorithm for Finding Perfect Hash Functions" with authors: Edward A. Fox, Qi-Fan Chen, Lenwood Heath, Sanjeev Datta [Note: this is similar to paper that will be presented at ACM CSC 89 in February. It is about an O(n^3) algorithm that was used in a beta copy of Virginia Disc One for developing perfect hash functions to over 300 sets, each with 256 words - Ed.] ------------------------------ Date: Wed, 7 Sep 88 22:43 5 From: ORBETON@nuhub.acs.northeastern.edu Subject: DP88 Adv Prog Info [Sent as final reminder - Ed.] ACM Conference on Document Processing Systems December 5 - 9, 1988 Santa Fe, New Mexico Advance Program Information The ACM Conference on Document Processing Systems is an inaugural, international conference bringing together researchers, developers, and users to examine the theory, development, and application of document processing systems for generating, disseminating, searching, and viewing information. It is sponsored by the Association for Computing Machinery's Special Interest Groups on Graphics (SIGGRAPH), Computer-Human Interaction (SIGCHI), and Office Information Systems (SIGOIS), in cooperation with the Los Alamos National Laboratory and SIGIR (Special Interest Group on Information Retrieval). Technical Program The technical program offers 24 papers describing recent work relating to significant problems, including research results or the innovative application of document processing technology. A representative sampling includes: + Conceptual Documents: A Mechanism for Specifying Active Views in Hypertext + Translating Among Processable Multi-Media Document Formats Using ODA + Adding Browsing Semantics to the Hypertext Model + Automatic Text Indexing Using Complex Identifiers + Formalizing the Figural: Aspects of a Foundation for Document Manipulation + Why Switch from Paper to Electronic Manuals? A Military Perspective + Evolution of an SGML Application Generator + Auto-Updating as a Technical Documentation Tool + The LaserROM Project: A Case Study in Document Processing Systems + An Adaptation of Dataflow Methods for WYSIWYG Document Processing Courses On Monday, December 5, nine courses will provide an in-depth look at a wide range of topics related to document processing issues and techniques. The courses are arranged into six full-day tracks. Since seating is limited, early registration is strongly encouraged. The courses are offered in addition to the conference itself and carry separate fee schedules. Course offerings are as follows: Structured Documents. Richard Furuta, University of Maryland, Vania Joloboff, BULL Research Center, Vincent Quint, INRIA. For those interested in a conceptual framework that organizes the field of structured document processing systems. Introduction To The Office Document Architecture (ODA). Heather Brown, University of Kent at Canterbury. A general introduction to ODA concentrating on the document structures provided and the types of content currently allowed (especially text). Implementation And Conformance Of ODA/ODIF Systems. Wally Wedel, NBI, Inc., Frank Dawson, IBM Corporation. An introduction to the recently approved international standard entitled Information Processing--Document Architecture and Interchange Format. Introduction To The Standard Generalized Markup Language (SGML). Donald D. Chamberlin, IBM Almaden Research Center. Introduces the SGML Standard, including its purpose, its current status, and the syntax of the language itself from the points of view authors and designers. Implementation Of SGML Systems. Lynne A. Price, Hewlett Packard, Jim Heath, National Bureau of Standards, Peter Sharpe, SoftQuad Inc. Presents the programmer's view and discusses possible design strategies for SGML software, and shares experiences with prospective implementors. Digital Typography: A Primer. Richard Rubinstein, Digital Equipment Corporation. Provides the basic background necessary to understand the issues in digital output of text. Introduction To Hypertext And Hypermedia. Jakob Nielsen, Technical University of Denmark. Provides an introduction to the concepts of hypertext (non-sequential writing) and hypermedia (multi-media hypertext). CD-ROM Publishing And Access. Edward A. Fox, Virginia Polytechnic Institute and State University. Addresses two phases of document processing: document production and document dissemination. Advanced Methods Of Document Retrieval. Norbert Fuhr, Technische Hochschule Darmstadt. Shows how modern IR techniques can be adapted to multi-media, multi-type document bases in order to increase retrieval effectiveness as well as user friendliness. Demonstrations and Panel Sessions Demonstrations by their creators of 12 systems, closely related to the technical program and tutorials topics, will provide attendees with a close-up look at new experimental and commercial systems and concepts. Eight panels will stimulate thought and discussion. They will provide an alternative format for presenting varying views. Book Exhibit and Technical Tour Books and technical journals in the area of document processing systems from a variety of publishers will be available for browsing and purchase or subscription. You are invited to Los Alamos National Laboratory on Friday, December 9 to visit the Bradbury Science Museum and, if you are a U.S. citizen, the Central Computing Facility, which houses one of the world's largest scientific computing centers. Transportation will be provided. Social Functions Course attendees are invited to a reception Sunday evening. Technical program attendees are invited to a reception Tuesday evening at the Museum of International Folk Art. Registration Information Space is limited for the technical program and all courses. On-site registration is available only as space permits. Member Discounts are available to current ACM, SIGGRAPH, SIGCHI, SIGOIS, or SIGIR members. Students are especially invited to attend this conference and will benefit from a discounted fee schedule. Interested students are sought for volunteer work at the conference in exchange for complimentary registration fees. Fees -- Technical Program By Nov 7 After Nov 7 Member $225 $300 Non-member $300 $375 Student $100 $100 Fees -- Courses Member $200 $275 Non-member $230 $305 Student $100 $100 Santa Fe Santa Fe, so rich in tradition and cultural diversity, has been called "The City Different." Santa Fe, at 7,000 feet above sea level, is nestled in northern New Mexico's desert highlands at the foot of the Sangre de Cristo Mountains, where the cultures of the Indians, Hispanics, and Anglos meet and mingle. With its unique character and charm, Santa Fe resembles an old-world village. Distinct from every other American city, Santa Fe proves to its visitors that you don't have to leave the country to visit an enchanting foreign land. For a copy of the Advance Program which is available now, contact Peter Orbeton, Lotus Development, 161 First St., Cambridge, MA 02142; e-mail Orbeton.chi@xerox.com [Note - registration materials should be sent to Dr. Lynne A. Price Registration Chair Hewlett-Packard 3200 Hillview Ave Palo Alto, CA 94304 Note: see V4 #51 for program info - Ed.] ------------------------------ END OF IRList Digest ********************