Date: Fri, 16 Aug 85 18:18 EST To: irdis at vpi Subject: IRList V 1 No. 2 Reply-to: IRList%vpi@csnet-relay.arpa (or fox@vpics1 on BITNET) US-Mail: Dr. Edward A. Fox, Dept. of Computer Science, VPI&SU (also called Virginia Tech), Blacksburg VA 24061 Phone: (703) 961-5113 or 6931 Subject: IRList Digest V1 #2 IRList Digest Friday, 16 Aug 1985 Volume 1 : Issue 2 Today's Topics: Politics - Sensitivity of Mailing Lists in Australia EMAIL - Distribution List for Australia Research Interests - Human and Silicon Memory - Distributed Workstations and Backend Search Article - Online Access Aids for Documentation ---------------------------------------------------------------------- From: John Shepherd Date: 23 Jul 85 12:57:22 +1000 (Tue) Subject: Adding Australia to IRList Ed, We here at Melbourne University would be very interested in receiving and contributing to the Information Retrieval Digest. In order to cut transmission costs we plan to set up a mail alias on the Australian gateway machine (munnari), and distribute the list from there ... As you can see, we will handle requests from our end, and simply add people to the mailing list. I have one major question, related to a recent experience with the Parallel Symbolic computation digest. A number of people here at Melbourne were put onto that mailing list, and received the first few issues. Suddenly, they received mail telling them that they would no longer be receiving the list because they were "foreign nationals" and there was some problem with "technology transfer". Do you have any feelings on this matter? (Perhaps Parallel Symbolic computation is slightly more sensitive because of its military connections). ... Regards, John Shepherd (jas@mungunni) UUCP: {seismo,ukc,mcvax,ubc-vision}!munnari!jas ARPA: munnari!jas@seismo.ARPA CSNET: jas@munnari.oz ------- From: pje@munnari [Note- this is reachable by UUCP from SEISMO.ARPA - Ed] Date: 23 Jul 85 12:55:25 +1000 (Tue) Subject: IRList Would you please add IR-List@munnari to the IRList subscription list. We provide for redistribution of mailing list items to all sites in the "oz" domain. So if you receive any subscription requests from sites in Australia, please refer them to IR-List-Request@munnari. Thank you. Peter Eden. [Australian members please note! After this issue, you will be deleted from the list, so be sure to get added as mentioned. - Ed] ------- From: Wolf-Dieter Batz Date: 24-July-85 [Comments on interests by new members is most welcome! Please feel free to respond to the implied question, folks! - Ed] I'm interested in human memory phenomena as well as in silicon-based memory systems. The latter interest especially concentrates on an expert system for political issues. Cordial Thanx & kind regards --- Wodi ------- Subj: From: Lee Hollaar Date: Tue 30 Jul 85 06:25:42-MDT Would you please add me to your distribution list for the IR digest. I am the principal investigator of a project developing a distributed workstation environment for information handing and retrieval, and and directing the implementation of a high-speed backend search engine (announced as a product through a spin-off company). I'm also the Vice-Chair of SIGIR. [SIGIR Forum should be of interest to all readers. It is distributed to all members of the ACM Special Interest Group on Inf. Ret. - Ed] Lee Hollaar Hollaar@Utah-20 ------- From: "girill terry%d.mfenet"@LLL-MFE Date: Wed, 31 Jul 85 07:26 pst Subject: Access aids bibliography for IRlist distribution IRlist readers who did not attend the ACM Twelfth Annual User Services Conference, November 1984, in Reno, Nevada, may find this bibliographic outline interesting. It covers recently discussed access aids for online documentation...... Online Access Aids For Documentation: A Bibliographic Outline T. R. Girill National Magnetic Fusion Energy Computer Center University of California Livermore, CA 94550 Background for a Panel on Online Documentation and User Services ACM SIGUCCS Twelfth User Services Conference Reno, Nevada November 12, 1984 Computer documentation poses three information management problems, access problems that readers face when they try to find answers to questions using the documentation. All three problems arise from a mismatch between what readers want or expect and what document authors provide in (1) vocabulary, (2) text structure, and (3) text scope. Techniques exist for addressing this threefold mismatch in traditional, offline documentation. But these offline techniques are often limited in breadth or power, or prove prohibitively complex or expensive to apply. Online solutions for each access problem also exist, and are much more effective than their traditional counterparts. They apply computational tools already successfully tested elsewhere, and they all share the property of adapting a document's terminology, structure, or scope to meet reader needs. The relative benefits of these adaptive access aids can make a well-designed online documentation system very helpful to answer-seeking users. This outline reviews the three kinds of mismatch that cause access problems for readers, and inventories the online techniques that respond to each problem. Implementation details are already available in other published articles and books, for which I provide full bibliographic references. Vocabulary Mismatch ------------------- Mismatch between the vocabulary in a document and the query or search terms used by prospective readers poses the first major access problem. 1. That a document's vocabulary affects the ease with with one can read it is well known. 2. But vocabulary also strongly affects the retrieval of answer passages from documents. a. Common practice selects index terms and keywords chiefly from the text of the document itself. [Browning] b. But these text terms are only a small fraction of all possible terms. This makes the index entries and keywords inappropriate for many readers, who then cannot locate answer passages even when the text does contain relevant answers and when the searcher has a specific question formulated. [Sullivan] 3. Psychological experiments confirm the seriousness of this mismatch of terms. [Furnas] a. For passsages flagged with a single keyword, the probability that a user's search term will match that keyword is often less than 20%. b. Even if keyword choice exploits empirical evidence of user preferences, the match probability remains below 40%. c. One must assign as many as 15 distinct keywords to a sought passage to raise the probability of a first-time match by users to 60-80%. "Different people, contexts, and motives give rise to so varied a list of names that no single name, no matter how well chosen, can do very well." [Furnas, p. 1796] 4. Although professional indexers know techniques to enhnace index-entry and keyword choice, offline indexes can never adapt to reader needs at the time of use. 5. On the other hand, actively "negotiating" search terms with documentation users while they look for answers, via an interactive interface to online text, can dramatically improve term-match (and hence search-success) probability. a. Programs that "begin with user's words and look for interpretations--that is, try to recognize every possible word that the users generate, and use empirical data to determine what they mean by that word" by iterative guessing, can yield match probabilities around 90%. [Furnas, p. 1797] b. Even when keywords come exclusively from a standard, limited thesaurus, "relevance feedback" from users during a search improves performance. [Doszkocs] c. Soliciting and exploiting feedback on term correlations in online documents during a search promotes success by letting users recognize, rather than try to imagine, the semantic relationships they seek. [Doyle] d. Soliciting and exploiting feedback on term frequencies in online documents during a search quickly reveals and isolates relevant passages amidst large collections. [Burket] e. "Query optimization" programs can exploit synonymy and other semantic relations to convert queries posed in user's terms into faster-searching, less-expensive queries behind the scenes. [Barr] Structure Mismatch ------------------ Mismatch between a document's structure and the structure expected or needed by the reader poses the second major access problem. 1. That a document's structure affects its coherence and clarity is well known. 2. But document structure also strongly affects the ease with which readers can find answer passages. [Gerrie, p. 117] 3. In this access-support role, the structure of offline documents is intrinsically limited. [Swigger] [Wright] a. No matter how astute its organization, a printed document can have only one hierarchical structure. b. Any single structure will fail to match the needs, expectations, and distinctions of some who encounter it. c. The more diverse a document's audience, the more common and severe this mismatch will be. d. Even a single reader may bring to a document different needs and interests at different times, although its structure remains unchanged. 4. Techniques to adapt document structure to user needs have been developed to cope with these limitations. a. Two techniques apply both offline and online. 1. Providing more than one outline or "retrieval guide" for a document, where each embodies quite different distinctions, can improve access. [Sprowl] 2. Converting the table of contents or text hierarchy into a decision tree for locating passages can improve access. [Wright] b. Even in these cases, online implementation is often more successful than offline. 1. Programs exist to easily generate outlines and decision trees online. [Gaffney] 2. Updating is easier and more frequent if the outlines and decision trees are kept online. [Sprowl] 3. Use can be monitored online, and the (empirically) most common choices can then be offered first. 5. Furthermore, some adaptive techniques are only available online: computer-directed searching techniques can actively adapt to suit readers in ways impossible with traditional publications. a. Document distribution and passage retrieval programs can keep a model of the user, constantly modified in light of past requests and performance. By relying on this model, the software can actively map the user's needs onto the document's structure, even if his interests change during a search. [Oddy] b. Online search interfaces can infer a user's goals from his overt request (including indirect or implicit goals). They can then plan a "cooperative response" that furthers those search goals, even when the user might have been unable to formulate them himself (something no book can do). [Wilensky] [Jackson] c. Integrating online documentation with the program it explains can provide an alternative access structure, one tied to practical applications and actual user tasks. Query-in-depth (increasingly detailed) passage display can make integrated documentation even more adaptive. [Houghton] d. Online passage-retrieval programs can use data structures much more complex and sophisticated than any writer could support or any reader could exploit offline. Linked lists, associative networks, and other relation-rich structures support diverse search paths, multiple classifications, and elaborate cross references among passages. [Price] Scope Mismatch -------------- Mismatch between a document's scope and the scope expected or needed by readers poses the third major access problem. 1. That the scope of a document's passages affects readers' ability to learn from them and notice key features in them is well known. [Fleming] [Conklin] 2. But scope also strongly influences the ease with which readers can find adequate answers. a. If passages are too small or sparse (compared to what readers need), then readers must "jump, detour, and change [search] directions" frequently to find complete answers. [Weiss, p. 10] b. If passages are too large or detailed (compared to what readers need), they intimidate and confuse readers, leading to awkward rereading and lengthy searches. [Bethke] 3. The ability of offline documents to provide passages of suitable scope is limited. a. Documents that avoid redundancy to keep passages small must rely on an elaborate web of cross references to provide detailed answers. Following these reference chains can be awkward and confusing. b. Documents that avoid cross references by including many intentionally redundant (hence complete) passages automatically grow in bulk. Their large size can intimidate readers and boost costs. 4. Techniques to adapt passage scope to reader needs are available online that would be impractical or impossible in offline documentation. a. Freed from the offline contraints of page size and binding, one can package online text in intellectually meaningful chunks, in display units whose scope is based on content alone. [Badre] [Rothenberg] b. Online cross references between such meaningful chunks minimize access delays and reader confusion because they lead directly and precisely to related text, which seldom occurs offline. [Girill] c. Online passage-display programs can support virtual redundancy, by showing a set of text lines wherever it is relevant but without storing duplicate copies of the lines. This gives the reader the quick-answer benefits of a highly redundant document, yet without an increase in total size, and hence without intimidation, unwieldy bulk, or higher storage costs. [Luk] References ---------- Badre, Albert. "Designing Chunks for Sequentially Displayed Information," in Albert Badre and Ben Shneiderman, Eds., Directions in Human/Computer Interaction (Norwood, NJ: Ablex Publishing Co., 1982), pp. 179-193. Barr, Avron and Edward A. Feigenbaum. The Handbook of Artificial Intelligence (Los Altos, CA: William Kaufman, Inc., 1982), vol. 2, Ch. VII-D, "Artificial Intelligence in Database Management." Bethke, F. J. Ease-of-Use Study Group Report (San Jose, CA: IBM Santa Teresa Laboratory, 1979). Browning, Christine. Guide to Effective Software Technical Writing (Englewood Cliffs, NJ: Prentice Hall, 1984), Ch. 8. Burket, T. G., P. Emrath, and D. J. Kuck. "The Use of Vocabulary Files for On-line Information Retrieval," Information Processing and Management, 15 (1979), 281-289. Conklin, E. J., K. Ehrlich, and D. D. McDonald. "An Empirical Investigation of Visual Salience and its Role in Text Generation," Cognition and Brain Theory 6 (1983), 197-225. Doszkocs, Tamas and Barbara A. Rapp. "Searching MEDLINE in English: A Prototype User Interface with Natural Language Query, Ranked Output, and Relevance Feedback," Proceedings of the ASIS 42nd Annual Meeting, vol. 16 (White Plains, NY: Knowledge Industry Publications, 1979), 131-137. Doyle, Lauren B. "Semantic Road Maps for Literature Searchers," Journal of the Association for Computing Machinery, 8 (October 1961), 553-578. Fleming, Malcolm and W. Howard Levie. Instructional Message Design (Englewood Cliffs, NJ: Educational Technology Publications, 1978), Ch. 2. Furnas, G. W. , T. K. Landauer, L. M. Gomez, and S. J. Dumais. "Statistical Semantics: Analysis of the Potential Performance of Key-Word Information Systems," Bell System Technical Journal, 62 (July 1983), 1753-1806. Gaffney, P. W., J. W. Wooten, and K. A. Kessel. NITPACK--A Numerical Interactive Tree Package (Oak Ridge, TN: Oak Ridge National Laboratory, 1982), Report ORNL/CSD-89. Gerrie, Brenda. Online Information Systems (Arlington, VA: Information Resources Press, 1983), Ch. 4, "The Retrieval Process." Girill, T. R. "Display Units for Online Passage Retrieval: A Comparative Analysis," Proceedings of the 31st International Technical Communication Conference, (Seattle: Society for Technical Communication, 1984), ATA87-90. Houghton, Raymond C. Jr. "Online Help Systems: A Conspectus," Communications of the Association for Computing Machinery, 27 (February 1984), 126-133. Jackson, Peter and Paul Lefrere. "On the Application of Rule-Based Techniques to the Design of Advice-Giving Systems," International Journal of Man-Machine Studies, 20 (January 1984), 63-86. Luk, Clement and T. R. Girill. "DOCUMENT: An Interactive, Online Solution to Four Documentation Problems," Communications of the Association for Computing Machinery, 26 (May 1983), 328-337. Oddy, R. N. "Information Retrieval Through Man-Machine Dialogue," Journal of Documentation, 33 (March 1977), 1-14. Price, Lynne A. "Using Offline Documentation Online," SIGSOC Bulletin, 13 (January 1982), 15-20. Rothenberg, J. H. "Online Tutorials and Documentation for the SIGMA Message Service," Proceedings of the AFIPS National Computer Conference, vol. 48 (Montvale, NJ: AFIPS Press, 1979), 863-867. Sprowl, James A. "Computer-Assisted Legal Research--An Analysis of Full-Text Document Delivery Systems," American Bar Foundation Research Journal, 175 (1976), 175-226. Sullivan, Patricia, and Carol Janik. "Adapting Manuals to a Variety of Audiences: Information Access," Proceedings of the 31st International Technical Communication Conference (Seattle: Society for Technical Communication, 1984), WE187-190. Swigger, Keith. "A Structured Model for Software Documentation," 13th ASIS Midyear Meeting, Bloomington, IN, May 21, 1984, 9 pp. Weiss, Edmond. "Usability: Toward a Science of User Documentation," Computerworld, 17 (January 1983), 9-16. Wilensky, Robert. Talking to UNIX in English: An Overview of an Online Consultant, (Berkeley: Computer Science Division, University of California, 1982), Report UCB/CSD 82/104. Wright, Patricia. "Presenting Technical Information: A Survey of Research Findings," Instructional Science, 6 (April 1977), 93-134. -------------------------- END OF IRList Digest ********************