From vtisr1!irlistrq Mon Sep 8 14:30:59 1986 Date: Mon, 8 Sep 86 14:30:49 edt From: vtisr1!irlistrq To: fox Subject: IRList Digest V2 #41 Status: R IRList Digest Monday, 8 September 1986 Volume 2 : Issue 41 Today's Topics: Query - Back issues, applying IR ideas to software systems? Announcement - Xerox PARC Forum on NoteCards Announcement - News on National Archives storage Discussion - Comments regarding News on National Archives storage Abstracts - Appearing in latest issue of ACM SIGIR Forum, Part 4 of 4 News addresses are ARPANET: fox%vt@csnet-relay.arpa BITNET: foxea@vtvax3.bitnet CSNET: fox@vt UUCPNET: seismo!vtisr1!irlistrq ---------------------------------------------------------------------- Date: Fri, 5 Sep 86 00:42:30 CDT From: seismo!gswd-vms.ARPA!marick%turkey (Brian Marick) Subject: back issues of the mailing list There's a line of information retrieval/organization research that stretches back to Vannevar Bush and Memex, through Englebart and NLS, up to Xerox Notecards and TextNet. I'd like to apply those ideas about organizing and retrieving information to the (I feel) analogous problems involved in the maintenance and enhancement of large software systems by smallish groups of people. The more I learn about the Bush-Englebart-... family tree, the better. Would back issues of the IRList help me? If so, is there any way to get to those back issues? Thanks much. Brian Marick, Wombat Consort Gould Computer Systems -- Urbana && University of Illinois ...ihnp4!uiucdcs!ccvaxa!marick ARPA: Marick@GSWD-VMS [Note: yes, see the Welcome message I will send you for details - Ed] [Note: Dr. William Frakes at AT&T Bell Laboratories and some of his colleagues have been interested in searching large software collections for "relevant" modules, which is a little like what you are talking about. Perhaps Bill and others will comment on your ideas. Please let us know what further developments result. - Ed] ------------------------------ Date: Fri, 5 Sep 86 13:25:06 PDT From: Hibbert.pa@Xerox.COM Subject: PARC Forum September 11: NoteCards [Forwarded from: AI-ED Digest Friday, 5 Sep 1986 V.1: Issue 30 - Ed] PARC Forum Thursday, September 11, 1986 4:00PM, PARC Auditorium Frank Halasz Randy Trigg Tom Moran Intelligent Systems Lab Xerox PARC NoteCards: An Experimental Environment for Idea Processing and Information Management NoteCards is an extensible environment designed to help people formulate, structure, compare, and manage ideas. It was developed here at PARC as a vehicle for our research on the nature of idea processing tasks and the ways in which computers can be used to support intellectual work. As part of this research, we have been actively seeding a community of NoteCards users inside Xerox and at a number of university, government, and industrial sites. NoteCards is currently being used by more than 50 people engaged in idea processing tasks ranging from writing research papers through designing parts for photocopiers. In this forum, we will briefly demonstrate the current version of NoteCards and discuss the major design considerations that drive its development. We will describe the NoteCards user community and the range of clever applications that are being developed using NoteCards. Finally, we will assess how well the system meets the needs of its users. Specifically, we will argue that NoteCards is very successful in supporting the task of managing and organizing large collections of ideas, but is relatively less suited to the task of formulating and structuring these ideas. We will also argue that the system lacks adequate support for collaborative work. These assessments will be used to motivate and briefly describe the current research directions of the NoteCards project. ------------------------------ Date: 4 Sep 86 19:58 PDT From: William Daul / McDonnell-Douglas / APD-ASD Author: Mitch Betts (ComputerWorld) Subject: ComputerWorld 9/1/86 p.31 "National Archives' Storage Under Scrutiny" Comment: I thought this might be of interest to you. It is copied without permission. --Bi// Keywords: National Archives, information retrieval, infomation storage, archives, historians WASHINGTON, D.C. -- The prestigious National Research Council has issued a report urging the National Archives not to use magnetic media or optical disks to permanently store historical documents. Optical disks and magnetic media last only 10 to 20 years for archival purposes, and the rapid pace of change in hardware and software technology suggests that it may be impossible to read the historical records in the centuries to come, according to the report, "Preservation of Historical Records." William Holmes, director of the National Archives and Records Administration's archival research and evaluation staff, stated that he agrees with the research report's conclusions. He said that although the agency plans a pilot test of digital imaging and optical-disk technology, optical disks will be used only for public retrieval and not for permanent storage. "Even if the operating systems and documentation problems somehow are dealt with, what is the archivist to do when the machine manufacturer declares the hardware obsolete or simply goes out of business?," the research report adked. "Will there be an IBM or a Sony in the year 2200? If they still exist, will they maintain a 1980-1990 vintage machine?" the report continued. An example of the problem occurred in the mid-1970s when archivists discovered that there were only two computers that could read the 1960 U.S. census; one was in the Smithsonian Institution and the other was in Japan. The inescapable conclusion, the researchers said, is that long-term archives would be committed to an expensive file conversion program every 10-20 years if it uses electronic media for permanent storage. ------------------------------ Date: 5 Sep 86 09:34 CDT From: "Don Young"@csnet-relay.csnet, "Augmentation Systems Division"@csnet-relay.csnet, MDC Subject: Re:ComputerWorld 9/1/86p.31 "National Archives' Storage Under Scrutiny" [Note: this is a follow up to previous message. - Ed] Thanks for putting this article on-line. Yes, the National Archive folks have two major problems: 1. The question that they ask us "WILL YOU BE AROUND AS A VENDOR TO SUPPORT YOUR PRODUCT OVER THE LONG TERM". 2. Problem with finding the proper recording devices for long term storage. The positive thing in the article is that they confirmed that they are going to run a pilot test. This pilot test could be with ASD. Also, the Air Force/Navy Standard Multiuser small Computer Requirements Contract (RFIas this point) describes Augment On-Line Files in good detail as a requirement. The specification is Augment coupled with the methodology used by AFCC. Will hope that the RFP states the same when available next month. ------------------------------ Date: Wed, 23 Jul 1986 13:06 CST From: Vijay V. Raghavan Subject: SIGIR FORUM Abstracts [Part 4 of 4 - Ed] [Note: Members of ACM SIGIR should have received the spring/summer Forum, and can find these on pages 39-42. The previous parts have appeared in machine readable form in earlier issues of IRList. - Ed] ABSTRACTS (Chosen by G. Salton or V. Raghavan from 1984 issues of journals in the retrieval area) 30. STRUCTURE OF HIERARCHIC CLUSTERINGS: IMPLICATIONS FOR INFORMATION RETRIEVAL AND FOR MULTIVARIATE DATA ANALYSIS F. Murtagh Department of Computer Science, University College Dublin, Dublin 4 Ireland Hierarchic clustering methods may be used to condense information for a user, as they are in multivariate data analysis, or to achieve computational advantages, as they are in information retrieval. The structure of the hierarchic classification produced has a direct bearing on the effectiveness and utility of using cluster analysis, yet this important feature of the classification has only been implicitly referred to in the literature to date. In this study, three different coefficients are defined, each of which quantify the symmetry-asymmetry (balancedness- unbalancedness) of hierarchic clusterings on a scale from 0 to 1. Using examples of data from the areas of information retrieval and of multivariate data analysis, a number of hierarchic clustering methods are discussed in terms of the hierarchies they produce. (INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 5/6, pp. 611-617, 1984). 31. AUTOMATIC INDEXING OF FULL TEXTS Dr. Zdenek Jonak Central Office of Scientific, Technical and Economic Information, Prague, Czechoslovakia The article deals with the preparation of query description using a semantic analyser method based on the analysis of semantic structure of documents. The aim of the paper is to demonstrate the efficiency of this method in the field of automatic indexing. The results obtained by means of this method are compared with results of automatic indexing performed by some traditional methods and with the results of indexing done by human indexers. (INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 5/6, pp. 619-627, 1984). 32. ASPECTS AND THE OVERLAP FUNCTION Marilyn M. Levine Dr. Levine's Information Machine, 823 N. 2nd Street, Room 200, Milwaukee, WI 53203, USA Leonard P. Levine Department of Electrical Engineering and Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI 53201, USA It is intuitively clear that putting the cart before the horse is not the same as putting the horse before the cart. It is equally clear that a history of philosphy is different from a philosophy of history. Yet there is no logical relationship, like the AND/OR/NOT functions, which would enable manipulation of these permuted, non-commutative, relationships. In this paper we present a system for automatic handing of ordered sets, states based on these sets, and of differing points of view regarding a Universe of Discourse. We call what we are dealing with aspects and we represent them by means of a new logical function called the Overlap function. (INFORMATION PROCESSING AND MANAGEMENT, Vol 20, NO. 5/6, pp. 629-636, 1984). 33. A COMPARISON OF TWO METHODS FOR BOOLEAN QUERY RELEVANCY FEEDBACK G. Salton and E. Voorhees Department of Computer Science, Cornell University, Ithaca, NY 14853, USA E. A. Fox Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA The relevance feedback process uses information derived from an initially retrieved set of documents to improve subsequnt search formulations and retrieval output. In a Boolean query environment this implies that new query terms must be identified and Boolean operators must be chosen automatically to connect the various query terms. In this study two recently proposed automatic methods for relevance feedback of Boolean queries are evaluated and conclusions are drawn concerning the use of effective feedback methods in a Boolean query environment. (INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 5/6, pp. 637-651, 1984). 34. ORGANIZATION OF CLUSTERED FILES FOR CONSECUTIVE RETRIEVAL J. S. Deogun University of Nebraska V. V. Raghavan and T. K. W. Tsou University of Regina This paper studies the problem of storing single-level and multilevel clustered files. Necessary and sufficient conditions for a single-level clustered file to have the consecutive retrieval property (CRP) are developed. A linear time algorithm to test the CRP for a given clustered file and to identify the proper arrangement of objects, If CRP exists, is presented. For the single-level clustered files that do not have CRP, it is shown that the problem of identifying a storage organization with minimum redundancy is NP-complete. Consequently, an efficient heuristic algorithm to generate a good storage organization for such files is developed. Furthermore, it is shown that, for certain types of multilevel clustered files, there exists a storage organization such that the objects in each cluster, for all clusters in each level of the clustering, appear in consecutive locations. (ACM TRANSACTIONS ON DATABASE SYSTEMS, Vol. 9, No. 4, December 1984, Pages 646-671) 35. LASER OPTICAL DISK: THE COMING REVOLUTION IN ON-LINE STORAGE Larry Fujitani Commercially available only recently, the optical disk drive uses a laser beam to burn impressions onto a plastic disk. Employing a highly focused beam rather than a diffuse magnetic field to write, the laster optical disk drive yields storage densities up to 10 times those of magnetic disks. (COMMUNICATIONS OF THE ACM, Vol. 27, Number 6, June 1984) 36. AUTOMATIC SPELLING CORRECTION IN SCIENTIFIC AND SCHOLARLY TEXT Joseph J. Pollock and Antonio Zamora An automatic spelling correcting algorithm corrects most of the 50,000 misspellings culled from 25,000,000 words of text from seven scientific and scholarly databases. It uses a similarity key to identify words in a large dictionary that are most similar to a particular misspelling, and then an error-reversal test to select from these the most plausible correction(s). (COMMUNICATIONS OF THE ACM, Vol. 27, Number 4, April, 1984) 37. THE DATA-DOCUMENT DISTINCTION IN INFORMATION RETRIEVAL David C. Blair The speed and effectiveness of documents retrieval systems can be greatly improved by reducing the number of logical decisions required of the user. Based on the weighting of single terms by the user, the proposed system provides an optimized search strategy by combining the terms to yield the highest probabilities and then calculating the size of the retrieval set in each case. (COMMUNICATIONS OF THE ACM, Vol. 27, Number 4, April 1984) ------------------------------ END OF IRList Digest ********************