IRList Digest Sunday, 24 January 1988 Volume 4 : Issue 1 Today's Topics: Email - Welcome message with latest info on submission, etc. Announcement - Free text retrieval software for Mac, SUN, VAX, etc. Abstracts - Software Psychology Society Newsletter for Winter 1988 News addresses are Internet or CSNET: fox@vtopus.cs.vt.edu BITNET: foxea@vtvax3.bitnet ---------------------------------------------------------------------- Date: Sun, 24 Jan 88 13:13:35 est From: fox (Ed Fox) Subject: welcome message to IRList to start off the year Welcome to the IRList. I am the moderator of the IRList discussion. I am responsible for composing the digest from pending submissions, controlling the volume and frequency of mail, keeping an archive, and answering administrative requests. You may submit material for the digest to a variety of places, depending on what network you are on and how quickly and reliably you want mail to reach me. We do not have to pay for mail deliveries, but they do vary in speediness and reliability. Possibilities include: If on ARPANET and can use domains, or on CSNET, use fox@vtopus.cs.vt.edu foxea%vtvax3.bitnet@cunyvm.cuny.edu If on ARPANET and can't use domains use fox%vtopus.cs.vt.edu@csnet-relay.arpa If on BITNET, use foxea@vtvax3 If on UUCPNET, use something like ... seismo!vtvax3.bitnet!foxea As you might expect, archival copies of all digests will be kept; feel free to ask for recent back issues. Note that FTP is now finally possible but details have yet to be worked out regarding access. Meanwhile, all communication must be by EMAIL or phone or letter. IRList is open to discussion of any topic (vaguely) related to information retrieval. Certainly, any material relating to ACM SIGIR (the Special Interest Group on Information Retrieval of the Association for Computing Machinery) is of interest. Our field has close ties to artificial intelligence, database management, information and library science, linguistics, ... A partial list of topics suitable are: Information Management/Processing/Science/Technology AI Applications to IR Hardware aids for IR Abstracting Hypertext and Hypermedia CD-ROM / CD-I / ... Indexing/Classification Citations Information Display/Presentation Cognitive Psychology Information Retrieval Applications Communications Networks Information Theory Computational Linguistics Knowledge Representation Computer Science Language Understanding Cybernetics Library Science Data Abstraction Message Handling Dictionary analysis Natural Languages, NL Processing Document Representations Optical disc technology and applications Electronic Books Pattern Recognition, Matching Evidential Reasoning Probabilistic Techniques Expert Systems in IR Speech Analysis Expert Systems use of IR Statistical Techniques Full-Text Retrieval Thesaurus construction Fuzzy Set Theory Contributions may be anything from tutorials to rampant speculation. In particular, the following are sought: Abstracts of Papers,Reports,Dissertations Address Changes Bibliographies Conference Reports Descriptions of Projects/Laboratories Half-Baked Ideas Histories Humorous,Enlightening Anecdotes Questions Requests Research Overviews Seminar Announcements/Summaries Work Planned or in Progress The only real boundaries to the discussion are defined by the topics of other mailing lists. Please do not send communications to both this list and AIList or the Prolog list, except in special cases. I will try not to overlap much with NL-KR, except when we both receive materials from contributors or from some bulletin board or researchers. PLEASE "sign" subscriptions with full name and address so that people can access you from Internet and/or BITNET (many other networks can be reached through them and are certainly urged to participate). Editing of contributions will usually be limited to text justifications and spelling corrections. Editorial remarks and elisions will be marked with square brackets. The author will be contacted if significant editing is required. I have no objection to distributing material that is destined for conference proceedings or any other publication. I support ACM SIGIR Forum and unless you request otherwise may encourage inclusion of submissions in whole or in part in future paper versions of the FORUM. Indeed, this is one form of solicitation for FORUM contributions! Both IRList and the FORUM are unrefereed, and opinions are always those of the author and not of any organization unless there are other indications. Copies of list items should credit the original author, not necessarily the IRList. If you are interested in submitting to Information Processing and Management (IP&M), I would to entertain a discussion with you as well. Also with The Laserdisk Professional, a new publication about CD-ROM and optical discs. The list does not assume copyright, nor does it accept any liability arising from remailing of submitted material. Further, no liability is accepted for use of such materials for information retrieval research, including distribution of test collections. I reserve the right, however, to refuse to remail any contribution that I judge to be of commercial purpose, obscene, libelous, irrelevant, or pointless. Replies to public requests for information should be sent, at least in "carbon" form, to this list unless the request states otherwise. If necessary, I will digest or abstract the replies to control the volume of distributed mail. However, PLEASE DO contribute! I would rather deal with too much material than with too little. -- Ed Fox Edward A. Fox, Assistant Professor, Dept. of Computer Science, Virginia Tech (VPI&SU), McBryde Hall Rm. 562, Blacksburg VA 24061 (703) 961-5113 or 6931 ------------------------------ Date: 27 Dec 87 09:35 EST From: science@nems.ARPA (Mark Zimmermann) Subject: free text retrieval software for SUN, VAX, Macintosh Hi there! Ed, if you could forward this note to SUN-SPOTS and/or to Igor Metz, who asked about text retrieval software for the Sun, I'd greatly appreciate it -- I am terrible at figuring out addresses to send things to from here, and my mailer is even worse. I wrote up a bunch of programs in C about 6 months ago that run on Sun, VAX, Macintosh, etc., which generate simple complete inverted indices to every word in an ascii text file. (Leaving out 'stop words' turns out to be something of a waste of the computer's time and doesn't save a significant amount of disk space either.) If anybody wants to see copies of the best of these programs, 'qndxr.c' and 'brwsr.c', and can get me an address on the net to send them to (from arpanet, from a picky mailer) I'd be more than happy to do so. 'qndxr.c' is about 50 kB long, including comments, and seems pretty transportable ... I've sent out dozens of copies and haven't heard of any bugs from the latest version. It takes an arbitrarily-large text file (disk space limits you, until you get to 2 or 4 GB where my 32-bit pointers run out) and breaks it up into chunks that fit into memory, then does a quicksort on pointers to every word in the chunk, and writes the resulting chunks of index files to disk ... then, it goes through and merges the chunks of index together until there is a single (pair) of index files (one holding keys, the other holding pointers to every occurrence of words). Very very simple ... I'm working on extensions, but more on that later. Current version seems to build indices at roughly 10-15 MB/hour pace on a Sun or Mac II, and at 3-4 MB/hour on a Mac Plus.... 'brwsr.c' lets you browse through the index ... gives you a display of words and their occurrence rates, like: 100 aardvark 9876 aaron 21 aarons etc. If you are interested in aardvarks, you can pop down into a complete key-word-in-context display of the occurrences of the string aardvark (all 100 of them), like: was eaten by a voracious aardvark in 1492, when his boat landed... took the left leg of his aardvark and painted it blue without a... among the earliest known aardvark civilizations. Now it can be... etc. Then, if any of these lines of the KWIC display look promising, you can pop down into the full text around that chosen line, and read, copy to a file of notes, etc. The C code for brwsr is also about 50 kB long including comments. I have been spending the past few weeks rewriting most of the above to integrate it into HyperCard (Macintosh program ... my routines become external functions and commands) ... should have some good stuff to start distributing in a few weeks, if all goes well. My sabbatical time is running out, so my work will be slower next year, alas. Oh, I forgot to mention, 'brwsr.c' above has simple proximity searching ... you can define a working subset of the dataspace as, for example, only to include words within a few sentences of '1492', for instance, in which case the index display shows the counts in that subset, e.g., 1/100 aardvark 17/9876 aaron 2/21 aarons etc. Now, if you ask for a KWIC display of aardvark, you only see the one occurrence in the neighborhood of '1492'. I use my Macintosh versions of brwsr and qndxr all the time ... have accumulated over 12 MB of text from the past year or so of arpanet and usenet and delphi digests, mostly related to Macintosh programming, information retrieval, etc. -- it's easy to browse and pull out tidbits that I vaguely recall the existence of. As stated earlier, the programs are free (at the moment), but I can't afford to spend a lot of time distributing them or supporting them at that price, and my time will be even scarcer starting next week. Best, ^z (Mark Zimmermann, 'science@nems.arpa') ------------------------------ Date: Sun, 27 Dec 87 16:50:37 EST Subject: Software Psychology Newsletter - Winter 1988 From: ("Ben Shneiderman ") ... Happy New Year...Ben ___________________________________________ SOFTWARE PSYCHOLOGY SOCIETY POTOMAC CHAPTER VOLUME 12 NUMBER 2 WINTER 1988 Note: All meetings will be held at the George Washington University's Mar- vin Center (800 21st Street, N.W.) between 10:00 AM and noon. Coffee and doughnuts will be provided by the Department of Electrical Engineering and Computer Sciences. Send correspondence for this newsletter to: Software Psychology Society, c/o Skip Williamson, Knowledge Systems, Inc., 5705 Stillwell Rd., Rockville, MD 20851. ____________________ January 8 Room 413-414 PERCEPTION AND COMPREHENSION OF COLOR CHARACTERS ON COLOR BACKGROUNDS John T. Christian (1) and Bruce H. Thomas (2) Computer Sciences Corporation, System Sciences Division 8728 Colesville Rd., Silver Spring, MD (1) now at CSC, 4600 Powder Mill Rd., Beltsville, MD 20705 (2) now at National Bureau of Standards, Gaithersburg, MD 20899 Often during the design of a user-system interface, human factors engineers are asked if and how color can be used to code information. Frequently the response is that color can be used but in limited ways (e.g., follow cultural stereotypes and use less than six colors). On the other hand, designers and practitioners in the software field (e.g., word processing and presentation graphics) have used color, sometimes in highly artistic ways, to enhance pro- cessing of presented information. With the advent of improved high resolution color graphics monitors, more people want to use color coding presumably as a strategy to improve productivity. The basis for making color coding deci- sions, for example in a word processing task, are unclear at best. In an attempt to sort out the consequences of color coding information for user productivity, several experiments were conducted. We investigated the effects of color character - color background combinations on people's percep- tion and comprehension of information in a timed target detection and reading comprehension tasks. In two other studies, we examined cultural stereotypes for coding meteorological parameters. The mixed results may serve as a palette to color future decisions on color coding. ____________________ February 12 Room 413-414 THE OPERATING PERSONNEL PERFORMANCE MODEL: AN AID TO DESIGNERS OF AUTOMATED SYSEMS Sylvia B. Sheppard, Elizabeth D. Murphy, Lisa J. Stewart Computer Technology Associates, Inc., 14900 Sweitzer Lane, Laurel, MD 20702 Walter Truszkowski, NASA Goddard Space Center, Greenbelt, MD A theoretical model has been designed to predict the performance of users of automated systems. The Operating Personnel Performance Model is based on the premise that user performance in control rooms can be predicted from a knowledge of the cognitive, sensory, and motor demands imposed on the users in the performance of their tasks and from a knowledge of the capabilities required to meet those demands. Two studies, related to the Network Control Center (NCC) and the Georgia Tech. Multi-Satellite Operations Control Center (GT-MSOCC) at NASA - Goddard Space Flight Center, were conducted to test the model's predictive validity. The results supported the conclusion that the model is an aid in the rapid, systematic evaluation of design alternatives. ____________________ March 11 Room 413-414 USER-CENTERED DESIGN OF AN INTELLIGENT DATABASE SYSTEM Eizabeth Roop, Carlow Associates Incorporated 8315 Lee Highway, Suite 410, Fairfax, VA 22031 To facilitate the dissemination of its collection and compilation of UCI (User-Computer Interface) reports and literature, U.S. Army's Human Engineer- ing Lab is applying an intense R&D effort toward a fully automated, intelli- gent database system. Specifications for this state-of-the-technology system were based on user preferences, determined by surveys of the intended users of the system, and a review of the current literature for both hardware and software techniques. Results from early testing of the prototype will be presented. The prototype includes hardware to scan documents for rapid data entry, a character recogni- tion server to divide the data into separate ASCII and bitmap files, and the latest in WORM (Write Once Read Many) drive technology. HEL's database sys- tem is supported by an intelligent front end, which provides new features for a traditional query and retrieval subsystem, hypertext capability, and per- sonal files. ------------------------------ END OF IRList Digest ********************