From vtisr1!irlistrq Sun Sep 7 17:30:50 1986 Date: Sun, 7 Sep 86 17:30:43 edt From: vtisr1!irlistrq To: fox Subject: IRList Digest V2 #40 Status: R IRList Digest Sunday, 7 September 1986 Volume 2 : Issue 40 Today's Topics: Query - Mail list digest indexing? Announcement - Oxford Text Archive shortlist, new acquisitions Call for Papers - IFIP WG6.5 Conf. on Message Handling Systems Abstracts - Appearing in latest issue of ACM SIGIR Forum, Part 3 News addresses are ARPANET: fox%vt@csnet-relay.arpa BITNET: foxea@vtvax3.bitnet CSNET: fox@vt UUCPNET: seismo!vtisr1!irlistrq ---------------------------------------------------------------------- Date: Sat, 30 Aug 86 00:43:45 edt From: Ewgorc@CS.UCL.AC.UK Subject: indexing of mailing-list items mailing list digest indexing Dear Mr. Fox, I've just found out from the moderator of the AIList that you have built a system for the automatic indexing of mailing-list items. I am currently implementing a similar system for an M.Sc. project, and I would be very grateful if you could tell me a little about your system, as a comparison would be useful as part of my project report. (The only similar system I'd heard about before was one designed by an M.Sc. student at Queen Mary College, University of London : this was very much oriented towards reserarch into user-modelling, and was never actually implemented.) My system is designed to run under UNIX on a Sun workstation. It first identifies the mailing-list to which each new message belongs, copying them into an appropriate directory (and splitting digests into several files, one for each item or for "Today's Topics"). An index file for each is then produced : indexing terms are taken from profiles set up by potential readers, and from a dummy profile maintained by the system administrator which will be particularly useful in the early days of the system before many readers have created large profiles. The indexing process is based on "fgrep" : the number of times each term occurs in the text of the message is noted (terms occurring in the "Subject" header are given an especially high score, but other header lines are disregarded.) Each reader's profile is then compared to the index files, adding up the scores for each term which occurs in both the profile and the index of a message, and then dividing by the number of lines in the text, to obtain the interest value of the message to that reader : messages for which the interest value is greater than or equal to a threshold (which may be given in the profile, although a default is provided) are mailed to a file associated with the reader, using details of the match - the interest value and a list of the matched terms - as a new Subject header. The three main programs of the system are written in C : although much of the work could possibly have ben done by "grep", "awk" and similar utilities, I felt C code would be more maintainable (e.g. once the basic system is in operation, I would like to allow readers to link terms together by "and" and "or"). I would be interested to know if my system differs radically from yours in any way, particularly in the algorithms used for indexing and matching. For example, does your system perform any semantic analysis of messages rather than just string-matching, and if so how? Thanks in advance for any information you can give me. Tim Miles BP Research Centre, Sunbury-on-Thames, England. [Tim: Sounds like an interesting and useful project. Yes, my system is different. I will try to fill in a few comments and give pointers to related works - other readers are invited to add to this discussion. 1) Thomas Malone at MIT has worked on the "Information Lens" which deals with mail handling and profiles. 2) Michael Mauldin (see IRList V2 #25, 30) is working on FERRET which is an application and extension of methods like FRUMP to mail indexing and retrieval. 3) The SMART system at Cornell and Virginia Tech has been used to index and retrieve collections of mail and mail digest messages. 4) The CODER system is under development at Virginia Tech to use AI methods to analyze and retrieve (parts of) messages from an archive of messages from AIList Digest. A brief discussion appears in IRList V2 #26, and there have been several conference papers and technical reports on it. 5) Your approach is like several SDI (selective dissemination of information) systems that matched profiles against new files of recent abstracts. 6) Some online interactive systems for searching archives have recently become available on networks like BITNET. 7) R. Korfhage has recently done work on profiles and queries in information retrieval. Hope this is the kind of info you wanted. Regards, Ed] ------------------------------ Date: 29-AUG-1986 14:33:03 From: LOU%UK.AC.OXFORD.VAX1@AC.UK Subject: Oxford Text Archive news OXFORD TEXT ARCHIVE A new edition of the Oxford Text Archive Shortlist was published in August 1986. Send your name and address for a free copy! (second and subsequent copies cost $3 each) Recent acquisitions include TWO diffent parsed/structured versions of the Oxford Advanced Lwearners Dictionary, one produced by Roger Mitton at Birkbeck College; the other by Rick Kazman at University of Waterloo. Both versions are available under the OTA's standard conditions of use. ------------------------------ Subject: Call for Papers WG 6.5 From: Peter Schicker Date: 13 Jun 86 7:55 -0100 BST [Forwarded msg below was sent to me for redistribution - Ed] Hugh and Stef: Could you post the following call for papers on the US and European nets, please. CALL FOR PAPERS IFIP WG 6.5 International Working Conference on MESSAGE HANDLING SYSTEMS (State of the Art and Future Directions) 27 to 29 April 1987 Munich Fed. Rep. of Germany Program: The purpose of the conference is to provide an international forum for the exchange of information on the technical, economic, social, and political impacts of computer message and office systems. The conference format will be two days of conference paper presentations followed by one day of work- shops. Papers are desired in the following topic areas: MHS Interconnection and Interworking Interconnection of X.400 Systems (Private and public) Gateways to X.400 Systems X.400 Shell to non-X.400 Systems Interworking between X.400 and the Postal System Interworking with other Architectures (e.g., DIA/DCA, All-In-1, etc.) Multi-Vendor Private Message Systems Documents and Messages Document and Message Architectures Multimedia Documents and Messages Graphics (GKS) vs. Facsimile Communication of Business Forms and Trade Documents Directory Services Naming and Addressing Public Directory Systems Interworking between Public and Private Directory Systems New Access Protocols Mailbox Services Extensions to X.400 Series Recommendations Message Management Personal Message Management Message and Document Filing and Retrieval Group Communication Distribution Lists Organization of Message Flow Real-Time Conferencing Models for Group Communication Workstations and User Interface Workstation and Cluster Design Backup and Archiving User Interface Issues Message Editing Security Aspects Authentication Confidentiality Impacts of MHS Social and Behavioral Impacts Impacts on Organizations Impacts on Nations Inpacts on Relieving Impairment Policy Issues Public Policy Issues in MHS Transborder Data Flow Legal Status of MHS Privacy and Confidentiality - ------------------------------------------------------------------------ Instructions to Authors: Prospective Authors are invited to submit for review unpublished original contributions (not exceeding 5000 words) which describe recent developments on any design or service aspect of computer message systems. Accepted papers will appear in the Conference Proceedings published by North-Holland Publishing Company. Deadlines: Today Send a postcard with your name, telephone, and EMail address to: Message Systems '87 Mrs. Stenzel Siemens AG D-AP.11 Otto Hahn Ring 6 D-8000 Munich 83 Fed. Rep. of Germany This will ensure that you will receive further information about the conference. Please indicate also the provisio- nal title if you intend to submit a paper. Sept. 30, 1986 Draft versions of papers required Nov. 30, 1986 Notification of acceptance Jan. 31, 1987 Camera-ready papers required Papers should be submitted to: Peter Schicker Zellweger Telecommunications AG CH-8634 Hombrechtikon Switzerland ------------------------------ Date: Wed, 23 Jul 1986 13:06 CST From: Vijay V. Raghavan Subject: SIGIR FORUM Abstracts [Part 3 - Ed] [Note: Members of ACM SIGIR should have received the spring/summer Forum, and can find these on pages 37-39. The rest will appear in machine readable form also in a later issue of IRList. - Ed] ABSTRACTS (Chosen by G. Salton or V. Raghavan from 1984 issues of journals in the retrieval area) 23. RANKING TECHNIQUES AND THE EMPIRICAL LOG LAW Bertram C. Brookes 64 Abbots Gardens, London N2 0JH, England Four empirical laws of bibliometrics - those of anomalous numbers, of Lotka, Zipf and Bradford, together with Laplace's notorious "law of succession" and de Solla Price's cumulative advantage distribution, are shown to be almost identical. Some of these laws are expressed as frequency distributions, some are frequency-ranked. A simple model which discriminates these various forms is described. It shows that the frequency forms conform with an inverse square law over the appropriate interval and that the equivalent rank distribution - the Log Law - has the Df Q(r) = log b (r + 1) where b is the rank interval. It is further shown that frequency distributions discard empirical statistical information which the equivalent rank distributions retain for analysis. So that rank distributions of theoretical advantages in this field. The paper concludes with comments on the analysis of the empirical hybrid forms which arise. The reduction of the above laws, empirical and hypothetical, to a single law is achieved by NOT equating the ordinals 1st, 2nd, 3rd,... to the numbers 1, 2, 3,... as is commonly done. (Information Processing & Management, Vol. 20, No. 1-2, pp. 37-46, 1984). 24. HUMAN INFORMATION SEEKING AND DESIGN OF INFORMATION SYSTEMS William B. Rouse and Sandra H. Rouse Center for Man-Machine Systems Research, Georgia Institute of Technology, Atlanta, GA 30332, USA The literature of psychology, library science, management, computer science, and systems engineering is reviewed and integrated into an overall perspective of human information seeking and the design of information systems. The nature of information seeking is considered in terms of its role in decision making and problem solving, the dynamics of the process, and the value of information. Discussions of human information seeking focus on basic psychological studies, effects of cognitive style, and models of human behavior. Design issues considered include attributes of information systems, analysis of information needs, aids for information seeding, and evaluation of information systems. INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 1-2, pp 129-138, 1984. 25. EXPERIMENTAL AND QUASI-EXPERIMENTAL DESIGNS FOR RESEARCH IN INFORMATION SCIENCE David F. Haas and Donald H. Kraft Department of Computer Science, Louisiana State University, Baton Rouge, LA 70803, USA This is a paper about research designs in information science. We look at a sample of current research and compare its designs with an abstract ideal of experimental research design to see how closely they approximate it. We then consider ways in which research in our field might be brought closer to the ideal. We do this because we believe that experimental and quasi-experimental designs offer unique advantages over other research designs, especially in the production of knowledge that can be applied to the solution of practical problems in information and in software science. (INFORMATION PROCESSING AND MANAGEMENT Vol. 20, No. 1-2, pp. 229-237, 1984). 26. THE RELATION BETWEEN THEORY AND METHODOLOGY FOR DESIGNING EXPERIMENTS IN INFORMATION SCIENCE Charles Pearson Catronix Corporation, 151 Sixth STNW, Suite 100, Atlanta, GA 30313, USA The relation between theory and experiment in Information Science is the same as that in any other science. This relation is examined in some detail in order to provide a better understanding for designing experiments in Information Science. (INFORMATION PROCESSING AND MANAGEMENT, Vol 20, No. 1-2, pp. 239-241, 1984) 27. MATHEMATICAL MODELS OF TEXT Harold P. Edmundson Department of Computer Science, University of Maryland, College Park, MD 20742, USA An object of serious study in the information sciences is printed language, called text. This paper presents numerous examples of mathematical models of text and in so doing exposes some interesting results and problems associated with the linquistic, mathematical, and computational aspects of current research involving text. First, the notions of mathematical models and modeling are reviewed. Then the graphemic, morphological, syntactic, and semantic linquistic levels of text analysis are distinguished and discussed. Next, Numerous deterministic models and stochastic models of text are treted in some detail. Finally, thses research accomplishments in information science are summarized and future research is discussed. (INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 1-2, pp. 261-268, 1984). 28. FUZZY PROBABILITIES Lotfi A. Zadeh Computer Science Division, Department of Electrical Engineering and Computer Sciences and the Electronics Research Laboratory, University of California, Berkeley, CA 94720, USA. The conventional approaches to decision analysis are based on the assumption that the probabilities which enter into the assessment of the consequences of a decision are known numbers. In most realiastic settings, this assumption is of questionable validity since the data from which the probabilities must be estimated are usually incomplete, imprecise or not totally reliable. In the approach outlined in this paper, the probabilities are assumed to be fuzzy rather than real numbers. It is shown how such probabilities may be estimated from fuzzy data and a basic relation between joint, conditional and marginal fuzzy probabilities is established. Manipulation of fuzzy probabilities requires, in general, the use of fuzzy arithmetic, and many of the properties of fuzzy probabilities are simple generalization of the corresponding properties of real-valued probabilities. (INFORMATION PROCESSING AND MANAGEMENT. Vol. 20, No. 3, 363- 372, 1984). 29. ENTROPIES WITH AND WITHOUT PROBABILITIES APPLICATIONS TO QUESTIONNAIRES Bruno Forte Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1 Entropy is a basic quantity in Information Theory. As it measures the amount of uncertainty one has in an alternative, it is "conditional" upon all kinds of "information" that has been given. New motivations for measures of uncertainty and infomation are provided. A more natural interpretation of the entropies in the "mixed theory" and the entropies for a random vector is given. The proposed new approach in measuring uncertainty is illustrated with examples, in particular, from questionnaires theory. (INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 3, 397- 405, 1984). ------------------------------ END OF IRList Digest ********************