From vtisr1!irlistrq Sun Sep  7 17:30:50 1986
Date: Sun, 7 Sep 86 17:30:43 edt
From: vtisr1!irlistrq
To: fox
Subject: IRList Digest V2 #40
Status: R

IRList Digest           Sunday, 7 September 1986      Volume 2 : Issue 40

Today's Topics:
   Query - Mail list digest indexing?
   Announcement - Oxford Text Archive shortlist, new acquisitions
   Call for Papers - IFIP WG6.5 Conf. on Message Handling Systems
   Abstracts - Appearing in latest issue of ACM SIGIR Forum, Part 3

News addresses are ARPANET: fox%vt@csnet-relay.arpa  BITNET: foxea@vtvax3.bitnet
   CSNET: fox@vt   UUCPNET: seismo!vtisr1!irlistrq
----------------------------------------------------------------------

Date: Sat, 30 Aug 86 00:43:45 edt
From: Ewgorc@CS.UCL.AC.UK
Subject: indexing of mailing-list items

mailing list digest indexing
Dear Mr. Fox,

I've just found out from the moderator of the AIList that you have built
a system for the automatic indexing of mailing-list items. I am currently
implementing a similar system for an M.Sc. project, and I would be very
grateful if you could tell me a little about your system, as a comparison
would be useful as part of my project report. (The only similar system
I'd heard about before was one designed by an M.Sc. student at
Queen Mary College, University of London : this was very much oriented
towards reserarch into user-modelling, and was never actually implemented.)

My system is designed to run under UNIX on a Sun workstation.
It first identifies the mailing-list to which each new message belongs,
copying them into an appropriate directory (and splitting digests into
several files, one for each item or for "Today's Topics").

An index file for each is then produced : indexing terms are taken from
profiles set up by potential readers, and from a dummy profile maintained
by the system administrator which will be particularly useful in the early
days of the system before many readers have created large profiles.
The indexing process is based on "fgrep" : the number of times each term
occurs in the text of the message is noted (terms occurring in the "Subject"
header are given an especially high score, but other header lines are
disregarded.)

Each reader's profile is then compared to the index files,
adding up the scores for each term which occurs in both the profile and
the index of a message, and then dividing by the number of lines in the
text, to obtain the interest value of the message to that reader : messages
for which the interest value is greater than or equal to a
threshold (which may be given in the profile, although a default is
provided) are mailed to a file associated with the reader, using
details of the match - the interest value and a list of the matched
terms - as a new Subject header.

The three main programs of the system are written in C : although much
of the work could possibly have ben done by "grep", "awk" and similar
utilities, I felt C code would be more maintainable (e.g. once the
basic system is in operation, I would like to allow readers to link
terms together by "and" and "or").

I would be interested to know if my system differs radically from yours
in any way, particularly in the algorithms used for indexing and
matching. For example, does your system perform any semantic analysis
of messages rather than just string-matching, and if so how?

Thanks in advance for any information you can give me.

Tim Miles
BP Research Centre, Sunbury-on-Thames, England.

[Tim: Sounds like an interesting and useful project. Yes, my system is
different. I will try to fill in a few comments and give pointers to
related works - other readers are invited to add to this discussion.
1) Thomas Malone at MIT has worked on the "Information Lens" which
  deals with mail handling and profiles.
2) Michael Mauldin (see IRList V2 #25, 30) is working on FERRET which
  is an application and extension of methods like FRUMP to mail
  indexing and retrieval.
3) The SMART system at Cornell and Virginia Tech has been used to
  index and retrieve collections of mail and mail digest messages.
4) The CODER system is under development at Virginia Tech to use AI
  methods to analyze and retrieve (parts of) messages from an archive
  of messages from AIList Digest.  A brief discussion appears in IRList
  V2 #26, and there have been several conference papers and technical
  reports on it.
5) Your approach is like several SDI (selective dissemination of
  information) systems that matched profiles against new files of
  recent abstracts.
6) Some online interactive systems for searching archives have recently
  become available on networks like BITNET.
7) R. Korfhage has recently done work on profiles and queries in
  information retrieval.

Hope this is the kind of info you wanted. Regards, Ed]

------------------------------

Date:     29-AUG-1986 14:33:03
From:     LOU%UK.AC.OXFORD.VAX1@AC.UK
Subject:  Oxford Text Archive news

OXFORD TEXT ARCHIVE

A new edition of the Oxford Text Archive Shortlist was published in
August 1986. Send your name and address for a free copy! (second and subsequent
copies cost $3 each)

Recent acquisitions include TWO diffent parsed/structured versions of the
Oxford Advanced Lwearners Dictionary, one produced by Roger Mitton at
Birkbeck College; the other by Rick Kazman at University of Waterloo.
Both versions are available under the OTA's standard conditions of use.

------------------------------

Subject: Call for Papers WG 6.5
From: Peter Schicker <schicker%ean.cs.nott.ac.uk@cs.ucl.ac.UK>
Date: 13 Jun 86 7:55 -0100 BST

[Forwarded msg below was sent to me for redistribution - Ed]

Hugh and Stef:
Could you post the following call for papers on the US and European nets,
please.

             CALL  FOR  PAPERS

      IFIP WG 6.5 International Working Conference on

              MESSAGE HANDLING SYSTEMS
          (State of the Art and Future Directions)

             27 to 29 April 1987
                   Munich
            Fed. Rep. of Germany

Program:
The purpose of the conference is to provide an international forum for the
exchange of information on the technical, economic, social, and political
impacts of computer message and office systems. The conference format will
be two days of conference paper presentations followed by one day of work-
shops.


Papers are desired in the following topic areas:

MHS Interconnection and Interworking
  Interconnection of X.400 Systems (Private and public)
  Gateways to X.400 Systems
  X.400 Shell to non-X.400 Systems
  Interworking between X.400 and the Postal System
  Interworking with other Architectures (e.g., DIA/DCA, All-In-1, etc.)
  Multi-Vendor Private Message Systems

Documents and Messages
  Document and Message Architectures
  Multimedia Documents and Messages
  Graphics (GKS) vs. Facsimile
  Communication of Business Forms and Trade Documents

Directory Services
  Naming and Addressing
  Public Directory Systems
  Interworking between Public and Private Directory Systems

New Access Protocols
  Mailbox Services
  Extensions to X.400 Series Recommendations

Message Management
  Personal Message Management
  Message and Document Filing and Retrieval

Group Communication
  Distribution Lists
  Organization of Message Flow
  Real-Time Conferencing
  Models for Group Communication

Workstations and User Interface
  Workstation and Cluster Design
  Backup and Archiving
  User Interface Issues
  Message Editing

Security Aspects
  Authentication
  Confidentiality

Impacts of MHS
  Social and Behavioral Impacts
  Impacts on Organizations
  Impacts on Nations
  Inpacts on Relieving Impairment

Policy Issues
  Public Policy Issues in MHS
  Transborder Data Flow
  Legal Status of MHS
  Privacy and Confidentiality

- ------------------------------------------------------------------------

Instructions to Authors:

Prospective Authors are invited to submit for review unpublished original
contributions (not exceeding 5000 words) which describe recent developments
on any design or service aspect of computer message systems.

Accepted papers will appear in the Conference Proceedings published by
North-Holland Publishing Company.

Deadlines:

Today               Send a postcard with your name, telephone, and EMail
                    address to:
                       Message Systems '87
                       Mrs. Stenzel
                       Siemens AG
                       D-AP.11
                       Otto Hahn Ring 6
                       D-8000 Munich 83
                       Fed. Rep. of Germany
                    This will ensure that you will receive further information
                    about the conference. Please indicate also the provisio-
                    nal title if you intend to submit a paper.
Sept. 30, 1986      Draft versions of papers required
Nov. 30, 1986       Notification of acceptance
Jan. 31, 1987       Camera-ready papers required

Papers should be submitted to:

Peter Schicker
Zellweger Telecommunications AG
CH-8634 Hombrechtikon
Switzerland

------------------------------

Date:         Wed, 23 Jul 1986 13:06 CST
From:         Vijay V. Raghavan <RAGHAVAN@UREGINA1.bitnet>
Subject:      SIGIR FORUM Abstracts [Part 3 - Ed]

[Note: Members of ACM SIGIR should have received the spring/summer
 Forum, and can find these on pages 37-39. The rest will appear in
 machine readable form also in a later issue of IRList. - Ed]

                            ABSTRACTS

(Chosen by G.  Salton or V. Raghavan from 1984 issues of journals
in the retrieval area)

23. RANKING TECHNIQUES AND THE EMPIRICAL LOG LAW

    Bertram C. Brookes
    64 Abbots Gardens, London N2 0JH, England

    Four  empirical  laws of bibliometrics - those  of  anomalous
    numbers, of Lotka, Zipf and Bradford, together with Laplace's
    notorious "law of succession" and de Solla Price's cumulative
    advantage  distribution,  are shown to be  almost  identical.
    Some  of these laws are expressed as frequency distributions,
    some   are   frequency-ranked.     A   simple   model   which
    discriminates  these various forms is  described.   It  shows
    that  the frequency forms conform with an inverse square  law
    over  the  appropriate interval and that the equivalent  rank
    distribution - the Log Law - has the Df

         Q(r) = log b (r + 1)

    where b is the rank interval.

    It  is  further shown that  frequency  distributions  discard
    empirical  statistical information which the equivalent  rank
    distributions   retain   for   analysis.     So   that   rank
    distributions of theoretical advantages in this field.

    The  paper  concludes  with comments on the analysis  of  the
    empirical  hybrid forms which arise.   The reduction  of  the
    above  laws,  empirical and hypothetical,  to a single law is
    achieved by NOT equating the ordinals 1st,  2nd,  3rd,...  to
    the numbers 1, 2, 3,... as is commonly done.

    (Information Processing & Management,  Vol.  20, No. 1-2, pp.
    37-46, 1984).


24. HUMAN INFORMATION SEEKING AND DESIGN OF INFORMATION SYSTEMS

    William B. Rouse and Sandra H. Rouse
    Center for Man-Machine Systems Research, Georgia Institute of
    Technology, Atlanta, GA 30332, USA

    The  literature of psychology,  library science,  management,
    computer  science,  and systems engineering is  reviewed  and
    integrated  into an overall perspective of human  information
    seeking and the design of information systems.  The nature of
    information  seeking  is considered in terms of its  role  in
    decision  making  and problem solving,  the dynamics  of  the
    process,  and the value of information.  Discussions of human
    information  seeking  focus on basic  psychological  studies,
    effects  of  cognitive style,  and models of human  behavior.
    Design  issues considered include attributes  of  information
    systems,  analysis of information needs, aids for information
    seeding, and evaluation of information systems.

    INFORMATION PROCESSING AND MANAGEMENT,  Vol.  20, No. 1-2, pp
    129-138, 1984.


25. EXPERIMENTAL  AND QUASI-EXPERIMENTAL DESIGNS FOR RESEARCH  IN
    INFORMATION SCIENCE

    David F. Haas and Donald H. Kraft
    Department  of Computer Science,  Louisiana State University,
    Baton Rouge, LA 70803, USA

    This  is  a  paper  about  research  designs  in  information
    science.  We look at a sample of current research and compare
    its  designs with an abstract ideal of experimental  research
    design  to  see how closely they  approximate  it.   We  then
    consider ways in which research in our field might be brought
    closer  to  the ideal.   We do this because we  believe  that
    experimental  and  quasi-experimental  designs  offer  unique
    advantages  over other research designs,  especially  in  the
    production  of knowledge that can be applied to the  solution
    of practical problems in information and in software science.

    (INFORMATION  PROCESSING AND MANAGEMENT  Vol.  20,  No.  1-2,
    pp. 229-237, 1984).


26. THE  RELATION BETWEEN THEORY AND METHODOLOGY  FOR   DESIGNING
    EXPERIMENTS IN INFORMATION SCIENCE

    Charles Pearson
    Catronix Corporation,  151 Sixth STNW, Suite 100, Atlanta, GA
    30313, USA

    The  relation  between theory and experiment  in  Information
    Science  is  the  same as that in any  other  science.   This
    relation  is  examined in some detail in order to  provide  a
    better understanding for designing experiments in Information
    Science.

    (INFORMATION PROCESSING AND MANAGEMENT,  Vol 20, No. 1-2, pp.
    239-241, 1984)


27. MATHEMATICAL MODELS OF TEXT

    Harold P. Edmundson
    Department  of  Computer  Science,  University  of  Maryland,
    College Park, MD  20742, USA

    An  object  of serious study in the information  sciences  is
    printed language,  called text.  This paper presents numerous
    examples  of  mathematical  models of text and  in  so  doing
    exposes some interesting results and problems associated with
    the  linquistic,  mathematical,  and computational aspects of
    current research involving text.

    First,  the  notions of mathematical models and modeling  are
    reviewed.   Then the graphemic, morphological, syntactic, and
    semantic linquistic levels of text analysis are distinguished
    and  discussed.   Next,  Numerous  deterministic  models  and
    stochastic  models  of  text  are  treted  in  some   detail.
    Finally,   thses   research  accomplishments  in  information
    science are summarized and future research is discussed.

    (INFORMATION PROCESSING AND MANAGEMENT, Vol. 20, No. 1-2, pp.
    261-268, 1984).



28. FUZZY PROBABILITIES

    Lotfi A. Zadeh
    Computer   Science   Division,   Department   of   Electrical
    Engineering   and  Computer  Sciences  and  the   Electronics
    Research Laboratory,  University of California,  Berkeley, CA
    94720, USA.

    The conventional approaches to decision analysis are based on
    the  assumption that the probabilities which enter  into  the
    assessment  of  the  consequences  of a  decision  are  known
    numbers.   In most realiastic settings, this assumption is of
    questionable   validity  since  the  data  from   which   the
    probabilities  must  be  estimated  are  usually  incomplete,
    imprecise or not totally reliable.

    In the approach outlined in this paper, the probabilities are
    assumed  to be fuzzy rather than real numbers.   It is  shown
    how such probabilities may be estimated from fuzzy data and a
    basic relation between joint,  conditional and marginal fuzzy
    probabilities   is   established.    Manipulation  of   fuzzy
    probabilities  requires,   in  general,   the  use  of  fuzzy
    arithmetic, and many of the properties of fuzzy probabilities
    are simple generalization of the corresponding properties  of
    real-valued probabilities.

    (INFORMATION PROCESSING AND MANAGEMENT.  Vol. 20, No. 3, 363-
    372, 1984).

29. ENTROPIES WITH AND WITHOUT PROBABILITIES
    APPLICATIONS TO QUESTIONNAIRES

    Bruno Forte
    Department  of Applied Mathematics,  University of  Waterloo,
    Waterloo, Ontario, Canada   N2L 3G1

    Entropy  is  a basic quantity in Information Theory.   As  it
    measures the amount of uncertainty one has in an alternative,
    it is "conditional" upon all kinds of "information" that  has
    been given.   New motivations for measures of uncertainty and
    infomation  are provided.   A more natural interpretation  of
    the  entropies in the "mixed theory" and the entropies for  a
    random  vector  is  given.   The  proposed  new  approach  in
    measuring  uncertainty  is  illustrated  with  examples,   in
    particular, from questionnaires theory.

    (INFORMATION PROCESSING AND MANAGEMENT,  Vol. 20, No. 3, 397-
    405, 1984).

------------------------------

END OF IRList Digest
********************