IRList Digest           Wednesday, 23 December 1987      Volume 3 : Issue 50

Today's Topics:
   Report - Vassar workshop on text encoding standard for the humanities

News addresses are
   Internet or CSNET: fox@vtopus.cs.vt.edu
   BITNET: foxea@vtvax3.bitnet

----------------------------------------------------------------------

Date: Wed, 2 Dec 87 22:50:46 est
From: amsler@flash.bellcore.com (Robert Amsler)
Subject: Text Encoding Standard for the Humanities - Vassar Workshop report

[The following is a summary prepared by Michael Sperberg-McQueen for
the HUMANIST mailing list of the first workshop on the preparation of
an encoding standard for text in the humanities held at Vassar
College last month. As an attendee and steering committee member, I
would be willing to answer further questions concerning this effort
for the IRLIST or NL-KR communities.  The effort to develop a standard for
encoding texts in the humanities is just starting and anyone with
interest in this noble and ambitious goal should not feel the
slightest hesitancy about becoming a part of the effort.  What is at
stake is nothing less than the creation, use and preservation of our
global electronic cultural heritage - R. Amsler, (amsler@flash.bellcore.com)]

Contributor: "Michael Sperberg-McQueen" <U18189@UICVM>

A followup on the current status of the ACH effort to formulate
guidelines for text encoding practices.

   ******************************************************************
   * NOTE: The following encoding conventions have been used to     *
   *       represent French accents throughout this message:        *
   *                                                                *
   *   To Represent Accents  --  Pour la representation des accents *
   *    /       acute accent - accent aigu                          *
   *    `       grave accent - accent grave                         *
   *                                                                *
   * The accent codes are typed    Les codes pour les accents se    *
   * AFTER the letter, and are     trouvent APRES la lettre qu'ils  *
   * used with both upper and      modifient, et s'utilisent avec   *
   * lower case letters.           les majuscules aussi bien que    *
   *                               les minuscules.                  *
   ******************************************************************


On November 12 and 13, 1987, 31 representatives of professional
societies, universities, and text archives met to consider the
possibility of developing a set of guidelines for the encoding of texts
for literary, linguistic, and historical research. The meeting was
called by the Association for Computers and the Humanities and funded
by the National Endowment for the Humanities.  The list of participants
is appended to this document.

The participants heartily endorsed the idea of developing encoding
guidelines. In order to guide such development, they agreed on
the following principles:


       The Preparation of                 Re/daction des directives
     Text Encoding Guidelines             pour le codage des textes

                         Poughkeepsie, New York
                            13 November 1987

1.  The guidelines are intended   1.  Le but des directives est de cre/er
    to provide a standard format      un format standard pour l'e/change
    for data interchange in           des donne/es utilise/es pour la
    humanities research.              recherche dans les humanite/s.

2.  The guidelines are also       2.  Les directives sugge/reront
    intended to suggest principles    e/galement des principes pour
    for the encoding of texts         l'enregistrement des textes
    in the same format.               destine/s a` utiliser ce format.

3.  The directives should         3.  Les directives devraient

  a.  define a recommended          a.  de/finir une syntaxe recommande/e
      syntax for the format             pour exprimer le format,

  b.  define a metalanguage         b.  de/finir un me/ta-langage
      for the description               de/crivant les syste`mes de
      of text-encoding schemes,         codage des textes,

  c.  describe the new format       c.  de/crire par le moyen de ce
      and representative                me/talangage, aussi bien qu'en
      existing schemes both in          prose, le nouveau syste`me de
      that metalanguage and             codage aussi bien qu'un choix
      in prose.                         repre/sentatif de syste`mes
                                        de/ja` en vigueur.

4.  The guidelines should         4.  Les directives devraient proposer
    propose sets of coding            des syste`mes de codage utilisables
    conventions suited for            pour un large e/ventail
    various applications.             d'applications.

5.  The guidelines should         5.  Sera incluse dans les directives
    include a minimal set of          l'e/nonciation d'un syste`me de
    conventions for encoding          codage minimum, pour guider
    new texts in the format.          l'enregistrement de nouveaux textes
                                      conforme/ment au format propose/.

6.  The guidelines are to be      6.  Le travail d'e/laboration des
    drafted by committees on:         directives sera confie/ a` quatre
                                      comite/s centre/s sur les sujets
                                      suivants:

  a.  text documentation            a.  la documentation des textes,

  b.  text representation           b.  la repre/sentation des textes,

  c.  text interpretation           c.  l'analyse et l'interpre/tation
      and analysis                      des textes

  d.  metalanguage definition       d.  la de/finition du me/talangage et
      and description of                son utilisation pour de/crire le
      existing and proposed             nouveau syste`me aussi bien que
      schemes                           ceux qui existent de/ja`.

    co-ordinated by a steering        Ce travail sera coordonne/ par un
    committee of representatives      comite/ d'organisation ou`
    of the principal                  sie`geront des repre/sentants des
    sponsoring organizations.         principales associations qui
                                      soutiennent cet effort.

7.  Compatibility with existing   7.  Dans la mesure du possible, le
    standards will be maintained      nouveau syste`me sera compatible
    as far as possible.               avec les syste`mes de codage
                                      existants.

8.  A number of large text        8.  Des repre/sentants de plusieurs
    archives have agreed in           grandes archives de textes en form
    principle to support the          lisible par machine acceptent en
    guidelines in their function      principe d'utiliser les directives
    as an interchange format.         en tant que description des formats
    We encourage funding agencies     pour l'e/change de leurs donne/es.
    to support development of         Nous encourageons les organismes
    tools to facilitate this          qui fournissent des fonds pour la
    interchange.                      recherche de soutenir le
                                      de/veloppement de ce qui est
                                      ne/cessaire pour faciliter cela.

9.  Conversion of existing        9.  En convertissant des textes
    machine-readable texts to         lisibles par machine de/ja`
    the new format involves the       existants, on remplacera
    translation of their              automatiquement leur codage actuel
    conventions into the syntax       par ce qui est ne/cessaire pour les
    of the new format.  No            rendre conformes au format nouveau.
    requirements will be made for     Nul n'exigera l'ajout
    the addition of information       d'informations qui ne sont pas
    not already coded in the          de/ja` repre/sente/es dans ces
    texts.                            textes.

                                         (trad. P. A. Fortier)

                            ******************

The further organization and drafting of the guidelines will be
supervised by a steering committee selected by the three sponsoring
organizations:  ACH (the Association for Computers and the Humanities),
ACL (the Association for Computational Linguistics), and ALLC (the
Association for Literary and Linguistic Computing).  Drafts of the
guidelines will be submitted for comment to an editorial committee with
representatives of all participating organizations (in addition to the
sponsors, thus far:  the Modern Language Association, the Association
for Computing Machinery Special Interest Group for Information
Retrieval, and the Association of American Publishers; the following
groups have indicated interest informally but have not yet formally
pledged participation, in most cases pending a formal vote: the
Linguistic Society of America, the Association for Documentary Editing,
the American Philological Association. The American Anthropological
Association, plus several organizations within Europe, are now being
asked to consider participation.

The interchange format defined by the guidelines is expected to be
compatible with the Standard Generalized Markup Language defined
by ISO 8859, if that proves compatible with the needs of research.  The
needs of specialized research interests will be addressed wherever it
proves possible to find interested groups or individuals to do the
necessary work and achieve the necessary consensus.  Formation of
specific working groups will be announced later; in the meantime, those
interested in working on specific problems are invited to contact
either Dr. C. M. Sperberg-McQueen, Computer Center, University of
Illinois at Chicago (M/C 135), P.O. Box 6998, Chicago IL 60680 (on
Bitnet: U18189 at UICVM), or Prof. Nancy Ide, Dept. of Computer
Science, Vassar College, Poughkeepsie NY 12601 (on Bitnet:  IDE at
VASSAR).

                                                 - N.I., C.M.S-McQ

------------------------------------------------------------------------------

                    List of Participants

  NOTE: Association names are given following the names of their
        representatives at this meeting.

   Helen Aguera, National Endowment for the Humanities
   Robert A. Amsler, Bell Communications Research
   David T. Barnard, Department of Computing and Information Science,
      Queen's University, Ontario
   Lou Burnard, Oxford Text Archive
   Roy Byrd, IBM Research
   Nicoletta Calzolari, Istituto di linguistica computazionale, Pisa
   David Chestnutt  (Assoc. for Documentary Editing, American Historical
      Assoc.), Department of History, University of South Carolina
   Yaacov Choueka (Academy of the Hebrew Language), Department of
      Mathematics and Computer Science, Bar-Ilan University
   Jacques Dendien, Institut National de la Langue Francaise
   Paul A. Fortier, Department of Romance Languages, University of
      Manitoba
   Thomas Hickey, OCLC Online Computer Library Center
   Susan Hockey  (Association for Literary and Linguistic Computing),
      Oxford University Computing Service
   Nancy M. Ide (Association for Computers and the Humanities),
      Department of Computer Science, Vassar College
   Stig Johansson, International Computer Archive of Modern English,
      University of Oslo
   Randall Jones  (Modern Language Association), Humanities Research
      Computing Center, Brigham Young University
   Robert Kraft, Center for the Computer Analysis of Texts, University of
      Pennsylvania
   Ian Lancashire, Center for Computing in the Humanities, University of
      Toronto
   D. Terence Langendoen (Linguistic Society of America), Graduate
      Center, City University of New York
   Charles (Jack) Meyers, National Endowment for the Humanities
   Junichi Nakamura, Department of Electrical Engineering, Kyoto
      University
   Wilhelm Ott, Universitaet Tuebingen
   Eugenio Picchi, Istituto di linguistica computazionale, Pisa
   Carol Risher (American Association of Publishers), American
      Association of Publishers, Inc.
   Jane Rosenberg, National Endowment for the Humanities
   Jean Schumacher, Centre de traitement e/lectronique de textes,
      Universite/ catholique de Louvain a` Louvain-la-neuve
   J. Penny Small (American Philological Association), U.S. Center for
      the Lexicon Iconographicum Mythologiae Classicae, Rutgers
      University
   C.M. Sperberg-McQueen, Computer Center, University of Illinois at
      Chicago
   Paul Tombeur, Centre de traitement e/lectronique de textes,
      Universite/ catholique de Louvain a` Louvain-la-neuve, Belgium
   Frank Tompa, New Oxford English Dictionary Project, University of
      Waterloo
   Donald E. Walker (Association for Computational Linguistics), Bell
      Communications Research
   Antonio Zampolli, Istituto di linguistica computazionale, Pisa, Italy

------------------------------

END OF IRList Digest
********************