IRList Digest Wednesday, 23 December 1987 Volume 3 : Issue 50 Today's Topics: Report - Vassar workshop on text encoding standard for the humanities News addresses are Internet or CSNET: fox@vtopus.cs.vt.edu BITNET: foxea@vtvax3.bitnet ---------------------------------------------------------------------- Date: Wed, 2 Dec 87 22:50:46 est From: amsler@flash.bellcore.com (Robert Amsler) Subject: Text Encoding Standard for the Humanities - Vassar Workshop report [The following is a summary prepared by Michael Sperberg-McQueen for the HUMANIST mailing list of the first workshop on the preparation of an encoding standard for text in the humanities held at Vassar College last month. As an attendee and steering committee member, I would be willing to answer further questions concerning this effort for the IRLIST or NL-KR communities. The effort to develop a standard for encoding texts in the humanities is just starting and anyone with interest in this noble and ambitious goal should not feel the slightest hesitancy about becoming a part of the effort. What is at stake is nothing less than the creation, use and preservation of our global electronic cultural heritage - R. Amsler, (amsler@flash.bellcore.com)] Contributor: "Michael Sperberg-McQueen" A followup on the current status of the ACH effort to formulate guidelines for text encoding practices. ****************************************************************** * NOTE: The following encoding conventions have been used to * * represent French accents throughout this message: * * * * To Represent Accents -- Pour la representation des accents * * / acute accent - accent aigu * * ` grave accent - accent grave * * * * The accent codes are typed Les codes pour les accents se * * AFTER the letter, and are trouvent APRES la lettre qu'ils * * used with both upper and modifient, et s'utilisent avec * * lower case letters. les majuscules aussi bien que * * les minuscules. * ****************************************************************** On November 12 and 13, 1987, 31 representatives of professional societies, universities, and text archives met to consider the possibility of developing a set of guidelines for the encoding of texts for literary, linguistic, and historical research. The meeting was called by the Association for Computers and the Humanities and funded by the National Endowment for the Humanities. The list of participants is appended to this document. The participants heartily endorsed the idea of developing encoding guidelines. In order to guide such development, they agreed on the following principles: The Preparation of Re/daction des directives Text Encoding Guidelines pour le codage des textes Poughkeepsie, New York 13 November 1987 1. The guidelines are intended 1. Le but des directives est de cre/er to provide a standard format un format standard pour l'e/change for data interchange in des donne/es utilise/es pour la humanities research. recherche dans les humanite/s. 2. The guidelines are also 2. Les directives sugge/reront intended to suggest principles e/galement des principes pour for the encoding of texts l'enregistrement des textes in the same format. destine/s a` utiliser ce format. 3. The directives should 3. Les directives devraient a. define a recommended a. de/finir une syntaxe recommande/e syntax for the format pour exprimer le format, b. define a metalanguage b. de/finir un me/ta-langage for the description de/crivant les syste`mes de of text-encoding schemes, codage des textes, c. describe the new format c. de/crire par le moyen de ce and representative me/talangage, aussi bien qu'en existing schemes both in prose, le nouveau syste`me de that metalanguage and codage aussi bien qu'un choix in prose. repre/sentatif de syste`mes de/ja` en vigueur. 4. The guidelines should 4. Les directives devraient proposer propose sets of coding des syste`mes de codage utilisables conventions suited for pour un large e/ventail various applications. d'applications. 5. The guidelines should 5. Sera incluse dans les directives include a minimal set of l'e/nonciation d'un syste`me de conventions for encoding codage minimum, pour guider new texts in the format. l'enregistrement de nouveaux textes conforme/ment au format propose/. 6. The guidelines are to be 6. Le travail d'e/laboration des drafted by committees on: directives sera confie/ a` quatre comite/s centre/s sur les sujets suivants: a. text documentation a. la documentation des textes, b. text representation b. la repre/sentation des textes, c. text interpretation c. l'analyse et l'interpre/tation and analysis des textes d. metalanguage definition d. la de/finition du me/talangage et and description of son utilisation pour de/crire le existing and proposed nouveau syste`me aussi bien que schemes ceux qui existent de/ja`. co-ordinated by a steering Ce travail sera coordonne/ par un committee of representatives comite/ d'organisation ou` of the principal sie`geront des repre/sentants des sponsoring organizations. principales associations qui soutiennent cet effort. 7. Compatibility with existing 7. Dans la mesure du possible, le standards will be maintained nouveau syste`me sera compatible as far as possible. avec les syste`mes de codage existants. 8. A number of large text 8. Des repre/sentants de plusieurs archives have agreed in grandes archives de textes en form principle to support the lisible par machine acceptent en guidelines in their function principe d'utiliser les directives as an interchange format. en tant que description des formats We encourage funding agencies pour l'e/change de leurs donne/es. to support development of Nous encourageons les organismes tools to facilitate this qui fournissent des fonds pour la interchange. recherche de soutenir le de/veloppement de ce qui est ne/cessaire pour faciliter cela. 9. Conversion of existing 9. En convertissant des textes machine-readable texts to lisibles par machine de/ja` the new format involves the existants, on remplacera translation of their automatiquement leur codage actuel conventions into the syntax par ce qui est ne/cessaire pour les of the new format. No rendre conformes au format nouveau. requirements will be made for Nul n'exigera l'ajout the addition of information d'informations qui ne sont pas not already coded in the de/ja` repre/sente/es dans ces texts. textes. (trad. P. A. Fortier) ****************** The further organization and drafting of the guidelines will be supervised by a steering committee selected by the three sponsoring organizations: ACH (the Association for Computers and the Humanities), ACL (the Association for Computational Linguistics), and ALLC (the Association for Literary and Linguistic Computing). Drafts of the guidelines will be submitted for comment to an editorial committee with representatives of all participating organizations (in addition to the sponsors, thus far: the Modern Language Association, the Association for Computing Machinery Special Interest Group for Information Retrieval, and the Association of American Publishers; the following groups have indicated interest informally but have not yet formally pledged participation, in most cases pending a formal vote: the Linguistic Society of America, the Association for Documentary Editing, the American Philological Association. The American Anthropological Association, plus several organizations within Europe, are now being asked to consider participation. The interchange format defined by the guidelines is expected to be compatible with the Standard Generalized Markup Language defined by ISO 8859, if that proves compatible with the needs of research. The needs of specialized research interests will be addressed wherever it proves possible to find interested groups or individuals to do the necessary work and achieve the necessary consensus. Formation of specific working groups will be announced later; in the meantime, those interested in working on specific problems are invited to contact either Dr. C. M. Sperberg-McQueen, Computer Center, University of Illinois at Chicago (M/C 135), P.O. Box 6998, Chicago IL 60680 (on Bitnet: U18189 at UICVM), or Prof. Nancy Ide, Dept. of Computer Science, Vassar College, Poughkeepsie NY 12601 (on Bitnet: IDE at VASSAR). - N.I., C.M.S-McQ ------------------------------------------------------------------------------ List of Participants NOTE: Association names are given following the names of their representatives at this meeting. Helen Aguera, National Endowment for the Humanities Robert A. Amsler, Bell Communications Research David T. Barnard, Department of Computing and Information Science, Queen's University, Ontario Lou Burnard, Oxford Text Archive Roy Byrd, IBM Research Nicoletta Calzolari, Istituto di linguistica computazionale, Pisa David Chestnutt (Assoc. for Documentary Editing, American Historical Assoc.), Department of History, University of South Carolina Yaacov Choueka (Academy of the Hebrew Language), Department of Mathematics and Computer Science, Bar-Ilan University Jacques Dendien, Institut National de la Langue Francaise Paul A. Fortier, Department of Romance Languages, University of Manitoba Thomas Hickey, OCLC Online Computer Library Center Susan Hockey (Association for Literary and Linguistic Computing), Oxford University Computing Service Nancy M. Ide (Association for Computers and the Humanities), Department of Computer Science, Vassar College Stig Johansson, International Computer Archive of Modern English, University of Oslo Randall Jones (Modern Language Association), Humanities Research Computing Center, Brigham Young University Robert Kraft, Center for the Computer Analysis of Texts, University of Pennsylvania Ian Lancashire, Center for Computing in the Humanities, University of Toronto D. Terence Langendoen (Linguistic Society of America), Graduate Center, City University of New York Charles (Jack) Meyers, National Endowment for the Humanities Junichi Nakamura, Department of Electrical Engineering, Kyoto University Wilhelm Ott, Universitaet Tuebingen Eugenio Picchi, Istituto di linguistica computazionale, Pisa Carol Risher (American Association of Publishers), American Association of Publishers, Inc. Jane Rosenberg, National Endowment for the Humanities Jean Schumacher, Centre de traitement e/lectronique de textes, Universite/ catholique de Louvain a` Louvain-la-neuve J. Penny Small (American Philological Association), U.S. Center for the Lexicon Iconographicum Mythologiae Classicae, Rutgers University C.M. Sperberg-McQueen, Computer Center, University of Illinois at Chicago Paul Tombeur, Centre de traitement e/lectronique de textes, Universite/ catholique de Louvain a` Louvain-la-neuve, Belgium Frank Tompa, New Oxford English Dictionary Project, University of Waterloo Donald E. Walker (Association for Computational Linguistics), Bell Communications Research Antonio Zampolli, Istituto di linguistica computazionale, Pisa, Italy ------------------------------ END OF IRList Digest ********************