Date: Tue, 17 Sep 85 18:41 EST To: irdis at vpi Subject: IRList Digest V1 #9 IRList Digest Tuesday, 17 Sep 1985 Volume 1 : Issue 9 Today's Topics: EMAIL - Back issues, help needed Query - Relationship between Videotex and IR - Takers for offer of Tax Expertise Available for Expert System Announcement - Seminar on Speech Recognition and NL Processing (BBN) Article - The Inter-Network Database Server ---------------------------------------------------------------------- From: Rabjohns.Henr@XEROX Date: 4 Sep 85 08:32:07 EDT (Wednesday) ... The IRList digest sounds like a good one to me, would it be possible to become a member of the dl and also could you send me any back issues of the digest that you might have already generated. Thanks in advance, Douglas T. Rabjohns [Note: Virginia Tech is a member of CSNET, connected through Phonenet. We are polled twice daily for mail. We cannot FTP nor can other sites directly access our computers over Internet. Would any ARPAnet sites like to keep an archive of IRList messages so that others can FTP? It would be much easier for those who missed issues to get them that way. Also, at the present time, sending back issues repeatedly is very expensive for my department. Thanks, Ed] ------------------------------ From: Tom Scott Date: Thu, 5 Sep 85 12:32:28 edt Subject: The Relationship between Videotex and Information Retrieval This month's "Hardcopy" (September 1985, p.18) has a half-page article on the relationship between videotex and information retrieval which may be of interest to the readers of IRList. The author, Pam Jones, quotes Leslie Townsend of International Resource Development as follows: Videotex has been called a technology, when it's really not so much a technology as it is an information retrieval method .... The line between what were software packages for information retrieval and hardware for videotex systems are [sic] essentially becoming negligible. I'd like to hear more about this. Perhaps someone would submit and article to IRList, detailing the relationship between videotex and information retrieval. What exactly is videotex? What are its theoretical basis, practical applications, and technology? How do the theory, practice, and technology of videotex compare and contrast to the theory, practice, and technology of information retrieval? Tom Scott CSNET: scott@bgsu Dept. of Math. & Stat. ARPANET: scott%bgsu@csnet-relay Bowling Green State Univ. UUCP: osu-eddie!bgsuvax!scott Bowling Green OH 43403-0221 ATT: 419-372-2636 ------------------------------ Date: Thu 29 Aug 85 13:26:29-CDT From: Charles Petrie Subject: Tax Expertise Available for Expert System [Copied from AIList Digest V3 N 116 - 2 Sept] Prof. Lewis Solomon is a specialist in tax law and is interested in working with someone on an expert system in that domain. He would also like to hear about existing systems. His US Mail address is: George Washington University National Law Center Washington, D.C. 20052 Phone #(202)676-6753 ------------------------------ Date: 20 Aug 1985 16:08-EDT From: AHAAS at BBNG.ARPA Subject: Seminars - Speech Recognition and NL Processing (BBN) [Copied from AIList Digest V3 N 117 - 3 Sept, where it was:] [Forwarded from the MIT bboard by SASW@MIT-MC.] There will be an AI seminar on Monday August 26 at 10:30 in the second floor conference room at 10 Moulton St. Jean-Francois Cloarec and Michel Gilloux of Centre Nationale d'etudes des Telecommunications (CNET), Lannion, France will speak. Their abstract: SERAC : An Expert System for Acoustic-Phonetic Speech Recognition We present a knowledge based approach to speech recognition at the phonetic level. SERAC is a production system generating phonetic hypotheses for continuously spoken french sentences. We give the motivations for using such an approach and we describe the knowledge representation language. Then we present the knowledge base and report some preliminary results. There will be another talk by Karen Sparck Jones the next morning, August 27th, at 10:00 in the 2nd floor conference room. Her abstract: Natural Language Processing Research at the Computer Laboratory, University of Cambridge The talk will outline recent and current work at the Laboratory. This includes both research with a semantic stimulus and research driven by parsing issues. The semantic work is concerned with interpretation problems like reference resolution, and with techniques for representation and inference involving general as well as domain knowledge, in the context of such tasks as database query and construction, paraphrase, and indexing. The parsing work includes projects on grammar construction, morphological analysis, and the use of a large machine-readable dictionary, and research on finite state techniques for compositional interpretation and on robust phrase-based parsing strategies. ------------------------------ From: Henry Nussbacher Date: Fri, 30 Aug 85 10:39 O How does the Inter-Network Database Server Work Henry Nussbacher This article will attempt to describe all the components that go into the database server that is currently running on host BITNIC in Bitnet. The work on developing this inter-network database server is being funded by the Bitnet Development and Operations Center (BITDOC). DATABASE is an information retrieval system that will allow users from any network to have access to various types of information contained within a full-blown database system. DATABASE is a server machine (daemon) that runs on the VM operating system at BITNIC. It can accept commands in many different ways. Users within Bitnet can send interactive messages, punch files (record length of 80), print files (record length of 133), and Note files (IBM standard for electronic mail). In addition, users that are not located within Bitnet, but reside in any of the other networks that are connected to the Internet (Mailnet, Arpanet, Csnet, UUCP, etc.) can send RFC822 mail to DATABASE%BITNIC.BITNET@WISCVM.ARPA and the server will accept it as a command. Language used ============= DATABASE is written in Rexx (approximately 1600 lines), a high level macro language for VM. Rexx is a combination of Algol (parsing capabilities), C and Pascal (structured programming - Do, Do While, Do Until, Do Forever, Select- When-Otherwise) and PL/1 (functions - Index, Verify, Substr, Translate, Justify, etc.). Rexx has developed quite a following within the hackers that use Bitnet since it was created using the best properties of existing high level languages while leaving out the parts that everyone hates (e.g. declares - all Rexx variables are self declaring, etc.). By definition, Rexx has access to the VM file system. What needed to be created was hooks into the VM mail system and into a VM database system called Spires. Interface to Bitnet mail system =============================== It was decided that the most common aspect among all computer networks in the world was RFC822 mail. Upon this basis, the mail interface was developed. It is a separate subroutine within DATABASE, so that when X.400 mail becomes more accepted, additional coding can be done without affecting any of the other segments of the code. Most Bitnet sites run a package developed at Columbia University which is another system server to handle mail files. It performs all the validity checking and routing of mail. When a mail file arrives at DATABASE, it is parsed to find out from where it came. The first non-blank line after the RFC822 header starts the command stream. Multiple commands can be coded within one mail file for delivery to DATABASE. The commands are passed into Spires for handling and the results are stuffed into the VM file system. DATABASE then takes the resultant file and places it inside an RFC822 envelope and sends it to the mailer server for handling. If the "From:" field that was sent is invalid (i.e. xyzzy@Mitre-Bedford instead of the correct form of xyzzy@Mitre-Bedford.ARPA) then the mailer server will kick out the mail since Bitnet cannot determine where to send the mail back to. The Spires Database System ========================== There are many database packages that are available for VM systems: Focus, SQL, Adabas, etc. Spires (Stanford Public Information Retrieval System) was selected due to its functionality and the its ability to accept commands from Rexx. Spires can locate a single record within a database of half a million records after only 4 disk reads (maximum). Spires is an index based database system. The definer can define indices for any field that he/she so wishes. Spires is the result of over 10 years of development at Stanford University and has such advanced functions as phonetic search capability as well as all the standard items one expects to find in a database package (report generator, sequential processing, etc.) Arpanet Digests =============== One of the first projects was the incorporation of selected Arpanet digests into the DATABASE system. The auto-digest loader is written in Rexx (approximately 800 lines) and has tables to control which digests to accept and properties they contain. Certain digests are digested (examples: Ai- List, Info-Ibmpc) and some are rebroadcasted immediately (examples: Info-Nets, Security). Digested digests fall into two categories. Some follow the standard of having exactly 30 '-' (hyphens) separating individual entries along with 70 hyphens separating the table of contents from the individual entries. These digests, in addition are sequenced and have their first line being a title line for the digest. Examples of these digests would be Ai-List, Info-Ibmpc, Info-Kermit, and Sf-Lovers. The other example would be digested digests that are not sequenced and that do not have a title line. An example of this style of digest is Info- Graphics. As Arpanet forums arrive, they are parsed into their individual entries and added to the appropriate database subfile. This process of database addition is performed independently of the functioning of the DATABASE server. Further documentation ===================== To receive a detailed list of the valid command structure as accepted by DATABASE, send an RFC822 piece of mail to the address as stated above with the single line of HELP. You should receive in return an RFC822 piece of mail with introductory documentation on how to use DATABASE. In order to receive further information on Arpanet digest searching, issue the command HELP ARPANET. Appendix - HELP ARPANET ======== ============ This service (Arpanet digests) will be available as of mid-October 1985. DATABASE - Bitnet Inter-network Database Server (last updated 08/16/85) -------- This Inter-network database server is currently under development by the Development and Operations Center (BITDOC) of Bitnet. Suggestions and comments should be forwarded to: Henry Nussbacher Bitnet: HANK@BITNIC Internet: HANK%BITNIC.BITNET@WISCVM.ARPA ------------------------------------------------------------------------- How to search Arpanet digests ============================= This document is meant for advanced users who have mastered the beginning help file. For those who don't know what Arpanet digests are: There are currently over 100 discussion forums that are maintained within Arpanet. These range from discussions about the Apple MacIntosh to new standards for networks to information about computer security. Some of these discussions appear as a digest; a moderator receives all contributions and creates a formatted digest that is sent out to all subscribers. The other form of discussion is an immediate redistribution list, where contributions to the discussion are immediately rebroadcasted to all individuals who have registered for that discussion. DATABASE now has the facility to search various selected Arpanet digests. In order to receive a list of all valid subfiles, issue the command 'LIST': DRINKS Demonstration subfile EXPLAIN Database System subfile MOVIES Demonstration subfile PATHFINDER Database System subfile PRESIDENTS Demonstration subfile RECIPES Demonstration subfile RESTAURANT Demonstration subfile INFO-GRAPHICS Arpanet discussion forum (digest) AI-LIST Arpanet discussion forum (digest) INFO-NETS Arpanet discussion forum (digest) Arpanet "digested" digests, when loaded into the database, are pulled apart into their individual components, so that when you perform a search against a particular "digested" digest, you will not receive the entire digest but rather just the entry that pertains to your search request. The following fields are defined as indices for all subfiles that arrive from Arpanet: Goal Records: ENTRIES, ENTRY Simple Index: SD, SPIRES-DATE Simple Index: SPIRES-TIME, ST Simple Index: GRANDSEQ, GS Simple Index: SUBJECT, T, TITLE Simple Index: FROM Simple Index: DATE Simple Index: SEQ, SEQUENCE Simple Index: TEXT SPIRES-DATE (or SD) allows a user to find entries based upon the date it was added. Examples: FIND SD > 08/01/85 (IN AI-LIST would find all entries that have been added to Ai-List after 08/01/85. FIND SD < 07/01/85 (IN INFO-NETS would find all entries that have been added to Info-Nets before July 1st, 1985. SPIRES-TIME (or ST) allows you to refine your search even further when wishing to find entries that have been added after (or before) a particular day and time. Note should be taken that these indices of date and time are not the date and time fields as mentioned within the Arpanet digest but rather the actual date and time that the data was loaded into the DATABASE system. GRANDSEQ (or GS) is a unique integer number that is given to each Arpanet digest as it is added. Certain immediate redistribution digests (like Info-Nets) do not supply any sequencing number. This sequence number is assigned by an alternate conferencing system within Bitnet so that duplication of entries will not occur. This sequence number will generally be different than the sequence number as assigned by a "digested" digest moderator. Examples: FIND GS 96 (IN AI-LIST would find all individual entries from an Ai-List digest that was assigned a Grand sequence number of 96. FIND GS > 10 (IN INFO-NETS TABLE would find all entries that have been assigned a Grand sequence number greater than 10. In addition, since the list may be quite long, this example has specified the TABLE option, which will display a concise list of which entries have been found. SUBJECT (or TITLE or T) is the 'Subject:' header line that generally appears on each Arpanet digest entry. Examples: FIND SUBJECT PIXEL (IN INFO-GRAPHICS FIND TITLE PROLOG (IN AI-LIST FIND SUBJECT WORKST* (IN INFO-GRAPHICS FROM is the 'From:' header line that appears on each Arpanet digest entry. Examples: FIND FROM STRING DEC (IN INFO-NETS would find any entry in Info-Nets that had a 'From:' field with the character string 'DEC' anywhere within the field. FIND FROM HENRY (IN AI-LIST would find any entry in Ai-List that has the word HENRY in the 'From:' field. DATE is the 'Date:' header line that appears on each Arpanet digest entry. Examples: FIND DATE EDT (IN INFO-IBMPC would find all entries that have the word EDT in the 'Date:' field of the entry. FIND DATE JUL OR DATE JUN (IN INFO-NETS would find all entries that have the word 'JUN' or 'JUL' in their 'Date:' field. SEQUENCE (or SEQ) is only valid for Arpanet "digested" digests. Each "digested" digest is assigned a sequence number by the moderator. Examples: FIND SEQ 102 (IN AI-LIST FIND SEQ > 90 (IN INFO-IBMPC TABLE TEXT is the text of the entry. This is defined as the section of an entry that follows a blank line after the 'Date:', 'From:' and 'Subject:' (optional) fields. This entire section is keyword searchable. Examples: FIND TEXT EARN (IN INFO-NETS FIND TEXT PROLOG AND TEXT LISP (IN AI-LIST FIND TEXT XENIX AND SEQ > 85 (IN INFO-IBMPC FIND (TEXT UNIX OR TEXT XENIX) AND DATE PST (IN INFO-IBMPC ------------------------------ END OF IRList Digest ********************