IRList Digest Wednesday, 26 November 1986 Volume 2 : Issue 60 Today's Topics: Email - Problems with the size of our distribution - **URGENT** Request to members to set up re-distribution lists Query - Address of Ellen Voorhees? Abstracts - Reference, and Query on Bit String Use - One Abstract on Bit Strings, Others on Associative Networks - Bibliography on Bit String Use News addresses are ARPANET: fox%vt@csnet-relay.arpa BITNET: foxea@vtvax3.bitnet CSNET: fox@vt UUCPNET: seismo!vtisr1!irlistrq ---------------------------------------------------------------------- Date: Fri, 21 Nov 86 13:25:51 EST From: seismo!rick (Rick Adams) Subject: Re: ... my distribution list? ... While we're on the subject of the distribution list, it is getting too big to handle. We can not send to multiple recipients at the same host. It is very expensive to do. [Note: in case you did not know, Rick Adams at seismo has been letting me send 1 message to their computer and then the distribution list is expanded there to go out over various networks. So, we had best follow his requests! - Ed] At an absolute minimum, you need to remove the 50 or so bitnet addresses and get them onto a mail forwarded on a bitnet site. The people at BITNIC can help you and are encouraging this behavior. It is making a big impact on the arpa/bitnet gateway. I would appreciate it if you can get local forwarding on many of the other sites as well. The general rule is that if there are more than 2 recipients on a host (bitnet counts as a host because it is a relay machine) there should be a local forwarder. There should also be one on csnet-relay, but they aren't very cooperative about those things usually. ---rick [Note: I have taken the following steps as a result of this message: 1) Bitnet recipients will receive mail directly from me from foxea@vtvax3.bitnet. This will load our network for a while but should give Bitnet recipients very fast service. 2) Remaining recipients will still be reached from seismo. However, as you can see in the next message, I am encouraging your setting up of re-distribution points. Thanks for your cooperation! - Ed] ------------------------------ Date: Sat Nov 22 18:13 EST 1986 From: fox Subject: Request for setting up distribution lists! I am asking for cooperation of all IRList recipients to help with setting up of distribution lists. Your computer system administrator can set up an alias, such as IRList-dist@site-address so that I can just send to that and the 1 copy to your computer can then be replicated to all local readers who are interested. There are a number of aspects to changeover. First, there are a number of lists already, and some of you could be added to one rather than receive a direct copy: bboard.IRList@r20.utexas.edu cmu-IRList@cmu-cs-pt.arpa dist-irlist@louie.udel.edu incoming-IRList@SU-CSLI.arpa incoming-IRList@sumex-aim.arpa ir-list@lsu.csnet ir@bellcore.arpa irlist-bboard@red.rutgers.edu irlist-disty@MIT-Multics.arpa irlist-inbox@mcc.arpa irlist-incoming@TI-CSL.csnet irlist-local@scrc-stony-brook.arpa irlist-p@brl.arpa irlist@nlm-vax.arpa irlist@umass-cs.csnet irlist@usc-cse.csnet munnari!IR-List palladian-irlist@live-oak.lcs.mit.edu ucl-irlist@ucl-cs.arpa Second, there are several MIT addresses (which could, perhaps, be tied in with above MIT-Multics or other distributions): media-lab.mit.edu mit-hermes.arpa mit-mc.arpa OZ.AI.MIT.EDU@XX.LCS.MIT.EDU xx.lcs.mit.edu Each of these sites, at least, should have a re-distribution address. Similarly there are several at CMU: cad.cs.cmu.edu CMU-CS-G.arpa CMU-CS-K.arpa sei.cmu.edu Also, there are several lll sites: lll-mfe.arpa lll-tis-a.arpa lll-tis-b.arpa lll-tis.arpa sav@LLL-MFE.arpa sds.mfenet@lll-mfe.arpa Further, there are several sites at Digital: bartok.dec@decwrl.dec.com closet.DEC@decwrl.dec.com gvaic2.dec@decwrl.dec.com newton.dec@decwrl.dec.com sprite.DEC@decwrl.dec.com whoaru.DEC@decwrl.dec.com Finally, there are many sites with >1 recipient, where each person should contact their computer system administrator, ask for a distribution list to be set up, and ask to be added to that. Then, the system administrator can tell me the new address and what old addresses are handled by it: allegra apple.csnet cornell.arpa cs.dal.cdn@ubc.csnet cs.ucl.ac.uk gmr.csnet gmu90x hans@oslo-vax.arpa harvard.arpa hplabs.arpa ibm-sj.arpa ihnp4!hoqam mitre-bedford.arpa mitre.arpa njit-eies.MAILNET northeastern.csnet nyu-csd2.arpa nyu.arpa smu.csnet sri-ai.arpa sri-nic.arpa uchicago.csnet unl.csnet usc-isi.arpa utah-20.arpa Thanks for your cooperation! - Ed ------------------------------ Date: Wed, 19 Nov 86 00:41:09 est From: kraft@LSU.CSNET Subject: where is ellen voorhees Ed, do you have a forwarding address for ellen voorhees? She seems to have left Cornell and no one there seems to have heard where she is? Thanks, Don [Note: I have heard that she is working at a company in the Princeton NJ area but would welcome further details myself too. - Ed] ------------------------------ Date: Tue, 28 Oct 86 15:14:51 -0100 From: Wyle Subject: New reference and question on bit strings Ed: Here is an entry to the bibliography. I don't think it has been published yet. Entries on bitstrings will follow. I am looking for literature references related to bit strings and signature records used in text indexing. Does anyone in IR digest list know of a good place to start looking? Who are the key players in "signature records" used in IR? [Note: there was an article that surveyed related matters, that might help fill out your bibliography further: Faloutsos, Christos. Access Methods for Text. ACM Computing Surveys, 17(1), March 1985. Thanks for the references! - Ed] %A M Domenig %A P Shann %T Towards a dedicated database system for dictionaries %B Proceedings of the 11th International conference on Computational Linguistics, August 25-29 1986 %C Bonn %I IKP Universitaet Bonn ------------------------------ Date: Wed, 29 Oct 86 09:04:11 -0100 From: Wyle Subject: Yet more new references Here is a bit string entry and some associative networks references of particular interest (using connectionism to index text): %A D R McGregor %A J R Malone %T The Fact Database System - a system using generic associative networks %J Research and Development in Information Technology %V 1 %P 55-72 %D 1982 %A D R McGregor %A J R Malone %T The Fact System - a hardware-oriented approach %B Database management systems: a technical comparison %E P J King %S Computer sate of the art reports %C Maidenhead %I Pergamon Infotech %D 1983 %P 99-112 %A S E Fahlman %T A system for representing and using real-world knowledge %C Cambridge Massachussetts %I MIT Press %D 1979 %A J R Quinlan %T Induction over large databases %R HPP 79-14 %S Heuristic Programming Project %C Stanford California %I Stanford University Press %D 1979 %A R S Michaelski %T A theory and methodology of inductive learning %J Artificial Intelligence %V 19 %D 1982 %P 189-249 %A D R McGregor %A J R Malone %T Generic associative hardware, its impact on database systems %B Proceedings of an IEEE Colloquium on Associative Methods and Database engines %D May 1982 %A M L Minsky %A S Papert %T Perceptrons: threshold function geometry %C Cambridge Massachusetts %I MIT Press %D 1986 %A S A Feldman %A D Ballard %T Computing with connections $R TR72 14727 %C Rochester, New York %I Rochester Institute of Technology, Computer Science Department %D 1981 %A K C Mohan %A P Willett %T Nearest neighbor searching in serial files using text signatures %J Information Science and Technology (Netherlands) %V 11 %N 1 %P 31-39 %D 1985 %X A nearest neighbor search procedure is described for use with serial files of textual data. The procedure involves the grouping of records into blocks, each of which is characterized by a fixed length bit string. A comparable query bit string may be matched against each of these bit strings, and an upper bound calculation used to identify those blocks which need to be inspected in detail if the document that is most similar to the query is to be identified. Experiments with three small collections of documents and queries are used to test the efficiency of the approach. ------------------------------ Date: Tue, 4 Nov 86 11:28:42 -0100 From: Wyle Subject: Bit string bibliography references As promised, here are bibliographic citations on bit string entries. Our librarian can now send the references electronically (and error-free), so there should henceforth be fewer errors. ... %A A F Harding %A M F Lynch %A P Willett %O Author's current address: Dept. of Information Studies, Univ. of Sheffield, Sheffield, England. %T Document retrieval using a serial bit string search %J Inf-Process-Manage (GB) %V 19 %N 1 %P 1-8 %D 1983 %K information-retrieval-systems %K file-organisation %K serial-bit-string-search %K best-match-retrieval-system %K serial-file-organisation %X An experimental best match retrieval system is described based on the serial file organisation. Documents and queries are characterised by fixed length bit strings and the time-consuming character-by- character term match is preceeded by a bit string search to eliminate large numbers of documents which cannot possibly satisfy the query. Two methods, one fully automatic and one partially manual in character, are described for the generation of such bit string characterisations. Retrieval experiments with a large document test collection show that the two-level search can increase substantially the efficiency of serial searching while maintaining retrieval effectiveness, and that a single-level search based only upon the bit strings results in only a small decrease in effectiveness in some cases. %A K D MacLaury %O Author's current address: Res. Libraries Group Inc., Stanford, CA, USA. %T Automatic merging of monographic data bases-use of fixed-length keys derived from title strings. %J J-Libr-Autom (USA) %V 12 %N 2 %P 143-155 %D June 1979 %K library-mechanisation %K monographic-data-bases %K title-strings %K bibliographic-files %K optimized-character-position-key %K Harrison-bit-string-key %K fixed-length-keys %K automatic-merging %K library-mechanisation %X To find duplicate records in machine-readable bibliographic files, two different fixed-length keys were developed for finding matching titles. Each had different characteristics and functions. An optimized character position key was developed for comparing all titles in the files and a Harrison bit string key, tolerant of typographic errors and other small differences, was used for comparing titles within small groups of records that were potential matches. %A E Mumprecht %O Author's current address: IBM Corp., Armonk, NY, USA %T Efficient bit string handling with standard processing units %J IBM-Tech-Disclosure-Bull (USA) %V 26 %N 10A %P 4912-4914 %D March 1984 %K data-handling %K semiconductor-storage %K storage-management-and-garbage-collection %K high-resolution-graphics %K image-handling %K data-processing %K bit-string-handling %K standard-processing-units %K storage-reference-instructions %K microprocessors. %X The method described enhances the power of ordinary storage reference instructions in standard processing units, e.g., microprocessors. %A K Ramamohanarao %A J W Lloyd %A J A Thom %O Author's current address: Dept of Computer Sci, Univ of Melbourne, Parkville, Victoria, Australia %T Partial-match retrieval using hashing and descriptors %J ACM-Trans-Database-Syst (USA) %V 8 %N 4 %P 552-576 %D December 1983 %K database-management-systems %K information-retrieval-systems %K database-management-systems %K hashing %K descriptors %K partial-match-retrieval-scheme %K addresses %K mathematical-model %X This paper studies a partial-match retrieval scheme based on hash functions and descriptors. The emphasis is placed on showing how the use of a descriptor file can improve the performance of the scheme. Records in the file are given addresses according to hash functions for each field in the record. Furthermore, each page of the file has associated with it a descriptor, which is a fixed-length bit string, determined by the records actually present in the page. Before a page is accessed to see if it contains records in the answer to a query, the descriptor for the page is checked. This check may show that no relevant records are on the page and, hence, that the page does not have to be accessed. The method is shown to have a very substantial performance advantage over pure hashig schemes, when some fields in the records have large key spaces. A mathematical model of the scheme, plus an algorithm for optimizing performance, is given. %A R Sacks-Davis %A K Ramamohanarao %O Author's current address: Dept of Computing, Royal Melbourne Inst of Technol, Melbourne, Victoria, Australia. %T Partial-match retrieval based on superimposed coding %B Proceedings of the 6th Australian Computer Science Conference, Sydney, NSW, Australia, 10-12 Feb. 1983. %J Aust. Comput. Sci. Commun. (Australia) %V 5 %N 1 %P 166-176 %O Author's current address: February 1983 %K information-retrieval %K record-retrieval %K partial-match-retrieval %K data-files %K superimposed-coding %K descriptor-file %X This paper describes a method for partial-match retrieval on very large data files. The method is based on superimposed coding techniques. Associated with the data file is a descriptor file containing bit strings which describe the records. In order to retrieve records efficiently a two level descriptor file is proposed. An analysis of this scheme is presented. %A R Sacks-Davis %A K Ramamohanarao %O Author's current address: Dept of Computing, Royal Melbourne Inst. of Technol, Melbourne, Victoria, Australia %T A two level superimposed coding scheme for partial match retrieval %J Inf-Syst (GB) %V 8 %N 4 %P 273-280 %D 1983 %K database-management-systems %K information-retrieval. %K DBMS %K information-retrieval %K two-level-superimposed-coding-scheme %K partial-match-retrieval %K very-large-data-files %K descriptor-file, bit-strings %K descriptor-file %X The authors describe a method for partial-match retrieval on very large data files. The method is based on superimposed coding techniques. Associated with the data file is a descriptor file containing bit strings which describe the records. In order to retrieve records efficiently a two level descriptor file is proposed. An analysis of this scheme is presented. ------------------------------ END OF IRList Digest ********************