Date: Mon, 26 Aug 85 14:59 EST To: irdis at vpi Subject: IRList V. 1 No. 4 Reply-to: IRList%vpi@csnet-relay.arpa (or fox@vpics1 on BITNET) US-Mail: Dr. Edward A. Fox, Dept. of Computer Science, VPI&SU (also called Virginia Tech), Blacksburg VA 24061 Phone: (703) 961-5113 or 6931 Subject: IRList Digest V1 #4 IRList Digest Monday, 26 Aug 1985 Volume 1 : Issue 4 Today's Topics: EMAIL - digest formats Query - ACM SIGIR Officers Abstracts - Panel at SIGIR 85 Conf. in Montreal Query - map databases (question and summary) - response about World Databank II ---------------------------------------------------------------------- Subj: From: David.Anderson@CMU-CS-K Date: 17 Aug 1985 11:45-EDT Subject: Re: IRList V 1 No. 2 It would be nice if you could change the format of the IRList digests to match the "standard" ARPAnet digest layout, so that those of us with digest reading software could read the digests more effectively. Basically the lines of -------- aren't quite right -- if you need details, let me know and I'll look them up. --david [Note: Thanks to David and some others that pointed this out! I believe the digests are in correct form now. That is, 70 dashes at top, and 30 dashes separating messages, with a blank line before. - Ed] ------------------------------ From: Aviezri Fraenkel Date: Sun, 18 Aug 85 09:53:43 -0200 Ed, Who are the new SIGIR Officers? Could you pls send me a list of names with their functions? Thanks very much. Best wishes, Aviezri S. Fraenkel, Professor. Bitnet: fraenkel@wisdom.bitnet From Arpanet: fraenkel%wisdom.bitnet@wiscvm.arpa From Csnet: fraenkel%wisdom.bitnet@csnet-relay.arpa [Note: here is the requested information. Remember, ACM SIGIR needs new members! The new officers are energetic and have good plans. - Ed Chairman: Professor Clement Yu Secretary: Department of Elect. Eng. John Tolle, Ph.D. and Computer Science OCLC, Inc. Univ. of Illinois at Chicago 6565 Frantz Road Chicago, Illinois 60680 Dublin, Ohio 43017-0702 (312) 996-2318 (614) 764-6000 Vice Chairman: Treasurer: Professor Lee A. Hollaar Donna Williamson Department of Computer Science National Library of Medicine University of Utah 8600 Rockville Pike 3160 Merrill Engineering Bldg. Bethesda, Maryland, 20894 Salt Lake City UT 84112 301-496-1936 801-581-3203 donna@nlm-vax.arpa hollaar@utah-20.arpa ------------------------------ From: Aviezri Fraenkel Date: Sun, 18 Aug 85 09:30:23 -0200 Ed, At the recent SIGIR Conference in Montreal, a panel on Theoretical and Algorithmic Aspects of Information Storage and Retrieval took place. I thought that the abstracts of the panel might be of general interest, so I forward them to you hereby: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Panel on THEORETICAL AND ALGORITHMIC ASPECTS OF INFORMATION STORAGE AND RETRIEVAL Chairman: Aviezri S. Fraenkel The main aim of the panel is to portray a large number of examples illustrating how theoretical and algorithmic approaches can lead to a better understanding of problems or to improved solutions to problems in Information Storage and Retrieval. Abstracts of the addresses follow below. Each address will probably be limited to 10 minutes -- the mean free attention span at the end of a strenuous day! -- followed by a 5 minute question and answer period. SELECTION OF CLUSTERS Clement Yu The University of Illinois at Chicago Department of Electrical Engineering and Computer Science Illinois 60680, U.S.A. The problem of deciding the clusters to be searched is considered. If terms are distributed independently, a simple algorithm is given for the problem. The algorithm is then modified to take into consideration dependencies between terms. It is shown that the algorithm has low time complexity and yields accurate estimation. ON THE PHYSICAL ORGANIZATION OF CLUSTERED FILES Vijay V. Raghavan University of Regina Department of Computer Science Regina S4S 0A2, Canada The process of classification involves placing a set of objects into affinity classes or clusters, in such a way that the objects within a class are more similar to each other than they are to objects outside the class. In information retrieval, the documents comprising the bibliographic data base may be classified in order to identify clusters of documents that exhibit similar characteristics. In addition, there may also be a need to cluster index terms used to represent documents or to obtain a clustering of user queries. Since the objects within a cluster are to be retrieved jointly, it is desirable that they be kept in close proximity in physical storage. When the clusters are pairwise disjoint, a simple scheme called the cluster-inverted organization provides the answer. More generally, when clusters are allowed to overlap or when multi- level clusterings are employed, specialized algorithms are needed to ensure that physical proximity of objects of a cluster is achieved while keeping the redundancy in stored objects as small as possible. Some recent results in this area are identified. In particular, it is pointed out that for a single level, overlapping clustering, identifying minimum redundacy organization is NP-complete. This result justifies the development of a heuristic algorithm that is efficient, although not always optimal. The results for multi-level clusterings deal with the issue of having one arrangemet of the objects that would simultaneously achieve consecutive retrieval for clusters in every level of a clustering. THE USE OF FUZZY SUBSET THEORY IN INFORMATION RETRIEVAL Donald H. Kraft Department of Computer Science Louisiana State University Baton Rouge, Louisiana 70803, U.S.A. Substantial work has been done on the application of fuzzy subset theory to information retrieval. Boolean query processing has been generalized to allow for weights to be attached to individual index term and to terms in a query. However, problems still remain in terms of preservation of the Boolean lattice structure. Interesting applications using weights to generate the linguistic structure of queries have been considered. Moreover, work is ongoing to develop retrieval performance measures taking ranked output into consideration. SOME RECENT ADVANCES IN STRING-MANIPULATION ALGORITHMS Alberto Apostolico Computer Science Department Purdue University West Lafayette, Indiana 47907, U.S.A. The talk will briefly review some recent progress in (serial as well as parallel) Algorithmics on Strings which seem of relevance to common tasks of Information Storage and Retrieval. One such task is the well known, and widely investigated, pattern matching problem, i.e., the problem of finding the first (or all) occurrences of an assigned string (the pattern) in a longer string (the text). Some practically relevant variations of the pattern matching problem have also been enhanced to some extent, although the main questions posed by them are still open: examples are the so-called pattern matching with don't care and the approximate pattern matching ,i.e., pattern matching within an assigned edit distance. Other advances revolve around index structures built around the notion of Suffix Tree. Such structures have been found suitable to such tasks as multiple pattern matching in time linear in the sum of the lengths of the patterns, computing the complete substring statistics with or without overlaps, supporting some data compression techniques, building inverted files associated with a set of text files, etc. Finally, recent progress in problems related to common subsequences of strings will be pin-pointed. ROBUST DATA COMPRESSION Aviezri S. Fraenkel The Weizmann Institute of Science Department of Applied Mathematics Rehovot 76100, Israel For speeding up data retrieval or data encryption, or for reducing communication costs, it is often desirable to compress data. Many of the standard encodings used for compression are very error-sensitive in the sense that a single error may render the entire tail of the encoded string undecipherable. A new class of robust universal and uniquely decipherable encodings, based on exotic numeration systems, will be reviewed briefly. MEASUREMENT-THEORY AND EVALUATION MEASURES IN INFORMATION RETRIEVAL Peter Bollmann, Technische Universitaet Berlin If we get measurement-values by the application of an evaluation measure we want to make statements using these values. Such statements include the comparison of these values, of differences, of percentages or of arithmetic means. These comparisons are not meaningful for all kinds of scales. For example the comparison of two arithmetic means is meaningful for an interval scale but not for an ordinal scale. Measurement-theory yields criteria which assumptions scales have to fulfil and hence whether a statement is meaningful or not. In order to have a foundation for measurement in information retrieval the basic concepts of measurement-theory have to be interpreted in terms of information retrieval. Examples will be given. With best wishes, Aviezri S. Fraenkel. Bitnet: fraenkel@wisdom.bitnet From Arpanet: fraenkel%wisdom.bitnet@wiscvm.arpa From Csnet: fraenkel%wisdom.bitnet@csnet-relay.arpa ------------------------------ From: thrapp@noscvax.UUCP (Gary R. Thrapp) Subject: summary, map generation on micros Date: 31 Jul 85 17:33:57 GMT [Forwarded from WORKS Digest V.5 No. 31. - Ed] A couple of weeks ago I submitted the following request: **************************************************************** I am interested in obtaining software to generate geographic maps on Sun workstations and other micros of various word sizes. I expect this will involve a map database and software to access it. The preferred features are to accept a map center and radius (or latitude-longitude boundaries) and generate a display with shore outlines or filled landforms with a lat-lon grid. If the software used high level generic graphics calls for lines and polygons it could help for portability. A while ago I checked into getting a database called World Databank II, however I was told it took 9 reel-to-reel tapes to hold it. I am interested in something with perhaps less resolution but that would fit on micros with limited storage. I plan to post a summary of responses I receive by mail. Thank you. Gary R. Thrapp Naval Ocean Systems Center San Diego, CA MILNET/ARPANET: thrapp@nosc UUCP: {ihnp4,akgua,decvax,dcdwest,ucbvax}!sdcsvax!noscvax!thrapp **************************************************************** Thank you all for the responses I received. There are some possibilities here given some work. However my interest was in a medium resolution map database of a couple of megabytes or so with landforms and high level source code to display it. I was considering something simple that could be easily ported to various micros. If someone knows of such software please send me mail. For those who expressed interest in the CIA generated map database and software, here is the information I have found: World Databank II, 5 volume set ( > 400 megabytes, perhaps much more) $660. CAM Fortran software to access and display the map database $400. National Technical Information Service 5285 Fort Royal Road Springfield, VA 22161 Products 703-487-4600 Sales 703-487-4650 Here is an edited summary of the responses that I received, minus references to others that I don't have permission to post: **************************************************************** For ibm-type pc's, you might consider routines from a company called Golden Software. They have a series of programs, fairly cheap that take x,y,z coordinates and plot 3-D and topographic maps. Various rotations, translations, and slices are possible. **************************************************************** How about World Database I? It's about 5 megabytes. **************************************************************** I think you can get whatever map information you want, in up to 1/2 kilometer (or better) resolution, from the National Center for Cartieographic Studies or something. Check in a book called "Information USA," it's listed there. They can provide it on 9-track tape in various formats. If too much resolution is a problem, you can write some programs to take, for instance, a costal outline and do vector sums until you reduce it to something reasonable. **************************************************************** I am writing software to display maps (color or black & white) on Sun workstations. The software is in C, utilizing the SunCGI low-level graphics package. CGI is an ANSI Computer Graphics Interface standard currently under development. Also I am using the the multiple tools/windows, subwindows and icon facilities (Pixwin) of the Sun workstations. As for the map information itself, I'm planning on using the US Geological Survey (USGS) digital cartographic standards. There are two parts to the standards: Digital Line Graphs (DLG), such as rivers and roads; Digital Elevation Models (DEM), ground elevations. The DEM is available also in a modified format (a tighter gridding of elevation points) from the Defense Mapping Agency (DMA), and also from USGS. I'm currently working with the DLG attributes only. The USGS has completely mapped the USA with a 1:250,000 map scale and is slowly digitizing on a 1:24,000 scale. Eventually I believe the digital standards will apply to overseas maps as well, the DMA is already using a modified DEM standard. I plan to allow zooming between 1:250,000 and 1:24,000 scale maps, panning, toggling of features, etc. The software will be mostly mouse driven with an eye toward easing the usage by non-typists. Various dynamic features, such as military units, will be represented by icons (standard military symbols) which can be manipulated by a mouse. In general most of the software will be portable to other systems, the only exception will be the Pixwin features of the Sun workstations. I am using the ANSI CGI and the USGS DLG/DEM standards with regards to portability to, and compatabilty with, other machines and/or software. As for the storage of data on micros, that will be a problem. I found that maps are very detailed, dense pieces of information. I suggest that the map data base be eventually stored on optically encoded disks (giga-bytes). **************************************************************** This won't help with a Sun, but it might give you someting to look into: Software Concepts, Stamford CT (203-357-0522) sells a low-priced graphics program for the IBM-PC that is supposed to provide a 3-D world map. It costs $69.95 and fits on one disk. The features seem impressive for this package - turning the globe N, E, S or W, zoom, etc. It also does some distance calculating and plotting. All this info from an articel in PC Week of a few months ago. **************************************************************** We have the NCAR graphics package up and running on our Suns. One of the routines is called SUPMAP, and its purpose is to draw maps in various projections centered at various points. It may have all of the capabilities you need. My only experience with it is running the test program provided by NCAR to see if it worked as expected (it did). The map database is about 1Mbyte. I don't know how hard it would be to port the NCAR package to other micros. The thing is written in Fortran 66, but is very machine dependent, and we had to hack hard to get it to work on the Suns and the Vax. ------------------------------ From: rick@seismo.UUCP (Rick Adams) Subject: Re: summary, map generation on micros Date: 14 Aug 85 22:41:57 GMT [Forwarded from WORKS Digest V.5 No. 31. - Ed] Actually the World Databank II is really only about 20 Mbytes. It comes on 5 tapes and looks like 400 Mbytes of data because, as delivered, it has everything in fixed length records of ascii characters. Once on your target machine, by writing binary floating point numbers instead of the characters, you can get it down to a reasonable size. You can reduce it even smaller by throwing out the rivers and state/province boundaries. ---rick ------------------------------ END OF IRList Digest ********************