Date: Tue, 8 Apr 86 09:05:40 est From: vtisr1!irlistrq To: fox Subject: IRList Digest V2 #19 Status: RO IRList Digest Tuesday, 8 Apr 1986 Volume 2 : Issue 19 Today's Topics: Article - Research proposal:Software development for information structures CSLI - Categories of Correspondence ---------------------------------------------------------------------- Date: Fri, 4 Apr 86 01:09:34 est From: vtcs1::in% (Peter_Smit%ub-mts%umich-mts.mailnet@mit-multics.ARPA) Subject: seventh possible PhD research project [Note: This is a long message, but Peter would like feedback. I have left this in a narrow column (though in future it is better if submissions are done more normally), which I hope won't hurt readability too much. The copy I edited had the 1st letter of most lines omitted, so in case I erred in editing, you can try to figure out the correct form yourself, since lines are as before. - Ed] University of Michigan Peter Smit 8603 2803 Urban, Technological and Environm. Planning TITLE: Software development strategies for new information structures. BACKGROUND: Providing retrieval services of existing literature sources for the public at large, as is done at present by some on-line systems and some laser disk services, is not enough anymore. There are many such services already. Moreover, retrieval by category or by a combination of keywords is not good enough because, in response to the general queries that non- specialists are most likely to ask, it responds with impractically large numbers of references. While all of these may include that general concept, usually only few provide good introductory readings for a novice. Finally, the readings to which retrieval services typically refer contain parts that are redundant, side tracks irrelevant to the user's purposes and specialized unexplained terminology. People in a practical frame of mind would like to know what all these articles and books really have to say about their particuilar problem, but they don't have the time to read even a selection. Rather, there may be some demand for a service that provides short summaries of general as well as specific topics, with menus of choices after each summary to get to related or more specific summary items. The advantage of this approach is, first, that in practice, the use of menus is easier than having to learn command codes or retrieval languages which is often necessary in keyword retrieval systems. Also, the specially re-written summaries can be made short enough to fit on a single screen display. Furthermore, general overview items can be retrieved in response to a general queries, and more specific items in response to queries contqaining more specialized searching terms. However many times different source texts repeat each other in mentioning some basic facts, the users will have to read those general introductory items only once because one summary item will cover all those instances. Moreover, items can be made to inculde menus or references to other items that deal with similars topics, provide evidence, mention exceptions, or are otherwise related. Finally, just as that list or menu of related items, a list can be provided of all the books, articles and other source texts that mention the topic described in the item. The latter will help the reader to assess the strength or truthvalue of the contents of the items. The disadvantage of the proposed linked, general and specific summary items is that, in order to make these, all literature has to be picked apart, its elements summarized and compared to similarly dissected sourcematerial. This is kind of analysis is different from abstracting because, of the desired summaries, some go much more into depth than typical abstracts do, and the summaries may be segemented in strange ways to fit the divisions as to what is mentioned in other source texts as well. Not only is this a lot of work, but it also involves judgements that cannot be left to a computer: are particular paragraphs of two articles so similar as to be redundant, does one paragraph support the other or generalize from the other, are they different but related or do they directly oppose eachother, etc. In fact, not even all people will be able to analyze and summarize all texts. In a field like environmental care, there are specialties like nature preservation, traffic planning, housing, historic preservation, economic development, utilities, water quality management, emergency preparedness, recreation, etc. Probably only people from the right specialty, or better yet, a panel of those specialists, will find their content analyses and summaries accepted by others in the field. Making new information items for a service as proposed above, then, will be a very labor intensive endeavour. Ways should be sought in which computers may assist in getting that work done. APPROACH: While the computer can not interpret texts well enough to formulate the new sets of summary items and link these with existing items, it can help in setting priorities for new texts to process and it can provide administrative support to the specialists that do the analysis, reformulation and overview building. Computer programs to assist in retrieval from the envisioned integrated sets of text items should be no problem in principle because several retreival files with that structure are operational already (e.g. Bernstein and Williamson's ANNOD, files on the English PRESTEL or the Dutch VIDITEL, etc.). However, software for entering, editing and connecting new items into a file of this kind may not exist because most of these files are small, custom made and may not expand as much as get details within their items updated or altered. Research questions are, then, (A) Is there software that does part of the job, such as: Maintain a classification of subscribers as to their area of specialty and the kind of feedback or reward that has made them willing to process a new text in the past. Perform a word frequency analysis on newly incoming titles in order to determine the area of specialty of which they seem to be a part. Administer the citation (or the entire new source text if there is room) to several subscribers in the proper area of specialty and ask, when they sign on, if they would be willing to process it. Keep track of who is beginning to respond and how fast they seem to go. Put out this request to even more articipants if the process is taking too long. While helpful subscribers are retrieving items to see how they relate to what the new text is saying, let them work at a reduced rate. provide a rebate on later system usage, lottery tickets, names of famous people who are waiting for this information to be added, or whatever incentive is appropriate and indicated by 1. Provide wordprocessing support and encourage the use of standard layouts and formats that are used throughout the file. Let it be a matter of a few keystrokes to add the new citation to lists of source material that are shown on the bottom of other items. Prod the analyst-editors for general summaries that introduce a particular term or argument as used in the new text, for title lines to identify items in the menus of related items, for connections to other texts even where the author of the source text did not indicate these in the new text or its bibliography, for lists of synonyms, criteria to distinguish between look-alike terms or for anything else that may help in retrieval. Facilitate debate when different specialists disagree, such as concensus facilitation in the delphi process and in computer conferencing. Show them each other's analyses, perhaps in the form of item-maps as well as the items themselves and ask them to pick the prefered formulation or pinpoint the areas of disagreement. Collect particular third opinions or hold a poll if necessary. Enter items about which there is minimal agreement in a provisional way with indication of the nature and extent of uncertainty or disagreement still pending. Enter the items properly once sufficient agreement has been reached, replace the revised ones and delete the provisional ones. 10. Keep solliciting user feedback. Group comments by the item or relation to which they pertain. Call the matter to the attention of a specialist in the proper area if the proportion of users that leaves a comment is high. 11. Analyze usage frequencies and browsing patterns so as to identify areas where additional information would be most welcome and to spot loops etc. where users are getting lost. Propose clarifications even where users did not have the awareness to leave comments. 12. Analyze specialist's summarizing quality by tabulating the number of adjustments that have to be made later to their work, because some specialists are better at doing their thing than at talking or editing about it. Not all of these parts have the same priority, but if programs for any of them exist, it would be good to make programs for the other parts compatible and to enable all of them to work from the same data formats. A further question if different software suppliers can offer some of these parts, is, of course, (B) what would be a good development startegy for getting the missing parts written, in terms of cost, duration and flexibility? What effort of writing database reformatting programs can be justified to accomodate less compatible programs? If you know of (A) any appropriate software, even if only for part of the job, or of (B) any strategies for complex package development, I should like to hear from you. Leave a message before the end of April for Peter Smit on UB, using the attached electronic address. Thank you ------------------------------ Date: Fri, 4 Apr 86 01:08:43 est From: EMMA@su-csli.ARPA To: friends@su-csli.arpa Subject: Calendar, April 3, No. 10 [Extract - Ed] C S L I C A L E N D A R O F P U B L I C E V E N T S April 3, 1986 Stanford Vol. 1, No. 10 Categories of Correspondence Brian C. Smith (Briansmith.pa@xerox) [April 3] Photographs, sentences, balsa airplane models, images on computer screens, Turing machine quadruples, architectural blueprints, set-theoretic models of meaning and content, maps, parse trees in linguistics, and so on and so forth, are all representations--- complex, structured objects that somehow stand for or correspond to some other object or situation (or, if you prefer, are `taken by an interpreter' to stand for or correspond to that represented situation). It is important, in trying to make sense of representation more generally, to identify the ways in which the structure or composition of a representation can be used to signify or indicate what it represents. Strikingly, received theoretical practice has no vocabulary for such relations. On the contrary, standard approaches generally fall into one of two camps: those (like model-theory, abstract data types, and category theory) that identify two objects when they are roughly isomorphic, and those (like formal semantics) that take the ``designation'' relation---presumably a specific kind of representation---to be strictly non-transitive. The latter view is manifested, for example, in the strict hierarchies of meta-languages, the notion of a ``use/mention'' confusion, etc. Unfortunately, the first of these approaches is too coarse-grained for our purposes, ignoring many representational details important for computation and comprehension, while the latter is untenably rigid---far too strict to cope with representational practice. A photographic copy of a photograph of a sailboat, for example, can sometimes serve perfectly well as a photo of the sailboat. Similarly, it would be pedantic to deny, on the grounds of use/mention hygiene, that the visual representation `12' on a computer screen `must not be taken to represent a number,' but rather viewed as representing a data structure that in turn represents a number. And yet there are clearly times when the latter reading is to be preferred. In practice, representational relations, from the simplest to the most complex, can sometimes be composed, sometimes not. How does this all work? Our approach starts very simply, identifying the structural relations that obtain between two domains when objects of one are used to correspond to objects of the other. For example, we call a representation `iconic' when its objects, properties, and relations correspond, respectively, to objects, properties, and relations in the represented domain. Similarly, a representation is said to `absorb' anything that represents itself. Thus the grammar rule `EXP -> OP(EXP1,EXP2)', for a formal language of arithmetic, absorbs left-to-right adjacency; model-theoretic accounts of truth typically absorb negation; etc. A representation is said to `reify' any property or relation that it represents with an object. Thus first-order logic reifies the predicates in the semantic domain, since they are represented by (instances of) objects---i.e., predicate letters---in the representation. A representation is called `polar' when it represents a presence by an absence, or vice versa, as for example when the presence of a room key at the hotel desk is taken to signify the client's absence. By developing and extending a typology of this sort, we aim to categorize representation relations of a wide variety, and to understand their composition, their use in inference and computation. ------------------------------ END OF IRList Digest ********************