IRList Digest           Saturday, 27 August 1988      Volume 4 : Issue 45
     
Today's Topics:
   Query - NY Times Collection
   Abstracts - Dissertations selected by S. Humphrey [Part 2 of 5]
     
News addresses are
   Internet: fox@fox.cs.vt.edu or fox%fox.cs.vt.edu@dcssvx.cc.vt.edu
   BITNET: foxea@vtcc1.bitnet (replaces foxea@vtvax3)
     
----------------------------------------------------------------------
     
Date:     Tue, 2 Aug 88 16:52 EDT
From: krovetz@UMass
Subject:  NY Times test collection
     
Ed,
  Bruce Croft told me you might know about a test collection of
articles from the NY Times.  He said it was pretty small.  Could
it be sent in a mail message?
     
Thanks,
Bob
[Note: There is an old collection used in some of the SMART
studies that might be available from Cornell - this is the
only one other than the big collections (probably without
queries) of NY Times documents made at several locations.
Let me know if you turn up something and we can put it on
Virginia Disc One (which is now almost ready!) - Ed]

------------------------------
     
Date: Wed, 3 Aug 88 13:36:58 EDT
From: "Susanne M. HUMPHREY" <humphrey@MCS.NLM.NIH.GOV>
Subject: dissertation abstracts [Note: Part 2 of 5 - Ed.]

.[
AN University Microfilms Order Number ADG87-29523.
AU YU, JUNG-PIN.
IN The University of Iowa Ph.D 1987, 203 pages.
TI LIBRARY SEARCHES OF INFRARED SPECTRA AND MULTICOMPONENT ANALYSIS.
DE Chemistry, Analytical.
AB Odd moments of the cross-correlation function and average
   wavenumber methods are used for library searches. The absorbances
   of a spectrum are represented by 24 values if the spectrum is
   divided into twelve regions. The indirect spectral data base is
   split into twelve subsets according to the spectral regions with
   the most intense integrated intensities. Only several subsets are
   used for a search. Several versions of integrated intensity
   filters are applied to pre-select the library entries. The best
   method of this work presently has about an 84% chance of locating
   an experimental spectrum among the best five matches if this
   spectrum is in the library and the correlation coefficient between
   them is not less than 0.9000. The average search time per spectrum
   is approximately 30 seconds with a Leading Edge Model D computer
   equipped with an 8087 co-processor. The search results by the
   correlation coefficient method are comapred, and the effects of
   baseline shift, noise, and frequency shift are also investigated.

   A factor analysis of n-component spectra generates n orthogonal
   spectra. Proper combinations of any two of the abstract spectra
   can eliminate one component and yield new spectra with n $-$ 1
   components. The orthogonal spectra obtained by a factor analysis
   of the newly generated spectra have n $-$ 1 components each.
   Iterative subtraction processes finally resolved all pure
   component spectra. A minimum entropy method and a minimum
   pathlength method are used to evaluate the component elimination
   procedures.

   The relative component fractions of a sample with indeterminate
   pathlength can be determined by using normalized pure component
   spectra as references. The relative component fractions of a
   mixture is constant for each component of all samples. Q matrix
   and T matrix methods are investigated. Singular value
   decomposition method is found to have utility in the spectral
   reconstruction.
.]
.[
AN This item is not available from University Microfilms International
   ADG05-62008.
AU CASAS-RAPOSO, IGNACIO ANTONIO.
IN University of Toronto (Canada) Ph.D 1986.
TI PROPHET: A LAYERED ANALYTICAL MODEL FOR PERFORMANCE PREDICTION OF
   DATABASE SYSTEMS.
DE Computer Science.
AB The efficiency and performance of database systems is
   substantially affected by many decisions which are required
   throughout the development stages. Unfortunately, the impact on
   performance of many design alternatives is not yet well
   understood. In this thesis we propose an analytical performance
   model of database systems that provides a means of quantitative
   comparison of design alternatives. Four levels of abstraction are
   distinguished for design choices, models of representation, and
   related performance measures. Queuing network models are used at
   the bottom, most concrete level. Several subproblems within the
   layered framework are studied in detail including schema and
   internal access plans models and a performance model for database
   buffer management. We also extend and refine several existing
   modeling techniques such as schema-to-internal database mappings
   and models of internal database structures. All the proposed
   submodels are integrated within a generalized framework for
   database performance analysis, named PROPHET, with detailed models
   of computation at the four layers of abstraction. The
   applicability and feasibility of the framework is illustrated with
   applications to several present-day commercial DBMSs, including
   IMS, MISTRESS and SYSTEM 2000. We also describe the design and
   implementation of a specialization of our generalized framework to
   SYSTEM 2000 environments. We report preliminary results of the
   validation of the SYSTEM 2000 performance predictor against data
   measured in an actual operational environment.
.]
.[
AN University Microfilms Order Number ADG88-00136.
AU CHIN, YU-TUNG.
IN University of Rhode Island Ph.D 1987, 194 pages.
TI SYSTEM DESIGN AND IMPLEMENTATION OF A COMPUTERIZED RHODE ISLAND
   WATER RESOURCES INFORMATION SYSTEM.
DE Computer Science.
AB Water resources data from various federal and state agencies of
   Rhode Island were collected, analyzed and reorganized. The nature
   of the data was studied. A conceptual model representing the
   entities and relationships among the data was derived by means of
   a semantic data model. The conceptual model was then mapped into a
   logical file structure represented by a pseudo Data Definition
   Language. A distributed configuration was chosen to implement the
   database in order to achieve desired functional performance and
   cost effectiveness. Finally, the logical structure was realized by
   appropriate Database Management Systems in mainframe and personal
   computers.

   Programs were developed to support data entry as well as to
   maintain integrity and security of the database. Applications of
   the system in water resources decision-making and management were
   demonstrated by examples.
.]
.[
AN This item is not available from University Microfilms International
   ADG05-62262.
AU FALOUTSOS, CHRISTOS NICK.
IN University of Toronto (Canada) Ph.D 1986.
TI SIGNATURE FILES: AN ACCESS METHOD FOR TEXTUAL MESSAGES.
DE Computer Science.
AB Signature files seem to be a promising method for text retrieval
   and message retrieval. According to this method the messages are
   stored sequentially in one file ("text file") while abstractions
   of the messages ("signatures") are stored sequentially in another
   file ("signature file"). In order to resolve a query, the
   signature file is scanned first and many non-qualifying messages
   are immediately rejected. Those non-qualifying messages that pass
   the signature test are called false drops and are undesirable.
   Careful design of the signature extraction method and the
   appropriate signature size can restrict the average number of
   false drops. The screening capacity of some signature extraction
   methods is examined analytically. New methods are proposed and
   their performance is compared with one of the old methods.
   Finally, the relationship between the false drop probability and
   the information loss during the signature extraction is
   investigated.
.]
.[
AN This item is not available from University Microfilms International
   ADG05-61631.
AU ICAZA, JOSE IGNACIO.
IN University of Waterloo (Canada) Ph.D 1987.
TI ADAPTIVE SELECTION OF QUERY PROCESSING STRATEGIES.
DE Computer Science.
AB Approximate cost models that are traditionally needed to evaluate
   query execution strategies are sometimes unavailable or too
   unreliable for particular environments. Our new approach for
   strategy selection is based on feeding back the actual costs of
   query execution under different strategies, rather than on
   assumptions-loaded estimates of these costs. We propose three
   different methods for using these feedback costs, under a common
   general framework for adaptive systems.

   In optimal selection we define the problem of designing an optimal
   decision policy for a single query class, where the policy
   specifies which strategy to execute for each query. This problem
   is mapped to some versions of the bandit problem in statistics,
   which can be solved using bayesian statistics and dynamic
   programming. We present and analyze a program that derives the
   optimal policy.

   In approximate selection we use a learning automaton to select
   strategies. The learning algorithm by Thathachar and Sastry is
   modified to handle the dynamic environments of databases. This
   adaptive algorithm is tested using existing databases and query
   loads, showing how this approach determines the best execution
   strategies over time, and changes which strategies are selected as
   the environment changes.

   Finally, the method of query class partitioning considers the
   problem of evolving optimal query partitionings, in which each
   partition includes all queries that have a common optimal
   strategy. We present a novel adaptive classification scheme that
   uses several learning automata.
.]
.[
AN University Microfilms Order Number ADG88-04134.
AU KEARNS, TIMOTHY G.
IN Air Force Institute of Technology Ph.D 1987, 351 pages.
TI A METHODOLOGY, BASED ON ANALYTICAL MODELING, FOR THE DESIGN OF
   PARALLEL AND DISTRIBUTED ARCHITECTURES FOR RELATIONAL DATABASE QUERY
   PROCESSORS.
DE Computer Science.
AB The design of faster relational database query processors to
   improve the data retrieval capability of a database was the goal
   of this research. The emphasis was on evaluating the potential of
   parallel implementations to allow use of multiprocessing. First,
   the theoretical considerations of applying relational operations
   to distributed data was considered to provide an underlying data
   distribution and parallel processing environment model. Next,
   analytical models were constructed to evaluate various
   implementations of the select, project, and join relational
   operations and the update operations of addition, deletion, and
   modification for a range of data structures and architectural
   configurations. To predict the performance of the query processor
   for all cases, the individual operator models needed to be
   extended to models for complex queries consisting of several
   relational operations. The solution to modeling multi-step queries
   was the use of a general form to express a query. This normal form
   query tree uses combined operations to express relational algebra
   equivalent queries in a standard form. This standard tree form was
   then used to construct analytical models for multi-step queries.
   These models provide the capability to model the potential of
   different forms of parallelism in solving complex queries. The
   results of the analytical models present a logical design for a
   multiprocessor query processor. This logical query processor using
   multiple processors and employing parallelism illustrates the
   potential for an improved query processor using parallel
   processing when the analytical model results of complex queries
   are compared to a benchmark of some current database systems.
.]
.[
AN University Microfilms Order Number ADG88-01732.
AU LUCCI, STEPHEN.
IN City University of New York Ph.D 1987, 241 pages.
TI STRING MATCHING: A COMPARATIVE STUDY OF ALGORITHMS, AND ITS RELATION
   TO PROBLEMS OF PARALLEL AND DISTRIBUTED COMPUTING.
DE Computer Science.
AB Our purpose in this work is to provide a comparative study of
   string matching algorithms and their relationship to problems in
   parallel and distributed computing. We begin our discussion in
   chapters one and two with the Knuth-Morris-Pratt and Boyer-Moore
   algorithms. These optimal serial approaches to string matching
   provide a reference point against which their parallel
   counterparts in chapters three through five may be judged. Galil's
   parallel algorithm in chapter three searches for prefixes of the
   pattern of increasing length which occur within the text string.
   Vishkin's Algorithm, described in chapter four employs the concept
   of duels between various text positions to obtain a sparse set of
   indices within the text in which the pattern might begin. In
   chapter five the Rabin-Karp Algorithm is outlined. This approach
   uses hashing functions called fingerprints, rather than character
   comparisons, as its basic operation. It is a serial algorithm
   which unlike other serial approaches is readily parallelizable.

   In chapter six we outline an architecture for a parallel processor
   for Galil's algorithm. The "parallel string processor" (PSP) is an
   SIMD computer with 2$\sp{16}$ PEs and a perfect shuffle
   interconnection network. In the process of specifying our design
   we consider sorting algorithms, connection networks, and various
   paradigms for data broadcasting. Hopefully, this work foreshadows
   the rapidly unfolding technological breakthroughs that will serve
   as a bridge to fifth generation systems.
.]
.[
AN This item is not available from University Microfilms International
   ADG03-75884.
AU SHEU, CHEN-YU.
IN University of California, Berkeley Ph.D 1986.
TI QUERY PROCESSING AND COMMUNICATION CONTROL IN OBJECT BASES AND
   DISTRIBUTED OBJECT BASES.
DE Computer Science.
.]
.[
AN This item is not available from University Microfilms International
   ADG05-62253.
AU TUORI, MARTIN IAN.
IN University of Toronto (Canada) Ph.D 1987.
TI A FRAMEWORK FOR BROWSING IN THE RELATIONAL DATA MODEL.
DE Computer Science.
AB This thesis addresses problems in HCI (Human-Computer Interaction)
   with information systems, especially database management systems
   (DBMSs). Inexperienced or occasional users are often unable or
   unwilling to cope with the linguistic, structural and intentional
   demands of traditional interfaces, especially high-level query
   languages. The primal form of interaction is looking, or
   searching. A four part taxonomy of searching is given (intention,
   structure, language and modality), and used as a descriptive basis
   for a review of relevant HCI literature and interface approaches.
   The most important form of searching for casual use is browsing,
   defined here as searching without an explicit goal or intention.

   A three-part framework is presented for the design and evaluation
   of browsing interfaces--conceptual models of the interface, design
   guidelines, and evaluation criteria. A closed-loop model of the
   user interface is presented, along with a novel categorization of
   relational database interfaces, by the way a user may approach the
   information (data-model-oriented, object-oriented, and
   domain-oriented) and by structural level (system, database, table
   and tuple).

   Two selection methods for domain-oriented, table-level browsing
   are presented--multidimensional range query and point of view
   using nearest neighbours. Their strengths and weaknesses are
   examined, and the nearest-neighbours approach developed in detail.
   Problems arising from this method are examined, solutions offered,
   and an implementation supporting both these methods is presented
   and discussed.

   The descriptive taxonomy and development framework presented in
   this thesis offer a basis for understanding computer-assisted
   searching and for assessing the contributions of various interface
   approaches. Previous literature on HCI is based on the premise of
   well-defined intention; this premise must be rejected in favour of
   a model of HCI in which the user's intention is variable.
   Computerized information systems capable of supporting a wide
   range of users will result from the integration of many interface
   approaches; browsing offers a useful approach for initial,
   exploratory search, especially by inexperienced or occasional
   users.
.]
.[
AN University Microfilms Order Number ADG88-04620.
AU TURBYFILL, CAROLYN.
IN Cornell University Ph.D 1988, 286 pages.
TI COMPARATIVE BENCHMARKING OF RELATIONAL DATABASE SYSTEMS.
DE Computer Science.
AB New and diverse architectures have been developed to meet rising
   expectations of both functionality and performance from relational
   database systems. A comprehensive, portable, comparative benchmark
   is needed to evaluate the performance tradeoffs inherent in these
   different architectures. In this thesis, we develop an
   experimental framework for systematically evaluating and comparing
   relational database systems across diverse architectures. We
   describe and motivate the design of a scalable, portable benchmark
   for relational database systems, the AS$\sp3$AP benchmark (ANSI
   SQL Standard Scalable And Portable). The AS$\sp3$AP benchmark is
   designed to provide meaningful measures of database processing
   power and to be a useful tool for system designers. We introduce a
   new performance metric, the equivalent database ratio, to be used
   in comparing systems. The equivalent database ratio is the ratio
   of database sizes on two different systems for which equivalent
   performance on a set of test queries is obtained.

   The benchmark consists of two parts: tests of the access methods
   and building blocks, and test of the optimizer. In tests of the
   access methods and building blocks, access methods and functions
   common to implementations of relational database systems are
   tested. The relations and queries used in these tests are designed
   to increase the likelihood that the access method or program
   branch targeted by the test will actually be chosen by the query
   optimizer. The tests of the access methods and building blocks are
   used to compute the equivalent database ratio. In tests of the
   optimizer, assumptions typically made by query optimizers such as
   the uniform distribution of data values or the random placement of
   tuples are systematically violated. Comparisons between queries
   are used to indicate whether the optimizer has made a correct
   choice. Queries are included to illustrate current limitations in
   the state of the art in query optimization. For a representative
   cross section of access methods, estimates of the best case
   response time for the benchmark queries on disk bound systems are
   computed. The response time estimates and the systematic
   benchmarking methodology provide for clear interpretation of
   benchmark results.
.]
    [Note: continued in next issue - Ed]

------------------------------
     
END OF IRList Digest
********************