%A J.T. Hsieh
%A A.R. Pleszkun
%A J.R. Goodman
%T Performance evaluation of the PIPE computer architecture
%R Technical Report #?
%I Computer Sciences Department, University of Wisconsin at Madison
%D November 1984

%A R. Agrawal
%A D.J. DeWitt
%T Whither hundreds of processors in a database machine?
%R Technical Report #?
%I Computer Sciences Department, University of Wisconsin at Madison
%D January 1984

%A H. Boral
%A D.J. DeWitt
%T A methodology for database system performance evaluation
%R Technical Report #?
%I Computer Sciences Department, University of Wisconsin at Madison
%D January 1984

%A R. Agrawal
%A D.J. DeWitt
%T Recovery architectures for multiprocessor database machines
%R Technical Report #573
%I Computer Sciences Department, University of Wisconsin at Madison
%D January 1985

%A B.P. Miller
%T Parallelism in distributed programs: measurement and prediction
%R Technical Report #574
%I Computer Sciences Department, University of Wisconsin at Madison
%D January 1985

%A J.R. Goodman
%A H.C. Young
%T Code scheduling methods for some architectural features in PIPE
%R Technical Report #579
%I Computer Sciences Department, University of Wisconsin at Madison
%D February 1985

%A J.R. Goodman
%T Cache memory optimization to reduce processor/memory traffic
%R Technical Report #580
%I Computer Sciences Department, University of Wisconsin at Madison
%D February 1985

%A H-T. Chou
%A D.J. DeWitt
%T An evaluation of buffer management strategies
for relational database systems
%R Technical Report #584
%I Computer Sciences Department, University of Wisconsin at Madison
%D February 1985

%A M.J. Carey
%A D.J. DeWitt
%T Extensible database systems
%R Technical Report #585
%I Computer Sciences Department, University of Wisconsin at Madison
%D February 1985

%A M.J. Carey
%A H. Lu
%T Some experimental results on distributed join algorithms
in a local network
%R Technical Report #587
%I Computer Sciences Department, University of Wisconsin at Madison
%D March 1985

%A J.T. Hsieh
%A A.R. Pleszkun
%A M.K. Vernon
%T Performance evaluation of a pipelined VLSI architecture
using the graph model of behaviour (GMB)
%R Technical Report #589
%I Computer Sciences Department, University of Wisconsin at Madison
%D March 1985

%A R. Agrawal
%A M.J. Carey
%A L.W. McVoy
%T The performance of alternative strategies of dealing with deadlocks
in database management systems
%R Technical Report #590
%I Computer Sciences Department, University of Wisconsin at Madison
%D March 1985

%A B.P. Miller
%T A measurement system for distributed programs
%R Technical Report #592
%I Computer Sciences Department, University of Wisconsin at Madison
%D April 1985

%A M.A. Holliday
%A M.K. Vernon
%T A generalized timed Petri net model for performance analysis
%R Technical Report #593
%I Computer Sciences Department, University of Wisconsin at Madison
%D May 1985

%A M.A. Holliday
%A M.K. Vernon
%T Exact performance estimates
for multiprocessor memory and bus interference
%R Technical Report #594
%I Computer Sciences Department, University of Wisconsin at Madison
%D May 1985

%A R.R. Meyer
%T Network optimization
%R Technical Report #595
%I Computer Sciences Department, University of Wisconsin at Madison
%D May 1985

%A M.L. Scott
%T Design and implementation of a distributed systems language
%R Technical Report #596
%I Computer Sciences Department, University of Wisconsin at Madison
%D May 1985

%A H-T. Chou
%T Buffer management of database systems
%R Technical Report #597
%I Computer Sciences Department, University of Wisconsin at Madison
%D May 1985

%A Honesty Cheng Young
%T Evaluation of a decoupled computer architecture
and the design of a vector extension
%R Technical Report #603
%I Computer Sciences Department, University of Wisconsin at Madison
%D July 1985
%X Abstract: To meet increasing demands for computing power, systems must
exploit parallelism at all levels. A balance between the processing speeds
of a machine's subsystems is critical for overall system performance. A
high performance system must have a fast scalar mode as well as a vector
mode which reduces the fraction of work that the scalar mode must process.
PIPE (Parallel Instructions and Pipelined Execution) is a very high
performance computer architecture intended for a heavily pipelined VLSI
implementation. The primary goal of the PIPE architecture is fast
execution of scalar programs. PIPE has several novel features in order to
meet this goal. These features include architectural queues
which reduce the effective memory accessing delay; a prepare-to-branch
instruction which decreases the penalty incurred by branches; and
decoupled execution mode which increases the instruction initiation rate.
In this study, a compiler (for a subset of Pascal) has been developed to
evaluate the effectiveness of these architectural features. A code
scheduler takes advantage of the architectural queues and the
prepare-to-branch instruction. Software pipelining (which can prefetch
operands across loop boundaries) also takes advantage of these features.
A code generator generates two instruction streams (of sequential tasks),
that run on the two processors required for PIPE's decoupled mode.
A queue-based vector extension is proposed. This extension has the
following characteristics: (1) two level control used to support
out-of-order instruction initiation, (2) multiple classes of registers, (3)
a very short vector start-up time (one clock period), (4) branch
instructions used only for implementing high level language control
structures, (5) elements of a queue may be read repeatedly, and (6) easily
vectorizable property.
We demonstrate that the original PIPE architecture supports fast execution
of scalar programs, and that the vector extension facilitates
vectorization.

%A Carl deBoor
%A Klaus Hollig
%A Sherman Riemenschneider
%T Convergence of cardinal series
%R Technical Report #604
%I Computer Sciences Department, University of Wisconsin at Madison
%D August 1985
%X Abstract: The result of this paper is a generalization of our
characterization of the limits of multivariate cardinal splines.
Let $~~M sub n~~$
denote the $n$-fold convolution of a compactly supported function
 $~~M~ epsilon ~L sub 2 (R sup d )~~$ and denote by
.EQ
S sub n~:=~"{"sum from {j epsilon Z sup d}~c(j)M sub n ( cdot ~-~j)~:~c~ epsilon
 ~l sub 2 (Z sup d )"}"
.EN
the span of the translates of $~~M sub n$. We prove that there exists a set
 $~~OMEGA~~$ with $~~vol sub d ( OMEGA )~=~(2 pi ) sup d~~$ such that for any
 $~~f epsilon ~L sub 2 (R sup d ),$
.EQ
dist(f, S sub n )~->~0~~as~~n~->~ inf ,
.EN
if and only if the support of the Fourier transform of $f$ is contained in
 $~~OMEGA bar$.

%A Tobin J. Lehman
%A Michael J. Carey
%T A study of index structures for main memory database management systems
%R Technical Report #605
%I Computer Sciences Department, University of Wisconsin at Madison
%D July 1985
%X Abstract: One approach to achieving high performance in a database
management system is to store the database in main memory rather than on disk.
One can then design new data structures and algorithms oriented towards making
efficient use of CPU cycles and memory space rather than minimizing disk
accesses and using disk space efficiently. In this paper we present some
results on index structures from an ongoing study of main memory database
management systems. We propose a new index structure, the T-tree, and we
compare it to existing index structures in a main memory database environment.
Our results indicate that the T-tree provides good overall performance in main
memory.

%A O. L. Mangasarian
%A T.-H. Shiau
%T Error bounds for monotone linear complementarity problems
%R Technical Report #606
%I Computer Sciences Department, University of Wisconsin at Madison
%D July 1985
%X Abstract: We give a bound on the distance between an arbitrary point and
the solution set of a monotone linear complementarity problem in terms of a
condition constant which depends on the problem data only and a residual
function of the violations of the complementarity problem conditions by the
point considered. When the point satisfies the linear inequalities of the
complementarity problem, the residual consists of the complementarity
condition $~~x(Mx~+~q)~~$ plus its square root:$~~(x(Mx~+~q)) sup half $.
This latter term is essential and without which the error bound cannot hold.
We also show that another natural residual that has been employed to bound
errors for strictly monotone linear complementarity problems, fails to bound
errors for the monotone case considered here.

%A Pudukkottai K. Subramanian
%T Iterative methods of solution for complementarity problems
%R Technical Report #607
%I Computer Sciences Department, University of Wisconsin at Madison
%D June 1985
%X Abstract: Many problems in optimization theory such as linear programming,
quadratic programming and problems arising in diverse areas such as
economic equilibria, electronic circuit simulation and free boundary
problems can be formulated as complementarity problems. It is well known
that where the matrices involved are large sparse matrices, the usual
pivoting methods are not very efficient and sometimes even fail. This
thesis is a contribution to the ongoing research in the area of iterative
methods for the solution of linear and nonlinear complementarity problems.
We begin by considering complementarity problems where the operators are
monotone and consider their Tihonov regularizations. We obtain bounds for
the solutions of the perturbed problems and in particular, estimates for
the rate of growth to these solutions. In the case of linear
complementarity problems with positive semidefinite matrices, these results
reduce the solution of the LCP to the solution of a sequence of positive
definite symmetric quadratic programs. We propose SOR (successive
overrelaxation) algorithms to solve these subproblems.
In the case of complementarity problems with nonempty interior, we show
that given any preassigned feasibility tolerance our algorithm solves the
problem by solving a Tihonov regularization problem for a single value of
the parameter.
We consider monotone complementarity problems as fixed point problems. We
prove convergence of iterative algorithms which find fixed points of
carefully constructed projection maps using summability methods.
Since the solution of the nonlinear complementarity problem is equivalent
to the solution of a system of nonlinear equations, we develop damped
Gauss-Newton methods which under appropriate hypotheses solve this system
with a local quadratic convergence rate. We present computational results
which show that the use of SOR methods in conjunction with the Gauss-Newton
methods is computationally effective.

%A David P. Anderson
%A Lawrence H. Landweber
%T A grammar based methodology for protocol specification and implementation
%R Technical Report #608
%I Computer Sciences Department, University of Wisconsin at Madison
%D July 1985
%X Abstract: A new methodology for specifying and implementing communication
protocols is presented. This methodology is based on a formalism called
"Real-Time Asynchronous Grammars" (RTAG), which uses a syntax similar to that
of attribute grammars to specify allowable message sequences. In addition
RTAG provides mechanisms for specifying data-dependent protocol activities,
real-time constraints, and concurrent activities within a
protocol entity. RTAG encourages a top-down approach to protocol design
that can be of significant benefit in expressing and reasoning about highly
complex protocols. As an example, an RTAG specification is given for part
of the Class 4 ISO Transport Protocol (TP-4).
Because RTAG allows protocols to be specified at a highly detailed level,
major parts of an implementation can be automatically generated from a
specification. An RTAG parser
can be written which, when combined with an RTAG specification of a protocol
and a set of interface and utility routines, constitutes an implementation
of the protocol. To demonstrate the viability of RTAG for implementation
generation, an RTAG parser has
been integrated into the kernel of the 4.2 BSD Unix operating system, and
has been used in conjunction with the RTAG TP-4 specification to obtain an
RTAG-based TP-4 implementation in the DoD Internet domain.

%A Seymour V Parter
%T On the distribution of the singular values of Toeplitz matrices
%R Technical Report #609
%I Computer Sciences Department, University of Wisconsin at Madison
%D August 1985
%X Abstract: In 1920, G. Szego proved a basic result concerning the
distribution of the eigenvalues $"{"lambda sub j sup (n) "}"~~$
of the Toeplitz sections $~~T sub n [f]~~$ where
 $f( THETA )~ epsilon ~L sub inf (- pi ,~ pi )~~$
is a real-valued function. Simple examples show that
this result cannot hold in the case where $~~f( THETA )~~$ is not real valued.
In this note, we give an extension of this theorem for the singular values
of $T sub n [f]~~$ when $~~f( THETA )~=~f sub 0 ( THETA ) R sub 0 (
THETA )~~$
with $~~f sub 0 ( THETA )~~$ real-valued and $R sub 0 ( THETA )~~$
continuous, periodic (with period $2 pi )~$ and $~~|R sub 0 ( THETA )|~=~1$.
In addition, we apply the basic theorem of Szego to resolve a question of
C. Moler.

%A Seymour V. Parter
%T On an estimate for the three-grid MGR multigrid method
%R Technical Report #610
%I Computer Sciences Department, University of Wisconsin at Madison
%D August 1985
%X Abstract: The MGR $[ nu ]$ algorithm of Ries, Trottenberg and Winter,
Algorithm 2.1 of Braess and Algorithm 4.1 of Verfurth are all algorithms for the
numerical solution of the discrete Poisson equation based on red-black
Gauss-Seidel smoothing iterations. In this work we consider the extension
of the MGR[0] method to the general diffusion equation $~~-\(gr cdot p
\(gr u~=~f$.
In particular, for the three grid scheme we extend an interesting and
important result of Ries, Trottenberg and Winter whose results are based on
Fourier analysis and hence intrinsically limited to the case where
 $~~OMEGA~~$ is a rectangle. Let $~~OMEGA~~$ be a general polygonal domain
whose sides have slope $~~+- 1~,~~0~~$ and $~~inf .~~$ Let $~~epsilon sup 0~~$
be the error before a single multigrid cycle and let $~~epsilon sup 1~~$
be the error after this cycle.
Then $|| epsilon sup 1 || sub L sub h ~<= ~ 1 over 2 (1+Kh)|| epsilon sup
0 || sub L sub h~~$ where $||~~|| sub L sub h~~$ denotes the
energy or operator norm. When $~~p(x,y)~==~$ constant, then $~~K~==~0$.

%A Aaron Jonathan Gordon
%T Ordering errors in distributed programs
%R Technical Report #611
%I Computer Sciences Department, University of Wisconsin at Madison
%D August 1985
%X Abstract: With processors becoming small and inexpensive, many researchers
attempt to decrease program runtime by combining processors into a
multicomputer and running programs distributed over these processors.
Debugging in a distributed environment is different from debugging on a
uniprocessor. On a uniprocessor, the order in which a process's events
occur is deterministic. In a distributed environment events occur
concurrently on different processors. The order in which events occur
cannot be easily determined; a program that works correctly one time may
fail subsequently if the timing between processors changes. Traditional
methods of debugging (such as putting in print statements and recompiling
the program or recompiling the program with a debug flag on) are
inadequate since they change the program and therefore change the timing.
For this research, I have investigated distributed program bugs that depend
on the relative order between events. These ordering errors
include events which always occur in the wrong order and events whose order
of occurrence is time-dependent. In this research, I characterize these
timing errors and show necessary conditions for their occurrence. Using
my model of a distributed system, I prove which features can be used in
suggestions to those writing distributed programs, developing distributed
programming languages and designing distributed operating systems. I then
explain drawbacks to preventing ordering errors and show ways to detect
them as they occur. Finally, I describe a tool (called TAP) to aid the
programmer in discovering the causes of ordering errors in running
programs. I also show that TAP is useful in finding other types of
distributed program bugs.

%A David P. Anderson
%T A grammar-based methodology for protocol specification and implementation
%R Technical Report #612
%I Computer Sciences Department, University of Wisconsin at Madison
%D September 1985
%X Abstract: A new methodology for specifying and implementing communication
protocols is presented. This methodology is based on a formalism called
"Real-Time Asynchronous Grammars" (RTAG), which uses a syntax similar to
that of attribute grammars to specify allowable message sequences. In
addition RTAG provides mechanisms for specifying data-dependent protocol
activities, real-time constraints, and concurrent activities within a
protocol entity. RTAG encourages a top-down approach to protocol design
that can be of significant benefit in expressing and reasoning about highly
complex protocols. As an example, an RTAG specification is given for part
of the Class 4 NBS Transport Protocol (TP-4).
Because RTAG allows protocols to be specified at a highly detailed level,
major parts of an implementation can be automatically generated from a
specification. An RTAG parser can be written which, when combined with an
RTAG specification of a protocol and a set of interface and utility
routines, constitutes an implementation of the protocol. To demonstrate
the viability of RTAG for implementation generation, an RTAG parser has
been integrated into the kernel of the 4.2 BSD Unix operating system, and
has been used in conjunction with the RTAG T-4 specification to obtain a
working TP-4 implementation in the DOD Internet environment.

%A Barton P. Miller
%A Cui-qing Yang
%T Measuring distributed programs: a hierarchical and interactive approach
%R Technical Report #613
%I Computer Sciences Department, University of Wisconsin at Madison
%D September 1985
%X Abstract: We have designed an interactive tool, called IPS, for
performance measurement and analysis of distributed programs. IPS is based
on a hierarchical model of distributed computation which maps the program's
behavior to different levels of abstraction. The hierarchical model unifies
performance data from the whole program level down to procedure and
statement level. Users are able to maneuver through the hierarchy and
analyze the program measurement results at various levels of details. IPS
allows the programmer to interactively evaluate the performance history of
a distributed program. Because of the regular structure of the hierarchy,
IPS can provide information to guide the programmer to the cause of
performance bottlenecks. Critical path analysis is used, in conjunction
with simple performance metrics to direct the programmer in identifying
bottlenecks.

%A Yaoshuang Qu
%A Lawrence H. Landweber
%A Miron Livny
%T Performance modeling and optimization of PaRing
%R Technical Report #614
%I Computer Sciences Department, University of Wisconsin at Madison
%D September 1985
%X Abstract: This paper introduces the PaRing, a new token ring architecture,
and studies analytical approaches for modeling and optimizing its
performance. The PaRing is distinguished from earlier ring architectures
by its ability to support concurrent multiple communication paths on a
single loop in an efficient and fair manner. A multiple server queueing
network model, in which a communication path is viewed as being served by a
server, is used to determine packet waiting time for symmetric traffic.
The key to the modeling is to convert the multiple system into multiple
independent and identical single server systems. The model is verified by
simulation. The performance of the PaRing depends on the number of
concurrent paths. An optimization method is developed to maximize the
number of concurrent paths.

%A Klaus Hollig
%A Charles A. Micchelli
%T Divided differences, hyperbolic equations and lifting distributions
%R Technical Report #615
%I Computer Sciences Department, University of Wisconsin at Madison
%D September 1985
%X Abstract: We discuss the relationship between divided differences,
fundamental function of hyperbolic equations, multivariate interpolation
and polyhedral splines.

%A Yaoshuang Qu
%A Lawrence H. Landweber
%A Miron Livny
%T PaRing: a token ring local area network with concurrency
%R Technical Report #616
%I Computer Sciences Department, University of Wisconsin at Madison
%D September 1985
%X Abstract: In this paper we introduce the PaRing, a token ring
architecture. The PaRing ring is distinguished from earlier ring
architectures by its ability to concurrently support multiple communication
paths on a single loop in an efficient and fair manner. This is
accomplished by allowing more than one station to transmit a packet at a
time. The result is higher throughput and shorter response time than is
the case for the standard token ring or the register insertion ring.

%A Koujuch Liou
%T Design of pipelined memory systems for decoupled architectures
%R Technical Report #617
%I Computer Sciences Department, University of Wisconsin at Madison
%D October 1985
%X Abstract: Design of memory systems for decoupled architectures is the
theme of this dissertation. The decoupled architecture uses hardware
queues to architecturally decouple memory request generation from
algorithmic computation. This results in an implementation that has two
separate instruction streams that communicate via hardware queues. Thus,
performance is improved through parallelism and efficient memory
referencing.
Techniques for increasing memory bandwidth and algorithms for servicing
memory requests are incorporated into the memory system designs within
these two constraints: (1) the operands placed in the hardware queue must
be in the correct order, and (2) the needed operands are the only operands
that can be placed in the hardware queue.
Techniques such as pipelining, interleaving, servicing requests out of
arrival order, and cache memory are investigated. Two strategies for
servicing memory requests are studied: (1) to service requests according to
their priorities, and (2) to minimize the total request service time. For
the first strategy, the priority of each request type is derived from the
characteristics of memory reference and possible bottleneck during
decoupled computations. The second strategy results in a request
scheduling policy, Free-Module-Request-First,
that is proven to be able to minimize the total request service time.
A sequence control scheme must be used with the Free-Module-Request-First
scheduling policy in order to deliver the memory outputs to the hardware
queue in the correct order. This sequence control scheme is also used to
track cache hits and misses, so that a data cache can be implemented in the
memory system without difficulty.
The designed data cache can not only support flexible fetch and replacement
cache algorithms, it can also detect memory access hazards and
short-circuit the Read-After-Write requests. Therefore, the penalty of
memory access hazards can be greatly reduced.
The combination of the designed data cache and the pipelined interleaved
memory system using Free-Module-Request-First scheduling policy results in
a high-performance memory system, that is capable of servicing memory
nearly no conflict delay under the particular workload defined in the
trace files.

%A Mary K. Vernon
%A Mark A. Holliday
%T Performance analysis of multiprocessor cache consistency protocols
using generalized timed Petri nets
%R Technical Report #618
%I Computer Sciences Department, University of Wisconsin at Madison
%D November 1985
%X Abstract: We use an exact analytical technique, based on Generalized Timed
Petri Nets (GTPNs), to study the performance of shared bus cache consistency
protocols for multiprocessors. We develop a general framework within which the key
characteristics of the Write-Once protocol and four enhancements that have
been combined in various ways in the literature can be identified and
evaluated. We then quantitatively assess the performance gains for each of
the four enhancements. We consider three levels of data sharing in our
workload models. One of the enhancements substantially improves system
performance in all cases. Two enhancements are shown to have negligible
effect over the range of workloads analyzed. The fourth enhancement shows
a small improvement for low levels of sharing, but shows more substantial
improvement as sharing is increased, if we assume a "good access pattern".
The effects of two architectural parameters, the blocksize and the
main memory cycle time are also considered.

%A Wei-Chung Hsu
%T Register allocation for VLSI processors
%R Technical Report #619
%I Computer Sciences Department, University of Wisconsin at Madison
%D November 1985
%X Abstract: The performance of a single-chip CPU is often restricted by
limited off-chip memory bandwidth. In this report, we study how to
effectively use registers - one kind of on-chip memory - to reduce off-chip
memory access. To use a limited number of registers efficiently, a good
register allocation should handle spilling effectively. We study the
minimization of Load/Store instructions in local register allocation. Both
an optimal and an efficient heuristic algorithm are proposed for local
register allocation. We also suggest the use of a trace optimization
technique for global register allocation.

%A Klaus Hollig
%T Multivariate splines
%R Technical Report #620
%I Computer Sciences Department, University of Wisconsin at Madison
%D December 1985
%X Abstract: Multivariate B-splines are defined as volume densities of convex
polyhedra. Two special cases, simplex splines and box splines, give rise
to natural generalizations of univariate spline functions. While simplex
splines yield smooth piecewise polynomial spaces on fairly general
triangular meshes, box splines correspond to regular triangulations and
share many of the computationally attractive features of tensor products.
In this paper, the basic properties of these new classes of spline
functions are discussed as well as their application to surface
approximation.

%A Noga Alon
%A Amnon Barak
%A Udi Manber
%T On disseminating information reliably without broadcasting
%R Technical Report #621
%I Computer Sciences Department, University of Wisconsin at Madison
%D December 1985
%X Abstract: A general scheme for collecting information in a multicomputer
system is presented. The scheme allows data to be exchanged efficiently
and quickly among many machines without broadcasting. In addition, certain
operations can be performed on the data while it is being collected.
Several applications to distributed computing are discussed. As a
byproduct of the combinatorial part of the paper, a problem of Babai and
Erdos in combinatorial group theory is solved.

%A Carl de Boor
%A Klaus Hollig
%T B-splines without divided differences
%R Technical Report #622
%I Computer Sciences Department, University of Wisconsin at Madison
%D December 1985
%X Abstract: This note develops the basic B-spline theory without using
divided differences. Instead, the starting point is the definition of
B-splines via recurrence relations. This approach yields very simple
derivations of basic properties of spline functions and algorithms.

%A David Kamowitz
%T SOR and MGR[$nu$] experiments on the Crystal multicomputer
%R Technical Report #623
%I Computer Sciences Department, University of Wisconsin at Madison
%D December 1985
%X Abstract: This report describes implementations of the red/black SOR
algorithm and of the MGR[$nu$] multigrid algorithm on the Crystal
multicomputer. Rates of convergence and observed efficiencies for both
algorithms are compared.

%A Hongjun Lu
%T Distributed query processing with load balancing in local networks
%R Technical Report #624
%I Computer Sciences Department, University of Wisconsin at Madison
%D December 1985
%X Abstract: This thesis presents a new approach to distributed query
processing in locally distributed database systems, load-balanced query
processing (LBQP), which integrates distributed query processing and load
balancing. Several observations about previous research work in
distributed query processing motivated this study. First, only a few query
processing algorithms have been developed specifically for distributed
databases based on local networks. Second, the use of multiple copies of
data to improve performance in a distributed database system has not been
addressed by most existing algorithms. Finally, and perhaps most
importantly, existing query optimization algorithms have considered only
the static characteristics of a distributed database. The algorithms
reported here consider the dynamic load status of the system and the
existence of multiple copies of data to provide better performance than is
achievable with purely static planning techniques.
The dynamic query allocation problem for a distributed database system with
fully-replicated data was studied first using simulation. Two new
heuristic algorithms were proposed for dynamically choosing a processing
site for a newly arrived query in a local network environment. Both of
these heuristics use knowledge about queries obtained during query
optimization, such as their estimated CPU and I/O requirements. A
simulation model was developed and used to study these heuristics and
compare them to an algorithm that simply balances the number of jobs at
each site. The results of this study indicate that knowledge of query
processing requirements can be used effectively to improve overall system
performance through dynamic query allocation.
In order to obtain empirical results regarding distributed query processing
in local area networks, a testbed was built using an experimental
distributed system, the Crystal multicomputer, to conduct experiments on
the performance of distributed join algorithms. Eight different
distributed join methods were implemented using the testbed. Join queries
with a variety of relation sizes, join selectivities, and join column value
distributions were experimentally studied. The performance results
obtained indicate that pipelined join methods outperform sequential methods
over a wide range of join queries. It was also found that the
communications costs in a local network environment are certainly not a
dominant factor with respect to performance.
A three-phase load-balanced query processing algorithm, Algorithm LBQP, was
developed based on these experimental results and the results of the study
of dynamic query allocation. This algorithm first statically generates
a processing plan for a query in a locally distributed database system,
ignoring the physical storage sites of the relations referenced by the
query. A dynamic query unit allocation algorithm is then applied to the
plan to determine the processing sites for each relation. Finally,
specific processing methods for the distributed joins in the resulting plan
are selected. A model of distributed database systems with
partially-replicated data was used to investigate the performance of the
dynamic query unit algorithm of LBQP. The results show significant
improvements in performance, including improvements in both the mean waiting
time for queries and the overall system throughput.

%A Michael J. Carey
%A Hongjun Lu
%T Load balancing in a locally distributed database system
%R Technical Report #625
%I Computer Sciences Department, University of Wisconsin at Madison
%D December 1985
%X Abstract: Most previous work on query optimization in distributed
database systems has focused on finding optimal or near-optimal processing
plans based solely on static system characteristics, and few researchers
have addressed the problem of copy selection when data is replicated. This
paper describes a new approach to query processing for locally distributed
database systems. Our approach uses load information to select the
processing site(s) for a query, dynamically choosing from among those sites
that have copies of relations referenced by the query. Query compilation
is used to produce a statistically-optimized logical plan
for the query, and then a dynamic optimization phase converts this logical
plan into an executable physical plan
at run-time. This paper motivates the separation of static and dynamic
optimization, presents algorithms for the various phases of the
optimization process, and describes a simulation study that was undertaken
to investigate the performance of this approach. Our simulation results
indicate that load-balanced query processing can provide improvements in
both query response times and overall system throughput as compared to
schemes where execution sites are either statically or randomly selected.

%A R. R. Meyer
%T Trust region methods for piecewise-linear approximation
%R Technical Report #626
%I Computer Sciences Department, University of Wisconsin at Madison
%D December 1985
%X Trust region methods are analyzed for piecewise-linear
approximation algorithms for linearly-constrained nonlinear optimization.
Convergence to the optimal value is demonstrated for continuously
differential convex objectives and for certain classes of nondifferentiable
convex objectives. Computationally, this approach has the nice property
that the approximation is generally more accurate than a linear
approxmation yet the subproblems to be solved at each iteration remain
linear programs. The method is also well-suited to convex network
optmization, since it preserves the network structure of the subproblems,
thereby allowing the use of the very fast techniques available for linear
network optimization. For problem classes such as traffic assignment in
which the critical coupling between variables occurs in the objective
function, the separability of the approximation makes possible a
decomposition into independent subproblems that may be solved in parallel
in a multiprocessing environment.

%A W. Harry Plantinga
%A Charles R. Dyer
%T An algorithm for constructing the aspect graph
%R Technical Report #627
%I Computer Sciences Department, University of Wisconsin at Madison
%D December 1985
%X Abstract: In this paper we consider the size of aspect graphs, first in
the convex case, and then in the general case. In particular, we give
upper and lower bounds on the maximum size (including vertex labels) of
 $~ THETA (n sup 3 )~$ and $~~ THETA (n sup 5 )~~$ respectively, and
algorithms for constructing the aspect graph which run in time
 $~~O (n sup 4 )~~$ and
 $~~O (n sup 6 )~~$. The algorithm for the general case makes use of a new
object representation called the aspect representation or asp.
We also show a different way to label the aspect graph in order to save
a factor of n in the asymptotic size and construction time (at the expense
of retrieval time) in both the convex and general cases, and we suggest
alternatives to the aspect graph which require less space and store more
information.

%A Sheldon Klein
%T The invention of computationally plausible knowledge systems
in the upper paleolithic
%R Technical Report #628
%I Computer Sciences Department, University of Wisconsin at Madison
%D December 1985
%X Abstract: The problem of computing human behavior by rules can become
intractable with large scale knowledge systems if the human brain, like a
computer, is a finite state automaton. The problem of making such
computations at a pace fast enough for ordinary social interaction can be
solved if appropriate constraints apply to the structure of those rules.
There is evidence that systems of such constraints were invented in the
Upper Paleolithic, and were of sufficient power to guarantee that the time
necessary for computation of behavior would increase only linearly with
increases in the size and heterogeneity of world knowledge systems.
Fundamentally, there was just one type of computational invention, capable
of unifying the full range of human sensory domains, and consisting of an
analogical reasoning method in combination with a global classification
scheme. The invention may have been responsible for the elaboration of
language and culture structures in a process of co-evolution. The encoding
of the analogical mechanism in iconic visual imagery and myth structures
may have given rise to the phenomenon of Shamanism. The theory is testable,
and one of its implications is that the structuralism of Levi-Strauss has
an empirical foundation.

%A Jiawei Han
%T Pattern-based and knowledge-directed query compilation
for recursive data bases
%R Technical Report #629
%I Computer Sciences Department, University of Wisconsin at Madison
%D January 1986
%X Abstract: Expert database systems (EDS's) comprise an interesting class of
computer systems which represent a confluence of research in artificial
intelligence, logic, and database management systems. They involve
knowledge-directed processing of large volumes of shared information and
constitute a new generation of knowledge management systems.
Our research is on the deductive augmentation of relational database
systems, especially on the efficient realization of recursion. We study
the compilation and processing of recursive rules in relational database
systems, investigating two related approaches: pattern-based recursive rule
compilation and knowledge-directed recursive rule compilation and planning.
Pattern-based recursive rule compilation is a method of compiling and processing
recursive rules based on their recursion patterns. We classify recursive rules
according to their processing complexity and develop three kinds of algorithms
for compiling and processing different classes of recursive rules: transitive
closure algorithms, SLSR wavefront algorithms, and stack-directed compilation
algorithms. These algorithms, though distinct, are closely related. The more
complex algorithms are generalizations of the simpler ones, and all apply the
heuristics of performing selection first and utilizing previous processing
results (wavefronts) in reducing query processing costs. The algorithms are
formally described and verified, and important aspects of their behavior are
analyzed and experimentally tested.
To further improve search efficiency, a knowledge-directed recursive rule
compilation and planning technique is introduced. We analyze the issues raised
for the compilation of recursive rules and propose to deal with them by
incorporating functional definitions, domain-specific knowledge, query
constants, and a planning technique. A prototype knowledge-directed relational
planner, RELPLAN, which maintains a high level user view and query interface,
has been designed and implemented, and experiments with the prototype are
reported and illustrated.
%Y $5.70

%A Raphael Finkel
%A A. P. Anantharaman
%A Sandip Dasgupta
%A Tarak S. Goradia
%A Prasanna Kaikini
%A Chun-Pui Ng
%A Murali Subbarao
%A G. A. Venkatesh
%A Sudhanshu Verma
%A Kumar A. Vora
%T Experience with Crystal, Charlotte and Lynx
%R Technical Report #630
%I Computer Sciences Department, University of Wisconsin at Madison
%D February 1986
%X Abstract: This paper describes the most recent implementations of
distributed algorithms at Wisconsin that use the Crystal multicomputer, the
Charlotte operating system, and the Lynx language. This environment is an
experimental testbed for design of such algorithms. Our report is meant to
show the range of applications that we have found reasonable in such an
environment and to give some of the flavor of the algorithms that have been
developed. We do not claim that the algorithms are the best possible for
these problems, although they have been designed with some care. In
several cases they are completely new or represent significant
modifications of existing algorithms. We present distributed
implementations of B trees, systolic arrays, prolog tree search, the
travelling salesman problem, incremental spanning trees, nearest-neighbor
search in k-d trees, and the Waltz constraint-propagation algorithm. Our
conclusion is that the environment, although only recently available, is
already a valuable resource and will continue to grow in importance in
developing new algorithms.
%Y Cost: $1.44

%A O. L. Mangasarian
%A R. De Leone
%T Error bounds for strongly convex programs and (Super)Linearly
convergent iterative schemes for the least 2-norm solution of linear
programs
%R Technical Report #631
%I Computer Sciences Department, University of Wisconsin at Madison
%D February 1986
%X Abstract: Given an arbitrary point (x,u) in $R sup n$ x $R sup m$+, we
give bounds on the Euclidean distance between x and the unique solution x
to a strongly convex program in terms of the violations of the Karush-Kuhn-
Tucker conditions by the arbitrary point (x,u). These bounds are then used to
derive linearly and superlinearly convergent iterative schemes for obtaining
the unique least 2-norm solution of a linear program. These schemes can be
used effectively in conjunction with the successive overrelaxation (SOR)
methods for solving very large sparse linear programs.
%Y Cost: $0.00

%A Yeshayahu Artsy
%A Hung-Yang Chang
%A Raphael Finkel
%T Interprocess communication in Charlotte
%R Technical Report #632
%I Computer Sciences Department, University of Wisconsin at Madison
%D February 1986
%X Abstract: Charlotte is a distributed operating system that provides a
powerful interprocess communication mechanism. This mechanism differs from
most others in that it employs duplex links, does not buffer messages in
the kernel, and does not block on communication requests. A link is a bound
communication channel between two processes upon which messages can be sent,
received, awaited, or canceled. Processes may acquire new links, destroy their
end of the link, or enclose their end as part of an outgoing message. A link
is thus a dynamic capability-like entity that permits only the holder access to
its communication resource.
We first introduce the Charlotte interprocess communication structure and
discuss our motivation for choosing the specific operations. We then
describe distributed computing environments in which Charlotte is
applicable. Next, we discuss how we implemented the
interprocess-communication facility. We close with a comparison between
Charlotte and related research and justify our design and its
implementation cost.
%Y Cost: $0.00

%A Hongjun Lu
%A Michael J. Carey
%T Load-balanced task allocation in locally distributed computer systems
%R Technical Report #633
%I Computer Sciences Department, University of Wisconsin at Madison
%D February 1986
%X Abstract: A task force is a distributed program which consists of a set of
communicating tasks. This paper presents an algorithm for allocating the
tasks in a task force to a set of execution sites in a locally distributed
computer system. The main objective of this dynamic task allocation
algorithm is to balance the load of the system, and the algorithm takes the
current load status of the system into account when making its allocation
decisions. In addition to balancing the system's load, the algorithm also
tries to minimize the communications costs between the tasks to the extent
possible within the constraints imposed by load balancing. The paper
begins by quantitatively defining the load unbalance factor, and the
problem of load-balanced task allocation is then defined. A heuristic
algorithm for finding solutions to the problem is presented, and allocations
generated by the heuristic algorithm are compared to those obtained via
exhaustive search. The results of this comparison indicate that the heuristic
algorithm finds the optimal allocation in most cases. The execution time and
the complexity of the heuristic task allocation algorithm are also addressed.
%Y Cost: $0.00

%A Seymour V. Parter
%T Remarks on multigrid convergence theorems
%R Technical Report #634
%I Computer Sciences Department, University of Wisconsin at Madison
%D July 1986
%X Abstract: Multigrid has become an important iterative method for the
solution of discrete elliptic equations. However, there is much to be done
in the theory of convergence proofs. At the present time there are two
general two-level methods for general convergence proofs: an algebraic
approach and a duality approach. While these theories do not give
sharp estimates they provide good, general rigorous convergence theorems.
In this note we study the relationship between these theories. While the
approach and thought-process leading to these theories are different, the
results are essentially the same. Indeed, the basic estimated required by
these theories are the same.

%A David J. DeWitt
%A Robert H. Gerber
%A Goetz Graefe
%A Michael L. Heytens
%A Krishna B. Kumar
%A M. Muralikrishna
%T GAMMA: a high performance dataflow database machine
%R Technical Report #635
%I Computer Sciences Department, University of Wisconsin at Madison
%D March 1986
%X Abstract: In this paper, we present the design, implementation techniques,
and initial performance evaluation of Gamma. Gamma is a new relational
database machine that exploits dataflow query processing techniques. Gamma
is a fully operational prototype consisting of 20 VAX 11/750 computers.
The design of Gamma is based on what we learned from building our earlier
multiprocessor database machine prototype (DIRECT) and several years of
subsequent research on the problems raised by the DIRECT prototype.
In addition to demonstrating that parallelism can really be made to work in a
database machine context, the Gamma prototype shows how parallelism can be
controlled with minimal control overhead through a combination of the use
of algorithms based on hashing and the pipelining of data between
processes. Except for 2 messages to initiate each operator of a query tree
and 1 message when the operator terminates, the execution of a query is
entirely self-scheduling.
%Y Cost: $1.80

%A Eric Norman
%T Tracking the elusive eureka
%R Technical Report #636
%I Computer Sciences Department, University of Wisconsin at Madison
%D March 1986
%X Abstract: The algebraic approach to computation of FP is used to transform
recursive definitions of the Fibonacci and factorial functions into
iterative code. The approach is similar to the one used to solve many
differential equations and also similar to what many call "higher order" or
"semantic" unification. An algorithmic procedure is not obtained; the
intent is to demonstrate the usefulness of the algebraic approach.
%Y Cost: $0.00

%A Tobin J. Lehman
%A Michael J. Carey
%T Query processing in main memory database management systems
%R Technical Report #637
%I Computer Sciences Department, University of Wisconsin at Madison
%D March 1986
%X Abstract: Most previous work in the area of main memory database systems
has focused on the problem of developing query processing techniques that
work well with a very large buffer pool. In this paper, we address query
processing issues for memory resident relational databases, an environment
with a very different set of costs and priorities. We present an architecture
for a main memory DBMS, discussing the ways in which a memory resident database
differs from a disk-based database. We then address the problem of processing
relational queries in this architecture, considering alternative algorithms for
selection, projection, and join operations and studying their performance. We
show that a new index structure, the T Tree, works well for selection and join
processing in memory resident databases. We also show that hashing methods
work well for processing projections and joins, and that an old join method,
sort-merge, still has a place in main memory.
%Y Cost: $0.00

%A Michael J. Carey
%A David J. DeWitt
%A Joel E. Richardson
%A Eugene J. Shekita
%T Object and file management in the EXODUS extensible database system
%R Technical Report #638
%I Computer Sciences Department, University of Wisconsin at Madison
%D March 1986
%X Abstract: This paper describes the design of the object-oriented storage
component of EXODUS, an extensible database management system currently
under development at the University of Wisconsin. The basic abstraction in
the EXODUS storage system is the storage object, an uninterpreted
variable-length record of arbitrary size; higher level abstractions such as
records and indices are supported via the storage object abstraction. One
of the key design features described here is a scheme for managing large
dynamic objects, as storage objects can occupy many disk pages and can grow
or shrink at arbitrary points. The data structure and algorithms used to
support such objects are described, and performance results from a
preliminary prototype of the EXODUS large-object management scheme are
presented. A scheme for maintaining versions of large objects is also
described. We then describe the file structure used in the EXODUS storage
system, which provides a mechanism for grouping and sequencing through a
set of related storage objects. In addition to object and file management,
we discuss the EXODUS approach to buffer management, concurrency control,
and recovery, both for small and large objects.
%Y Cost: $0.00

%A Mark A. Holliday
%A Mary K. Vernon
%T The GTPN analyzer: numerical methods and user interface
%R Technical Report #639
%I Computer Sciences Department, University of Wisconsin at Madison
%D April 1986
%X Abstract: The GTPN (Generalized Timed Petri Net) is a performance model
based on Petri Nets. The state space for a model of the system is
automatically constructed and analyzed using results from Markov chain
theory. We address some of the key computational issues involved in the
Markov chain theory approach. In particular, we discuss two types of
algorithms. The first type compute the absorption probabilities and mean
time to absorption. The second type compute the steady state probability
distribution for a single, possibly periodic, recurrent Markov chain class.
We also describe the GTPN's user interface for input of the model description,
execution of the tool, and the output of the performance results.
%Y Cost: $2.28

%A Klaus Hollig
%T Box splines
%R Technical Report #640
%I Computer Sciences Department, University of Wisconsin at Madison
%D April 1986
%X Abstract: This report gives an introduction to multivariate cardinal spline
theory. It is based on joint work with Carl de Boor and Sherman
Riemenschneider and the particular topics discussed are: recurrence
relations for box splines, approximation order, interpolation, convergence
to functions of exponential type and subdivision algorithms.
%Y Cost: $0.00

%A Richard A. Brualdi
%A Rachel Manber
%T On strong digraphs with a unique minimally strong subdigraph
%R Technical Report #641
%I Computer Sciences Department, University of Wisconsin at Madison
%D April 1986
%X Abstract: In this paper we determine the maximum number of edges that a
strong digraph can have if it has a unique minimally strong subdigraph. We
show that this number equals n (n + 1)/2 + 1, a surprisingly large number.
Furthermore we show that there is, up to an isomorphism, a unique strong
digraph which attains this maximum.
%Y Cost: $0.00

%A Michael D. Chang
%A Michael Engquist
%A Raphael Finkel
%A Robert Meyer
%T A parallel algorithm for generalized networks
%R Technical Report #642
%I Computer Sciences Department, University of Wisconsin at Madison
%D June 1986
%X Abstract: A parallel algorithm for the solution of linear generalized
network optimization problems is presented. This method decomposes the
original network into initial collections of quasi-trees that are
distributed among a set of processors that optimize in parallel their
corresponding "local" problems. Periodically, the processors exchange
information on their local problems, and one or more quasi-trees may be
moved between processors as desirable links between quasi-trees on
different processors are determined or for load balancing purposes. This
procedure has been implemented on the CRYSTAL multicomputer, and
computational results have been obtained on problems with up to 28,000
variables. For problems whose optimal solutions contain numerous
quasi-trees, the efficiency of this parallel implementation is about 50%.
%Y Cost: $0.00

%A Charles V. Stewart
%A Charles R. Dyer
%T Convolution algorithms on the pipelined image-processing engine
%R Technical Report #643
%I Computer Sciences Department, University of Wisconsin at Madison
%D May 1986
%X Abstract: In this paper we present two algorithms for computing an N by N
convolution on the Pipelined Image-Processing Engine (PIPE). PIPE is a
special-purpose machine for low-level vision consisting of eight processing
stages connected in a pipelined fashion. Each stage can compute a number
of different basic image functions. Each of the algorithms decomposes the
solution into a sequence of 3 by 3 neighborhood operations, shifts and
additions. The first algorithm divides the results of the 3 by 3 partial
convolutions into groups of concentric rings of images and successively
collapses the images on the outer ring onto the next ring, merging the
results until a single result image is computed. The second algorithm
divides the partial convolution images into eight sectors and computes each
sector independently. For a 27 by 27 convolution of a 256 by 256 image the
first algorithm requires 0.633 seconds, while the second requires 0.683
seconds. These algorithms for PIPE are also shown to compare favorably
with algorithms for arbitrary sized kernels on fast general purpose
hardware, the MPP and Warp.
%Y Cost: $0.00

%A Michael J. Carey
%A David J. DeWitt
%A Daniel Frank
%A Goetz Graefe
%A Joel E. Richardson
%A Eugene J. Shekita
%A M. Muralikrishna
%T The architecture of the EXODUS extensible DBMS: a preliminary report
%R Technical Report #644
%I Computer Sciences Department, University of Wisconsin at Madison
%D May 1986
%X Abstract: With non-traditional application areas such as engineering
design, image/voice data management, scientific/statistical applications,
and artificial intelligence systems all clamoring for ways to store and
efficiently process larger and larger volumes of data, it is clear that
traditional database technology has been pushed to its limits. It also
seems clear that no single database system will be capable of
simultaneously meeting the functionality and performance requirements of
such a diverse set of applications. In this paper we describe the
preliminary design of EXODUS, an extensible database system that will
facilitate the fast development of high-performance, application-specific
database systems. EXODUS provides certain kernel facilities, including a
versatile storage manager and a type manager. In addition, it provides an
architectural framework for building application-specific database systems,
tools to partially automate the generation of such systems, and libraries of
software components (e.g., access methods) that are likely to be useful for
many application domains.
%Y Cost: $0.00

%A Klaus Hollig
%T Geometric continuity of spline curves and surfaces
%R Technical Report #645
%I Computer Sciences Department, University of Wisconsin at Madison
%D June 1986
%X Abstract: We review $beta$-spline theory for curves and show how some of
the concepts can be extended to surfaces. Our approach is based on the
Bezier form for piecewise polynomials which yields simple geometric
characterizations of smoothness constraints and shape parameters. For
curves most of the standard "spline calculus" has been developed. We
discuss in particular the construction of B-splines, the conversion for
B-spline to Bezier representation and interpolation algorithms. A
comparable theory for spline surfaces for general meshes does at present
not exist. We merely describe how to join triangular and rectangular
patches and discuss the corresponding $beta$-spline constraints in terms of
the Bezier representation.
%Y Cost: $0.00

%A Leonard Uhr
%T Parallel, hierarchical software/hardware pyramid architectures
%R Technical Report #646
%I Computer Sciences Department, University of Wisconsin at Madison
%D June 1986
%X Abstract: This paper examines pyramid-structured software/hardware
systems, both programs and multi-computers. It explores how efficient
parallel computers can be designed. It poses the extremely difficult
problem of perception, and briefly surveys the major alternative
approaches. It examines the basic structures with which pyramids can be
built, and describes pyramid multi-computers. It sketches out how pyramid
hardware can be used to execute programs, with special emphasis on the
"recognition cone" structures being developed to model living visual
systems. It briefly suggests how pyramids can be augmented and embedded
into larger software/hardware structures. In addition, it gives a short
history of massively parallel hierarchical pyramid structures.
%Y Cost: $0.00

%A O.L. Mangasarian
%A R. De Leone
%T Parallel successive overrelaxation methods
for symmetric linear complementarity problems and linear programs
%R Technical Report #647
%I Computer Sciences Department, University of Wisconsin at Madison
%D June 1986
%X Abstract: A parallel successive overrelaxation (SO) method is proposed for
the solution of the fundamental symmetric linear complementarity problem.
Convergence is established under a relaxation factor which approaches the
classical value of 2 for a loosely coupled problem. The parallel SOR
approach is then applied to solve the symmetric linear complementarity
problem associated with the least norm solution of a linear program.
%Y Cost: $0.00

%A Barton P. Miller
%A Jong-Deok Choi
%T Breakpoints and halting in distributed programs
%R Technical Report #648
%I Computer Sciences Department, University of Wisconsin at Madison
%D July 1986
%X Abstract: Interactive debugging requires that the programmer be able to
halt a program at interesting points in its execution. This paper
presents an algorithm for halting a distributed program in a consistent
state, and presents a definition of distributed breakpoints with an
algorithm for implementing the detection of these breakpoints. The Halting
Algorithm extends Chandy and Lamport's algorithm for recording global state
and solves the problem of processes that are not fully connected or
frequently communicating. The definition of distributed breakpoints is
based on those events that can be detected in a distributed system. Events
that can be partially ordered are detectable and form the basis for the
breakpoint predicates, and from the breakpoint definition comes the
description of an algorithm that can be used in a distributed debugger to
detect these breakpoints.

%A Raphael Finkel
%A Bahman Barzideh
%A Chandreshekhar W. Bhide
%A Man-On Lam
%A Donald Nelson
%A Ramesh Polisetty
%A Sriram Rajaraman
%A Igor Steinberg
%A G. A. Venkatesh
%T Experience with Crystal, Charlotte and Lynx: second report
%R Technical Report #649
%I Computer Sciences Department, University of Wisconsin at Madison
%D July 1986
%X Abstract: This paper describes several recent implementations of
distributed algorithms at Wisconsin that use the Crystal multicomputer, the
Charlotte operating system, and the Lynx language. This environment is an
experimental testbed for design of such algorithms. Our report is meant to
show the range of applications that we have found reasonable in such an
environment and to give some of the flavor of the algorithms that have been
developed. We do not claim that the algorithms are the best possible for
these problems, although they have been designed with some care. In
several cases they are completely new or represent significant
modifications of existing algorithms. We present distributed
implementations of PLA folding, a heuristic for the travelling-salesman
problem, incremental update of spanning trees, ray tracing, the simplex
method, and the Linda programming language. Together with our previous
report, this paper leads us to conclude that the environment is a valuable
resource and will continue to grow in importance in developing new
algorithms.

%A Barton P. Miller
%A David L. Presotto
%A Michael L. Powell
%T DEMOS/MP: the development of a distributed operating system
%R Technical Report #650
%I Computer Sciences Department, University of Wisconsin at Madison
%D July 1986
%X Abstract: The DEMOS/MP operating system has moved from a super computer
with a simple addressing structure to a network of microcomputers. This
transformation was done without significant changes to the semantics of the
original DEMOS, i.e., existing DEMOS programs should run on DEMOS/MP.
The changes to DEMOS were simplified by the structure of its primitive
objects and the functions over those objects. The structure of DEMOS links
and processes were the major contributors to the simplicity. The changes
made to produce DEMOS/MP involved the internal structure of link,
modification to parts of the kernel, and limited changes to the various
system processes.

%A Leonard Uhr
%T Toward a computational information-processing model of object perception
%R Technical Report #651
%I Computer Sciences Department, University of Wisconsin at Madison
%D July 1986
%X Abstract: Integrating What Is Known Into a Running Model. This paper
describes the first version of an information-processing model (to be
programmed for an appropriately structured parallel-serial multi-computer
network) for the visual recognition of complex objects. The model will be
run to exhibit its behavior, and tested, evaluated, and compared with
living visual systems, and with other models. It should also serve, where
the brain's mechanisms are not yet known, as a test-bed to evaluate
alternate possible structures.
Such an enterprise now appears to be feasible. Consider these highlights
of pertinent facts:
 .The retina and primary visual cortex, with their massively parallel and
to some extent serial structure of processes, detect spots, colors,
families of oriented edges, and other features.
 .Much is now known about the cortex's columnar structure, and the topology
of links within and between columns, hypercolumns, modules, and areas.
 .Roughly 20 cortical visual areas have been discovered. A great deal is
known about the many links between them, the way they map information, and
the processes each effects.
 .The recent firm evidence for neurons in the temporal lobe that respond
selectively, in 70 to 200 msec, to face, different particular faces,
complex parts of faces, and other complex objects, strongly suggests that
these neurons are involved at a culminating stage in the complex structure
of processes that perceives patterned objects. Temporal lobe lesions make
complex objects like faces unrecognizable, while leaving other visual
processes largely undisturbed.
 .The brain's massive parallelism makes this enormous speed possible. The
relatively slow msec response times of its basic computing elements, the
neurons, means that the maximum possible serial depth of processes is a few
dozen to a few hundred at most.
This paper first organizes key known facts about the visual system and
describes the model. (Later sections give more detail, and pinpoint
important facts still unknown.) It also briefly examines this model's
relation to parallel computer architectures, to other models for visual
perception, and to computer vision programs, emphasizing those from which
this model grew.
The goal is to develop a model/theory that exhibits the brain/mind's
behavior on difficult recognition tasks, and suggests plausible mechanisms
where facts are not yet uncovered, or firm. The running program should
show that the model is precise and consistent, and how well the separate
parts work together. It should suggest new questions and experiments.
This first version incorporates only what appears essential to achieve a
simple yet realistic working model. The final discussion examines possible
improvements.

%A Mark A. Holliday
%T Deterministic time and analytical models of parallel architectures
%R Technical Report #652
%I Computer Sciences Department, University of Wisconsin at Madison
%D July 1986
%O Ph.D. thesis
%X Abstract: In parallel computer architectures many events are of constant
duration. For example, a fetch of a cache block, ignoring resource
contention, takes a fixed number of clock cycles. Analytical performance
modeling techniques that include deterministic time have previously been
proposed. However, serious restrictions on the set of allowed models are
imposed in these previous techniques. In this dissertation we extend these
previous modeling techniques by removing those restrictions. Those
restrictions fall into two classes: those involving the construction of
the state space and those involving the analysis of the state space. We
propose interpretations for performance measures when those restrictions
are removed. In this general case, the state space represents an
arbitrary, finite state, discrete parameter Markov Chain. We also present
algorithms that efficiently construct and analyze the state space in the
general case.
Our technique is called Generalized Timed Petri Nets (GTPN). It has been
implemented in a tool and has been used to develop models for several
interesting architectures. The two most important studies examine
bus-based multiprocessors. Performance degradation due to memory and bus
interference in multiprocessors with a single-stage interconnection network
has been frequently examined. Using the GTPN we are able to derive exact
performance estimates in the important case when memory access times are
constant and interrequest times are non-zero. Previously only approximate
estimates and simulation results existed.
Caches are an important means for reducing memory contention in bus-based
multiprocessors. Our second study is the first analytical performance
comparison of the key features of protocols that maintain cache
consistency when a single shared bus is assumed.

%A Raphael Finkel
%A Michael L. Scott
%A William K. Kalsow
%A Yeshayahu Artsy
%A Hung-Yang Chang
%A Prasun Dewan
%A Aaron J. Gordon
%A Bryan Rosenburg
%A Marvin H. Solomon
%A Cui-Qing Yang
%T Experience with Charlotte:
simplicity versus function in a distributed operating system
%R Technical Report #653
%I Computer Sciences Department, University of Wisconsin at Madison
%D July 1986
%X Abstract: This paper presents a retrospective view of the Charlotte
distributed operating system, which is intended as a testbed for developing
techniques and tools for exploiting large-grain parallelism to solve
computation-intensive problems. Charlotte was constructed over the course
of approximately 5 years, going through several distinct versions as the
underlying hardware and our ideas for implementation changed. Charlotte
rests on several underlying design decisions: (1) it is a software layer
on the Crystal multicomputer, (2) processes do not share memory, (3)
communication is on reliable, symmetric, bi-directional paths named by
capabilities, and (4) absolute information is stored at each end of a
communication path. Our implementation taught us that our dual goals of
simplicity and function were not easily reached. In particular, the issue
of simplicity is quite complex; quests for simplicity in various areas often
conflict with each other. This paper explores how the design decisions we
made to satisfy our goals incurred implementation cost and required extra
levels of software, but resulted in a high-quality testbed for
experimentation in distributed algorithm design.

%A Seymour V. Parter
%T Estimates for multigrid methods based on red-black Gauss-Seidel smoothings
%R Technical Report #654
%I Computer Sciences Department, University of Wisconsin at Madison
%D July 1986
%X Abstract: The MGR[v] algorithms of Ries, Trottenberg and Winter, the
algorithms 2.1 and 6.1 of Braess and the Algorithm 4.1 of Verfurth are all
multigrid algorithms for the solution of the discrete Poisson equation (with
Dirichlet boundary conditions) based on red-black Gauss-Seidel smoothing.
Both Braess and Verfurth give explicit numerical upper bounds on the rate of
convergence of their methods in convex polygonal domains.
In this work we reconsider these problems and obtain improved estimates for
the h-2h Algorithm 4.1 as well as W-cycle estimates for both schemes in
non-convex polygonal domains. The proofs do not depend on the strengthened
Cauchy inequality.

%A Yeshayahu Artsy
%A Hung-Yang Chang
%A Raphael Finkel
%T Processes migrate in Charlotte
%R Technical Report #655
%I Computer Sciences Department, University of Wisconsin at Madison
%D August 1986
%X Abstract: Process migration in a distributed operating system is a facility
to dynamically relocate processes among component computers. In recent
years, several studies have been conducted concerning the need for process
migration and the algorithms to perform it efficiently. Only a few
successful implementations of process migration have been reported.
A process-migration facility has been implemented in Charlotte, a
message-based distributed operating system. Process migration policy is
decided by a user-level utility called the Starter, while the mechanism is
handled by the kernel. A distributed squad of Starters can enforce regional
and global process migration policy. Several migration efforts can proceed
in the system concurrently. Migration can be aborted due to a change in the
load; the migrating process can be rescued in many cases of machine failure.
Migration is transparent to the migrating process and to other processes
communicating with it. Once a process is migrated out of a machine, no trail
of stubs or dangling links is left to interfere with future communication.
This paper gives an overview of Charlotte, discusses the design of the
process migration facility, details its implementation, and gives some
preliminary cost results. Process migration was implemented in mid-1985
and has been used experimentally since then.

%A Tobin Jon Lehman
%T Design and performance evaluation of a main memory relational database
system
%R Technical Report #656
%I Computer Sciences Department, University of Wisconsin at Madison
%D August 1986
%O Ph.D. thesis
%X Abstract: Most previous work in the area of main memory database systems
has focused on the problem of developing techniques that work well with a
very large buffer pool. This dissertation addresses the problems of
database architecture design, query processing, concurrency control, and
recovery for a memory resident relational database, an environment with a very
different set of costs and priorities. An architecture for a memory-resident
database system is presented, along with a discussion of the differences between
memory-resident database systems and conventional disk-based database
systems. Index structures are then studied for a memory-resident database
environment. The T Tree, a new index structure designed for use in this
environment, is introduced and compared with several existing index
structures: Arrays, AVL Trees, B Trees, Extendible Hashing, Linear Hashing,
Modified Linear Hashing and Chained Bucket Hashing. The T Tree is shown to
perform well in a memory-resident environment. Several of the index
structures are then used to examine relational join and projection
algorithms for a main memory database environment. Experimental results
show that a Merge Join algorithm that uses a T Tree index is usually the
best method, and that a simple Hash Join algorithm is usually second best.
Recovering a memory-resident database is different from recovering a
disk-oriented database, so a different approach is taken in this
dissertation. Existing proposals for memory-resident database recovery
treat the database as a single entity, so recovery and checkpoint
operations are applied to the entire database. A new design is proposed
that allows logging, checkpointing and recovery to be done at the relation
or index level, providing a form of demand recovery. After a crash
with undemanded relations being recovered by a background task. Finally, the
cost issues for concurrency control are different for a memory-resident
database system. Locking is more costly on a per database access basis, so it
must be made more efficient. Multiple granularity locking is desirable, but it would be too expensive if several levels of locks needed checking for every database reference. An algorithm is presented that uses locks with a dynamic level of granularity, with locks being escalated or de-escalated in size to meet
the system's concurrency requirements.

%A Allan Bricker
%A Tad Lebeck
%A Barton P. Miller
%T DREGS: a distributed runtime environment for game support
%R Technical Report #657
%I Computer Sciences Department, University of Wisconsin at Madison
%D August 1986
%X Abstract: DREGS, a distributed environment for game support, simplifies
the task of implementing multi-player games. DREGS solves the problems of
concurrency control, synchronization, and communication as they apply to
distributed games. DREGS provides support for the update and control of
replicated objects and uses a central arbitration scheme to enforce a
strict ordering of events. DREGS has been implemented to run under any
4.3BSD Unix compatible system and operates across a network of heterogeneous
architectures.
A game description language, GDL, provides a framework for
programming multi-player distributed games. GDL is a simple language that
generates complex distributed programs. It is an object-based language,
where objects are defined in terms of their input events and their
corresponding actions. The programmer writes the game as if it were a
serial program, without concern for concurrent activities. GDL also frees
the programmer from the details of device and network interfaces.
The combination of the DREGS runtime and GDL frees a game designer from the
distributed aspects of multi-player games. This freedom allows designers
to concentrate their efforts on better, more interesting games. DREGS has
been used to implement an N-way talk program, a tank game, and a flying
saucer space game.

%A Matthew T. Thazhuthaveetil
%T A structured memory access architecture for Lisp
%R Technical Report #658
%I Computer Sciences Department, University of Wisconsin at Madison
%D August 1986
%O Ph.D. thesis
%X Abstract: Lisp has been a popular programming language for well over 20
years. The power and popularity of Lisp are derived from its extensibility and
flexibility. These two features also contribute to the large semantic gap
that separates Lisp from the conventional von Neumann machine, typically
leading to the inefficient execution of Lisp programs This dissertation
investigates how the semantic gap can be bridged.
We identify function calling, environment maintenance, list access, and
heap maintenance as the four key run-time demands of Lisp programs, and
survey the techniques that have been developed to meet them in current Lisp
machines. Previous studies have revealed that Lisp list access streams
show spatial locality as well as temporal locality of access. While the
presence of temporal locality suggests the use of fast buffer memories, the
spatial locality displayed by a Lisp program is implementation dependent
and hence difficult for a computer architect to exploit. We introduce the
concept of structural locality as a generalization of spatial locality,
and describe techniques that were used to analyze the structural locality
shown by the list access streams generated from a suite of benchmark Lisp
programs. This analysis suggests architectural features for improved Lisp
execution.
The SMALL Lisp machine architecture incorporates these features. It
partitions functionality across two specialized processing elements whose
overlapped execution leads to efficient Lisp program evaluation.
Trace-driven simulations of the SMALL architecture reveal the advantages of
this partition. In addition, SMALL appears to be a suitable basis for the
development of a multi-processing Lisp system.

%A Udi Manber
%T Using mathematical induction to design computer algorithms
%R Technical Report #660
%I Computer Sciences Department, University of Wisconsin at Madison
%D August 1986
%X Abstract: An analogy between proving mathematical theorems and designing
computer algorithms is explored in this paper. This analogy provides an
elegant methodology for designing algorithms, explaining their behavior,
and understanding their key ideas. The paper identifies several
mathematical proof techniques, mostly based on induction, and presents
analogous algorithm design techniques. Each of these techniques is
illustrated by several examples of algorithms.

%A M. A. Sridhar
%T Efficient algorithms for multiple pattern matching
%R Technical Report #661
%I Computer Sciences Department, University of Wisconsin at Madison
%D August 1986
%O Ph.D. thesis
%X Abstract: We address the problem of finding an occurrence of one of
several given patterns in a given text string.
The Aho-Corasick algorithm solves this problem by constructing a modified
finite automaton and using this to scan the text string left to right.
This yields an algorithm that runs in time linear in the text length. The
Boyer-Moore algorithm scans the text right to left, looking for an
occurrence of one pattern. It has a sublinear average running time, and
can be modified to be linear-time in the worst case.
We extend the Boyer-Moore algorithm to handle multiple patterns. Two new
algorithms are developed and analyzed. Both operate by remembering
previous matches. Given a text string of length N and patterns with maximum
length D, the first algorithm remembers no more than 1 + log D previous
matches, and consults O(N log D) text characters. Algorithms for performing
the necessary preprocessing are also developed.
The second algorithm is designed for a different time-space tradeoff. For
any given k, it consults O(kN log D) text characters, and remembers no more
than t/k nonperiodic matches at any time, where t is the number of patterns.
The dominating feature of a sublinear average-case running time is retained
by both algorithms.

%A Dennis Draheim
%A Bart Miller
%A Steven Snyder
%T A reliable and secure Unix connection service
%R Technical Report #662
%I Computer Sciences Department, University of Wisconsin at Madison
%D August 1986
%X Abstract: Distributed programs require a method for processes residing on
different machines to identify each other and establish communication. One
method is to provide a special connection service to perform this task. A
good connection service should be easy to use. It should allow arbitrary
processes to connect to each other as well as helping client processes to
connect to server processes. It should provide location transparency; that
is, the programmer should not have to know the network address of a process
to connect to it. The connection service should be reliable. It should
provide a way for a process to establish the identity of the user
associated with the process to which it has connected, and to communicate
securely with that process.
We have implemented a connection service for Berkeley Unix that is
reliable, available, secure, and easy to use. The connection service
achieves ease of use through a simple interface based on the library
routine meet. Meet allows one process to connect to another by specifying
arbitrary names for itself and the other process. The connection service
imposes no naming conventions of its own so it can be used with most name
spaces and naming services. The service is location-transparent. It also
provides a routine for posting services.
Reliable and available service is provided by replicating connection
servers. Each server knows about all pending connection requests. The
connection service provides continuous service as long as at least one
server is running. Connections can be authenticated by an authentication
server that works in cooperation with the connection server. Secure
communication is achieved via the RSA public-key encryption algorithm.
The connection server was put in regular use in June 1986. Our limited
experience indicates that it satisfies an important need of Unix users.

%A David Kamowitz
%T Experimental results for multigrid and transport problems
%R Technical Report #663
%I Computer Sciences Department, University of Wisconsin at Madison
%D September 1986
%X Abstract: An experimental study of the applicability of a multigrid
algorithm to the solution of the neutron transport problem in a slab is
described. Only the simplest choices are made for the components of the
algorithm. Experimental results indicate that the coarse grid operator
obtained from finite differences works better and is cheaper than the
Galerkin choice.

%A Charles V. Stewart
%A Charles R. Dyer
%T A scheduling algorithm for the pipelined image-processing engine
%R Technical Report #664
%I Computer Sciences Department, University of Wisconsin at Madison
%D September 1986
%X Abstract: In this paper we present a heuristic scheduling algorithm for
the National Bureau of Standards' Pipelined Image-Processing Engine (PIPE).
PIPE is a special purpose machine for low-level image processing consisting
of a linearly connected array of processing stages. A program is specified
as a planar directed acyclic graph where each node represents an operation
on an image, and each arc represents an image output by one operation and
input to another. The algorithm uses the greedy approach to schedule
operations on a stage. It uses several heuristics to control the movement
of images between stages. The worst case time for the schedule generated
by the algorithm is O(N) times the optimal schedule, where N is the maximum
width of the graph. Several examples are given where the algorithm generates
a nearly optimal schedule.

%A Ze-Nian Li
%A Leonard Uhr
%T Comparative timings for a neuron recognition program
on serial and pyramid computers
%R Technical Report #665
%I Computer Sciences Department, University of Wisconsin at Madison
%D September 1986
%X Abstract: This paper examines the time needed by a program that recognizes
neurons in photomicrographs when it is executed on a conventional serial
computer (a VAX 11/750), and compares this with the time that would be
needed if it were executed on an appropriate parallel-serial pyramid
multi-computer. As the size of the image array increases from 64x64 to
4,096x4,096 the estimates of the pyramid's increases in speed grow from 350
times as fast as the VAX to 1,276,300 times as fast.

%A Prasun Dewan
%T Automatic generation of user interfaces
%R Technical Report #666
%I Computer Sciences Department, University of Wisconsin at Madison
%D September 1986
%O Ph.D. thesis
%X Abstract: In traditional interactive programming environments, each
application individually manages its interaction with the human user. The
result is duplication of effort in implementing user interface code and
non-uniform - hence confusing - input conventions. It would be useful if
the user interface of an application could be automatically generated. This
idea requires an application-independent model of user interaction together
with a programming environment that supports the model.
Recent work in user interface design has suggested that editing can be used as
a general model of interaction. This dissertation presents an approach that
supports this model.
This approach allows applications to be created as objects. An object is a
instance of a class that describes the data encapsulated by the object and the
methods to manipulate them. Between each object and the user is a dialogue
manager. The dialogue manager receives messages from the object, which name
variables that can be edited by the user. It displays the variables using the
data definition in the class of the object, offers the user a structure editing
interface to modify them, and sends new values back in messages to the object.
The object can then execute methods to make its internal data consistent with
the displayed data. Thus, from the point of view of the objects, the user
appears to be another object that can send and receive messages. From the
point of view of the user, the objects appear to be data that can be edited.
The dialogue manager acts as an intermediary between the object and the user,
translating between the languages of object interaction and user
interaction. A dialogue manager is provided automatically by the
environment.
The utility of our approach is demonstrated through discussion, examples,
and implementation of its major components.

%A Umakishore Ramachandran
%T Hardware support for interprocess communication
%R Technical Report #667
%I Computer Sciences Department, University of Wisconsin at Madison
%D September 1986
%O Ph.D. thesis
%X Abstract: Access to system services in traditional uniprocessor operating
systems are requested via protected procedure calls, whereas in
message-based operating systems they are requested via message passing.
Since message exchange is the basic kernel mechanism in message-based
operating systems, the performance of the system depends crucially on the
rate of message exchange. The advent of local area networking has sparked
interest in message-based operating systems.
One of the main problems in existing multicomputer message-based operating
systems has been the slowness of interprocess communication. This speed
limitation is often due to the processing overhead associated with message
passing. Previous studies implicitly assume that only communication
between processes on different nodes in the network is expensive. However,
we show that there is considerable processing overhead for local
communication as well. Therefore, we propose hardware support in the form
of a message coprocessor.
Our solution to the message-passing problem has two components: a software
partition, and a hardware organization. To determine an effective solution
we followed a three-step process: First, we profiled the kernels of four
operating systems to identify the major components of system overhead in
message passing. The results suggested a partitioning of the software
between the host and the message coprocessor. Second, we implemented such
a partition on a multiprocessor and measured its performance. Based on
these results, we proposed bus primitives for supporting the interactions
between the host, the message coprocessor, and the network devices. We
designed the bus in detail to show the feasibility of the primitives from
the point of view of hardware implementation. Through the simplicity of
the design, we show that our bus proposal is cost effective in this
environment. Finally, we used Timed Petri nets to model and analyze
several system architectures and show the effectiveness of our software
partition and hardware organization for solving the message-passing
problem.

%A Gilbert Verghese
%A Shekhar Mehta
%A Charles R. Dyer
%T Image processing algorithms for the pipelined image-processing engine
%R Technical Report #668
%I Computer Sciences Department, University of Wisconsin at Madison
%D September 1986
%X Abstract: In this paper we describe nine basic vision algorithms for the
National Bureau of Standards' Pipelined Image-Processing Engine (PIPE).
The algorithms presented are: local peak detection, median filtering,
thinning, the Hough transform for line detection, photometric stereo, n-bit
point operations, detecting edges at multiple resolutions, stereo vision,
and multiresolution, model-based object recognition.

%A Mary Vernon
%A John Zahorjan
%A Edward D. Lazowska
%T A comparison of performance Petri nets and queueing network models
%R Technical Report #669
%I Computer Sciences Department, University of Wisconsin at Madison
%D September 1986
%X Abstract: Queueing networks (QNs) and performance Petri nets (PPNs) are
two approaches to answering performance questions about computer systems.
While QNs were originally designed to model resource contention among
independent jobs and PPNs were designed to model parallel systems
containing synchronization, there is clearly some overlap in the systems
that can be modeled using each. In this paper we address this overlap with
the goal of indicating whether the two models are fundamentally the same,
or whether there are intrinsic differences that make one approach more
powerful than the other for particular application domains.
There seem to be two characteristics of a modeling approach that are most
important in determining how it may be profitably employed. The first is
notation: what models can be expressed, and perhaps more importantly, how
convenient is it to specify a particular class of models? The second
feature is the evaluation technique: what performance measures can be
computed from the model, and what computational effort is required to do
so? Our comparison of QNs and PPNs therefore concentrates on these two
aspects.
It is conceivable that the shortcomings of either approach in a particular
environment might be addressed by adopting features of the other better
suited to that environment. We consider this possibility for making
improvements to the notations and evaluation efficiencies of the two
modeling approaches.

%A Bryan S. Rosenburg
%T Automatic generation of communication protocols
%R Technical Report #670
%I Computer Sciences Department, University of Wisconsin at Madison
%D October 1986
%O Ph.D. thesis
%X Abstract: This dissertation describes an effort to improve
application-level communication efficiency by using information provided
by, and derived from, the application itself.
We define a simple language for describing the static form of conversations
between application processes. A conversation description serves as a form
of service specification for the conversation's underlying communication
protocol. It specifies the types of messages required by the conversation
and provides a regular expression that defines the set of legal sequences
of messages between the conversation's participants.
We also define structures we call plans that application processes can use
dynamically to participate in conversations. A plan is a regular
expression a process can construct to describe its desire to send or
receive messages allowed by its active conversations. The plan mechanism
is a generalization of the CSP guarded alternative construct. It allows
processes to describe future as well as immediate intentions.
Conversation descriptions and plans contain application-specific
information that can be used to enhance the efficiency of the application's
communication. Other useful information can be derived from measurements
made while the application is running. We present strategies for
collecting and using information from these sources. These strategies
attempt to use application-specific information to reduce the number of
low-level messages needed to accomplish the application's communication.
We have implemented a version of the protocol generation system that
supports application processes to be executed on the Crystal multicomputer.
We describe several typical applications and evaluate their performance.
We show that application-specific information from several sources can be
used to significantly improve the efficiency of the application's
communication.

%A Thomas Reps
%T Incremental evaluation for attribute grammars with unrestricted movement
between tree modifications
%R Technical Report #671
%I Computer Sciences Department, University of Wisconsin at Madison
%D October 1986
%X Abstract: This paper concerns the design of editors that perform checks on
a language's context-dependent constraints. Our particular concern is the
design of an efficient, incremental analysis algorithm for systems based on
the attribute-grammar model of editing.
With previous incremental evaluation algorithms for arbitrary noncircular
attribute grammars, the editing model required there to be a restriction on
the operation that moves the editing cursor: moving the cursor was limited
to just a single step in the tree - either to the parent node or to one of
the child nodes of the current cursor location. This paper describes a new
updating algorithm that can be used when an arbitrary movement of the
cursor in the tree is permitted. After an operation that restructures the
tree, the tree's attributes can be updated with a cost of O
((1 + |AFFECTED|) m) m), where m is the size of the tree and AFFECTED is the
subset of the tree's attributes that require new values, when the cost is
amortized over a sequence of tree modifications. The editing cursor may be
moved from its current location to any other node of the tree in a single,
unit-cost operation.

%A Robert Howard Gerber
%T Dataflow query processing using multiprocessor hash-partitioned algorithms
%R Technical Report #672
%I Computer Sciences Department, University of Wisconsin at Madison
%D October 1986
%O Ph.D. thesis
%X Abstract: In this thesis, we demonstrate that hash-partitioned query
processing algorithms can serve as a basis for a highly parallel, high
performance relational database machine. In addition to demonstrating that
parallelism can really be made to work in a database machine context, we
will show that such parallelism can be controlled with minimal overhead
using dataflow query processing techniques that pipeline data between
highly autonomous, distributed processes.
For this purpose, we present the design, implementation techniques, and
initial performance evaluation of Gamma, a new relational database machine.
Gamma is a fully operational prototype consisting of 20 VAX 11/750
computers. The Gamma architecture illustrates that a high performance
database machine can be constructed without the assistance of special
purpose hardware components.
Finally, a simulation model of Gamma is presented that accurately reflects
the measured performance of the actual Gamma prototype. Using this
simulation model, we explore the performance of Gamma for large
multiprocessor systems with varying hardware capabilities.

%A Raphael Finkel
%A Gautam Das
%A Dhruva Ghoshal
%A Kamal Gupta
%A Ganesh Jayaraman
%A Mukesh Kacker
%A Jaspal Kohli
%A Viswanathan Mani
%A Ananth Raghaven
%A Michael Tsang
%A Sriram Vajapeyam
%T Experience with Crystal, Charlotte and Lynx - third report
%R Technical Report #673
%I Computer Sciences Department, University of Wisconsin at Madison
%D November 1986
%X Abstract: This paper describes several recent implementations of
distributed algorithms at Wisconsin that use the Crystal multicomputer, the
Charlotte operating system, and the Lynx language. This environment is an
experimental testbed for design of such algorithms. Our report is meant to
show the range of applications that we have found reasonable in such an
environment and to give some of the flavor of the algorithms that have
been developed. We do not claim that the algorithms are the best possible
for these problems, although they have been designed with some care. In
several cases they are completely new or represent significant
modifications of existing algorithms. We present distributed
implementations of the stable marriage problem, finding roots of an
equation, Gaussian elimination, finding minimal dominating sets, PLA
folding, the Hough transform, the Banker's algorithm, the n-queens problem,
and quick-sort. Together with our previous two reports, this paper leads
us to conclude that the environment is a valuable resource and will
continue to grow in importance in developing new algorithms.

%A Susan Horwitz
%T Adding relational databases to existing software systems: implicit relations
and a new relational query evaluation method
%R Technical Report #674
%I Computer Sciences Department, University of Wisconsin at Madison
%D November 1986
%X Abstract: Interactive software systems should include query handlers.
Query handlers based on the relational database model are attractive
because the model provides a uniform, non-procedural approach to query
writing. Standard relational database systems require that all information
be stored in relations; however, the data structures used by existing
software systems are generally non-relational, and it is impractical to
replace them with relations.
A new kind of relations, implicit relations, and a new approach to query
evaluation based on the use of access functions allow software systems to
include relational query facilities without giving up existing non-relational
data structures. The new query-evaluation method can also be used in
traditional relational databases, and may be more efficient than traditional
evaluation methods when applied to queries that use set operations.

%A R. J. Chen
%A R. R. Meyer
%T A scaled trust region method for a class of convex optimization problems
%R Technical Report #675
%I Computer Sciences Department, University of Wisconsin at Madison
%D December 1986
%X Abstract: Piecewise-linear approximation of nonlinear convex objective
functions in linearly constrained optimization produces subproblems that
may be solved as linear programs. This approach to approximation may be
used for nonseparable as well as separable functions, and for the former
class (the focus of this paper), it lies between linear and quadratic
approximation, we consider two devices: rectangular trus regions and
dynamic scaling. The use of rectangular trust regions in conjunction with
the type of piecewise-linear approximation considered here actually serves
to simplify rather than complicate the approximating problems. This is a
result of the equivalence of the trust region and the use of a limited
number of segments in the approximation. The approach to dynamic scaling
considered here may be applied to problems in which each objective function
term is a convex function of a linear function of the variables. This
scaling device allows the algorithm to adjust the approximation between an
underestimating function (corresponding to a linear approximation) and an
overestimating function (the nonseparable analog of the overestimate
associated with separable approximation of convex functions.) The scaling
factor is adjusted in accordance with the acceptance criteria associated
with the trust region method.
Computational experience is cited for some large-scale problems arising
from traffic assignment applications. The algorithm considered here also
has the property that it allows such problems to be decomposed into a set
into a set of smaller optimization problems at each major iteration. These
smaller problems correspond to linear single-commodity networks, and may be
solved in parallel. Results are given for the distributed solution of
these problems on the CRYSTAL multicomputer.

%A Anil Allan Pal
%T Generating execution facilities for integrated programming environments
%R Technical Report #676
%I Computer Sciences Department, University of Wisconsin at Madison
%D November 1986
%O Ph.D. thesis
%X Abstract: This thesis presents an approach to the problem of generating
execution facilities for integrated programming environments from
specifications of the dynamic semantics of programming languages. The
approach is based on techniques used in semantics-directed compiler
generators, using a denotational semantic description of the language.
These techniques are adapted to the special nature of an integrated
programming environment, in particular the need to provide incremental
translation and interactive execution.
In interpreters generated using our system, programs are translated into
denotations that are represented as linked structures containing pointers
to the compiled code of denotation functions. This representation is
compact, provides reasonable execution efficiency, and is easy to maintain
incrementally during program modification.
The correspondence between the executable representation and the parse tree
of the program can be exploited to permit the user to interact with the
program at runtime in terms of source-language constructs, thus providing
support for interactive execution. We show how many of the features of
previous hand-coded integrated programming environments can be incorporated
naturally into the generated execution facilities.

%A Mitali Bhattacharyya
%A David Cohrs
%A Barton Miller
%T Implementation of a visual Unix process connector
%R Technical Report #677
%I Computer Sciences Department, University of Wisconsin at Madison
%D December 1986
%X Abstract: UPCONN is a tool used by a programmer to visually describe the
connections between the processes in a distributed program and to execute
the distributed program. With UPCONN, the implementation of processes in a
program is separated from the description of the connections between them.
The programmer is provided with a screen editor which enables processes to
be described and allows the connections between these processes to be
specified. When a distributed program is described, UPCONN allows the user
to execute the program or save the program for later use.
UPCONN has a modular design which makes it easily extendible and allows its
parts to be used independently. A library of processes and procedures is
included to perform specialized tasks, such as message monitoring and file
access. The library provides a method for adding new features to UPCONN.
Several existing Unix utilities can be linked in with UPCONN to provide a
variety of functions. UPCONN is implemented under 4.3BSD Unix and runs on
a network of homogeneous computers.

%A Ze-Nian Li
%A Leonard Uhr
%T Pyramid vision using key features to integrate image-driven bottom-up and
model-driven top-down processes
%R Technical Report #678
%I Computer Sciences Department, University of Wisconsin at Madison
%D December 1986
%X Abstract: Pyramid-like parallel hierarchical structures have been shown to
be suitable for many computer vision tasks, and to have the potential for
achieving the speeds needed for the real-time processing of real-world
images. We are developing algorithms to explore the pyramid's massively
parallel and shallowly serial-hierarchical computing ability in an
integrated system that combines both low level and higher level vision
tasks.
Micro-modular transforms are used to embody the program's knowledge of the
different objects it must recognize. This paper describes pyramid vision
programs that, starting with the image, use transforms that assess key
features to dynamically imply other feature-detecting and characterizing
transforms, and additional top-down, model-driven processes to apply.
Program performance is presented for four real-world images of buildings.
The use of key features in pyramid vision programs and the related search
and control issues are discussed. To expedite the detection of various key
features, feature-adaptable windows are developed. In addition to
image-driven bottom-up and model-driven top-down processing, lateral search
is used, and is shown to be helpful, efficient, and feasible. The results
indicate that, with the use of key features and the combination of a
variety of powerful search patterns, the pyramid-like structure is
effective and efficient for supporting parallel and hierarchical object
recognition algorithms.

%A Charles R. Dyer
%T Multiscale image understanding
%R Technical Report #679
%I Computer Sciences Department, University of Wisconsin at Madison
%D December 1986
%X Abstract: This paper reviews methods for understanding multiscale (also
called multiresolution) image descriptions. We include work related to the
construction and analysis of image representations which make explicit
properties of edges, shape and texture at multiple scales. In addition we
present two applications of multiscale techniques for model-based object
recognition and texture segmentation.

%A Yannis E. Ioannidis
%A Eugene Wong
%T Query optimization by simulated annealing
%R Technical Report #693
%I Computer Sciences Department, University of Wisconsin at Madison
%D April 1987

%A Will Winsborough
%T A minimal function graph semantics for logic programs
%R Technical Report #711
%I Computer Sciences Department, University of Wisconsin at Madison
%D August 1987
%P 19