%A J.T. Hsieh %A A.R. Pleszkun %A J.R. Goodman %T Performance evaluation of the PIPE computer architecture %R Technical Report #? %I Computer Sciences Department, University of Wisconsin at Madison %D November 1984 %A R. Agrawal %A D.J. DeWitt %T Whither hundreds of processors in a database machine? %R Technical Report #? %I Computer Sciences Department, University of Wisconsin at Madison %D January 1984 %A H. Boral %A D.J. DeWitt %T A methodology for database system performance evaluation %R Technical Report #? %I Computer Sciences Department, University of Wisconsin at Madison %D January 1984 %A R. Agrawal %A D.J. DeWitt %T Recovery architectures for multiprocessor database machines %R Technical Report #573 %I Computer Sciences Department, University of Wisconsin at Madison %D January 1985 %A B.P. Miller %T Parallelism in distributed programs: measurement and prediction %R Technical Report #574 %I Computer Sciences Department, University of Wisconsin at Madison %D January 1985 %A J.R. Goodman %A H.C. Young %T Code scheduling methods for some architectural features in PIPE %R Technical Report #579 %I Computer Sciences Department, University of Wisconsin at Madison %D February 1985 %A J.R. Goodman %T Cache memory optimization to reduce processor/memory traffic %R Technical Report #580 %I Computer Sciences Department, University of Wisconsin at Madison %D February 1985 %A H-T. Chou %A D.J. DeWitt %T An evaluation of buffer management strategies for relational database systems %R Technical Report #584 %I Computer Sciences Department, University of Wisconsin at Madison %D February 1985 %A M.J. Carey %A D.J. DeWitt %T Extensible database systems %R Technical Report #585 %I Computer Sciences Department, University of Wisconsin at Madison %D February 1985 %A M.J. Carey %A H. Lu %T Some experimental results on distributed join algorithms in a local network %R Technical Report #587 %I Computer Sciences Department, University of Wisconsin at Madison %D March 1985 %A J.T. Hsieh %A A.R. Pleszkun %A M.K. Vernon %T Performance evaluation of a pipelined VLSI architecture using the graph model of behaviour (GMB) %R Technical Report #589 %I Computer Sciences Department, University of Wisconsin at Madison %D March 1985 %A R. Agrawal %A M.J. Carey %A L.W. McVoy %T The performance of alternative strategies of dealing with deadlocks in database management systems %R Technical Report #590 %I Computer Sciences Department, University of Wisconsin at Madison %D March 1985 %A B.P. Miller %T A measurement system for distributed programs %R Technical Report #592 %I Computer Sciences Department, University of Wisconsin at Madison %D April 1985 %A M.A. Holliday %A M.K. Vernon %T A generalized timed Petri net model for performance analysis %R Technical Report #593 %I Computer Sciences Department, University of Wisconsin at Madison %D May 1985 %A M.A. Holliday %A M.K. Vernon %T Exact performance estimates for multiprocessor memory and bus interference %R Technical Report #594 %I Computer Sciences Department, University of Wisconsin at Madison %D May 1985 %A R.R. Meyer %T Network optimization %R Technical Report #595 %I Computer Sciences Department, University of Wisconsin at Madison %D May 1985 %A M.L. Scott %T Design and implementation of a distributed systems language %R Technical Report #596 %I Computer Sciences Department, University of Wisconsin at Madison %D May 1985 %A H-T. Chou %T Buffer management of database systems %R Technical Report #597 %I Computer Sciences Department, University of Wisconsin at Madison %D May 1985 %A Honesty Cheng Young %T Evaluation of a decoupled computer architecture and the design of a vector extension %R Technical Report #603 %I Computer Sciences Department, University of Wisconsin at Madison %D July 1985 %X Abstract: To meet increasing demands for computing power, systems must exploit parallelism at all levels. A balance between the processing speeds of a machine's subsystems is critical for overall system performance. A high performance system must have a fast scalar mode as well as a vector mode which reduces the fraction of work that the scalar mode must process. PIPE (Parallel Instructions and Pipelined Execution) is a very high performance computer architecture intended for a heavily pipelined VLSI implementation. The primary goal of the PIPE architecture is fast execution of scalar programs. PIPE has several novel features in order to meet this goal. These features include architectural queues which reduce the effective memory accessing delay; a prepare-to-branch instruction which decreases the penalty incurred by branches; and decoupled execution mode which increases the instruction initiation rate. In this study, a compiler (for a subset of Pascal) has been developed to evaluate the effectiveness of these architectural features. A code scheduler takes advantage of the architectural queues and the prepare-to-branch instruction. Software pipelining (which can prefetch operands across loop boundaries) also takes advantage of these features. A code generator generates two instruction streams (of sequential tasks), that run on the two processors required for PIPE's decoupled mode. A queue-based vector extension is proposed. This extension has the following characteristics: (1) two level control used to support out-of-order instruction initiation, (2) multiple classes of registers, (3) a very short vector start-up time (one clock period), (4) branch instructions used only for implementing high level language control structures, (5) elements of a queue may be read repeatedly, and (6) easily vectorizable property. We demonstrate that the original PIPE architecture supports fast execution of scalar programs, and that the vector extension facilitates vectorization. %A Carl deBoor %A Klaus Hollig %A Sherman Riemenschneider %T Convergence of cardinal series %R Technical Report #604 %I Computer Sciences Department, University of Wisconsin at Madison %D August 1985 %X Abstract: The result of this paper is a generalization of our characterization of the limits of multivariate cardinal splines. Let $~~M sub n~~$ denote the $n$-fold convolution of a compactly supported function $~~M~ epsilon ~L sub 2 (R sup d )~~$ and denote by .EQ S sub n~:=~"{"sum from {j epsilon Z sup d}~c(j)M sub n ( cdot ~-~j)~:~c~ epsilon ~l sub 2 (Z sup d )"}" .EN the span of the translates of $~~M sub n$. We prove that there exists a set $~~OMEGA~~$ with $~~vol sub d ( OMEGA )~=~(2 pi ) sup d~~$ such that for any $~~f epsilon ~L sub 2 (R sup d ),$ .EQ dist(f, S sub n )~->~0~~as~~n~->~ inf , .EN if and only if the support of the Fourier transform of $f$ is contained in $~~OMEGA bar$. %A Tobin J. Lehman %A Michael J. Carey %T A study of index structures for main memory database management systems %R Technical Report #605 %I Computer Sciences Department, University of Wisconsin at Madison %D July 1985 %X Abstract: One approach to achieving high performance in a database management system is to store the database in main memory rather than on disk. One can then design new data structures and algorithms oriented towards making efficient use of CPU cycles and memory space rather than minimizing disk accesses and using disk space efficiently. In this paper we present some results on index structures from an ongoing study of main memory database management systems. We propose a new index structure, the T-tree, and we compare it to existing index structures in a main memory database environment. Our results indicate that the T-tree provides good overall performance in main memory. %A O. L. Mangasarian %A T.-H. Shiau %T Error bounds for monotone linear complementarity problems %R Technical Report #606 %I Computer Sciences Department, University of Wisconsin at Madison %D July 1985 %X Abstract: We give a bound on the distance between an arbitrary point and the solution set of a monotone linear complementarity problem in terms of a condition constant which depends on the problem data only and a residual function of the violations of the complementarity problem conditions by the point considered. When the point satisfies the linear inequalities of the complementarity problem, the residual consists of the complementarity condition $~~x(Mx~+~q)~~$ plus its square root:$~~(x(Mx~+~q)) sup half $. This latter term is essential and without which the error bound cannot hold. We also show that another natural residual that has been employed to bound errors for strictly monotone linear complementarity problems, fails to bound errors for the monotone case considered here. %A Pudukkottai K. Subramanian %T Iterative methods of solution for complementarity problems %R Technical Report #607 %I Computer Sciences Department, University of Wisconsin at Madison %D June 1985 %X Abstract: Many problems in optimization theory such as linear programming, quadratic programming and problems arising in diverse areas such as economic equilibria, electronic circuit simulation and free boundary problems can be formulated as complementarity problems. It is well known that where the matrices involved are large sparse matrices, the usual pivoting methods are not very efficient and sometimes even fail. This thesis is a contribution to the ongoing research in the area of iterative methods for the solution of linear and nonlinear complementarity problems. We begin by considering complementarity problems where the operators are monotone and consider their Tihonov regularizations. We obtain bounds for the solutions of the perturbed problems and in particular, estimates for the rate of growth to these solutions. In the case of linear complementarity problems with positive semidefinite matrices, these results reduce the solution of the LCP to the solution of a sequence of positive definite symmetric quadratic programs. We propose SOR (successive overrelaxation) algorithms to solve these subproblems. In the case of complementarity problems with nonempty interior, we show that given any preassigned feasibility tolerance our algorithm solves the problem by solving a Tihonov regularization problem for a single value of the parameter. We consider monotone complementarity problems as fixed point problems. We prove convergence of iterative algorithms which find fixed points of carefully constructed projection maps using summability methods. Since the solution of the nonlinear complementarity problem is equivalent to the solution of a system of nonlinear equations, we develop damped Gauss-Newton methods which under appropriate hypotheses solve this system with a local quadratic convergence rate. We present computational results which show that the use of SOR methods in conjunction with the Gauss-Newton methods is computationally effective. %A David P. Anderson %A Lawrence H. Landweber %T A grammar based methodology for protocol specification and implementation %R Technical Report #608 %I Computer Sciences Department, University of Wisconsin at Madison %D July 1985 %X Abstract: A new methodology for specifying and implementing communication protocols is presented. This methodology is based on a formalism called "Real-Time Asynchronous Grammars" (RTAG), which uses a syntax similar to that of attribute grammars to specify allowable message sequences. In addition RTAG provides mechanisms for specifying data-dependent protocol activities, real-time constraints, and concurrent activities within a protocol entity. RTAG encourages a top-down approach to protocol design that can be of significant benefit in expressing and reasoning about highly complex protocols. As an example, an RTAG specification is given for part of the Class 4 ISO Transport Protocol (TP-4). Because RTAG allows protocols to be specified at a highly detailed level, major parts of an implementation can be automatically generated from a specification. An RTAG parser can be written which, when combined with an RTAG specification of a protocol and a set of interface and utility routines, constitutes an implementation of the protocol. To demonstrate the viability of RTAG for implementation generation, an RTAG parser has been integrated into the kernel of the 4.2 BSD Unix operating system, and has been used in conjunction with the RTAG TP-4 specification to obtain an RTAG-based TP-4 implementation in the DoD Internet domain. %A Seymour V Parter %T On the distribution of the singular values of Toeplitz matrices %R Technical Report #609 %I Computer Sciences Department, University of Wisconsin at Madison %D August 1985 %X Abstract: In 1920, G. Szego proved a basic result concerning the distribution of the eigenvalues $"{"lambda sub j sup (n) "}"~~$ of the Toeplitz sections $~~T sub n [f]~~$ where $f( THETA )~ epsilon ~L sub inf (- pi ,~ pi )~~$ is a real-valued function. Simple examples show that this result cannot hold in the case where $~~f( THETA )~~$ is not real valued. In this note, we give an extension of this theorem for the singular values of $T sub n [f]~~$ when $~~f( THETA )~=~f sub 0 ( THETA ) R sub 0 ( THETA )~~$ with $~~f sub 0 ( THETA )~~$ real-valued and $R sub 0 ( THETA )~~$ continuous, periodic (with period $2 pi )~$ and $~~|R sub 0 ( THETA )|~=~1$. In addition, we apply the basic theorem of Szego to resolve a question of C. Moler. %A Seymour V. Parter %T On an estimate for the three-grid MGR multigrid method %R Technical Report #610 %I Computer Sciences Department, University of Wisconsin at Madison %D August 1985 %X Abstract: The MGR $[ nu ]$ algorithm of Ries, Trottenberg and Winter, Algorithm 2.1 of Braess and Algorithm 4.1 of Verfurth are all algorithms for the numerical solution of the discrete Poisson equation based on red-black Gauss-Seidel smoothing iterations. In this work we consider the extension of the MGR[0] method to the general diffusion equation $~~-\(gr cdot p \(gr u~=~f$. In particular, for the three grid scheme we extend an interesting and important result of Ries, Trottenberg and Winter whose results are based on Fourier analysis and hence intrinsically limited to the case where $~~OMEGA~~$ is a rectangle. Let $~~OMEGA~~$ be a general polygonal domain whose sides have slope $~~+- 1~,~~0~~$ and $~~inf .~~$ Let $~~epsilon sup 0~~$ be the error before a single multigrid cycle and let $~~epsilon sup 1~~$ be the error after this cycle. Then $|| epsilon sup 1 || sub L sub h ~<= ~ 1 over 2 (1+Kh)|| epsilon sup 0 || sub L sub h~~$ where $||~~|| sub L sub h~~$ denotes the energy or operator norm. When $~~p(x,y)~==~$ constant, then $~~K~==~0$. %A Aaron Jonathan Gordon %T Ordering errors in distributed programs %R Technical Report #611 %I Computer Sciences Department, University of Wisconsin at Madison %D August 1985 %X Abstract: With processors becoming small and inexpensive, many researchers attempt to decrease program runtime by combining processors into a multicomputer and running programs distributed over these processors. Debugging in a distributed environment is different from debugging on a uniprocessor. On a uniprocessor, the order in which a process's events occur is deterministic. In a distributed environment events occur concurrently on different processors. The order in which events occur cannot be easily determined; a program that works correctly one time may fail subsequently if the timing between processors changes. Traditional methods of debugging (such as putting in print statements and recompiling the program or recompiling the program with a debug flag on) are inadequate since they change the program and therefore change the timing. For this research, I have investigated distributed program bugs that depend on the relative order between events. These ordering errors include events which always occur in the wrong order and events whose order of occurrence is time-dependent. In this research, I characterize these timing errors and show necessary conditions for their occurrence. Using my model of a distributed system, I prove which features can be used in suggestions to those writing distributed programs, developing distributed programming languages and designing distributed operating systems. I then explain drawbacks to preventing ordering errors and show ways to detect them as they occur. Finally, I describe a tool (called TAP) to aid the programmer in discovering the causes of ordering errors in running programs. I also show that TAP is useful in finding other types of distributed program bugs. %A David P. Anderson %T A grammar-based methodology for protocol specification and implementation %R Technical Report #612 %I Computer Sciences Department, University of Wisconsin at Madison %D September 1985 %X Abstract: A new methodology for specifying and implementing communication protocols is presented. This methodology is based on a formalism called "Real-Time Asynchronous Grammars" (RTAG), which uses a syntax similar to that of attribute grammars to specify allowable message sequences. In addition RTAG provides mechanisms for specifying data-dependent protocol activities, real-time constraints, and concurrent activities within a protocol entity. RTAG encourages a top-down approach to protocol design that can be of significant benefit in expressing and reasoning about highly complex protocols. As an example, an RTAG specification is given for part of the Class 4 NBS Transport Protocol (TP-4). Because RTAG allows protocols to be specified at a highly detailed level, major parts of an implementation can be automatically generated from a specification. An RTAG parser can be written which, when combined with an RTAG specification of a protocol and a set of interface and utility routines, constitutes an implementation of the protocol. To demonstrate the viability of RTAG for implementation generation, an RTAG parser has been integrated into the kernel of the 4.2 BSD Unix operating system, and has been used in conjunction with the RTAG T-4 specification to obtain a working TP-4 implementation in the DOD Internet environment. %A Barton P. Miller %A Cui-qing Yang %T Measuring distributed programs: a hierarchical and interactive approach %R Technical Report #613 %I Computer Sciences Department, University of Wisconsin at Madison %D September 1985 %X Abstract: We have designed an interactive tool, called IPS, for performance measurement and analysis of distributed programs. IPS is based on a hierarchical model of distributed computation which maps the program's behavior to different levels of abstraction. The hierarchical model unifies performance data from the whole program level down to procedure and statement level. Users are able to maneuver through the hierarchy and analyze the program measurement results at various levels of details. IPS allows the programmer to interactively evaluate the performance history of a distributed program. Because of the regular structure of the hierarchy, IPS can provide information to guide the programmer to the cause of performance bottlenecks. Critical path analysis is used, in conjunction with simple performance metrics to direct the programmer in identifying bottlenecks. %A Yaoshuang Qu %A Lawrence H. Landweber %A Miron Livny %T Performance modeling and optimization of PaRing %R Technical Report #614 %I Computer Sciences Department, University of Wisconsin at Madison %D September 1985 %X Abstract: This paper introduces the PaRing, a new token ring architecture, and studies analytical approaches for modeling and optimizing its performance. The PaRing is distinguished from earlier ring architectures by its ability to support concurrent multiple communication paths on a single loop in an efficient and fair manner. A multiple server queueing network model, in which a communication path is viewed as being served by a server, is used to determine packet waiting time for symmetric traffic. The key to the modeling is to convert the multiple system into multiple independent and identical single server systems. The model is verified by simulation. The performance of the PaRing depends on the number of concurrent paths. An optimization method is developed to maximize the number of concurrent paths. %A Klaus Hollig %A Charles A. Micchelli %T Divided differences, hyperbolic equations and lifting distributions %R Technical Report #615 %I Computer Sciences Department, University of Wisconsin at Madison %D September 1985 %X Abstract: We discuss the relationship between divided differences, fundamental function of hyperbolic equations, multivariate interpolation and polyhedral splines. %A Yaoshuang Qu %A Lawrence H. Landweber %A Miron Livny %T PaRing: a token ring local area network with concurrency %R Technical Report #616 %I Computer Sciences Department, University of Wisconsin at Madison %D September 1985 %X Abstract: In this paper we introduce the PaRing, a token ring architecture. The PaRing ring is distinguished from earlier ring architectures by its ability to concurrently support multiple communication paths on a single loop in an efficient and fair manner. This is accomplished by allowing more than one station to transmit a packet at a time. The result is higher throughput and shorter response time than is the case for the standard token ring or the register insertion ring. %A Koujuch Liou %T Design of pipelined memory systems for decoupled architectures %R Technical Report #617 %I Computer Sciences Department, University of Wisconsin at Madison %D October 1985 %X Abstract: Design of memory systems for decoupled architectures is the theme of this dissertation. The decoupled architecture uses hardware queues to architecturally decouple memory request generation from algorithmic computation. This results in an implementation that has two separate instruction streams that communicate via hardware queues. Thus, performance is improved through parallelism and efficient memory referencing. Techniques for increasing memory bandwidth and algorithms for servicing memory requests are incorporated into the memory system designs within these two constraints: (1) the operands placed in the hardware queue must be in the correct order, and (2) the needed operands are the only operands that can be placed in the hardware queue. Techniques such as pipelining, interleaving, servicing requests out of arrival order, and cache memory are investigated. Two strategies for servicing memory requests are studied: (1) to service requests according to their priorities, and (2) to minimize the total request service time. For the first strategy, the priority of each request type is derived from the characteristics of memory reference and possible bottleneck during decoupled computations. The second strategy results in a request scheduling policy, Free-Module-Request-First, that is proven to be able to minimize the total request service time. A sequence control scheme must be used with the Free-Module-Request-First scheduling policy in order to deliver the memory outputs to the hardware queue in the correct order. This sequence control scheme is also used to track cache hits and misses, so that a data cache can be implemented in the memory system without difficulty. The designed data cache can not only support flexible fetch and replacement cache algorithms, it can also detect memory access hazards and short-circuit the Read-After-Write requests. Therefore, the penalty of memory access hazards can be greatly reduced. The combination of the designed data cache and the pipelined interleaved memory system using Free-Module-Request-First scheduling policy results in a high-performance memory system, that is capable of servicing memory nearly no conflict delay under the particular workload defined in the trace files. %A Mary K. Vernon %A Mark A. Holliday %T Performance analysis of multiprocessor cache consistency protocols using generalized timed Petri nets %R Technical Report #618 %I Computer Sciences Department, University of Wisconsin at Madison %D November 1985 %X Abstract: We use an exact analytical technique, based on Generalized Timed Petri Nets (GTPNs), to study the performance of shared bus cache consistency protocols for multiprocessors. We develop a general framework within which the key characteristics of the Write-Once protocol and four enhancements that have been combined in various ways in the literature can be identified and evaluated. We then quantitatively assess the performance gains for each of the four enhancements. We consider three levels of data sharing in our workload models. One of the enhancements substantially improves system performance in all cases. Two enhancements are shown to have negligible effect over the range of workloads analyzed. The fourth enhancement shows a small improvement for low levels of sharing, but shows more substantial improvement as sharing is increased, if we assume a "good access pattern". The effects of two architectural parameters, the blocksize and the main memory cycle time are also considered. %A Wei-Chung Hsu %T Register allocation for VLSI processors %R Technical Report #619 %I Computer Sciences Department, University of Wisconsin at Madison %D November 1985 %X Abstract: The performance of a single-chip CPU is often restricted by limited off-chip memory bandwidth. In this report, we study how to effectively use registers - one kind of on-chip memory - to reduce off-chip memory access. To use a limited number of registers efficiently, a good register allocation should handle spilling effectively. We study the minimization of Load/Store instructions in local register allocation. Both an optimal and an efficient heuristic algorithm are proposed for local register allocation. We also suggest the use of a trace optimization technique for global register allocation. %A Klaus Hollig %T Multivariate splines %R Technical Report #620 %I Computer Sciences Department, University of Wisconsin at Madison %D December 1985 %X Abstract: Multivariate B-splines are defined as volume densities of convex polyhedra. Two special cases, simplex splines and box splines, give rise to natural generalizations of univariate spline functions. While simplex splines yield smooth piecewise polynomial spaces on fairly general triangular meshes, box splines correspond to regular triangulations and share many of the computationally attractive features of tensor products. In this paper, the basic properties of these new classes of spline functions are discussed as well as their application to surface approximation. %A Noga Alon %A Amnon Barak %A Udi Manber %T On disseminating information reliably without broadcasting %R Technical Report #621 %I Computer Sciences Department, University of Wisconsin at Madison %D December 1985 %X Abstract: A general scheme for collecting information in a multicomputer system is presented. The scheme allows data to be exchanged efficiently and quickly among many machines without broadcasting. In addition, certain operations can be performed on the data while it is being collected. Several applications to distributed computing are discussed. As a byproduct of the combinatorial part of the paper, a problem of Babai and Erdos in combinatorial group theory is solved. %A Carl de Boor %A Klaus Hollig %T B-splines without divided differences %R Technical Report #622 %I Computer Sciences Department, University of Wisconsin at Madison %D December 1985 %X Abstract: This note develops the basic B-spline theory without using divided differences. Instead, the starting point is the definition of B-splines via recurrence relations. This approach yields very simple derivations of basic properties of spline functions and algorithms. %A David Kamowitz %T SOR and MGR[$nu$] experiments on the Crystal multicomputer %R Technical Report #623 %I Computer Sciences Department, University of Wisconsin at Madison %D December 1985 %X Abstract: This report describes implementations of the red/black SOR algorithm and of the MGR[$nu$] multigrid algorithm on the Crystal multicomputer. Rates of convergence and observed efficiencies for both algorithms are compared. %A Hongjun Lu %T Distributed query processing with load balancing in local networks %R Technical Report #624 %I Computer Sciences Department, University of Wisconsin at Madison %D December 1985 %X Abstract: This thesis presents a new approach to distributed query processing in locally distributed database systems, load-balanced query processing (LBQP), which integrates distributed query processing and load balancing. Several observations about previous research work in distributed query processing motivated this study. First, only a few query processing algorithms have been developed specifically for distributed databases based on local networks. Second, the use of multiple copies of data to improve performance in a distributed database system has not been addressed by most existing algorithms. Finally, and perhaps most importantly, existing query optimization algorithms have considered only the static characteristics of a distributed database. The algorithms reported here consider the dynamic load status of the system and the existence of multiple copies of data to provide better performance than is achievable with purely static planning techniques. The dynamic query allocation problem for a distributed database system with fully-replicated data was studied first using simulation. Two new heuristic algorithms were proposed for dynamically choosing a processing site for a newly arrived query in a local network environment. Both of these heuristics use knowledge about queries obtained during query optimization, such as their estimated CPU and I/O requirements. A simulation model was developed and used to study these heuristics and compare them to an algorithm that simply balances the number of jobs at each site. The results of this study indicate that knowledge of query processing requirements can be used effectively to improve overall system performance through dynamic query allocation. In order to obtain empirical results regarding distributed query processing in local area networks, a testbed was built using an experimental distributed system, the Crystal multicomputer, to conduct experiments on the performance of distributed join algorithms. Eight different distributed join methods were implemented using the testbed. Join queries with a variety of relation sizes, join selectivities, and join column value distributions were experimentally studied. The performance results obtained indicate that pipelined join methods outperform sequential methods over a wide range of join queries. It was also found that the communications costs in a local network environment are certainly not a dominant factor with respect to performance. A three-phase load-balanced query processing algorithm, Algorithm LBQP, was developed based on these experimental results and the results of the study of dynamic query allocation. This algorithm first statically generates a processing plan for a query in a locally distributed database system, ignoring the physical storage sites of the relations referenced by the query. A dynamic query unit allocation algorithm is then applied to the plan to determine the processing sites for each relation. Finally, specific processing methods for the distributed joins in the resulting plan are selected. A model of distributed database systems with partially-replicated data was used to investigate the performance of the dynamic query unit algorithm of LBQP. The results show significant improvements in performance, including improvements in both the mean waiting time for queries and the overall system throughput. %A Michael J. Carey %A Hongjun Lu %T Load balancing in a locally distributed database system %R Technical Report #625 %I Computer Sciences Department, University of Wisconsin at Madison %D December 1985 %X Abstract: Most previous work on query optimization in distributed database systems has focused on finding optimal or near-optimal processing plans based solely on static system characteristics, and few researchers have addressed the problem of copy selection when data is replicated. This paper describes a new approach to query processing for locally distributed database systems. Our approach uses load information to select the processing site(s) for a query, dynamically choosing from among those sites that have copies of relations referenced by the query. Query compilation is used to produce a statistically-optimized logical plan for the query, and then a dynamic optimization phase converts this logical plan into an executable physical plan at run-time. This paper motivates the separation of static and dynamic optimization, presents algorithms for the various phases of the optimization process, and describes a simulation study that was undertaken to investigate the performance of this approach. Our simulation results indicate that load-balanced query processing can provide improvements in both query response times and overall system throughput as compared to schemes where execution sites are either statically or randomly selected. %A R. R. Meyer %T Trust region methods for piecewise-linear approximation %R Technical Report #626 %I Computer Sciences Department, University of Wisconsin at Madison %D December 1985 %X Trust region methods are analyzed for piecewise-linear approximation algorithms for linearly-constrained nonlinear optimization. Convergence to the optimal value is demonstrated for continuously differential convex objectives and for certain classes of nondifferentiable convex objectives. Computationally, this approach has the nice property that the approximation is generally more accurate than a linear approxmation yet the subproblems to be solved at each iteration remain linear programs. The method is also well-suited to convex network optmization, since it preserves the network structure of the subproblems, thereby allowing the use of the very fast techniques available for linear network optimization. For problem classes such as traffic assignment in which the critical coupling between variables occurs in the objective function, the separability of the approximation makes possible a decomposition into independent subproblems that may be solved in parallel in a multiprocessing environment. %A W. Harry Plantinga %A Charles R. Dyer %T An algorithm for constructing the aspect graph %R Technical Report #627 %I Computer Sciences Department, University of Wisconsin at Madison %D December 1985 %X Abstract: In this paper we consider the size of aspect graphs, first in the convex case, and then in the general case. In particular, we give upper and lower bounds on the maximum size (including vertex labels) of $~ THETA (n sup 3 )~$ and $~~ THETA (n sup 5 )~~$ respectively, and algorithms for constructing the aspect graph which run in time $~~O (n sup 4 )~~$ and $~~O (n sup 6 )~~$. The algorithm for the general case makes use of a new object representation called the aspect representation or asp. We also show a different way to label the aspect graph in order to save a factor of n in the asymptotic size and construction time (at the expense of retrieval time) in both the convex and general cases, and we suggest alternatives to the aspect graph which require less space and store more information. %A Sheldon Klein %T The invention of computationally plausible knowledge systems in the upper paleolithic %R Technical Report #628 %I Computer Sciences Department, University of Wisconsin at Madison %D December 1985 %X Abstract: The problem of computing human behavior by rules can become intractable with large scale knowledge systems if the human brain, like a computer, is a finite state automaton. The problem of making such computations at a pace fast enough for ordinary social interaction can be solved if appropriate constraints apply to the structure of those rules. There is evidence that systems of such constraints were invented in the Upper Paleolithic, and were of sufficient power to guarantee that the time necessary for computation of behavior would increase only linearly with increases in the size and heterogeneity of world knowledge systems. Fundamentally, there was just one type of computational invention, capable of unifying the full range of human sensory domains, and consisting of an analogical reasoning method in combination with a global classification scheme. The invention may have been responsible for the elaboration of language and culture structures in a process of co-evolution. The encoding of the analogical mechanism in iconic visual imagery and myth structures may have given rise to the phenomenon of Shamanism. The theory is testable, and one of its implications is that the structuralism of Levi-Strauss has an empirical foundation. %A Jiawei Han %T Pattern-based and knowledge-directed query compilation for recursive data bases %R Technical Report #629 %I Computer Sciences Department, University of Wisconsin at Madison %D January 1986 %X Abstract: Expert database systems (EDS's) comprise an interesting class of computer systems which represent a confluence of research in artificial intelligence, logic, and database management systems. They involve knowledge-directed processing of large volumes of shared information and constitute a new generation of knowledge management systems. Our research is on the deductive augmentation of relational database systems, especially on the efficient realization of recursion. We study the compilation and processing of recursive rules in relational database systems, investigating two related approaches: pattern-based recursive rule compilation and knowledge-directed recursive rule compilation and planning. Pattern-based recursive rule compilation is a method of compiling and processing recursive rules based on their recursion patterns. We classify recursive rules according to their processing complexity and develop three kinds of algorithms for compiling and processing different classes of recursive rules: transitive closure algorithms, SLSR wavefront algorithms, and stack-directed compilation algorithms. These algorithms, though distinct, are closely related. The more complex algorithms are generalizations of the simpler ones, and all apply the heuristics of performing selection first and utilizing previous processing results (wavefronts) in reducing query processing costs. The algorithms are formally described and verified, and important aspects of their behavior are analyzed and experimentally tested. To further improve search efficiency, a knowledge-directed recursive rule compilation and planning technique is introduced. We analyze the issues raised for the compilation of recursive rules and propose to deal with them by incorporating functional definitions, domain-specific knowledge, query constants, and a planning technique. A prototype knowledge-directed relational planner, RELPLAN, which maintains a high level user view and query interface, has been designed and implemented, and experiments with the prototype are reported and illustrated. %Y $5.70 %A Raphael Finkel %A A. P. Anantharaman %A Sandip Dasgupta %A Tarak S. Goradia %A Prasanna Kaikini %A Chun-Pui Ng %A Murali Subbarao %A G. A. Venkatesh %A Sudhanshu Verma %A Kumar A. Vora %T Experience with Crystal, Charlotte and Lynx %R Technical Report #630 %I Computer Sciences Department, University of Wisconsin at Madison %D February 1986 %X Abstract: This paper describes the most recent implementations of distributed algorithms at Wisconsin that use the Crystal multicomputer, the Charlotte operating system, and the Lynx language. This environment is an experimental testbed for design of such algorithms. Our report is meant to show the range of applications that we have found reasonable in such an environment and to give some of the flavor of the algorithms that have been developed. We do not claim that the algorithms are the best possible for these problems, although they have been designed with some care. In several cases they are completely new or represent significant modifications of existing algorithms. We present distributed implementations of B trees, systolic arrays, prolog tree search, the travelling salesman problem, incremental spanning trees, nearest-neighbor search in k-d trees, and the Waltz constraint-propagation algorithm. Our conclusion is that the environment, although only recently available, is already a valuable resource and will continue to grow in importance in developing new algorithms. %Y Cost: $1.44 %A O. L. Mangasarian %A R. De Leone %T Error bounds for strongly convex programs and (Super)Linearly convergent iterative schemes for the least 2-norm solution of linear programs %R Technical Report #631 %I Computer Sciences Department, University of Wisconsin at Madison %D February 1986 %X Abstract: Given an arbitrary point (x,u) in $R sup n$ x $R sup m$+, we give bounds on the Euclidean distance between x and the unique solution x to a strongly convex program in terms of the violations of the Karush-Kuhn- Tucker conditions by the arbitrary point (x,u). These bounds are then used to derive linearly and superlinearly convergent iterative schemes for obtaining the unique least 2-norm solution of a linear program. These schemes can be used effectively in conjunction with the successive overrelaxation (SOR) methods for solving very large sparse linear programs. %Y Cost: $0.00 %A Yeshayahu Artsy %A Hung-Yang Chang %A Raphael Finkel %T Interprocess communication in Charlotte %R Technical Report #632 %I Computer Sciences Department, University of Wisconsin at Madison %D February 1986 %X Abstract: Charlotte is a distributed operating system that provides a powerful interprocess communication mechanism. This mechanism differs from most others in that it employs duplex links, does not buffer messages in the kernel, and does not block on communication requests. A link is a bound communication channel between two processes upon which messages can be sent, received, awaited, or canceled. Processes may acquire new links, destroy their end of the link, or enclose their end as part of an outgoing message. A link is thus a dynamic capability-like entity that permits only the holder access to its communication resource. We first introduce the Charlotte interprocess communication structure and discuss our motivation for choosing the specific operations. We then describe distributed computing environments in which Charlotte is applicable. Next, we discuss how we implemented the interprocess-communication facility. We close with a comparison between Charlotte and related research and justify our design and its implementation cost. %Y Cost: $0.00 %A Hongjun Lu %A Michael J. Carey %T Load-balanced task allocation in locally distributed computer systems %R Technical Report #633 %I Computer Sciences Department, University of Wisconsin at Madison %D February 1986 %X Abstract: A task force is a distributed program which consists of a set of communicating tasks. This paper presents an algorithm for allocating the tasks in a task force to a set of execution sites in a locally distributed computer system. The main objective of this dynamic task allocation algorithm is to balance the load of the system, and the algorithm takes the current load status of the system into account when making its allocation decisions. In addition to balancing the system's load, the algorithm also tries to minimize the communications costs between the tasks to the extent possible within the constraints imposed by load balancing. The paper begins by quantitatively defining the load unbalance factor, and the problem of load-balanced task allocation is then defined. A heuristic algorithm for finding solutions to the problem is presented, and allocations generated by the heuristic algorithm are compared to those obtained via exhaustive search. The results of this comparison indicate that the heuristic algorithm finds the optimal allocation in most cases. The execution time and the complexity of the heuristic task allocation algorithm are also addressed. %Y Cost: $0.00 %A Seymour V. Parter %T Remarks on multigrid convergence theorems %R Technical Report #634 %I Computer Sciences Department, University of Wisconsin at Madison %D July 1986 %X Abstract: Multigrid has become an important iterative method for the solution of discrete elliptic equations. However, there is much to be done in the theory of convergence proofs. At the present time there are two general two-level methods for general convergence proofs: an algebraic approach and a duality approach. While these theories do not give sharp estimates they provide good, general rigorous convergence theorems. In this note we study the relationship between these theories. While the approach and thought-process leading to these theories are different, the results are essentially the same. Indeed, the basic estimated required by these theories are the same. %A David J. DeWitt %A Robert H. Gerber %A Goetz Graefe %A Michael L. Heytens %A Krishna B. Kumar %A M. Muralikrishna %T GAMMA: a high performance dataflow database machine %R Technical Report #635 %I Computer Sciences Department, University of Wisconsin at Madison %D March 1986 %X Abstract: In this paper, we present the design, implementation techniques, and initial performance evaluation of Gamma. Gamma is a new relational database machine that exploits dataflow query processing techniques. Gamma is a fully operational prototype consisting of 20 VAX 11/750 computers. The design of Gamma is based on what we learned from building our earlier multiprocessor database machine prototype (DIRECT) and several years of subsequent research on the problems raised by the DIRECT prototype. In addition to demonstrating that parallelism can really be made to work in a database machine context, the Gamma prototype shows how parallelism can be controlled with minimal control overhead through a combination of the use of algorithms based on hashing and the pipelining of data between processes. Except for 2 messages to initiate each operator of a query tree and 1 message when the operator terminates, the execution of a query is entirely self-scheduling. %Y Cost: $1.80 %A Eric Norman %T Tracking the elusive eureka %R Technical Report #636 %I Computer Sciences Department, University of Wisconsin at Madison %D March 1986 %X Abstract: The algebraic approach to computation of FP is used to transform recursive definitions of the Fibonacci and factorial functions into iterative code. The approach is similar to the one used to solve many differential equations and also similar to what many call "higher order" or "semantic" unification. An algorithmic procedure is not obtained; the intent is to demonstrate the usefulness of the algebraic approach. %Y Cost: $0.00 %A Tobin J. Lehman %A Michael J. Carey %T Query processing in main memory database management systems %R Technical Report #637 %I Computer Sciences Department, University of Wisconsin at Madison %D March 1986 %X Abstract: Most previous work in the area of main memory database systems has focused on the problem of developing query processing techniques that work well with a very large buffer pool. In this paper, we address query processing issues for memory resident relational databases, an environment with a very different set of costs and priorities. We present an architecture for a main memory DBMS, discussing the ways in which a memory resident database differs from a disk-based database. We then address the problem of processing relational queries in this architecture, considering alternative algorithms for selection, projection, and join operations and studying their performance. We show that a new index structure, the T Tree, works well for selection and join processing in memory resident databases. We also show that hashing methods work well for processing projections and joins, and that an old join method, sort-merge, still has a place in main memory. %Y Cost: $0.00 %A Michael J. Carey %A David J. DeWitt %A Joel E. Richardson %A Eugene J. Shekita %T Object and file management in the EXODUS extensible database system %R Technical Report #638 %I Computer Sciences Department, University of Wisconsin at Madison %D March 1986 %X Abstract: This paper describes the design of the object-oriented storage component of EXODUS, an extensible database management system currently under development at the University of Wisconsin. The basic abstraction in the EXODUS storage system is the storage object, an uninterpreted variable-length record of arbitrary size; higher level abstractions such as records and indices are supported via the storage object abstraction. One of the key design features described here is a scheme for managing large dynamic objects, as storage objects can occupy many disk pages and can grow or shrink at arbitrary points. The data structure and algorithms used to support such objects are described, and performance results from a preliminary prototype of the EXODUS large-object management scheme are presented. A scheme for maintaining versions of large objects is also described. We then describe the file structure used in the EXODUS storage system, which provides a mechanism for grouping and sequencing through a set of related storage objects. In addition to object and file management, we discuss the EXODUS approach to buffer management, concurrency control, and recovery, both for small and large objects. %Y Cost: $0.00 %A Mark A. Holliday %A Mary K. Vernon %T The GTPN analyzer: numerical methods and user interface %R Technical Report #639 %I Computer Sciences Department, University of Wisconsin at Madison %D April 1986 %X Abstract: The GTPN (Generalized Timed Petri Net) is a performance model based on Petri Nets. The state space for a model of the system is automatically constructed and analyzed using results from Markov chain theory. We address some of the key computational issues involved in the Markov chain theory approach. In particular, we discuss two types of algorithms. The first type compute the absorption probabilities and mean time to absorption. The second type compute the steady state probability distribution for a single, possibly periodic, recurrent Markov chain class. We also describe the GTPN's user interface for input of the model description, execution of the tool, and the output of the performance results. %Y Cost: $2.28 %A Klaus Hollig %T Box splines %R Technical Report #640 %I Computer Sciences Department, University of Wisconsin at Madison %D April 1986 %X Abstract: This report gives an introduction to multivariate cardinal spline theory. It is based on joint work with Carl de Boor and Sherman Riemenschneider and the particular topics discussed are: recurrence relations for box splines, approximation order, interpolation, convergence to functions of exponential type and subdivision algorithms. %Y Cost: $0.00 %A Richard A. Brualdi %A Rachel Manber %T On strong digraphs with a unique minimally strong subdigraph %R Technical Report #641 %I Computer Sciences Department, University of Wisconsin at Madison %D April 1986 %X Abstract: In this paper we determine the maximum number of edges that a strong digraph can have if it has a unique minimally strong subdigraph. We show that this number equals n (n + 1)/2 + 1, a surprisingly large number. Furthermore we show that there is, up to an isomorphism, a unique strong digraph which attains this maximum. %Y Cost: $0.00 %A Michael D. Chang %A Michael Engquist %A Raphael Finkel %A Robert Meyer %T A parallel algorithm for generalized networks %R Technical Report #642 %I Computer Sciences Department, University of Wisconsin at Madison %D June 1986 %X Abstract: A parallel algorithm for the solution of linear generalized network optimization problems is presented. This method decomposes the original network into initial collections of quasi-trees that are distributed among a set of processors that optimize in parallel their corresponding "local" problems. Periodically, the processors exchange information on their local problems, and one or more quasi-trees may be moved between processors as desirable links between quasi-trees on different processors are determined or for load balancing purposes. This procedure has been implemented on the CRYSTAL multicomputer, and computational results have been obtained on problems with up to 28,000 variables. For problems whose optimal solutions contain numerous quasi-trees, the efficiency of this parallel implementation is about 50%. %Y Cost: $0.00 %A Charles V. Stewart %A Charles R. Dyer %T Convolution algorithms on the pipelined image-processing engine %R Technical Report #643 %I Computer Sciences Department, University of Wisconsin at Madison %D May 1986 %X Abstract: In this paper we present two algorithms for computing an N by N convolution on the Pipelined Image-Processing Engine (PIPE). PIPE is a special-purpose machine for low-level vision consisting of eight processing stages connected in a pipelined fashion. Each stage can compute a number of different basic image functions. Each of the algorithms decomposes the solution into a sequence of 3 by 3 neighborhood operations, shifts and additions. The first algorithm divides the results of the 3 by 3 partial convolutions into groups of concentric rings of images and successively collapses the images on the outer ring onto the next ring, merging the results until a single result image is computed. The second algorithm divides the partial convolution images into eight sectors and computes each sector independently. For a 27 by 27 convolution of a 256 by 256 image the first algorithm requires 0.633 seconds, while the second requires 0.683 seconds. These algorithms for PIPE are also shown to compare favorably with algorithms for arbitrary sized kernels on fast general purpose hardware, the MPP and Warp. %Y Cost: $0.00 %A Michael J. Carey %A David J. DeWitt %A Daniel Frank %A Goetz Graefe %A Joel E. Richardson %A Eugene J. Shekita %A M. Muralikrishna %T The architecture of the EXODUS extensible DBMS: a preliminary report %R Technical Report #644 %I Computer Sciences Department, University of Wisconsin at Madison %D May 1986 %X Abstract: With non-traditional application areas such as engineering design, image/voice data management, scientific/statistical applications, and artificial intelligence systems all clamoring for ways to store and efficiently process larger and larger volumes of data, it is clear that traditional database technology has been pushed to its limits. It also seems clear that no single database system will be capable of simultaneously meeting the functionality and performance requirements of such a diverse set of applications. In this paper we describe the preliminary design of EXODUS, an extensible database system that will facilitate the fast development of high-performance, application-specific database systems. EXODUS provides certain kernel facilities, including a versatile storage manager and a type manager. In addition, it provides an architectural framework for building application-specific database systems, tools to partially automate the generation of such systems, and libraries of software components (e.g., access methods) that are likely to be useful for many application domains. %Y Cost: $0.00 %A Klaus Hollig %T Geometric continuity of spline curves and surfaces %R Technical Report #645 %I Computer Sciences Department, University of Wisconsin at Madison %D June 1986 %X Abstract: We review $beta$-spline theory for curves and show how some of the concepts can be extended to surfaces. Our approach is based on the Bezier form for piecewise polynomials which yields simple geometric characterizations of smoothness constraints and shape parameters. For curves most of the standard "spline calculus" has been developed. We discuss in particular the construction of B-splines, the conversion for B-spline to Bezier representation and interpolation algorithms. A comparable theory for spline surfaces for general meshes does at present not exist. We merely describe how to join triangular and rectangular patches and discuss the corresponding $beta$-spline constraints in terms of the Bezier representation. %Y Cost: $0.00 %A Leonard Uhr %T Parallel, hierarchical software/hardware pyramid architectures %R Technical Report #646 %I Computer Sciences Department, University of Wisconsin at Madison %D June 1986 %X Abstract: This paper examines pyramid-structured software/hardware systems, both programs and multi-computers. It explores how efficient parallel computers can be designed. It poses the extremely difficult problem of perception, and briefly surveys the major alternative approaches. It examines the basic structures with which pyramids can be built, and describes pyramid multi-computers. It sketches out how pyramid hardware can be used to execute programs, with special emphasis on the "recognition cone" structures being developed to model living visual systems. It briefly suggests how pyramids can be augmented and embedded into larger software/hardware structures. In addition, it gives a short history of massively parallel hierarchical pyramid structures. %Y Cost: $0.00 %A O.L. Mangasarian %A R. De Leone %T Parallel successive overrelaxation methods for symmetric linear complementarity problems and linear programs %R Technical Report #647 %I Computer Sciences Department, University of Wisconsin at Madison %D June 1986 %X Abstract: A parallel successive overrelaxation (SO) method is proposed for the solution of the fundamental symmetric linear complementarity problem. Convergence is established under a relaxation factor which approaches the classical value of 2 for a loosely coupled problem. The parallel SOR approach is then applied to solve the symmetric linear complementarity problem associated with the least norm solution of a linear program. %Y Cost: $0.00 %A Barton P. Miller %A Jong-Deok Choi %T Breakpoints and halting in distributed programs %R Technical Report #648 %I Computer Sciences Department, University of Wisconsin at Madison %D July 1986 %X Abstract: Interactive debugging requires that the programmer be able to halt a program at interesting points in its execution. This paper presents an algorithm for halting a distributed program in a consistent state, and presents a definition of distributed breakpoints with an algorithm for implementing the detection of these breakpoints. The Halting Algorithm extends Chandy and Lamport's algorithm for recording global state and solves the problem of processes that are not fully connected or frequently communicating. The definition of distributed breakpoints is based on those events that can be detected in a distributed system. Events that can be partially ordered are detectable and form the basis for the breakpoint predicates, and from the breakpoint definition comes the description of an algorithm that can be used in a distributed debugger to detect these breakpoints. %A Raphael Finkel %A Bahman Barzideh %A Chandreshekhar W. Bhide %A Man-On Lam %A Donald Nelson %A Ramesh Polisetty %A Sriram Rajaraman %A Igor Steinberg %A G. A. Venkatesh %T Experience with Crystal, Charlotte and Lynx: second report %R Technical Report #649 %I Computer Sciences Department, University of Wisconsin at Madison %D July 1986 %X Abstract: This paper describes several recent implementations of distributed algorithms at Wisconsin that use the Crystal multicomputer, the Charlotte operating system, and the Lynx language. This environment is an experimental testbed for design of such algorithms. Our report is meant to show the range of applications that we have found reasonable in such an environment and to give some of the flavor of the algorithms that have been developed. We do not claim that the algorithms are the best possible for these problems, although they have been designed with some care. In several cases they are completely new or represent significant modifications of existing algorithms. We present distributed implementations of PLA folding, a heuristic for the travelling-salesman problem, incremental update of spanning trees, ray tracing, the simplex method, and the Linda programming language. Together with our previous report, this paper leads us to conclude that the environment is a valuable resource and will continue to grow in importance in developing new algorithms. %A Barton P. Miller %A David L. Presotto %A Michael L. Powell %T DEMOS/MP: the development of a distributed operating system %R Technical Report #650 %I Computer Sciences Department, University of Wisconsin at Madison %D July 1986 %X Abstract: The DEMOS/MP operating system has moved from a super computer with a simple addressing structure to a network of microcomputers. This transformation was done without significant changes to the semantics of the original DEMOS, i.e., existing DEMOS programs should run on DEMOS/MP. The changes to DEMOS were simplified by the structure of its primitive objects and the functions over those objects. The structure of DEMOS links and processes were the major contributors to the simplicity. The changes made to produce DEMOS/MP involved the internal structure of link, modification to parts of the kernel, and limited changes to the various system processes. %A Leonard Uhr %T Toward a computational information-processing model of object perception %R Technical Report #651 %I Computer Sciences Department, University of Wisconsin at Madison %D July 1986 %X Abstract: Integrating What Is Known Into a Running Model. This paper describes the first version of an information-processing model (to be programmed for an appropriately structured parallel-serial multi-computer network) for the visual recognition of complex objects. The model will be run to exhibit its behavior, and tested, evaluated, and compared with living visual systems, and with other models. It should also serve, where the brain's mechanisms are not yet known, as a test-bed to evaluate alternate possible structures. Such an enterprise now appears to be feasible. Consider these highlights of pertinent facts: .The retina and primary visual cortex, with their massively parallel and to some extent serial structure of processes, detect spots, colors, families of oriented edges, and other features. .Much is now known about the cortex's columnar structure, and the topology of links within and between columns, hypercolumns, modules, and areas. .Roughly 20 cortical visual areas have been discovered. A great deal is known about the many links between them, the way they map information, and the processes each effects. .The recent firm evidence for neurons in the temporal lobe that respond selectively, in 70 to 200 msec, to face, different particular faces, complex parts of faces, and other complex objects, strongly suggests that these neurons are involved at a culminating stage in the complex structure of processes that perceives patterned objects. Temporal lobe lesions make complex objects like faces unrecognizable, while leaving other visual processes largely undisturbed. .The brain's massive parallelism makes this enormous speed possible. The relatively slow msec response times of its basic computing elements, the neurons, means that the maximum possible serial depth of processes is a few dozen to a few hundred at most. This paper first organizes key known facts about the visual system and describes the model. (Later sections give more detail, and pinpoint important facts still unknown.) It also briefly examines this model's relation to parallel computer architectures, to other models for visual perception, and to computer vision programs, emphasizing those from which this model grew. The goal is to develop a model/theory that exhibits the brain/mind's behavior on difficult recognition tasks, and suggests plausible mechanisms where facts are not yet uncovered, or firm. The running program should show that the model is precise and consistent, and how well the separate parts work together. It should suggest new questions and experiments. This first version incorporates only what appears essential to achieve a simple yet realistic working model. The final discussion examines possible improvements. %A Mark A. Holliday %T Deterministic time and analytical models of parallel architectures %R Technical Report #652 %I Computer Sciences Department, University of Wisconsin at Madison %D July 1986 %O Ph.D. thesis %X Abstract: In parallel computer architectures many events are of constant duration. For example, a fetch of a cache block, ignoring resource contention, takes a fixed number of clock cycles. Analytical performance modeling techniques that include deterministic time have previously been proposed. However, serious restrictions on the set of allowed models are imposed in these previous techniques. In this dissertation we extend these previous modeling techniques by removing those restrictions. Those restrictions fall into two classes: those involving the construction of the state space and those involving the analysis of the state space. We propose interpretations for performance measures when those restrictions are removed. In this general case, the state space represents an arbitrary, finite state, discrete parameter Markov Chain. We also present algorithms that efficiently construct and analyze the state space in the general case. Our technique is called Generalized Timed Petri Nets (GTPN). It has been implemented in a tool and has been used to develop models for several interesting architectures. The two most important studies examine bus-based multiprocessors. Performance degradation due to memory and bus interference in multiprocessors with a single-stage interconnection network has been frequently examined. Using the GTPN we are able to derive exact performance estimates in the important case when memory access times are constant and interrequest times are non-zero. Previously only approximate estimates and simulation results existed. Caches are an important means for reducing memory contention in bus-based multiprocessors. Our second study is the first analytical performance comparison of the key features of protocols that maintain cache consistency when a single shared bus is assumed. %A Raphael Finkel %A Michael L. Scott %A William K. Kalsow %A Yeshayahu Artsy %A Hung-Yang Chang %A Prasun Dewan %A Aaron J. Gordon %A Bryan Rosenburg %A Marvin H. Solomon %A Cui-Qing Yang %T Experience with Charlotte: simplicity versus function in a distributed operating system %R Technical Report #653 %I Computer Sciences Department, University of Wisconsin at Madison %D July 1986 %X Abstract: This paper presents a retrospective view of the Charlotte distributed operating system, which is intended as a testbed for developing techniques and tools for exploiting large-grain parallelism to solve computation-intensive problems. Charlotte was constructed over the course of approximately 5 years, going through several distinct versions as the underlying hardware and our ideas for implementation changed. Charlotte rests on several underlying design decisions: (1) it is a software layer on the Crystal multicomputer, (2) processes do not share memory, (3) communication is on reliable, symmetric, bi-directional paths named by capabilities, and (4) absolute information is stored at each end of a communication path. Our implementation taught us that our dual goals of simplicity and function were not easily reached. In particular, the issue of simplicity is quite complex; quests for simplicity in various areas often conflict with each other. This paper explores how the design decisions we made to satisfy our goals incurred implementation cost and required extra levels of software, but resulted in a high-quality testbed for experimentation in distributed algorithm design. %A Seymour V. Parter %T Estimates for multigrid methods based on red-black Gauss-Seidel smoothings %R Technical Report #654 %I Computer Sciences Department, University of Wisconsin at Madison %D July 1986 %X Abstract: The MGR[v] algorithms of Ries, Trottenberg and Winter, the algorithms 2.1 and 6.1 of Braess and the Algorithm 4.1 of Verfurth are all multigrid algorithms for the solution of the discrete Poisson equation (with Dirichlet boundary conditions) based on red-black Gauss-Seidel smoothing. Both Braess and Verfurth give explicit numerical upper bounds on the rate of convergence of their methods in convex polygonal domains. In this work we reconsider these problems and obtain improved estimates for the h-2h Algorithm 4.1 as well as W-cycle estimates for both schemes in non-convex polygonal domains. The proofs do not depend on the strengthened Cauchy inequality. %A Yeshayahu Artsy %A Hung-Yang Chang %A Raphael Finkel %T Processes migrate in Charlotte %R Technical Report #655 %I Computer Sciences Department, University of Wisconsin at Madison %D August 1986 %X Abstract: Process migration in a distributed operating system is a facility to dynamically relocate processes among component computers. In recent years, several studies have been conducted concerning the need for process migration and the algorithms to perform it efficiently. Only a few successful implementations of process migration have been reported. A process-migration facility has been implemented in Charlotte, a message-based distributed operating system. Process migration policy is decided by a user-level utility called the Starter, while the mechanism is handled by the kernel. A distributed squad of Starters can enforce regional and global process migration policy. Several migration efforts can proceed in the system concurrently. Migration can be aborted due to a change in the load; the migrating process can be rescued in many cases of machine failure. Migration is transparent to the migrating process and to other processes communicating with it. Once a process is migrated out of a machine, no trail of stubs or dangling links is left to interfere with future communication. This paper gives an overview of Charlotte, discusses the design of the process migration facility, details its implementation, and gives some preliminary cost results. Process migration was implemented in mid-1985 and has been used experimentally since then. %A Tobin Jon Lehman %T Design and performance evaluation of a main memory relational database system %R Technical Report #656 %I Computer Sciences Department, University of Wisconsin at Madison %D August 1986 %O Ph.D. thesis %X Abstract: Most previous work in the area of main memory database systems has focused on the problem of developing techniques that work well with a very large buffer pool. This dissertation addresses the problems of database architecture design, query processing, concurrency control, and recovery for a memory resident relational database, an environment with a very different set of costs and priorities. An architecture for a memory-resident database system is presented, along with a discussion of the differences between memory-resident database systems and conventional disk-based database systems. Index structures are then studied for a memory-resident database environment. The T Tree, a new index structure designed for use in this environment, is introduced and compared with several existing index structures: Arrays, AVL Trees, B Trees, Extendible Hashing, Linear Hashing, Modified Linear Hashing and Chained Bucket Hashing. The T Tree is shown to perform well in a memory-resident environment. Several of the index structures are then used to examine relational join and projection algorithms for a main memory database environment. Experimental results show that a Merge Join algorithm that uses a T Tree index is usually the best method, and that a simple Hash Join algorithm is usually second best. Recovering a memory-resident database is different from recovering a disk-oriented database, so a different approach is taken in this dissertation. Existing proposals for memory-resident database recovery treat the database as a single entity, so recovery and checkpoint operations are applied to the entire database. A new design is proposed that allows logging, checkpointing and recovery to be done at the relation or index level, providing a form of demand recovery. After a crash with undemanded relations being recovered by a background task. Finally, the cost issues for concurrency control are different for a memory-resident database system. Locking is more costly on a per database access basis, so it must be made more efficient. Multiple granularity locking is desirable, but it would be too expensive if several levels of locks needed checking for every database reference. An algorithm is presented that uses locks with a dynamic level of granularity, with locks being escalated or de-escalated in size to meet the system's concurrency requirements. %A Allan Bricker %A Tad Lebeck %A Barton P. Miller %T DREGS: a distributed runtime environment for game support %R Technical Report #657 %I Computer Sciences Department, University of Wisconsin at Madison %D August 1986 %X Abstract: DREGS, a distributed environment for game support, simplifies the task of implementing multi-player games. DREGS solves the problems of concurrency control, synchronization, and communication as they apply to distributed games. DREGS provides support for the update and control of replicated objects and uses a central arbitration scheme to enforce a strict ordering of events. DREGS has been implemented to run under any 4.3BSD Unix compatible system and operates across a network of heterogeneous architectures. A game description language, GDL, provides a framework for programming multi-player distributed games. GDL is a simple language that generates complex distributed programs. It is an object-based language, where objects are defined in terms of their input events and their corresponding actions. The programmer writes the game as if it were a serial program, without concern for concurrent activities. GDL also frees the programmer from the details of device and network interfaces. The combination of the DREGS runtime and GDL frees a game designer from the distributed aspects of multi-player games. This freedom allows designers to concentrate their efforts on better, more interesting games. DREGS has been used to implement an N-way talk program, a tank game, and a flying saucer space game. %A Matthew T. Thazhuthaveetil %T A structured memory access architecture for Lisp %R Technical Report #658 %I Computer Sciences Department, University of Wisconsin at Madison %D August 1986 %O Ph.D. thesis %X Abstract: Lisp has been a popular programming language for well over 20 years. The power and popularity of Lisp are derived from its extensibility and flexibility. These two features also contribute to the large semantic gap that separates Lisp from the conventional von Neumann machine, typically leading to the inefficient execution of Lisp programs This dissertation investigates how the semantic gap can be bridged. We identify function calling, environment maintenance, list access, and heap maintenance as the four key run-time demands of Lisp programs, and survey the techniques that have been developed to meet them in current Lisp machines. Previous studies have revealed that Lisp list access streams show spatial locality as well as temporal locality of access. While the presence of temporal locality suggests the use of fast buffer memories, the spatial locality displayed by a Lisp program is implementation dependent and hence difficult for a computer architect to exploit. We introduce the concept of structural locality as a generalization of spatial locality, and describe techniques that were used to analyze the structural locality shown by the list access streams generated from a suite of benchmark Lisp programs. This analysis suggests architectural features for improved Lisp execution. The SMALL Lisp machine architecture incorporates these features. It partitions functionality across two specialized processing elements whose overlapped execution leads to efficient Lisp program evaluation. Trace-driven simulations of the SMALL architecture reveal the advantages of this partition. In addition, SMALL appears to be a suitable basis for the development of a multi-processing Lisp system. %A Udi Manber %T Using mathematical induction to design computer algorithms %R Technical Report #660 %I Computer Sciences Department, University of Wisconsin at Madison %D August 1986 %X Abstract: An analogy between proving mathematical theorems and designing computer algorithms is explored in this paper. This analogy provides an elegant methodology for designing algorithms, explaining their behavior, and understanding their key ideas. The paper identifies several mathematical proof techniques, mostly based on induction, and presents analogous algorithm design techniques. Each of these techniques is illustrated by several examples of algorithms. %A M. A. Sridhar %T Efficient algorithms for multiple pattern matching %R Technical Report #661 %I Computer Sciences Department, University of Wisconsin at Madison %D August 1986 %O Ph.D. thesis %X Abstract: We address the problem of finding an occurrence of one of several given patterns in a given text string. The Aho-Corasick algorithm solves this problem by constructing a modified finite automaton and using this to scan the text string left to right. This yields an algorithm that runs in time linear in the text length. The Boyer-Moore algorithm scans the text right to left, looking for an occurrence of one pattern. It has a sublinear average running time, and can be modified to be linear-time in the worst case. We extend the Boyer-Moore algorithm to handle multiple patterns. Two new algorithms are developed and analyzed. Both operate by remembering previous matches. Given a text string of length N and patterns with maximum length D, the first algorithm remembers no more than 1 + log D previous matches, and consults O(N log D) text characters. Algorithms for performing the necessary preprocessing are also developed. The second algorithm is designed for a different time-space tradeoff. For any given k, it consults O(kN log D) text characters, and remembers no more than t/k nonperiodic matches at any time, where t is the number of patterns. The dominating feature of a sublinear average-case running time is retained by both algorithms. %A Dennis Draheim %A Bart Miller %A Steven Snyder %T A reliable and secure Unix connection service %R Technical Report #662 %I Computer Sciences Department, University of Wisconsin at Madison %D August 1986 %X Abstract: Distributed programs require a method for processes residing on different machines to identify each other and establish communication. One method is to provide a special connection service to perform this task. A good connection service should be easy to use. It should allow arbitrary processes to connect to each other as well as helping client processes to connect to server processes. It should provide location transparency; that is, the programmer should not have to know the network address of a process to connect to it. The connection service should be reliable. It should provide a way for a process to establish the identity of the user associated with the process to which it has connected, and to communicate securely with that process. We have implemented a connection service for Berkeley Unix that is reliable, available, secure, and easy to use. The connection service achieves ease of use through a simple interface based on the library routine meet. Meet allows one process to connect to another by specifying arbitrary names for itself and the other process. The connection service imposes no naming conventions of its own so it can be used with most name spaces and naming services. The service is location-transparent. It also provides a routine for posting services. Reliable and available service is provided by replicating connection servers. Each server knows about all pending connection requests. The connection service provides continuous service as long as at least one server is running. Connections can be authenticated by an authentication server that works in cooperation with the connection server. Secure communication is achieved via the RSA public-key encryption algorithm. The connection server was put in regular use in June 1986. Our limited experience indicates that it satisfies an important need of Unix users. %A David Kamowitz %T Experimental results for multigrid and transport problems %R Technical Report #663 %I Computer Sciences Department, University of Wisconsin at Madison %D September 1986 %X Abstract: An experimental study of the applicability of a multigrid algorithm to the solution of the neutron transport problem in a slab is described. Only the simplest choices are made for the components of the algorithm. Experimental results indicate that the coarse grid operator obtained from finite differences works better and is cheaper than the Galerkin choice. %A Charles V. Stewart %A Charles R. Dyer %T A scheduling algorithm for the pipelined image-processing engine %R Technical Report #664 %I Computer Sciences Department, University of Wisconsin at Madison %D September 1986 %X Abstract: In this paper we present a heuristic scheduling algorithm for the National Bureau of Standards' Pipelined Image-Processing Engine (PIPE). PIPE is a special purpose machine for low-level image processing consisting of a linearly connected array of processing stages. A program is specified as a planar directed acyclic graph where each node represents an operation on an image, and each arc represents an image output by one operation and input to another. The algorithm uses the greedy approach to schedule operations on a stage. It uses several heuristics to control the movement of images between stages. The worst case time for the schedule generated by the algorithm is O(N) times the optimal schedule, where N is the maximum width of the graph. Several examples are given where the algorithm generates a nearly optimal schedule. %A Ze-Nian Li %A Leonard Uhr %T Comparative timings for a neuron recognition program on serial and pyramid computers %R Technical Report #665 %I Computer Sciences Department, University of Wisconsin at Madison %D September 1986 %X Abstract: This paper examines the time needed by a program that recognizes neurons in photomicrographs when it is executed on a conventional serial computer (a VAX 11/750), and compares this with the time that would be needed if it were executed on an appropriate parallel-serial pyramid multi-computer. As the size of the image array increases from 64x64 to 4,096x4,096 the estimates of the pyramid's increases in speed grow from 350 times as fast as the VAX to 1,276,300 times as fast. %A Prasun Dewan %T Automatic generation of user interfaces %R Technical Report #666 %I Computer Sciences Department, University of Wisconsin at Madison %D September 1986 %O Ph.D. thesis %X Abstract: In traditional interactive programming environments, each application individually manages its interaction with the human user. The result is duplication of effort in implementing user interface code and non-uniform - hence confusing - input conventions. It would be useful if the user interface of an application could be automatically generated. This idea requires an application-independent model of user interaction together with a programming environment that supports the model. Recent work in user interface design has suggested that editing can be used as a general model of interaction. This dissertation presents an approach that supports this model. This approach allows applications to be created as objects. An object is a instance of a class that describes the data encapsulated by the object and the methods to manipulate them. Between each object and the user is a dialogue manager. The dialogue manager receives messages from the object, which name variables that can be edited by the user. It displays the variables using the data definition in the class of the object, offers the user a structure editing interface to modify them, and sends new values back in messages to the object. The object can then execute methods to make its internal data consistent with the displayed data. Thus, from the point of view of the objects, the user appears to be another object that can send and receive messages. From the point of view of the user, the objects appear to be data that can be edited. The dialogue manager acts as an intermediary between the object and the user, translating between the languages of object interaction and user interaction. A dialogue manager is provided automatically by the environment. The utility of our approach is demonstrated through discussion, examples, and implementation of its major components. %A Umakishore Ramachandran %T Hardware support for interprocess communication %R Technical Report #667 %I Computer Sciences Department, University of Wisconsin at Madison %D September 1986 %O Ph.D. thesis %X Abstract: Access to system services in traditional uniprocessor operating systems are requested via protected procedure calls, whereas in message-based operating systems they are requested via message passing. Since message exchange is the basic kernel mechanism in message-based operating systems, the performance of the system depends crucially on the rate of message exchange. The advent of local area networking has sparked interest in message-based operating systems. One of the main problems in existing multicomputer message-based operating systems has been the slowness of interprocess communication. This speed limitation is often due to the processing overhead associated with message passing. Previous studies implicitly assume that only communication between processes on different nodes in the network is expensive. However, we show that there is considerable processing overhead for local communication as well. Therefore, we propose hardware support in the form of a message coprocessor. Our solution to the message-passing problem has two components: a software partition, and a hardware organization. To determine an effective solution we followed a three-step process: First, we profiled the kernels of four operating systems to identify the major components of system overhead in message passing. The results suggested a partitioning of the software between the host and the message coprocessor. Second, we implemented such a partition on a multiprocessor and measured its performance. Based on these results, we proposed bus primitives for supporting the interactions between the host, the message coprocessor, and the network devices. We designed the bus in detail to show the feasibility of the primitives from the point of view of hardware implementation. Through the simplicity of the design, we show that our bus proposal is cost effective in this environment. Finally, we used Timed Petri nets to model and analyze several system architectures and show the effectiveness of our software partition and hardware organization for solving the message-passing problem. %A Gilbert Verghese %A Shekhar Mehta %A Charles R. Dyer %T Image processing algorithms for the pipelined image-processing engine %R Technical Report #668 %I Computer Sciences Department, University of Wisconsin at Madison %D September 1986 %X Abstract: In this paper we describe nine basic vision algorithms for the National Bureau of Standards' Pipelined Image-Processing Engine (PIPE). The algorithms presented are: local peak detection, median filtering, thinning, the Hough transform for line detection, photometric stereo, n-bit point operations, detecting edges at multiple resolutions, stereo vision, and multiresolution, model-based object recognition. %A Mary Vernon %A John Zahorjan %A Edward D. Lazowska %T A comparison of performance Petri nets and queueing network models %R Technical Report #669 %I Computer Sciences Department, University of Wisconsin at Madison %D September 1986 %X Abstract: Queueing networks (QNs) and performance Petri nets (PPNs) are two approaches to answering performance questions about computer systems. While QNs were originally designed to model resource contention among independent jobs and PPNs were designed to model parallel systems containing synchronization, there is clearly some overlap in the systems that can be modeled using each. In this paper we address this overlap with the goal of indicating whether the two models are fundamentally the same, or whether there are intrinsic differences that make one approach more powerful than the other for particular application domains. There seem to be two characteristics of a modeling approach that are most important in determining how it may be profitably employed. The first is notation: what models can be expressed, and perhaps more importantly, how convenient is it to specify a particular class of models? The second feature is the evaluation technique: what performance measures can be computed from the model, and what computational effort is required to do so? Our comparison of QNs and PPNs therefore concentrates on these two aspects. It is conceivable that the shortcomings of either approach in a particular environment might be addressed by adopting features of the other better suited to that environment. We consider this possibility for making improvements to the notations and evaluation efficiencies of the two modeling approaches. %A Bryan S. Rosenburg %T Automatic generation of communication protocols %R Technical Report #670 %I Computer Sciences Department, University of Wisconsin at Madison %D October 1986 %O Ph.D. thesis %X Abstract: This dissertation describes an effort to improve application-level communication efficiency by using information provided by, and derived from, the application itself. We define a simple language for describing the static form of conversations between application processes. A conversation description serves as a form of service specification for the conversation's underlying communication protocol. It specifies the types of messages required by the conversation and provides a regular expression that defines the set of legal sequences of messages between the conversation's participants. We also define structures we call plans that application processes can use dynamically to participate in conversations. A plan is a regular expression a process can construct to describe its desire to send or receive messages allowed by its active conversations. The plan mechanism is a generalization of the CSP guarded alternative construct. It allows processes to describe future as well as immediate intentions. Conversation descriptions and plans contain application-specific information that can be used to enhance the efficiency of the application's communication. Other useful information can be derived from measurements made while the application is running. We present strategies for collecting and using information from these sources. These strategies attempt to use application-specific information to reduce the number of low-level messages needed to accomplish the application's communication. We have implemented a version of the protocol generation system that supports application processes to be executed on the Crystal multicomputer. We describe several typical applications and evaluate their performance. We show that application-specific information from several sources can be used to significantly improve the efficiency of the application's communication. %A Thomas Reps %T Incremental evaluation for attribute grammars with unrestricted movement between tree modifications %R Technical Report #671 %I Computer Sciences Department, University of Wisconsin at Madison %D October 1986 %X Abstract: This paper concerns the design of editors that perform checks on a language's context-dependent constraints. Our particular concern is the design of an efficient, incremental analysis algorithm for systems based on the attribute-grammar model of editing. With previous incremental evaluation algorithms for arbitrary noncircular attribute grammars, the editing model required there to be a restriction on the operation that moves the editing cursor: moving the cursor was limited to just a single step in the tree - either to the parent node or to one of the child nodes of the current cursor location. This paper describes a new updating algorithm that can be used when an arbitrary movement of the cursor in the tree is permitted. After an operation that restructures the tree, the tree's attributes can be updated with a cost of O ((1 + |AFFECTED|) m) m), where m is the size of the tree and AFFECTED is the subset of the tree's attributes that require new values, when the cost is amortized over a sequence of tree modifications. The editing cursor may be moved from its current location to any other node of the tree in a single, unit-cost operation. %A Robert Howard Gerber %T Dataflow query processing using multiprocessor hash-partitioned algorithms %R Technical Report #672 %I Computer Sciences Department, University of Wisconsin at Madison %D October 1986 %O Ph.D. thesis %X Abstract: In this thesis, we demonstrate that hash-partitioned query processing algorithms can serve as a basis for a highly parallel, high performance relational database machine. In addition to demonstrating that parallelism can really be made to work in a database machine context, we will show that such parallelism can be controlled with minimal overhead using dataflow query processing techniques that pipeline data between highly autonomous, distributed processes. For this purpose, we present the design, implementation techniques, and initial performance evaluation of Gamma, a new relational database machine. Gamma is a fully operational prototype consisting of 20 VAX 11/750 computers. The Gamma architecture illustrates that a high performance database machine can be constructed without the assistance of special purpose hardware components. Finally, a simulation model of Gamma is presented that accurately reflects the measured performance of the actual Gamma prototype. Using this simulation model, we explore the performance of Gamma for large multiprocessor systems with varying hardware capabilities. %A Raphael Finkel %A Gautam Das %A Dhruva Ghoshal %A Kamal Gupta %A Ganesh Jayaraman %A Mukesh Kacker %A Jaspal Kohli %A Viswanathan Mani %A Ananth Raghaven %A Michael Tsang %A Sriram Vajapeyam %T Experience with Crystal, Charlotte and Lynx - third report %R Technical Report #673 %I Computer Sciences Department, University of Wisconsin at Madison %D November 1986 %X Abstract: This paper describes several recent implementations of distributed algorithms at Wisconsin that use the Crystal multicomputer, the Charlotte operating system, and the Lynx language. This environment is an experimental testbed for design of such algorithms. Our report is meant to show the range of applications that we have found reasonable in such an environment and to give some of the flavor of the algorithms that have been developed. We do not claim that the algorithms are the best possible for these problems, although they have been designed with some care. In several cases they are completely new or represent significant modifications of existing algorithms. We present distributed implementations of the stable marriage problem, finding roots of an equation, Gaussian elimination, finding minimal dominating sets, PLA folding, the Hough transform, the Banker's algorithm, the n-queens problem, and quick-sort. Together with our previous two reports, this paper leads us to conclude that the environment is a valuable resource and will continue to grow in importance in developing new algorithms. %A Susan Horwitz %T Adding relational databases to existing software systems: implicit relations and a new relational query evaluation method %R Technical Report #674 %I Computer Sciences Department, University of Wisconsin at Madison %D November 1986 %X Abstract: Interactive software systems should include query handlers. Query handlers based on the relational database model are attractive because the model provides a uniform, non-procedural approach to query writing. Standard relational database systems require that all information be stored in relations; however, the data structures used by existing software systems are generally non-relational, and it is impractical to replace them with relations. A new kind of relations, implicit relations, and a new approach to query evaluation based on the use of access functions allow software systems to include relational query facilities without giving up existing non-relational data structures. The new query-evaluation method can also be used in traditional relational databases, and may be more efficient than traditional evaluation methods when applied to queries that use set operations. %A R. J. Chen %A R. R. Meyer %T A scaled trust region method for a class of convex optimization problems %R Technical Report #675 %I Computer Sciences Department, University of Wisconsin at Madison %D December 1986 %X Abstract: Piecewise-linear approximation of nonlinear convex objective functions in linearly constrained optimization produces subproblems that may be solved as linear programs. This approach to approximation may be used for nonseparable as well as separable functions, and for the former class (the focus of this paper), it lies between linear and quadratic approximation, we consider two devices: rectangular trus regions and dynamic scaling. The use of rectangular trust regions in conjunction with the type of piecewise-linear approximation considered here actually serves to simplify rather than complicate the approximating problems. This is a result of the equivalence of the trust region and the use of a limited number of segments in the approximation. The approach to dynamic scaling considered here may be applied to problems in which each objective function term is a convex function of a linear function of the variables. This scaling device allows the algorithm to adjust the approximation between an underestimating function (corresponding to a linear approximation) and an overestimating function (the nonseparable analog of the overestimate associated with separable approximation of convex functions.) The scaling factor is adjusted in accordance with the acceptance criteria associated with the trust region method. Computational experience is cited for some large-scale problems arising from traffic assignment applications. The algorithm considered here also has the property that it allows such problems to be decomposed into a set into a set of smaller optimization problems at each major iteration. These smaller problems correspond to linear single-commodity networks, and may be solved in parallel. Results are given for the distributed solution of these problems on the CRYSTAL multicomputer. %A Anil Allan Pal %T Generating execution facilities for integrated programming environments %R Technical Report #676 %I Computer Sciences Department, University of Wisconsin at Madison %D November 1986 %O Ph.D. thesis %X Abstract: This thesis presents an approach to the problem of generating execution facilities for integrated programming environments from specifications of the dynamic semantics of programming languages. The approach is based on techniques used in semantics-directed compiler generators, using a denotational semantic description of the language. These techniques are adapted to the special nature of an integrated programming environment, in particular the need to provide incremental translation and interactive execution. In interpreters generated using our system, programs are translated into denotations that are represented as linked structures containing pointers to the compiled code of denotation functions. This representation is compact, provides reasonable execution efficiency, and is easy to maintain incrementally during program modification. The correspondence between the executable representation and the parse tree of the program can be exploited to permit the user to interact with the program at runtime in terms of source-language constructs, thus providing support for interactive execution. We show how many of the features of previous hand-coded integrated programming environments can be incorporated naturally into the generated execution facilities. %A Mitali Bhattacharyya %A David Cohrs %A Barton Miller %T Implementation of a visual Unix process connector %R Technical Report #677 %I Computer Sciences Department, University of Wisconsin at Madison %D December 1986 %X Abstract: UPCONN is a tool used by a programmer to visually describe the connections between the processes in a distributed program and to execute the distributed program. With UPCONN, the implementation of processes in a program is separated from the description of the connections between them. The programmer is provided with a screen editor which enables processes to be described and allows the connections between these processes to be specified. When a distributed program is described, UPCONN allows the user to execute the program or save the program for later use. UPCONN has a modular design which makes it easily extendible and allows its parts to be used independently. A library of processes and procedures is included to perform specialized tasks, such as message monitoring and file access. The library provides a method for adding new features to UPCONN. Several existing Unix utilities can be linked in with UPCONN to provide a variety of functions. UPCONN is implemented under 4.3BSD Unix and runs on a network of homogeneous computers. %A Ze-Nian Li %A Leonard Uhr %T Pyramid vision using key features to integrate image-driven bottom-up and model-driven top-down processes %R Technical Report #678 %I Computer Sciences Department, University of Wisconsin at Madison %D December 1986 %X Abstract: Pyramid-like parallel hierarchical structures have been shown to be suitable for many computer vision tasks, and to have the potential for achieving the speeds needed for the real-time processing of real-world images. We are developing algorithms to explore the pyramid's massively parallel and shallowly serial-hierarchical computing ability in an integrated system that combines both low level and higher level vision tasks. Micro-modular transforms are used to embody the program's knowledge of the different objects it must recognize. This paper describes pyramid vision programs that, starting with the image, use transforms that assess key features to dynamically imply other feature-detecting and characterizing transforms, and additional top-down, model-driven processes to apply. Program performance is presented for four real-world images of buildings. The use of key features in pyramid vision programs and the related search and control issues are discussed. To expedite the detection of various key features, feature-adaptable windows are developed. In addition to image-driven bottom-up and model-driven top-down processing, lateral search is used, and is shown to be helpful, efficient, and feasible. The results indicate that, with the use of key features and the combination of a variety of powerful search patterns, the pyramid-like structure is effective and efficient for supporting parallel and hierarchical object recognition algorithms. %A Charles R. Dyer %T Multiscale image understanding %R Technical Report #679 %I Computer Sciences Department, University of Wisconsin at Madison %D December 1986 %X Abstract: This paper reviews methods for understanding multiscale (also called multiresolution) image descriptions. We include work related to the construction and analysis of image representations which make explicit properties of edges, shape and texture at multiple scales. In addition we present two applications of multiscale techniques for model-based object recognition and texture segmentation. %A Yannis E. Ioannidis %A Eugene Wong %T Query optimization by simulated annealing %R Technical Report #693 %I Computer Sciences Department, University of Wisconsin at Madison %D April 1987 %A Will Winsborough %T A minimal function graph semantics for logic programs %R Technical Report #711 %I Computer Sciences Department, University of Wisconsin at Madison %D August 1987 %P 19