%A L. Bukys %A T.J. LeBlanc %T Getting started with the BBN Butterfly parallel processor %R BPR 1 (second edition) %I Computer Science Department, University of Rochester %D October 1986 %X This report is an introductory user's manual for the BBN Butterfly multiprocessor at the University of Rochester. It is intended to help novice users learn to access and use the various Butterfly machines and the associated software environment. Its primary purpose is to illustrate how to compile and execute simple programs, and to describe what other documentation is available and where to find it. %A M. Fanty %T A connectionist simulator for the BBN Butterfly multiprocessor %R TR 164, BPR 2 %I Computer Science Department, University of Rochester %D January 1986 %X This report describes the implementation of a connectionist simulator on the BBN Butterfly Multiprocessor. It presents the model of connectionist networks used by the simulator and gives a detailed account of the simulator's implementation. Performance data for some sample networks is included. Some plans for future improvements are discussed. %A T.J. LeBlanc %T Shared memory versus message-passing in a tightly-coupled multiprocessor: a case study %R BPR 3 %I Computer Science Department, University of Rochester %D January 1986. %X Conceptually, the BBN Butterfly Parallel Processor can support a model of computation based on either shared memory or message passing. Recently, much of the work on the Butterfly, including finite element analysis and computer vision algorithms, has assumed the shared-memory model. In this paper we describe the results of our experiments with the message-passing model. The goal of the experiments was to analyze the tradeoffs between the shared-memory model, as exemplified by the BBN Uniform System package, and a simple message-passing model. Important factors to be considered were performance, scalability, and ease of programming. We compare the two models with respect to these criteria and conclude that the particular model of computation used is less important than how well it is matched to the application. %A T.J. Olson %T Modula-2 on the BBN Butterfly parallel processor %R BPR 4 %I Computer Science Department, University of Rochester %D January 1986 %X This report describes the current state of the Rochester Butterfly Modula-2 compiler and runtime environment. It covers use of the compiler, the standard runtime environment, and the (experimental) Butterfly environment. It also describes methods of extending the experimental environment and obtaining access to operating system functions. %A E. Hinkelman %T NET: a utility for building regular process networks on the BBN Butterfly parallel processor %R BPR 5 %I Computer Science Department, University of Rochester %D February 1986 $X Coarse-grained parallel architectures like the BBN Butterfly are difficult to use without tools for building networks of communicating processes. Net is a tool for creating a useful class of process networks. This report describes the Net utility, an example application, and some design decisions and experiences. %A M.L. Scott %T The interface between distributed operating system and high-level programming language %R TR 182 and BPR 6 %I Computer Science Department, University of Rochester %D September 1986 (revised) %O Also published in the \f2University of Rochester 1986-87 Computer Science and Computer Engineering Research Review\fP %X A distributed operating system provides a process abstraction and primitives for communication between processes. A distributed programming language regularizes the use of the primitives, making them both safer and more convenient. The level of abstraction of the primitives dictates the division of labor between the operating system and the language support routines, with serious ramifications for both efficiency and flexibility. %X LYNX, now available on the Butterfly, is a high-level language for parallel programming. Previous implementations of LYNX were designed for the Crystal multicomputer at the University of Wisconsin and for Jon Kepecs's SODA. Comparison of the three implementations provides important insights into the nature of the language/operating system interface. Among other things, the comparison suggests that a lower-level interface is best; the Butterfly implementation is the simplest and the fastest of the three. %A M.L. Scott %T LYNX reference manual %R BPR 7 %I Computer Science Department, University of Rochester %D August 1986 (revised) %X LYNX is a message-based distributed programming language with novel facilities for communication between processes and for management of context within processes. LYNX was first implemented on the Crystal multicomputer at the University of Wisconsin. It has subsequently been ported to the Butterfly Parallel Processor at the University of Rochester. %X This manual is intended for serious users of the Butterfly implementation. At the time of its writing it constitutes the \f2de facto\fP standard on the syntax and semantics of LYNX. It also describes ways in which the Butterfly implementation differs from the standard, and in which that implementation resolves issues that the standard leaves undefined. %A T.J. LeBlanc %A N.M. Gafter %A T. Ohkami %T SMP: a message-based programming environment for the BBN Butterfly %R BPR 8 %I Computer Science Department, University of Rochester %D July 1986 %X SMP is a message-bassed programming environment for the BBN Butterfly similar in flavor and scope to the BBN Uniform System package. As such, SMP provides an alternative to the shared memory model of the Uniform System. SMP supports the construction of \f2process families\fP, a fixed set of asynchronous processes that communicate, using messages, according to a given interconnection pattern. A dynamic hierarchy of such process families is possible. In this report we describe the SMP user interface and an implementation of the Butterfly. %A T.J. Olson %T An image processing package for the BBN Butterfly parallel processor %R BPR 9 %I Computer Science Department, University of Rochester %D September 1986 %X The University of Rochester computer vision group uses IFF (Image File Format) as its internal standard for representing and manipulating image data. This report describes an adaptation of the IFF libraries and programming methodology to the BBN Butterfly Multiprocessor. It offers guidelines for programmers using the library and describes early experience writing utilities with the system. %A T.J. Olson %T Finding lines with the Hough transform on the BBN Butterfly parallel processor %R BPR 10 %I Computer Science Department, University of Rochester %D September 1986 %X The Hough transform is a popular method of finding parametric curves in noisy data. It is a key primitive in many current vision systems. In order to evaluate the suitability of the Butterfly for this type of computation, we have implemented two line-finding Hough transform algorithms under the Uniform System software package. We conclude that for these particular algorithms the Butterfly/Uniform System combination works very well, although memory contention may be a problem for the more general case. %A L. Bukys %T Connected component labeling and border following on the BBN Butterfly parallel processor %R BPR 11 %I Computer Science Department, University of Rochester %D October 1986 %X Multiprocessor architectures present the problem of efficiently merging the results of separate (parallel) subcomputations. This report discusses an approach to this problem based on the UNION-FIND algorithm, and presents experimental results obtained on the BBN Butterfly Parallel Processor running an assortment of benchmarks. %A T.J. LeBlanc %A J.M. Mellor-Crummey %T Debugging parallel programs with instant replay %R BPR 12 %I Computer Science Department, University of Rochester %D September 1986 %X The debugging cycle is the most common methodology for finding and correcting errors in sequential programs. Cyclic debugging is effective because sequential programs are usually deterministic. Debugging parallel programs is considerably more difficult because successive executions of the same program often do not produce the same results. In this paper we present a general solution for reproducing the execution behavior of parallel programs, termed \f2Instant Replay\fP. During program execution we save the relative order of significant events as they occur, not the data associated with such events. As a result, our approach requires less time and space to save the information needed for program replay than other methods. Our technique is not dependent on any particular form of interprocess communication. It provides for replay of an entire program, rather than individual processes in isolation. No centralized bottlenecks are introduced and there is no need for synchronized clocks or a globally-consistent logical time. We describe a prototype implementation of Instant Replay on the BBN Butterfly Parallel Processor, and discuss how it can be incorporated into the debugging cycle for parallel programs. %A C.M. Brown %A R.J. Fowler %A T.J. LeBlanc %A M.L. Scott %A M. Srinivas %A et\0al. %T DARPA parallel architecture benchmark study %R BPR 13 %I Computer Science Department, University of Rochester %D October 1986 %X In intensive work over a four-week period in the summer of 1986, seven problems were studied and implemented on the Butterfly. The problems were inspired by various capabilities in computer vision, and were proposed as benchmarks for a DARPA workshop on parallel architectures. They were: convolution and zero-crossing detection for edges, edge tracking, connected component labeling, Hough transform, three computational geometry problems (convex hull, Voronoi diagram, and minimum spanning tree), three-dimensional visibility calculations, subgraph isomorphism and minimum cost path calculation. BPRs 10, 11, and 14 are detailed reports of three of the problems. BPR 13 contains the conclusions of the study and writeups of the work not covered in other BPRs. %A J. Costanzo %A L. Crowl %A L. Sanchis %A M. Srinivas %T Subgraph isomorphism on the BBN Butterfly multiprocessor %R BPR 14 %I Computer Science Department, University of Rochester %D October 1986 %X This report describes an algorithm for finding subgraph isomorphisms for a restricted class of graphs and a parallel implementation of the algorithm on the BBN Butterfly Multiprocessor. This effort was part of a larger project to assess the suitability of the Butterfly architecture for a variety of machine vision tasks. Our algorithm searches a tree in which each node represents a partial assignment of vertices in the smaller graph to vertices in the larger graph. The algorithm prunes the search tree using properties of the two graphs as constrained by the partial mapping. These properties are vertex connectivity, distance between vertices and the local topology of vertex clusters. By carefully balancing the computational load and the contention for shared resources, our algorithm achieves almost linear speedup in the processing rate of search tree nodes. However, the speedup of isomorphism detection rate is poor when looking for a few isomorphisms, and good only when looking for many isomorphisms. We present an analysis of why we believe this property is intrinsic to algorithms that parallelize the tree search without parallelizing the node computations. We also discuss the effectiveness of the Butterfly architecture and programming environment in implementing such parallel algorithms. %A L. Crowl %T Chrysalis++ %R BPR 15 %I Computer Science Department, University of Rochester %D December 1986 %X The standard Butterfly Parallel Processor's programming environment uses the C programming language and the Chrysalis operating system. This report describes Chrysalis++, a C++ programming language interface to Chrysalis. %X The binding strategy for the design of Chrysalis++ was to re-cast the error-prone, explicit Chrysalis object management in C into the implicit object management that occurs routinely in C++. Chrysalis++ merges creation and access of Chrysalis objects into the declaration of the variable used to represent them. This strategy provides the entire Chrysalis++ programming environment with one single object management strategy. %X The development of Chrysalis++ highlighted some of the difficulties in mixing a single process high level language with a low level multiprocessor operating system. Bringing C++ and Chrysalis together demonstrated weaknesses in the C++ class mechanism and showed inconsistencies in Chrysalis object treatment. %X This report is composed of three parts. The first part gives a background for this project and describes the resulting Chrysalis++. The second part describes the experiences in developing Chrysalis++ in porting C++ to the Butterfly and some of the weaknesses discovered in C++. Finally, the third part provides the details of Chrysalis++. This last part is intended only for actual users of Chrysalis++. %A J. Low %T Experiments with remote procedure call on the Butterfly %R BPR 16 %I Computer Science Department, University of Rochester %D December 1986 %X Several remote procedure call protocols were implemented and timed on the Butterfly. These implementations used either Chrysalis synchronization facilities (Dual Queues and Events) or shared memory. Analysis of some of the costs, benefits and deficiencies of each of the techniques is presented. %A M.L. Scott %A A.L. Cox %T An empirical study of message-passing overhead %R BPR 17 %I Computer Science Department, University of Rochester %D December 1986 %O Submitted for publication %X Conventional wisdom holds that message-passing is orders of magnitude more expensive than shared memory for communication between parallel processes. Differences in the speed of underlying hardware mechanisms fail to account for a substantial portion of the performance gap. The remainder is generally attributed to the ``inevitable cost'' of higher-level semantics, but a deeper understanding of the factors that contribute to message-passing overhead has not been forthcoming. %X In this paper we provide a detailed performance analysis of one message-passing system: the implementation for the BBN Butterfly Parallel Processor of the LYNX distributed programming language. The case study includes a description of the implementation, an explanation of optimizations employed to improve its performance, and a detailed breakdown of remaining costs. The data provide a direct measure of the expense of individual features in LYNX. They also provide insight into the likely costs of other message-passing systems, both present and future. Lessons gained from our experience should be of use to other researchers in performing similar studies. %A P. Dibble %T Benchmark results for Chrysalis functions %R BPR 18 %I Computer Science Department, University of Rochester %D December 1986 %X This paper presents the results of benchmarks for the important Chrysalis functions and macros. In most cases, the Chrysalis code that implements the function or the assembly code generated by the macro is analyzed as well. It is intended to help Butterfly programmers choose efficient ways to use Chrysalis, and to guide system programmers in enhancing Chrysalis.