%A L. Bukys
%A T.J. LeBlanc
%T Getting started with the BBN Butterfly parallel processor
%R BPR 1 (second edition)
%I Computer Science Department, University of Rochester
%D October 1986
%X This report is an introductory user's manual for the BBN Butterfly
multiprocessor at the University of Rochester.
It is intended to help novice users learn to access and use the various
Butterfly machines and the associated software environment.
Its primary purpose is to illustrate how to compile and execute simple
programs, and to describe what other documentation is available and where to
find it.

%A M. Fanty
%T A connectionist simulator for the BBN Butterfly multiprocessor
%R TR 164, BPR 2
%I Computer Science Department, University of Rochester
%D January 1986
%X This report describes the implementation of a connectionist simulator
on the BBN Butterfly Multiprocessor.
It presents the model of connectionist networks used by the simulator
and gives a detailed account of the simulator's implementation.
Performance data for some sample networks is included.
Some plans for future improvements are discussed.

%A T.J. LeBlanc
%T Shared memory versus message-passing in a tightly-coupled multiprocessor:
a case study
%R BPR 3
%I Computer Science Department, University of Rochester
%D January 1986.
%X Conceptually, the BBN Butterfly Parallel Processor can support a
model of computation based on either shared memory or message passing.
Recently, much of the work on the Butterfly, including finite element
analysis and computer vision algorithms, has assumed the shared-memory model.
In this paper we describe the results of our experiments with the
message-passing model.
The goal of the experiments was to analyze the tradeoffs between the
shared-memory model, as exemplified by the BBN Uniform System package,
and a simple message-passing model.
Important factors to be considered were performance, scalability, and
ease of programming.
We compare the two models with respect to these criteria and conclude
that the particular model of computation used is less important than how
well it is matched to the application.

%A T.J. Olson
%T Modula-2 on the BBN Butterfly parallel processor
%R BPR 4
%I Computer Science Department, University of Rochester
%D January 1986
%X This report describes the current state of the Rochester Butterfly
Modula-2 compiler and runtime environment.
It covers use of the compiler, the standard runtime environment, and the
(experimental) Butterfly environment.
It also describes methods of extending the experimental environment and
obtaining access to operating system functions.

%A E. Hinkelman
%T NET: a utility for building regular process networks on the BBN
Butterfly parallel processor
%R BPR 5
%I Computer Science Department, University of Rochester
%D February 1986
$X Coarse-grained parallel architectures like the BBN Butterfly are
difficult to use without tools for building networks of communicating
processes.
Net is a tool for creating a useful class of process networks.
This report describes the Net utility, an example application, and some
design decisions and experiences.

%A M.L. Scott
%T The interface between distributed operating system and
high-level programming language
%R TR 182 and BPR 6
%I Computer Science Department, University of Rochester
%D September 1986 (revised)
%O Also published in the \f2University of Rochester
1986-87 Computer Science and Computer Engineering Research Review\fP
%X A distributed operating system provides a process abstraction and
primitives for communication between processes.
A distributed programming language regularizes the use of the
primitives, making them both safer and more convenient.
The level of abstraction of the primitives dictates the division of
labor between the operating system and the language support routines,
with serious ramifications for both efficiency and flexibility.
%X LYNX, now available on the Butterfly, is a high-level language for
parallel programming.
Previous implementations of LYNX were designed for the Crystal
multicomputer at the University of Wisconsin and for Jon Kepecs's SODA.
Comparison of the three implementations provides important insights into
the nature of the language/operating system interface.
Among other things, the comparison suggests that a lower-level interface
is best; the Butterfly implementation is the simplest and the fastest of
the three.

%A M.L. Scott
%T LYNX reference manual
%R BPR 7
%I Computer Science Department, University of Rochester
%D August 1986 (revised)
%X LYNX is a message-based distributed programming language with novel
facilities for communication between processes and for management of
context within processes.
LYNX was first implemented on the Crystal multicomputer at the
University of Wisconsin.
It has subsequently been ported to the Butterfly Parallel Processor at
the University of Rochester.
%X This manual is intended for serious users of the Butterfly
implementation.
At the time of its writing it constitutes the \f2de facto\fP standard on
the syntax and semantics of LYNX.
It also describes ways in which the Butterfly implementation differs
from the standard, and in which that implementation resolves issues that
the standard leaves undefined.

%A T.J. LeBlanc
%A N.M. Gafter
%A T. Ohkami
%T SMP: a message-based programming environment for the BBN Butterfly
%R BPR 8
%I Computer Science Department, University of Rochester
%D July 1986
%X SMP is a message-bassed programming environment for the BBN Butterfly
similar in flavor and scope to the BBN Uniform System package.
As such, SMP provides an alternative to the shared memory model of the
Uniform System.
SMP supports the construction of \f2process families\fP, a fixed set of
asynchronous processes that communicate, using messages, according to a
given interconnection pattern.
A dynamic hierarchy of such process families is possible.
In this report we describe the SMP user interface and an implementation
of the Butterfly.

%A T.J. Olson
%T An image processing package for the BBN Butterfly parallel processor
%R BPR 9
%I Computer Science Department, University of Rochester
%D September 1986
%X The University of Rochester computer vision group uses IFF (Image
File Format) as its internal standard for representing and manipulating
image data.
This report describes an adaptation of the IFF libraries and programming
methodology to the BBN Butterfly Multiprocessor.
It offers guidelines for programmers using the library and describes
early experience writing utilities with the system.

%A T.J. Olson
%T Finding lines with the Hough transform
on the BBN Butterfly parallel processor
%R BPR 10
%I Computer Science Department, University of Rochester
%D September 1986
%X The Hough transform is a popular method of finding parametric curves
in noisy data.
It is a key primitive in many current vision systems.
In order to evaluate the suitability of the Butterfly for this type of
computation, we have implemented two line-finding Hough transform
algorithms under the Uniform System software package.
We conclude that for these particular algorithms the Butterfly/Uniform
System combination works very well, although memory contention may be a
problem for the more general case.

%A L. Bukys
%T Connected component labeling and border following on the BBN
Butterfly parallel processor
%R BPR 11
%I Computer Science Department, University of Rochester
%D October 1986
%X Multiprocessor architectures present the problem of efficiently
merging the results of separate (parallel) subcomputations.
This report discusses an approach to this problem based on the
UNION-FIND algorithm, and presents experimental results obtained on the
BBN Butterfly Parallel Processor running an assortment of benchmarks.

%A T.J. LeBlanc
%A J.M. Mellor-Crummey
%T Debugging parallel programs with instant replay
%R BPR 12
%I Computer Science Department, University of Rochester
%D September 1986
%X The debugging cycle is the most common methodology for finding and
correcting errors in sequential programs.
Cyclic debugging is effective because sequential programs are usually
deterministic.
Debugging parallel programs is considerably more difficult because
successive executions of the same program often do not produce the same
results.
In this paper we present a general solution for reproducing the
execution behavior of parallel programs, termed \f2Instant Replay\fP.
During program execution we save the relative order of significant
events as they occur, not the data associated with such events.
As a result, our approach requires less time and space to save the
information needed for program replay than other methods.
Our technique is not dependent on any particular form of interprocess
communication.
It provides for replay of an entire program, rather than individual
processes in isolation.
No centralized bottlenecks are introduced and there is no need for
synchronized clocks or a globally-consistent logical time.
We describe a prototype implementation of Instant Replay on the BBN
Butterfly Parallel Processor, and discuss how it can be incorporated
into the debugging cycle for parallel programs.

%A C.M. Brown
%A R.J. Fowler
%A T.J. LeBlanc
%A M.L. Scott
%A M. Srinivas
%A et\0al.
%T DARPA parallel architecture benchmark study
%R BPR 13
%I Computer Science Department, University of Rochester
%D October 1986
%X In intensive work over a four-week period in the summer of 1986,
seven problems were studied and implemented on the Butterfly.
The problems were inspired by various capabilities in computer vision,
and were proposed as benchmarks for a DARPA workshop on parallel
architectures.
They were: convolution and zero-crossing detection for edges, edge
tracking, connected component labeling, Hough transform, three
computational geometry problems (convex hull, Voronoi diagram, and
minimum spanning tree), three-dimensional visibility calculations,
subgraph isomorphism and minimum cost path calculation.
BPRs 10, 11, and 14 are detailed reports of three of the problems.
BPR 13 contains the conclusions of the study and writeups of the work
not covered in other BPRs.

%A J. Costanzo
%A L. Crowl
%A L. Sanchis
%A M. Srinivas
%T Subgraph isomorphism on the BBN Butterfly multiprocessor
%R BPR 14
%I Computer Science Department, University of Rochester
%D October 1986
%X This report describes an algorithm for finding subgraph isomorphisms
for a restricted class of graphs and a parallel implementation of the
algorithm on the BBN Butterfly Multiprocessor.
This effort was part of a larger project to assess the suitability of
the Butterfly architecture for a variety of machine vision tasks.
Our algorithm searches a tree in which each node represents a partial
assignment of vertices in the smaller graph to vertices in the larger
graph.
The algorithm prunes the search tree using properties of the two graphs
as constrained by the partial mapping.
These properties are vertex connectivity, distance between vertices and
the local topology of vertex clusters.
By carefully balancing the computational load and the contention for
shared resources, our algorithm achieves almost linear speedup in the
processing rate of search tree nodes.
However, the speedup of isomorphism detection rate is poor when looking
for a few isomorphisms, and good only when looking for many
isomorphisms.
We present an analysis of why we believe this property is intrinsic to
algorithms that parallelize the tree search without parallelizing the
node computations.
We also discuss the effectiveness of the Butterfly architecture and
programming environment in implementing such parallel algorithms.

%A L. Crowl
%T Chrysalis++
%R BPR 15
%I Computer Science Department, University of Rochester
%D December 1986
%X The standard Butterfly Parallel Processor's programming environment
uses the C programming language and the Chrysalis operating system.
This report describes Chrysalis++, a C++ programming language interface
to Chrysalis.
%X The binding strategy for the design of Chrysalis++ was to re-cast the
error-prone, explicit Chrysalis object management in C into the implicit
object management that occurs routinely in C++.
Chrysalis++ merges creation and access of Chrysalis objects into the
declaration of the variable used to represent them.
This strategy provides the entire Chrysalis++ programming environment
with one single object management strategy.
%X The development of Chrysalis++ highlighted some of the difficulties
in mixing a single process high level language with a low level
multiprocessor operating system.
Bringing C++ and Chrysalis together demonstrated weaknesses in the C++
class mechanism and showed inconsistencies in Chrysalis object treatment.
%X This report is composed of three parts.
The first part gives a background for this project and describes the
resulting Chrysalis++.
The second part describes the experiences in developing Chrysalis++ in
porting C++ to the Butterfly and some of the weaknesses discovered in C++.
Finally, the third part provides the details of Chrysalis++.
This last part is intended only for actual users of Chrysalis++.

%A J. Low
%T Experiments with remote procedure call on the Butterfly
%R BPR 16
%I Computer Science Department, University of Rochester
%D December 1986
%X Several remote procedure call protocols were implemented and timed on
the Butterfly.
These implementations used either Chrysalis synchronization facilities
(Dual Queues and Events) or shared memory.
Analysis of some of the costs, benefits and deficiencies of each of the
techniques is presented.

%A M.L. Scott
%A A.L. Cox
%T An empirical study of message-passing overhead
%R BPR 17
%I Computer Science Department, University of Rochester
%D December 1986
%O Submitted for publication
%X Conventional wisdom holds that message-passing is orders of magnitude
more expensive than shared memory for communication between parallel
processes.
Differences in the speed of underlying hardware mechanisms fail to
account for a substantial portion of the performance gap.
The remainder is generally attributed to the ``inevitable cost'' of
higher-level semantics, but a deeper understanding of the factors that
contribute to message-passing overhead has not been forthcoming.
%X In this paper we provide a detailed performance analysis of one
message-passing system: the implementation for the BBN Butterfly Parallel
Processor of the LYNX distributed programming language.
The case study includes a description of the implementation, an
explanation of optimizations employed to improve its performance, and a
detailed breakdown of remaining costs.
The data provide a direct measure of the expense of individual features
in LYNX.
They also provide insight into the likely costs of other message-passing
systems, both present and future.
Lessons gained from our experience should be of use to other researchers
in performing similar studies.

%A P. Dibble
%T Benchmark results for Chrysalis functions
%R BPR 18
%I Computer Science Department, University of Rochester
%D December 1986
%X This paper presents the results of benchmarks for the important
Chrysalis functions and macros.
In most cases, the Chrysalis code that implements the function or the
assembly code generated by the macro is analyzed as well.
It is intended to help Butterfly programmers choose efficient ways to
use Chrysalis, and to guide system programmers in enhancing Chrysalis.