About MVD Image Browser 0.0

Berkeley Digital Library Project

Multivalent documents is a new model of digital documents. Here we describe the initial, proof-of-concept implementation of multivalent documents, which we call MVD 0.0. This implementation is still available on for use on our scanned image document collection. However, we have recently released MVD 0.9, which more fully embodies the ideas of our model.

MVD 0.0 should work on any reasonable Java-compliant web client. (MVD 0.9, while much more functional, relies on Java features that seem poorly implemented on many platforms, as the MVD 0.9 documentation describes.)

We suggest you read the short "Multivalent" papers here for the general idea; online help, available via the brower's help button, provides more practical instructions. Also available are brief technical descriptions of layer data formats.

Some features are available for all documents in our scanned page collection. Others features require additional markup and are found only on a subset. Documents that exhibit a wide range of features include ELIB:17 (Water Conditions in California Report 3) and ELIB:620 (A Model of Sales).


Multivalent Documents in Brief

The multivalent document paradigm stands in contrast to the conventional monolithic document structures. Such documents generally are composed in a single mark-up language or document representation that is intended to anticipate all possible contents an author may wish to express. Such a specification is complex, and interaction with such a document is provided via concomitantly complicated viewers, generally of limited extensibility.

In contrast, the multivalent approach presumes that a single document comprises multiple layers of different but intimately related material. Each layer is of homogeneous content, but is of a relatively limited scope and functionality. The multivalent frameworks allows the composition of simpler layers to produce useful behaviors. In addition, new layers can be added at later stages. Layers have associated with them dynamically loaded program objects, called behaviors, that manipulate the content, often communicating with others layers and other behaviors to achieve a desired effect.

By representing the conceptual document as multiple components, others who are not the original document's author can add additional content and behavior at a later time, even introducing new technology that was unknown during the preparation of the initial materials. By keeping the pieces of content simple, the behaviors that manipulate them are more straightforward to define; the components are also more easily reusable. By storing the content in distinct pieces, only those pieces necessary to support the requested interaction need be shipped over the network, thus conserving bandwidth. Although narrowly tailored software could written to support any given document functionality, the general framework provided by multivalent documents leverages the work done in other areas, while fully supporting specialization to particular application domains.

The multivalent document paradigm allows the construction of rich varieties of online digital documents, documents which do not merely imitate the capabilities of other media. Such a true digital document provides an interface to potentially complex content. Since this content is infinitely varied and specialized, the framework is aimed at providing a means to interact with it in arbitrarily specialized ways. Furthermore, since relevant content may be found in distinct documents, the framework draws from multiple sources, yet provides a coherent presentation to the user. Finally, the framework is meant to be conducive to the convenient authoring of new content, the definition of new means of manipulation, and the seamlessly meshing of both with existing materials.


MVD 0.0

MVD 0.0 is the initial proof-of-concept implementation of a multivalent document browser. This prototype is available via our web page to Java-compliant web clients, and implements several kinds of functionality related to scanned pages. The prototype runs on our entire document collection.

The prototype makes use of several layers. One layer is the scanned page image; another comprises ASCII text derived from an optical character recognition process; a third layer relates the ASCII characters to their pixel positions on the image. In terms of functionality, when a user views a page via the multivalent document browser, the user sees the page image, as this is the form of the document the author intended the user to see. Unlike most image viewers, in which a page image has no function capabilities other than to enable the users to read them; a page image is--it is, after all, just a picture of a page--the multivalent document viewer gives users a number of novel functions:

Searching
The user can type search terms in a separate search window. Regions of the page containing images of the matching words will be highlighted on the page (each word group in a different color).

OCR select-and-paste
Another capability is ``select-and-paste''. In this behavior, the user indicates a region of the screen with a mouse click and drag. The region corresponding to the selected text will be highlighted (by coloring the background). The corresponding ASCII text can then be copied into the cut buffer, from which the user can paste the text into any other application. In effect, these behaviors allow the user to treat a formally inert scanned page image as if it were a document in a word processor.

The figure above displays the image of a scanned page with selected text and matching search terms. In particular, the search terms supplied are ``water'' and ``association''. In addition, the text from ``Tri-Dam Project'' in column 1 to ``U.S. Department'' in column 3 has been selected. Note that the the yellow highlight region conforms to the column structure of the page layout. The corresponding text is now available for pasting.

To implement these functions, the linking or ``wordbox'' layer, i.e., the layer that specifies the relationship of the ASCII text to pixel positions, is produced dynamically via a server set up at Xerox PARC. This server is capable of translating the (rather inscrutable) representation produced but the OCR software into a more accessible form. While it is not logically necessary to run the translation service at a separate installation, doing so illustrates the dynamic nature of the model, which allows layers to be created on the fly.

The functionality just described, plus some additional features, is available for the entire corpus of scanned images. Additional functionality exists in the form of additional layers for certain images:

Table sorting
In this case, a layer is supplied comprising information about a table located on a page. The associated functionality is as follows: A mouse click on the table header causes the lines of the image to sort themselves in accordance with the actual values in the column; shift-clicking produces a sort in the reverse order. I.e., the actual pixels are shifted in the image to produce a sort corresponding to the user's preference. This functionality is quite general, but is currently limited to some examples for which we have produced by hand the appropriate table data (e.g., the table on logical page 8 of Water Conditions in California Report 3, California Department of Water Resources bulletin #: 120-90, April 1, 1990, Elib ID: ELIB:17). Wait until ``table'' appears in the list of layers loaded before clicking on a column heading. For example, the figure below shows this table in its original version; and the figure below here shows the same region of the screen after clicking on the third column from the right, labeled ``inches of water equivalent, percent of Apr 1''.

``Alternative-select-and-paste''
Above we described how the user can select a screen region and paste the corresponding OCR. However, users may want to paste some interesting function of the corresponding text, rather than the text itself. For example, perhaps the user would want the text transcribed in a particular mark-up language, or in a form suitable for a particular purpose, such as a citation. We implemented this general functionality. Some good examples of this functionality are in ELIB:620 (A Model of Sales). In particular, p. 4 and p. 8 each have ``alt-text'' layers. The layer on p. 4 contains information about the equations on that page. (This layer was created by the Project's OCR programs for mathematical formulas.) Dragging the cursor across an equation will cause it to be highlighted. Clicking the mouse will cause the entire region to be selected. However, what is copied into the cut buffer, and hence made available for pasting, is the Latex which will cause this equation to be typeset. Similarly, the layer on p. 8 contains alternative text for each of the references. A select and paste here will cause the reference contents to be pasted in bibtex format.

Dictionary Lookup
Another function we implemented in the prototype is dictionary lookup. Position the cursor at a word in the image, and perform a Control-mouse click. The corresponding behavior directs the mouse click to look up the corresponding word in the image in a networked dictionary resource, and formats the text returned to have a suitable appearance.

Hypertext
We have also implemented hypertext as a layer. Visually, this layer appears as underscores of the areas of the image that are links. As with other systems, passing the cursor over the active hypertext region will reveal a description of the link, and clicking on the active region transports the user to the specified destination. An example of hyperlinks, and of the dictionary lookup, is given in the figure below. .

In this example, the word ``precipitation'' was selected for definition. The terms underscored in blue have hyperlinks. Note that these include some region of map in which the OCR process was able to identify usable words.

This example demonstrates one of the advantages of the multivalent approach. The link layer in the example was made up by someone other than the author. In principle, we could have multiple sets of hyperlinks for different purpose, created at different times. It would be difficult to achieve this functionality in a model in which hyperlinks are part of the mark-up language layer.


The Multivalent Document Architecture

This prototype is built upon a general architecture which we have developed over the past year. Not all of which has yet been implemented. Here we describe the general vision, relating it particular to our prototype.

In the general model, a layer has three kinds of components: data or information content, and ``behaviors'' that supply functional interfaces, and behaviors that supply one or more user-level interfaces. Behaviors resemble classes in object-oriented programming languages, encapsulating each layer as a set of methods that operate on private data. (Indeed, they are implemented as such a set of distinguished classes.) That is, data in a behavior is not directly accessible by other behaviors, which are therefore forced to operate at a higher level than bit-tweaking a volatile format like a byte stream. Instead, behaviors communicate through program-level interfaces.

For example, the multivalent documents implemented in our prototyped consist, in part, of a scanned page image with geometrically positioned OCR. The character layer stores the mapping between character positions in the text stream and their location on the image. These locations may be stored directly as bounding boxes, or they may be given as character origins along with a single global pointer to the corresponding font metrics from which the bounding boxes may be calculated. The exact method is hidden to other behaviors, since all access is provided through higher-level function interfaces. The select-and-paste behavior needs to map a mouse click to a character position, and take it and subsequent mouse drags and highlight the intervening characters. It does so by taking the two positions given by the initial mouse click and the current mouse position and mapping these to character positions in the text stream; these text positions are then transformed into the region which is drawn on the image. This region is not usually the rectangular region whose corners are the two points given by the mouse, as it obeys line boundaries; therefore, it is useful to utilize a behavior that is intimately familiar with the data.

A variety of behaviors may be active at any point in time, on various regions of the document. To aid the user in determining what is active where, most behaviors can outline its area of control as one, and perhaps its only, user-level interface.

Not all behaviors have all three components. Some behaviors are data-centric, serving primarily as information repositories, with enough program-level interfaces to provide access to the data. Other behaviors are program or functionality-centric. For instance, a general searching behavior may not store any data in itself, except perhaps a list of ``stop words" that should be ignored during the search. The searching function calls upon other behaviors to provide the text to search. At the user-level, it may have the ability to present a simple type-in box for the search term, or it may rely solely on other behaviors to invoke it. Still other behaviors primarily provide a user interface to functionality and content available elsewhere. Our view is that most customization by the average user will take sophisticated functionality developed by experts and mold the interaction with them to personal taste.

MVD Architecture -- Behaviors in Operation

Often the layers of a multivalent document are geographically dispersed on various repositories. The more cooperative servers are fronted by a database that can respond to queries by ``name'' for a specific piece of information, as opposed to a URI that maps more or less directly into a file system, and for entities matching a description of its attributes, like ``semantic layers associated with document 28329'' or ``behaviors that can search Unicode''. In our model, no particular cooperation from a server is required beyond delivering the raw data.

The user controls a client that can communicate with various servers. The query first goes to a ``handle server'' (similar to that described in [Kahn94]) that takes the name and returns a list of servers that have relevant information. The client queries this list of servers for the essential behaviors for minimal interaction with the document. Other behaviors are loaded as needed; this on demand loading conserves network bandwidth, which is important considering that a multivalent decomposition is most appropriate for complex documents with a great deal of content. Typically the main components of the document are fetched from a remote source. In all cases the server returns a series of ``type descriptors'', that is, concise characterizations of the behaviors at that server. This information is used in constructing what we call ``the type graph''.

MVD Architecture -- The Type Graph

All servers for which the client has access permission return a type descriptor. Each layer and behavior is typed, and the type graph is a construction of the relationships among pieces. When a behavior needs a particular information layer or action, it consults the type graph for an object that can satisfy it. If the behavior is not available locally (and not cached), it is fetched at this time over the network. The type graph is key to managing interactions among behaviors. Because the type graph is locally managed, it can be massaged and rearranged. One such use of the type graph is to override a behavior or piece of a behavior, say from the official repository for a particular document, with a local customization.

Another use of the type graph is to introduce new technology in a first class way into documents that were initially prepared before its development or in ignorance of it, but at any rate without special accommodations made for it.


Recent Developments

As mentioned above, we have released MVD 0.9, which more fully embodies the ideas of our model. In particular, MVD 0.9 demonstrates some collaborative annotation capabilities one can build on top of MVD.

We have developed a geographic data viewer, called GIS Viewer, based on the MVD model. We have applied GIS Viewer to several collections of layer data. See the Geographic Data for further information about, and access to, this prototype.


Berkeley Digital Library Project / www@elib.cs.berkeley.edu / Last Modified 12 Febuary 1996