Multivalent documents is a new model of digital documents. Here we describe the initial, proof-of-concept implementation of multivalent documents, which we call MVD 0.0. This implementation is still available on for use on our scanned image document collection. However, we have recently released MVD 0.9, which more fully embodies the ideas of our model.
MVD 0.0 should work on any reasonable Java-compliant web client. (MVD 0.9, while much more functional, relies on Java features that seem poorly implemented on many platforms, as the MVD 0.9 documentation describes.)
We suggest you read the short "Multivalent" papers here for the general idea; online help, available via the brower's help button, provides more practical instructions. Also available are brief technical descriptions of layer data formats.
Some features are available for all documents in our scanned page collection. Others features require additional markup and are found only on a subset. Documents that exhibit a wide range of features include ELIB:17 (Water Conditions in California Report 3) and ELIB:620 (A Model of Sales).
The multivalent document paradigm stands in contrast to the conventional monolithic document structures. Such documents generally are composed in a single mark-up language or document representation that is intended to anticipate all possible contents an author may wish to express. Such a specification is complex, and interaction with such a document is provided via concomitantly complicated viewers, generally of limited extensibility.
In contrast, the multivalent approach presumes that a single document comprises multiple layers of different but intimately related material. Each layer is of homogeneous content, but is of a relatively limited scope and functionality. The multivalent frameworks allows the composition of simpler layers to produce useful behaviors. In addition, new layers can be added at later stages. Layers have associated with them dynamically loaded program objects, called behaviors, that manipulate the content, often communicating with others layers and other behaviors to achieve a desired effect.
By representing the conceptual document as multiple components, others who are not the original document's author can add additional content and behavior at a later time, even introducing new technology that was unknown during the preparation of the initial materials. By keeping the pieces of content simple, the behaviors that manipulate them are more straightforward to define; the components are also more easily reusable. By storing the content in distinct pieces, only those pieces necessary to support the requested interaction need be shipped over the network, thus conserving bandwidth. Although narrowly tailored software could written to support any given document functionality, the general framework provided by multivalent documents leverages the work done in other areas, while fully supporting specialization to particular application domains.
The multivalent document paradigm allows the construction of rich varieties of online digital documents, documents which do not merely imitate the capabilities of other media. Such a true digital document provides an interface to potentially complex content. Since this content is infinitely varied and specialized, the framework is aimed at providing a means to interact with it in arbitrarily specialized ways. Furthermore, since relevant content may be found in distinct documents, the framework draws from multiple sources, yet provides a coherent presentation to the user. Finally, the framework is meant to be conducive to the convenient authoring of new content, the definition of new means of manipulation, and the seamlessly meshing of both with existing materials.
MVD 0.0 is the initial proof-of-concept implementation of a multivalent document browser. This prototype is available via our web page to Java-compliant web clients, and implements several kinds of functionality related to scanned pages. The prototype runs on our entire document collection.
The prototype makes use of several layers. One layer is the scanned page image; another comprises ASCII text derived from an optical character recognition process; a third layer relates the ASCII characters to their pixel positions on the image. In terms of functionality, when a user views a page via the multivalent document browser, the user sees the page image, as this is the form of the document the author intended the user to see. Unlike most image viewers, in which a page image has no function capabilities other than to enable the users to read them; a page image is--it is, after all, just a picture of a page--the multivalent document viewer gives users a number of novel functions:
The figure above displays the image of a scanned page with selected text and matching search terms. In particular, the search terms supplied are ``water'' and ``association''. In addition, the text from ``Tri-Dam Project'' in column 1 to ``U.S. Department'' in column 3 has been selected. Note that the the yellow highlight region conforms to the column structure of the page layout. The corresponding text is now available for pasting.
To implement these functions, the linking or ``wordbox'' layer, i.e., the layer that specifies the relationship of the ASCII text to pixel positions, is produced dynamically via a server set up at Xerox PARC. This server is capable of translating the (rather inscrutable) representation produced but the OCR software into a more accessible form. While it is not logically necessary to run the translation service at a separate installation, doing so illustrates the dynamic nature of the model, which allows layers to be created on the fly.
The functionality just described, plus some additional features, is available for the entire corpus of scanned images. Additional functionality exists in the form of additional layers for certain images:
In this example, the word ``precipitation'' was selected for definition. The terms underscored in blue have hyperlinks. Note that these include some region of map in which the OCR process was able to identify usable words.
This example demonstrates one of the advantages of the multivalent approach. The link layer in the example was made up by someone other than the author. In principle, we could have multiple sets of hyperlinks for different purpose, created at different times. It would be difficult to achieve this functionality in a model in which hyperlinks are part of the mark-up language layer.
This prototype is built upon a general architecture which we have developed over the past year. Not all of which has yet been implemented. Here we describe the general vision, relating it particular to our prototype.
In the general model, a layer has three kinds of components: data or information content, and ``behaviors'' that supply functional interfaces, and behaviors that supply one or more user-level interfaces. Behaviors resemble classes in object-oriented programming languages, encapsulating each layer as a set of methods that operate on private data. (Indeed, they are implemented as such a set of distinguished classes.) That is, data in a behavior is not directly accessible by other behaviors, which are therefore forced to operate at a higher level than bit-tweaking a volatile format like a byte stream. Instead, behaviors communicate through program-level interfaces.
For example, the multivalent documents implemented in our prototyped consist, in part, of a scanned page image with geometrically positioned OCR. The character layer stores the mapping between character positions in the text stream and their location on the image. These locations may be stored directly as bounding boxes, or they may be given as character origins along with a single global pointer to the corresponding font metrics from which the bounding boxes may be calculated. The exact method is hidden to other behaviors, since all access is provided through higher-level function interfaces. The select-and-paste behavior needs to map a mouse click to a character position, and take it and subsequent mouse drags and highlight the intervening characters. It does so by taking the two positions given by the initial mouse click and the current mouse position and mapping these to character positions in the text stream; these text positions are then transformed into the region which is drawn on the image. This region is not usually the rectangular region whose corners are the two points given by the mouse, as it obeys line boundaries; therefore, it is useful to utilize a behavior that is intimately familiar with the data.
A variety of behaviors may be active at any point in time, on various regions of the document. To aid the user in determining what is active where, most behaviors can outline its area of control as one, and perhaps its only, user-level interface.
Not all behaviors have all three components. Some behaviors are data-centric, serving primarily as information repositories, with enough program-level interfaces to provide access to the data. Other behaviors are program or functionality-centric. For instance, a general searching behavior may not store any data in itself, except perhaps a list of ``stop words" that should be ignored during the search. The searching function calls upon other behaviors to provide the text to search. At the user-level, it may have the ability to present a simple type-in box for the search term, or it may rely solely on other behaviors to invoke it. Still other behaviors primarily provide a user interface to functionality and content available elsewhere. Our view is that most customization by the average user will take sophisticated functionality developed by experts and mold the interaction with them to personal taste.
Often the layers of a multivalent document are geographically dispersed on various repositories. The more cooperative servers are fronted by a database that can respond to queries by ``name'' for a specific piece of information, as opposed to a URI that maps more or less directly into a file system, and for entities matching a description of its attributes, like ``semantic layers associated with document 28329'' or ``behaviors that can search Unicode''. In our model, no particular cooperation from a server is required beyond delivering the raw data.
The user controls a client that can communicate with various servers. The query first goes to a ``handle server'' (similar to that described in [Kahn94]) that takes the name and returns a list of servers that have relevant information. The client queries this list of servers for the essential behaviors for minimal interaction with the document. Other behaviors are loaded as needed; this on demand loading conserves network bandwidth, which is important considering that a multivalent decomposition is most appropriate for complex documents with a great deal of content. Typically the main components of the document are fetched from a remote source. In all cases the server returns a series of ``type descriptors'', that is, concise characterizations of the behaviors at that server. This information is used in constructing what we call ``the type graph''.
All servers for which the client has access permission return a type descriptor. Each layer and behavior is typed, and the type graph is a construction of the relationships among pieces. When a behavior needs a particular information layer or action, it consults the type graph for an object that can satisfy it. If the behavior is not available locally (and not cached), it is fetched at this time over the network. The type graph is key to managing interactions among behaviors. Because the type graph is locally managed, it can be massaged and rearranged. One such use of the type graph is to override a behavior or piece of a behavior, say from the official repository for a particular document, with a local customization.
Another use of the type graph is to introduce new technology in a first class way into documents that were initially prepared before its development or in ignorance of it, but at any rate without special accommodations made for it.
As mentioned above, we have released MVD 0.9, which more fully embodies the ideas of our model. In particular, MVD 0.9 demonstrates some collaborative annotation capabilities one can build on top of MVD.
We have developed a geographic data viewer, called GIS Viewer, based on the MVD model. We have applied GIS Viewer to several collections of layer data. See the Geographic Data for further information about, and access to, this prototype.