IR Tools Workshop

March 19-21, 1998, Pittsburgh
"Effects on Education, and a Proposal for Collection of Tools"
Edward A. Fox
VPI&SU Dept. of Computer Science

ABSTRACT: This paper argues that IR tools are essential to support learning in the IR field. Such tools can be collected, if care and coordination are backed with suitable support from both professional societies and funding agencies. The results will include: speeding up of progress and knowledge transfer in the field, training of a larger number of effective practitioners and researchers, and better understanding of the broad field of information by a larger segment of the scholarly community.

1. IR Tools Needed for Learning about IR

Since the earliest days of IR, researchers have developed a variety of tools. In many cases, groups at educational institutions have used those tools to help in both graduate and undergraduate courses in which IR is covered.

1.1 Kinds of Tools and their Distribution

IR tools developed and used by learners have come in a variety of shapes and sizes, and have been distributed through a number of mechanisms. For example, large systems - e.g., SMART, INQUIRY, ZPrise, and MG - have been widely distributed and used in a variety of institutions. Small routines, like stemming algorithms (e.g., Porter's routine), have been made available in written and electronic forms. IR test collections, available on an informal basis, through CD-ROM (e.g., on Virginia Disc 1) or over the Internet, have been repeatedly used in investigations. Books, like the Frakes and Baeza-Yates edited volume, accompanied with algorithms in on-line and CD-ROM forms, have been carefully studied. Screen dumps and visualizations, as well as a small number of online interactive demonstrations (e.g., Scorpion by OCLC), whether in collections such as that developed by E. Efthimiadis, or individually made available over WWW, have been used in classroom settings and for individual exploration.

1.2 Learners' Use of Tools

Some institutions, like Virginia Tech, have directly used a number of tools in related courses. In particular, there are laboratory exercises using online tools, freely shared in each of a number of course modules. See for example:

In some cases students use a tool for a homework exercise, while in other cases students use tools to undertake term projects. Some students simply apply a tool, while others study the tool to understand it or even to add enhancements. Some tools are small and simply illustrate key concepts, while other tools are complex and may be viewed as black-box systems whose overall function is under investigation. Some tools are manipulated by students, while others are huge online systems that students interact with from afar. Some students implement small tools from algorithm descriptions, while other students serve as "subjects" in experiments that involve testing a tool. Clearly, tools come in all shapes and sizes, and so can be used by learners in almost any imaginable way that serves their objectives.

1.3 Problems with Availability of Tools

Unhappily, good IR tools are unevenly distributed to learners. Some tools are not "politically correct" in a particular lab, especially when different theories or models are locally popular. Most tools are only available in a very small number of sites, since tool developers often do not package and distribute them (widely). All tools suffer from poor documentation, making it difficult for them to be adopted except when there is collaboration or extensive personal contact between faculty at (tool) preparing and (tool) receiving institutions.

Very few tools are described in depth in the open literature; if they are, such as in SIGIR Forum, they are often lost to those who don't read or receive such publications. It is seldom the case that metadata descriptions of IR tools are prepared, and there are no well-known catalogs of such tools. Though the WWW has made it easier to access tools that are made available, there is no well-known forum to discuss details of tools or to get at comments on what tools to select for which purposes.

2. How to Collect High Quality IR Tools

Collecting IR tools is a key concern, one that must be solved if our overall enterprise is to work. There are a number of ways to collect tools, however. We give two examples:

2.1 Research & Development Groups Making Tools Available

First, R&D groups can be encouraged to make their tools available to others. This can be urged by funding agencies, either when grants are made or through offering support for supplements to grants (that are almost complete and have yielded useful tools as deliverables).

Centers and laboratories developing tools can be urged to provide versions of their systems or other tools that can be used by learners in other sites. This has been partially practiced at locations like Cornell and U. Mass Amherst, for example, with very positive results.

2.2 Demonstrations: from Conference to Classroom

Second, associations like SIGIR can have a dramatic effect on transferring new research into the classroom. A simple method is to couple the demonstration track of each sponsored conference with a follow-on effort of some education committee. Thus, all suitable demonstrations at the annual SIGIR conference could be "packaged" as a set of screen dumps, an online set of WWW pages, or even an interactive application, that then could be widely publicized by SIGIR for use by learners worldwide.

2.3 Need for Support

Making tools available for use by others requires work. Some of that must be done by the authors of the tools. Other work could be done by those interested in "cataloging" the tools. Yet other work could be done by reviewers who critique tools, test them for portability, and help make sure that usefulness and usability are maximized.

3. Effects of Collection of IR Tools on Science and Technology

Having IR tools widely available could have a dramatic impact on the fields of science and technology. At NFAIS'98 (40th Anniversary of the National Federation of Abstracting and Information Services, Philadelphia, PA, Feb. 1998) it was clearly shown that key tools and services (citation services, bibliographic services, indexing services, search systems, CD-ROM databases, abstracting services, ...) developed in the information industry have played a crucial role in supporting progress in science, engineering, technology, medicine, and other areas. With the advent of the WWW, many more end-users work with these systems and others newly arrived on WWW, often with little training or understanding of basic concepts. Now that many colleges and universities are launching courses about WWW, online information, searching, and similar topics, having easily accessible materials for learners to understand this field more thoroughly could have a tremendous impact on the scholarly community at all levels.

3.1 Situated Understanding

While taking courses in the IR field is important, and is becoming more common at a growing number of institutions, this still is an unusual situation for the hundreds of thousands each year who could benefit from learning about this topic. Hence it is important that educational materials, and tools in particular, be made available over WWW in situations conducive to learning. For example, the Networked CS Technical Report Library, NCSTRL,, that covers technical reports at about a hundred computing departments, could be connected to a set of tools that might illustrate advanced search methods (i.e., situated in a context found when searching for reports). Other analogous offerings, where tools are made available as a type of extended "help system," might assist users of practical systems to understand "advanced features" more thoroughly, instead of adopting defaults, as is common.

3.2 IR as a Basic Need in Science and Technology

Since information is a basic necessity for modern life, and since those involved in science and technology must use information to carry out their work activities, it is crucial that IR concepts and tools be more widely understood and used. Concepts are best understood when applied in a real context, and are better learned when applied in an active situation, such as through using tools. With a vast and growing scientific literature, it is becoming increasingly important to know how to: formalize an information need, discover collections, formulate queries, manage results, develop search strategies, expand and reformulate queries, work with citations, prepare and manipulate documents, construct profiles, classify information, evaluate results and systems, or utilize and critique interfaces.

Not only will understanding concepts and tools directly help the larger community, but it also will have indirect effects. Many widely used services, such as the current generation of WWW search services, arose from small tools that were adapted to new contexts and collections. Thus, by having more tools available, there are likely to be many new services to aid the community that arise as the result of future work to further package and apply those tools.

3.3 Effects on De-Duping and Replication

If tools that are developed are made more widely available, it is likely that development of very similar tools will shrink. This savings of effort may lead to easier replication of studies undertaken by others, which is made difficult now because tools are often locally developed and used, but not made available to "competing" groups. Real "knowledge transfer" often requires replication of results by others, that can occur more quickly when tools are shared; further advances that build on prior work are also facilitated in such situations.

4. Proposal for Action

To move the IR community forward in the IR tools area, two focussed proposals are offered:

4.1 Liaison with CSTC, CRIM, ToCECS, NDLTD

First, the IR Community is invited to participate in four new efforts to apply digital library technology to improve education in the computing field.

  1. Computer Science Teaching Center:
    In March 1998 a two year grant from NSF's DUE was awarded to Deborah Knox (TCNJ), with co-PIs Scott Grissom (U. Illinois Springfield) and Edward Fox, to develop a digital library of computer science teaching resources. Professor Knox is focusing on laboratory materials, Professor Grissom on visualizations and visualization tools, and Professor Fox is providing infrastructure as well in collecting tools and other resources in the areas of multimedia, hypertext and information access (including IR). Everyone is encouraged to contribute IR tools to CSTC which can then help with reviewing, documentation, and making those tools accessible.
  2. Curriculum Resources in Interactive Multimedia:
    In February 1998 a two year grant from NSF's DUE was awarded to Edward Fox and Rachelle Heller (GWU), to develop and collect curriculum and courseware resources to help with learning about interactive multimedia information and systems. This fits with the CSTC effort, and will integrate with it. IR tools can be handled through this framework, especially regarding integration into curriculum recommendations.
  3. Transactions on Courseware and Education in Computer Science:
    In connection with the CSTC effort, a draft proposal has been prepared and is under discussion with SIGCSE regarding starting a transactions (or other suitable type of journal) - ToCECS - in connection with the ACM Digital Library, so that submissions of IR tools can be handled like scholarly papers in terms of being reviewed, refined, published, and earning recognition. If something like this is ultimately approved, there may be strong motivation for faculty and students to spend time and effort to develop and submit IR tools.
  4. Networked Digital Library of Theses and Dissertations:
    As universities join the NDLTD, electronic forms of theses and dissertations will become more widely accessible. Those that are in the IR field, and which include IR tools, test collections, data sets, or other resources that can be helpful for others to study and apply, can be easily made accessible to others. As part of the NDLTD we would be happy to supplement our work with another project in which special metadata and other facilities were added to support the goals and objectives of the IR Tools Workshop. Thus, a special category, and special metadata and search fields, might be added for those particularly interested in IR tools.

Given suitable support and assistance, the teams involved in the four efforts above, as well as similar projects, could help with the collection and distribution of IR tools.

4.2 Support by Funding Agencies and Professional Associations

Finally, for IR tools to become more central in the IR community as well as to help those in the broader world of science, engineering and technology, there is need for support by funding agencies and professional associations. Those willing to build important tools, or to extend prototypes into nicely packaged tools, or to help with the reviewing and testing of tools, need support and/or incentives. To some, a little work in this realm, e.g., reviewing for ToCECS, may occur our of loyalty. To others, a publication, such as in ToCECS, or in a SIGIR conference proceedings (maybe in a separate track), may suffice. However, large tools will require serious support, such as through grant funding. Finally, associations may encourage such work with awards (e.g., a SIGIR "tool award", similar to the annual ACM award for systems, or a "best demo award" for the SIGIR conference) that encourage this type of effort and indicate its importance for promotion, tenure, and merit decisions. In short, strong support at all levels is called for to encourage a significant strengthening in community support for IR tools.

Appendix - Pointers from Dragomir R. Radev

As an example of the power of collecting contributions from a wide variety of researchers, please see the list below of pointers provided by Dragomir R. Radev of Columbia University, in response to my invitation to contribute to this workshop.

  1. PROFILE demonstration, usable for query expansion
  2. PROFILE explanation
  3. Query expansion for "president"