ABSTRACT: This paper argues that IR tools are essential to support learning in the IR field. Such tools can be collected, if care and coordination are backed with suitable support from both professional societies and funding agencies. The results will include: speeding up of progress and knowledge transfer in the field, training of a larger number of effective practitioners and researchers, and better understanding of the broad field of information by a larger segment of the scholarly community.
Since the earliest days of IR, researchers have developed a variety of tools. In many cases, groups at educational institutions have used those tools to help in both graduate and undergraduate courses in which IR is covered.
IR tools developed and used by learners have come in a variety of shapes and sizes, and have been distributed through a number of mechanisms. For example, large systems - e.g., SMART, INQUIRY, ZPrise, and MG - have been widely distributed and used in a variety of institutions. Small routines, like stemming algorithms (e.g., Porter's routine), have been made available in written and electronic forms. IR test collections, available on an informal basis, through CD-ROM (e.g., on Virginia Disc 1) or over the Internet, have been repeatedly used in investigations. Books, like the Frakes and Baeza-Yates edited volume, accompanied with algorithms in on-line and CD-ROM forms, have been carefully studied. Screen dumps and visualizations, as well as a small number of online interactive demonstrations (e.g., Scorpion by OCLC), whether in collections such as that developed by E. Efthimiadis, or individually made available over WWW, have been used in classroom settings and for individual exploration.
Some institutions, like Virginia Tech, have directly used a number of tools in related courses. In particular, there are laboratory exercises using online tools, freely shared in each of a number of course modules. See for example:
In some cases students use a tool for a homework exercise, while in other cases students use tools to undertake term projects. Some students simply apply a tool, while others study the tool to understand it or even to add enhancements. Some tools are small and simply illustrate key concepts, while other tools are complex and may be viewed as black-box systems whose overall function is under investigation. Some tools are manipulated by students, while others are huge online systems that students interact with from afar. Some students implement small tools from algorithm descriptions, while other students serve as "subjects" in experiments that involve testing a tool. Clearly, tools come in all shapes and sizes, and so can be used by learners in almost any imaginable way that serves their objectives.
Very few tools are described in depth in the open literature; if they are, such as in SIGIR Forum, they are often lost to those who don't read or receive such publications. It is seldom the case that metadata descriptions of IR tools are prepared, and there are no well-known catalogs of such tools. Though the WWW has made it easier to access tools that are made available, there is no well-known forum to discuss details of tools or to get at comments on what tools to select for which purposes.
Collecting IR tools is a key concern, one that must be solved if our overall enterprise is to work. There are a number of ways to collect tools, however. We give two examples:
First, R&D groups can be encouraged to make their tools available to others. This can be urged by funding agencies, either when grants are made or through offering support for supplements to grants (that are almost complete and have yielded useful tools as deliverables).
Centers and laboratories developing tools can be urged to provide versions of their systems or other tools that can be used by learners in other sites. This has been partially practiced at locations like Cornell and U. Mass Amherst, for example, with very positive results.
Second, associations like SIGIR can have a dramatic effect on transferring new research into the classroom. A simple method is to couple the demonstration track of each sponsored conference with a follow-on effort of some education committee. Thus, all suitable demonstrations at the annual SIGIR conference could be "packaged" as a set of screen dumps, an online set of WWW pages, or even an interactive application, that then could be widely publicized by SIGIR for use by learners worldwide.
Making tools available for use by others requires work. Some of that must be done by the authors of the tools. Other work could be done by those interested in "cataloging" the tools. Yet other work could be done by reviewers who critique tools, test them for portability, and help make sure that usefulness and usability are maximized.
Having IR tools widely available could have a dramatic impact on the fields of science and technology. At NFAIS'98 (40th Anniversary of the National Federation of Abstracting and Information Services, Philadelphia, PA, Feb. 1998) it was clearly shown that key tools and services (citation services, bibliographic services, indexing services, search systems, CD-ROM databases, abstracting services, ...) developed in the information industry have played a crucial role in supporting progress in science, engineering, technology, medicine, and other areas. With the advent of the WWW, many more end-users work with these systems and others newly arrived on WWW, often with little training or understanding of basic concepts. Now that many colleges and universities are launching courses about WWW, online information, searching, and similar topics, having easily accessible materials for learners to understand this field more thoroughly could have a tremendous impact on the scholarly community at all levels.
While taking courses in the IR field is important, and is becoming more common at a growing number of institutions, this still is an unusual situation for the hundreds of thousands each year who could benefit from learning about this topic. Hence it is important that educational materials, and tools in particular, be made available over WWW in situations conducive to learning. For example, the Networked CS Technical Report Library, NCSTRL, http://www.ncstrl.org, that covers technical reports at about a hundred computing departments, could be connected to a set of tools that might illustrate advanced search methods (i.e., situated in a context found when searching for reports). Other analogous offerings, where tools are made available as a type of extended "help system," might assist users of practical systems to understand "advanced features" more thoroughly, instead of adopting defaults, as is common.
Not only will understanding concepts and tools directly help the larger community, but it also will have indirect effects. Many widely used services, such as the current generation of WWW search services, arose from small tools that were adapted to new contexts and collections. Thus, by having more tools available, there are likely to be many new services to aid the community that arise as the result of future work to further package and apply those tools.
If tools that are developed are made more widely available, it is likely that development of very similar tools will shrink. This savings of effort may lead to easier replication of studies undertaken by others, which is made difficult now because tools are often locally developed and used, but not made available to "competing" groups. Real "knowledge transfer" often requires replication of results by others, that can occur more quickly when tools are shared; further advances that build on prior work are also facilitated in such situations.
To move the IR community forward in the IR tools area, two focussed proposals are offered:
First, the IR Community is invited to participate in four new efforts to apply digital library technology to improve education in the computing field.
Given suitable support and assistance, the teams involved in the four efforts above, as well as similar projects, could help with the collection and distribution of IR tools.
Finally, for IR tools to become more central in the IR community as well as to help those in the broader world of science, engineering and technology, there is need for support by funding agencies and professional associations. Those willing to build important tools, or to extend prototypes into nicely packaged tools, or to help with the reviewing and testing of tools, need support and/or incentives. To some, a little work in this realm, e.g., reviewing for ToCECS, may occur our of loyalty. To others, a publication, such as in ToCECS, or in a SIGIR conference proceedings (maybe in a separate track), may suffice. However, large tools will require serious support, such as through grant funding. Finally, associations may encourage such work with awards (e.g., a SIGIR "tool award", similar to the annual ACM award for systems, or a "best demo award" for the SIGIR conference) that encourage this type of effort and indicate its importance for promotion, tenure, and merit decisions. In short, strong support at all levels is called for to encourage a significant strengthening in community support for IR tools.
As an example of the power of collecting contributions from a wide variety of researchers, please see the list below of pointers provided by Dragomir R. Radev of Columbia University, in response to my invitation to contribute to this workshop.