From foxea@vtvax3 Wed Sep 24 18:45:18 1986
Date: Wed, 24 Sep 86 18:45:11 edt
From: foxea@vtvax3
To: @ir.dis
Subject: IRList Digest V2 #49
Status: RO

IRList Digest           Wednesday, 24 September 1986      Volume 2 : Issue 49

Today's Topics:
   Announcement - Correction for book on software for content analysis
   Article - Software Reuse Through Information Retrieval - Part 3 of 3

News addresses are ARPANET: fox%vt@csnet-relay.arpa  BITNET: foxea@vtvax3.bitnet
   CSNET: fox@vt   UUCPNET: seismo!vtisr1!irlistrq
----------------------------------------------------------------------

Date: 1986 Sep 14   20:44 EST
From: Bob Weber   <WEBER3@HARVARDA>
     
[Extracted from CRTNET #58, 9/15/86 - Ed]

ERROR IN CONTENT ANALYSIS BOOK AND SOFTWARE FOR CONTENT ANALYSIS
     
     
In the second printing of my book, Basic Content Analysis (1985),
published by Sage in their series on quantitative methodology,
they have not corrected an important factual error which I had
asked them to fix.
     
The first paragraph on page 80 indicates that a computer
program for key-word-in-context listings is available
from the Harvard Laboratory for Computer Graphics; this
is no longer so.
     
That paragraph will be replaced with the following information
on the availability of the General Inquier computer system for
automated content analysis (the references are given in the
bibliography of the Sage book).
The replacement paragraph for first paragraph on page 80 will
state:
     
       Beginning early in 1987, the current version of the
    General Inquirer system (Kelly and Stone, 1975), the latest
    Harvard dictionary (Dunphy, et al., 1974), and the Lasswell
    Value Dictionary (Namenwirth and Weber, 1987)* will be
    distributed by ZUMA.  Distribution of software,
    dictionaries, and documentation will be on an "as is" basis
    and only for non-commercial use.  A very small fee for
    handling will be charged. Note: the General Inquirer is
    written in PL1 for IBM mainframe computers only!
     
    Interested readers should contact ZUMA, the Center For Surveys,
    Methods, and Analysis, in Mannheim, FRG:
     
     
    Computer Department
    ZUMA
    B2,1
    Postfach 5969
    D-6800 Mannheim 1
    Federal Republic of Germany
     
*Namenwirth, J. Zvi and Robert Philip Weber. 1987. Dynamics of Culture.
Winchester, MA: Allen & Unwin.
     
 Robert Philip (Bob) Weber
 Harvard University
     
 BITNET: WEBER3@HARVARDA
 ARPA:   Weber3%Harvarda.Bitnet@Wiscvm.Wisc.Edu
     
------------------------------
     
Date: Fri, 12 Sep 86 22:31:44 EDT
From: seismo!allegra!hoqam!wbf
Subject: Software Reuse through IR [ Part 3 of 3 - Ed]

Software Reuse Through Information Retrieval

          W. B. Frakes
          B. A. Nejmeh

     AT&T Bell Laboratories
   Holmdel, New Jersey 07733

[Note: sections 1-4, 5-6 appeared in the last two IRList issues - Ed]

7.  A Software Template Design to Promote Reuse

The extent to which IR technology will promote software
reuse is directly related to the quality and accuracy of the
information in its software database. That is, poor
descriptions of code capability and functionality will
decrease the probability that the code will be located for
potential reuse during the search process. Likewise, lack of
information about how to call a function, the side-effects
of the function, and the environmental requirements of the
function also increase the overhead associated with its
reuse.  We now propose a template for the descriptive
information that should be maintained for each module and
function in the code base to increase the ease with which it
can be reused.

Throughout this section we will use the terms module and
function. For our purposes, a module is a file consisting of
one or more functions. A function is as defined in the C
programming language.  We now describe the contents of
module and function prologues which we believe will increase
the probability that the code appearing in the module is
located as a candidate for reuse whenever possible.
Likewise, we believe that the information contained in each
template will reduce the amount of time required to
interface into an existing function and assure that it is
performing the necessary operations without harmful side-
effects.

Our basic premise is that every module and function must
begin with a prologue.  The contents of the prologue each
case will now be described.

7.1  Module Prologue

We endorse the following format for a module prologue.


<Top of Page>

/*
 * Module :       the name of the module.
 *
 * Description:       a concise description of what the
 *        functions contained in the file do. This
 *        description should be written with an understanding
 *        that generic inquiries into the source data base
 *        will be matched on the prose appearing in this
 *        section of the file.
 *
 * Supporting Docs:    References to supporting requirements or design
 *        documents should be given here.
 *
 * Contents:       List the functions appearing in the file in
 *        the order in which they appear, with a brief
 *        description of each function.
 *
 * Data:        List all of the global data defined in the file with
 *        a brief description of each data item.
 *
 * Environmental Requirements :List all of the hardware and software
 *         that the module requires (i.e. certain
 *         kinds of hardware, specific software
 *         libraries, etc.) to function properly.
 *
 */


7.2  Function Prologue

We endorse the following format for a function prologue.

<Top of Page>

/*
 * Function :       the name of the function.
 *
 * Author:       name, location, and phone number of developer
 *        who wrote the function.
 *
 * Date:        date the function was written
 *
 * Description:       a concise overview of the function
 *        in terms of the processing it performs. In
 *        addition, the input, output, and transformational
 *        processing performed by the function should be
 *        described.
 *
 * Usage:       List the #include files necessary to call the
 *        function.
 *
 * Parameters:       The parameters passed to the function with a
 *        description of each parameter should appear
 *        here. For pointer parameters, the object
 *        pointed to should be discussed. Finally, if
 *        the value of any parameter is changed by the
 *        function, the modification should be described.
 *
 * Externals:       All of the global variables referenced in the
 *        function, along with how their values are
 *        modified should be described here.
 *
 * Macros:       List the macros used by the function.
 *
 * Returns:       The value returned by the function should be
 *        described here. The function should be declared
 *        "void" if it does not return a value.
 *
 * Calls:       List the functions called by this function along
 *        with the modules in which the called functions
 *        appear.
 *
 * Called By:       List the functions and their corresponding files
 *        which call this function.
 *
 * Modifications:      For each change to the file, list the following
 *        information: Date, Author of Change, Description
 *        of Change, Reason for Change.
 *
 */



8.  Future Directions

Certain areas of IR research are likely to improve IR
systems as tools for managing software reuse. Despite
extensive research on IR systems, improvements have been
slow in coming, and the systems in practical use today are
quite similar to those in use in the 1960's. Such
improvements as have been observed have in general been more
due to improvements in general computing environments than
to advances in IR research per se. However the use of user
feedback [16] has given experimental improvements in
retrieval performance, as has the use of extended boolean
models [17].

A major practical problem in IR is the management of very
large databases.  Databases in existence today have already
pushed the limits of magnetic disk storage, and these
databases are growing exponentially. Storage of the source
code and documentation for projects in large corporations
will also result in very large databases. Optical storage
technology offers the ability to store gigabytes of
information on a single optical disk, thus offering a
solution to this problem. Current optical disk technology is
write once, however multiwrite technology will probably be
available within the next two years.

As IR databases become larger and larger, it becomes
difficult to search and retrieve records quickly. To address
this problem, specialized hardware to perform IR operations
has been built [18] [19].  Such hardware promises to provide
searching speeds for full text of millions of characters per
second. Specialized hardware is also needed to speed up
certain IR operations such as stemming and set processing
that are bottlenecks in current systems.

A central problem of IR has been how to represent the
meaning of text or other records in a way comprehensible to
a computer. The knowledge representation techniques used in
AI systems [20] offer promise in this direction. Oddy [21]
has used a semantic net approach to document representation,
production rules have been used to create an intelligent
thesaurus [22], and natural language systems have been used
to extract and formalize the information in medical
documents [23].

Taking these newer technologies together, it appears
probable that future IR systems for software reuse will have
capabilities for massive storage in the gigabyte range, and
specialized hardware for text searching, and set
combination. Such systems will allow better semantic
representation of records, and will provide intelligent
interfaces that will guide users in system use. Other
probable developments in IR technology can be found in Fox
[24].


9.  Conclusion

We have argued that reuse is crucial if we are to deliver
efficient, reliable, and maintainable software in a timely
manner. The lack of adequate tools to organize, search, and
retrieve reusable modules has impeded reuse. We have
proposed IR systems as the technology of choice for managing
code reuse, using the CATALOG system to demonstrate the
feasibility of this approach. We have concluded by
discussing important trends in IR research and development
likely to impact the reuse problem.


    REFERENCES

 1. DeMarco, T., Lister, T. Controlling Software Projects:
    Management, Measurement, and Evaluation, Seminar Notes,
    New York, Atlantic Systems Guild Inc., 1984.

 2. Frakes, W.B. "Term Conflation for Information
    Retrieval", in VanRijsbergen C.J. Ed. Research and
    Development in Information Retrieval Cambridge:
    Cambridge University Press, 1984.

 3. Frakes, W.B., Leighton W.J., "The Catalog Information
    Management System", Proceedings of Symposium on
    Workstations in the Future Computing Environment , AT&T
    Bell Laboratories , Naperville Il., 1985.

 4. Standish, T., "An Essay on Software Reuse", IEEE
    Transactions on Software Engineering, Vol. SE-10, Sept.
    1984.

 5. Boehm, Barry, Software Engineering Economics, Prentice-
    Hall, Englewood Cliffs N.J., 1981.

 6. Horowitz, E. and Munson, J. "An Expansive View of
    Software Reuse", IEEE Transactions on Software
    Engineering, Vol. SE-10, Sept. 1984.

 7. Frank, W.L., "What Limits to Software Gains",
    Computerworld, pp. 65-70, May 4, 1984.

 8. Grabow, P., "Software Reuse, Where Are We Going?", IEEE
    COMPSAC85, Oct. 9-11, 1985, pp.202.

 9. McNamara, D. "Japanese Software Factories", presentation
    at Computer Science Colloquium, University of
    California, Irvine, May 1983.

10. Huang, C., "Reusable Software Implementation Technology
    : A Review of the Current Practice", IEEE COMPSAC85,
    Oct. 9-11, 1985, pp.207.

11. Date, C. J., An Introduction to Database Systems, 3rd
    Ed. Reading, Mass., Addison Wesley, 1981.

12. Lancaster F. W. and Fayen, E. G. Information Retrieval
    On-Line, Los Angeles, Melville Publishing Co., 1973.

13. Salton G. and McGill M. Introduction to Modern
    Information Retrieval, New York, McGraw-Hill, 1983.

14. Crocker, S.L., Frakes, W.B., Leon, R.V., Tortorella, M.,
    "SUPER:  System Used for Prediction and Evaluation of
    Reliability", Paper read at IEEE Conference on
    Reliability of Computer Controlled Telecommunications
    Systems, 1985, at Val David, Canada.

15. Frakes, W.B., "LATTIS: A Corporate Library and
    Information System for the UNIX Environment", To appear
    in the Proceedings of the National Online Conference,
    1986.

16. Rocchio, J. J., "Relevance Feedback in Information
    Retrieval" in The SMART Retrieval System - Experiments
    in Automatic Document Processing, G. Salton Editor,
    Prentice-Hall Inc., Englewood Cliffs N.J., 1971, Chapter
    14.

17. Salton, G., Fox, E., Wu, H., "Extended Boolean
    Information Retrieval", Communications of the ACM,
    26(11): pp. 1022-1036, Nov, 1983.

18. Proceedings of the Fourth Workshop on Computer
    Architecture for Nonnumeric Processing, Syracuse, N.Y.
    1979.

19. Hollaar, L.A., "The Utah Text Retrieval Project -- A
    Status Report", in VanRijsbergen C.J. Ed. Research and
    Development in Information Retrieval Cambridge:
    Cambridge University Press, 1984.

20. Winston, Patrick Henry, Artificial Intelligence 2nd Ed.,
    Reading Mass., 1984.

21. Oddy, R. N., "Information Retrieval Through Man-Machine
    Dialogue", Journal of Documentation, 33. 1-14(1977).

22. McCune, B. et. al. "RUBRIC: A System for Rule Based
    Information Retrieval", IEEE Transactions of Software
    Engineering, 1985.

23. Sager, Naomi, "Information Structures in Texts of a
    Sublanguage", Proceedings of 44th ASIS Annual Meeting,
    Washington D.C., October 1981.

24. Fox, Christopher and Zappert, F., "Future Generation
    Information Systems", To appear in the Journal of the
    American Society for Information Science.
     
------------------------------
     
END OF IRList Digest
********************