From foxea@vtvax3 Wed Sep 24 18:45:18 1986 Date: Wed, 24 Sep 86 18:45:11 edt From: foxea@vtvax3 To: @ir.dis Subject: IRList Digest V2 #49 Status: RO IRList Digest Wednesday, 24 September 1986 Volume 2 : Issue 49 Today's Topics: Announcement - Correction for book on software for content analysis Article - Software Reuse Through Information Retrieval - Part 3 of 3 News addresses are ARPANET: fox%vt@csnet-relay.arpa BITNET: foxea@vtvax3.bitnet CSNET: fox@vt UUCPNET: seismo!vtisr1!irlistrq ---------------------------------------------------------------------- Date: 1986 Sep 14 20:44 EST From: Bob Weber [Extracted from CRTNET #58, 9/15/86 - Ed] ERROR IN CONTENT ANALYSIS BOOK AND SOFTWARE FOR CONTENT ANALYSIS In the second printing of my book, Basic Content Analysis (1985), published by Sage in their series on quantitative methodology, they have not corrected an important factual error which I had asked them to fix. The first paragraph on page 80 indicates that a computer program for key-word-in-context listings is available from the Harvard Laboratory for Computer Graphics; this is no longer so. That paragraph will be replaced with the following information on the availability of the General Inquier computer system for automated content analysis (the references are given in the bibliography of the Sage book). The replacement paragraph for first paragraph on page 80 will state: Beginning early in 1987, the current version of the General Inquirer system (Kelly and Stone, 1975), the latest Harvard dictionary (Dunphy, et al., 1974), and the Lasswell Value Dictionary (Namenwirth and Weber, 1987)* will be distributed by ZUMA. Distribution of software, dictionaries, and documentation will be on an "as is" basis and only for non-commercial use. A very small fee for handling will be charged. Note: the General Inquirer is written in PL1 for IBM mainframe computers only! Interested readers should contact ZUMA, the Center For Surveys, Methods, and Analysis, in Mannheim, FRG: Computer Department ZUMA B2,1 Postfach 5969 D-6800 Mannheim 1 Federal Republic of Germany *Namenwirth, J. Zvi and Robert Philip Weber. 1987. Dynamics of Culture. Winchester, MA: Allen & Unwin. Robert Philip (Bob) Weber Harvard University BITNET: WEBER3@HARVARDA ARPA: Weber3%Harvarda.Bitnet@Wiscvm.Wisc.Edu ------------------------------ Date: Fri, 12 Sep 86 22:31:44 EDT From: seismo!allegra!hoqam!wbf Subject: Software Reuse through IR [ Part 3 of 3 - Ed] Software Reuse Through Information Retrieval W. B. Frakes B. A. Nejmeh AT&T Bell Laboratories Holmdel, New Jersey 07733 [Note: sections 1-4, 5-6 appeared in the last two IRList issues - Ed] 7. A Software Template Design to Promote Reuse The extent to which IR technology will promote software reuse is directly related to the quality and accuracy of the information in its software database. That is, poor descriptions of code capability and functionality will decrease the probability that the code will be located for potential reuse during the search process. Likewise, lack of information about how to call a function, the side-effects of the function, and the environmental requirements of the function also increase the overhead associated with its reuse. We now propose a template for the descriptive information that should be maintained for each module and function in the code base to increase the ease with which it can be reused. Throughout this section we will use the terms module and function. For our purposes, a module is a file consisting of one or more functions. A function is as defined in the C programming language. We now describe the contents of module and function prologues which we believe will increase the probability that the code appearing in the module is located as a candidate for reuse whenever possible. Likewise, we believe that the information contained in each template will reduce the amount of time required to interface into an existing function and assure that it is performing the necessary operations without harmful side- effects. Our basic premise is that every module and function must begin with a prologue. The contents of the prologue each case will now be described. 7.1 Module Prologue We endorse the following format for a module prologue. /* * Module : the name of the module. * * Description: a concise description of what the * functions contained in the file do. This * description should be written with an understanding * that generic inquiries into the source data base * will be matched on the prose appearing in this * section of the file. * * Supporting Docs: References to supporting requirements or design * documents should be given here. * * Contents: List the functions appearing in the file in * the order in which they appear, with a brief * description of each function. * * Data: List all of the global data defined in the file with * a brief description of each data item. * * Environmental Requirements :List all of the hardware and software * that the module requires (i.e. certain * kinds of hardware, specific software * libraries, etc.) to function properly. * */ 7.2 Function Prologue We endorse the following format for a function prologue. /* * Function : the name of the function. * * Author: name, location, and phone number of developer * who wrote the function. * * Date: date the function was written * * Description: a concise overview of the function * in terms of the processing it performs. In * addition, the input, output, and transformational * processing performed by the function should be * described. * * Usage: List the #include files necessary to call the * function. * * Parameters: The parameters passed to the function with a * description of each parameter should appear * here. For pointer parameters, the object * pointed to should be discussed. Finally, if * the value of any parameter is changed by the * function, the modification should be described. * * Externals: All of the global variables referenced in the * function, along with how their values are * modified should be described here. * * Macros: List the macros used by the function. * * Returns: The value returned by the function should be * described here. The function should be declared * "void" if it does not return a value. * * Calls: List the functions called by this function along * with the modules in which the called functions * appear. * * Called By: List the functions and their corresponding files * which call this function. * * Modifications: For each change to the file, list the following * information: Date, Author of Change, Description * of Change, Reason for Change. * */ 8. Future Directions Certain areas of IR research are likely to improve IR systems as tools for managing software reuse. Despite extensive research on IR systems, improvements have been slow in coming, and the systems in practical use today are quite similar to those in use in the 1960's. Such improvements as have been observed have in general been more due to improvements in general computing environments than to advances in IR research per se. However the use of user feedback [16] has given experimental improvements in retrieval performance, as has the use of extended boolean models [17]. A major practical problem in IR is the management of very large databases. Databases in existence today have already pushed the limits of magnetic disk storage, and these databases are growing exponentially. Storage of the source code and documentation for projects in large corporations will also result in very large databases. Optical storage technology offers the ability to store gigabytes of information on a single optical disk, thus offering a solution to this problem. Current optical disk technology is write once, however multiwrite technology will probably be available within the next two years. As IR databases become larger and larger, it becomes difficult to search and retrieve records quickly. To address this problem, specialized hardware to perform IR operations has been built [18] [19]. Such hardware promises to provide searching speeds for full text of millions of characters per second. Specialized hardware is also needed to speed up certain IR operations such as stemming and set processing that are bottlenecks in current systems. A central problem of IR has been how to represent the meaning of text or other records in a way comprehensible to a computer. The knowledge representation techniques used in AI systems [20] offer promise in this direction. Oddy [21] has used a semantic net approach to document representation, production rules have been used to create an intelligent thesaurus [22], and natural language systems have been used to extract and formalize the information in medical documents [23]. Taking these newer technologies together, it appears probable that future IR systems for software reuse will have capabilities for massive storage in the gigabyte range, and specialized hardware for text searching, and set combination. Such systems will allow better semantic representation of records, and will provide intelligent interfaces that will guide users in system use. Other probable developments in IR technology can be found in Fox [24]. 9. Conclusion We have argued that reuse is crucial if we are to deliver efficient, reliable, and maintainable software in a timely manner. The lack of adequate tools to organize, search, and retrieve reusable modules has impeded reuse. We have proposed IR systems as the technology of choice for managing code reuse, using the CATALOG system to demonstrate the feasibility of this approach. We have concluded by discussing important trends in IR research and development likely to impact the reuse problem. REFERENCES 1. DeMarco, T., Lister, T. Controlling Software Projects: Management, Measurement, and Evaluation, Seminar Notes, New York, Atlantic Systems Guild Inc., 1984. 2. Frakes, W.B. "Term Conflation for Information Retrieval", in VanRijsbergen C.J. Ed. Research and Development in Information Retrieval Cambridge: Cambridge University Press, 1984. 3. Frakes, W.B., Leighton W.J., "The Catalog Information Management System", Proceedings of Symposium on Workstations in the Future Computing Environment , AT&T Bell Laboratories , Naperville Il., 1985. 4. Standish, T., "An Essay on Software Reuse", IEEE Transactions on Software Engineering, Vol. SE-10, Sept. 1984. 5. Boehm, Barry, Software Engineering Economics, Prentice- Hall, Englewood Cliffs N.J., 1981. 6. Horowitz, E. and Munson, J. "An Expansive View of Software Reuse", IEEE Transactions on Software Engineering, Vol. SE-10, Sept. 1984. 7. Frank, W.L., "What Limits to Software Gains", Computerworld, pp. 65-70, May 4, 1984. 8. Grabow, P., "Software Reuse, Where Are We Going?", IEEE COMPSAC85, Oct. 9-11, 1985, pp.202. 9. McNamara, D. "Japanese Software Factories", presentation at Computer Science Colloquium, University of California, Irvine, May 1983. 10. Huang, C., "Reusable Software Implementation Technology : A Review of the Current Practice", IEEE COMPSAC85, Oct. 9-11, 1985, pp.207. 11. Date, C. J., An Introduction to Database Systems, 3rd Ed. Reading, Mass., Addison Wesley, 1981. 12. Lancaster F. W. and Fayen, E. G. Information Retrieval On-Line, Los Angeles, Melville Publishing Co., 1973. 13. Salton G. and McGill M. Introduction to Modern Information Retrieval, New York, McGraw-Hill, 1983. 14. Crocker, S.L., Frakes, W.B., Leon, R.V., Tortorella, M., "SUPER: System Used for Prediction and Evaluation of Reliability", Paper read at IEEE Conference on Reliability of Computer Controlled Telecommunications Systems, 1985, at Val David, Canada. 15. Frakes, W.B., "LATTIS: A Corporate Library and Information System for the UNIX Environment", To appear in the Proceedings of the National Online Conference, 1986. 16. Rocchio, J. J., "Relevance Feedback in Information Retrieval" in The SMART Retrieval System - Experiments in Automatic Document Processing, G. Salton Editor, Prentice-Hall Inc., Englewood Cliffs N.J., 1971, Chapter 14. 17. Salton, G., Fox, E., Wu, H., "Extended Boolean Information Retrieval", Communications of the ACM, 26(11): pp. 1022-1036, Nov, 1983. 18. Proceedings of the Fourth Workshop on Computer Architecture for Nonnumeric Processing, Syracuse, N.Y. 1979. 19. Hollaar, L.A., "The Utah Text Retrieval Project -- A Status Report", in VanRijsbergen C.J. Ed. Research and Development in Information Retrieval Cambridge: Cambridge University Press, 1984. 20. Winston, Patrick Henry, Artificial Intelligence 2nd Ed., Reading Mass., 1984. 21. Oddy, R. N., "Information Retrieval Through Man-Machine Dialogue", Journal of Documentation, 33. 1-14(1977). 22. McCune, B. et. al. "RUBRIC: A System for Rule Based Information Retrieval", IEEE Transactions of Software Engineering, 1985. 23. Sager, Naomi, "Information Structures in Texts of a Sublanguage", Proceedings of 44th ASIS Annual Meeting, Washington D.C., October 1981. 24. Fox, Christopher and Zappert, F., "Future Generation Information Systems", To appear in the Journal of the American Society for Information Science. ------------------------------ END OF IRList Digest ********************