CS6604 - Digital Libraries
Why take CS6604?
- To prepare you for teaching in Computer, Library, or
Information Science departments/programs/schools related
to any type of "information" courses or research.
- To prepare you for working at Google, Microsoft, or any company
involved in searching, WWW, text, and/or machine learning.
- To prepare you for research involving search engines, social
media, natural language processing, text mining, classification,
clustering, indexing, and/or information seeking/exploration.
- To gain proficiency with parallel processing on clusters.
- To extend what you studied in CS5604 (Information Retrieval)
Resources
- 20+ node Hadoop Cluster with 10Gbit network connection
- Cloudera software including HBase, HDFS, Hive, Mahout,
MapReduce, Nutch, Pig, Solr, Spark, Sqoop`
- MeTA,
NLTK, and Python toolkits
References
- Free Books with Morgan and Claypool (through Library download), required:
-
Theoretical Foundations for Digital Libraries (Fox/Goncalves/Shen)
-
Key Issues Regarding Digital Libraries: Evaluation and Integration (Shen/Goncalves/Fox)
- Digital Library Technologies:
Complex Objects, Annotation, Ontologies, Classification,
Extraction, and Security (Fox/Torres, editors),
- Digital Library Applications:
CBIR, Education, Social Networks, eScience/Simulation, and GIS
(Fox/Leidig, editors)
Supplemental website
- Free (through Library download), recommended:
ChengXiang Zhai and Sean Massung. 2016. Text Data Management and
Analysis: a Practical Introduction to Information Retrieval and Text
Mining. Association for Computing Machinery and Morgan & Claypool, New
York, NY, USA.
- VTechWorks reports from CS6604 projects
- VTechWorks reports from CS5604 projects
Course Organization
- CS6604 Spring 2017 class: CRN 19209, TR, 12:30-1:45pm, McB 307, 12T -
Final 1:05-3:05pm, May 6
- Approach: seminar and team term project
- Seminar discussions will focus on key readings in the field
- Term projects in teams of 2-5 will be on topics identified by
students, or those suggested, e.g.,
- Preparing a proposal, possibly for funding by NSF and usable as a
prelim, such as for ETDseer, a digital library like CiteSeerX that
works with a very large collection of electronic theses or
dissertations
- Designing and prototyping a digital library to support research in
the behavioral sciences that includes participants working in a social
network
- Designing an integration of digital library and archive methods
that considers the temporal aspects of webpages that appear in
different versions over time, and of ongoing discussions in social
media, so that trends and changes in language are characterized
About the Instructor
- Professor Edward A. Fox, fox@vt.edu, 540-231-5113, Torg. 2160G.
Office hours are TBD or by appointment.
- Dr. Fox's 1983 Ph.D. was supervised by Prof. Gerard Salton at
Cornell University, often called "the father of information
retrieval".
- Some of the GRAs researching in the digital libraries area, working in 2030 Torg.:
- Prashant Chandrasekar, peecee@vt.edu
- Andrej Galad, agalad@vt.edu
- Islam Harb, iharb@vt.edu
- Liuqing Li, liuqing@vt.edu
Author:
Edward A. Fox (CV, directions, hours,
photo)
Curator:
Virginia Tech
; Dept.
of Computer Science
Last Updated: November 2, 2016
Email: fox@vt.edu
© Edward A. Fox 2016