CS6604 - Digital Libraries - Fall 2019
Why take CS6604?
- To allow you to work on your digital library related thesis or dissertation.
- To prepare you for research involving search engines, social
media, natural language processing, text mining, classification,
clustering, indexing, deep learning,
Web archiving, and/or information seeking/exploration.
- To prepare you for teaching in Computer, Library, or
Information Science departments/programs/schools related
to any type of "information" courses or research.
- To prepare you for working at Google, Microsoft, or any company
involved in searching, WWW, text, AI, and/or machine learning.
- To gain proficiency with parallel processing on clusters.
- To extend what you studied in CS5604 (Information Retrieval).
Resources
- Campus clusters with deep learning support like Cascades, Huckleberry
and software like TensorFlow or PyTorch
- 20+ node Hadoop Cluster
with 10Gbit network connection
- Cloudera software including HBase, HDFS, Hive, Mahout,
MapReduce, Nutch, Pig, Solr, Spark, Sqoop
- MeTA,
NLTK, and Python toolkits
References
- Free Books with Morgan and Claypool (through Library download), required:
-
Theoretical Foundations for Digital Libraries (Fox/Goncalves/Shen)
-
Key Issues Regarding Digital Libraries: Evaluation and Integration (Shen/Goncalves/Fox)
- Digital Library Technologies:
Complex Objects, Annotation, Ontologies, Classification,
Extraction, and Security (Fox/Torres, editors),
- Digital Library Applications:
CBIR, Education, Social Networks, eScience/Simulation, and GIS
(Fox/Leidig, editors)
Supplemental website
- Free (through Library download), recommended:
ChengXiang Zhai and Sean Massung. 2016. Text Data Management and
Analysis: a Practical Introduction to Information Retrieval and Text
Mining. Association for Computing Machinery and Morgan & Claypool, New
York, NY, USA.
- VTechWorks reports from CS6604 projects
- VTechWorks reports from CS5604 projects
Course Organization
- CS6604 Fall 2019 class: CRN 90122, TR, 11:00-12:15pm, McB 232
- Approach: seminar and team term project
- Goal: to help students progress toward their thesis or dissertation
- Seminar discussions: focused on key readings in the field selected by students
based on relevance to their interests
- Term projects: in teams of 2-5, on topics identified by students
- See the page on the most recent (Spring 2017) prior
version of this course. Note that this course in only rarely offered.
About the Instructor
- Professor Edward A. Fox, fox@vt.edu, 540-231-5113, Torg. 2160G.
Office hours are TBD or by appointment.
- Dr. Fox's 1983 Ph.D. was supervised by Prof. Gerard Salton at
Cornell University, often called "the father of information
retrieval".
- He is a Fellow of ACM, and a Fellow of IEEE, cited for contributions to digital
libraries and information retrieval, having helped found the digital library research field.
- He directs the Digital Library Research Laboratory
(see there about its members, research, etc.), which can be utilized in this course.
Author:
Edward A. Fox (CV, directions, hours, photo)
Curator:
Virginia Tech
; Dept.
of Computer Science
Last Updated: March 12, 2019
Email: fox@vt.edu
© Edward A. Fox 2019
Last updated 3/12/2019