 
 
   CS5604 - Information Retrieval - Fall 2020
Alternate title: Search Engines and Text Mining with Big Data
Alternate title: Applied Machine Learning (Underlying Google)
Why take CS5604?  
- To prepare you for working at Google, Microsoft, or any company
involved in machine learning, text analytics, searching, and/or WWW.
- To prepare you for research involving search engines, 
natural language processing, text mining, classification,
clustering, indexing, recommendation/personalization, 
information extraction/seeking/exploration,
social media, 
and/or 
web archiving.
- To gain proficiency with the latest software engineering practices,
including containers, Docker, Kubernetes, CI/CD.
Resources  
References
- Textbook:
  Introduction to Information Retrieval by Christopher D.
  Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008, 496 pages,
  Cambridge University Press, ISBN-10: 0521865719, ISBN-13:
  978-0521865715. See also online versions, slides, etc.
- Free (through Library download), recommended:
  ChengXiang Zhai and Sean Massung. 2016. Text Data Management and
  Analysis: a Practical Introduction to Information Retrieval and Text
  Mining. Association for Computing Machinery and Morgan & Claypool, New
  York, NY, USA.
- VTechWorks reports from CS5604 projects
Course Organization  
- CS5604 Fall 2020 class: CRN 83127, MW, 2:30-3:45pm
- Pre-requisite: a course on data structures,
  or permission of instructor
- Approach: problem/project based learning, teams, online, flipped classroom
- Goal: solve the following question: How can we best build
a state-of-the-art information retrieval and analysis system in
support of the communities interested in each of
- All the nation's electronic theses/dissertations (ETDs) -
  related to 
  an IMLS grant to VT and ODU for 8/1/2019 - 7/31/2022
- Tweets and webpages about important events, trends, and topics,
  and help University Libraries make our 
  growing collection useful
 
- The students in the class will confront this driving question,
working in teams, with the teams cooperating, as they co-design
a working system that can handle the two collections.
- There will be teams for ingesting content, indexing and
searching (with ElasticSearch), clustering, topic analysis,
and UX/interface development.
- The instructor and several GRAs working on related research
will provide guidance and assistance.
- This is one of the courses leading to a
 XCaliber Award "for making extraordinary contributions to technology
  enriched active learning".
  
About the Instructor  
- Professor Edward A. Fox, fox@vt.edu, 540-231-5113, Torg. 2160G.
Office hours are TBD, or by appointment.
-  Dr. Fox is an ACM fellow as well as an IEEE Fellow: for 
 contributions and leadership in 
 information retrieval
 and digital libraries
- Dr. Fox's 1983 Ph.D. was supervised by Prof. Gerard Salton at
 Cornell University, often called "the father of information
 retrieval".
- GRA: TBD
Author: 
Edward A. Fox (CV, directions, hours,
photo)
Curator: 
Virginia Tech
; Dept.
of Computer Science
Last Updated: June 12, 2020
Email: fox@vt.edu
©  Edward A. Fox 2020