CS5604 - Information Retrieval - Fall 2020
Alternate title: Search Engines and Text Mining with Big Data
Alternate title: Applied Machine Learning (Underlying Google)
Why take CS5604?
- To prepare you for working at Google, Microsoft, or any company
involved in machine learning, text analytics, searching, and/or WWW.
- To prepare you for research involving search engines,
natural language processing, text mining, classification,
clustering, indexing, recommendation/personalization,
information extraction/seeking/exploration,
social media,
and/or
web archiving.
- To gain proficiency with the latest software engineering practices,
including containers, Docker, Kubernetes, CI/CD.
Resources
References
- Textbook:
Introduction to Information Retrieval by Christopher D.
Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008, 496 pages,
Cambridge University Press, ISBN-10: 0521865719, ISBN-13:
978-0521865715. See also online versions, slides, etc.
- Free (through Library download), recommended:
ChengXiang Zhai and Sean Massung. 2016. Text Data Management and
Analysis: a Practical Introduction to Information Retrieval and Text
Mining. Association for Computing Machinery and Morgan & Claypool, New
York, NY, USA.
- VTechWorks reports from CS5604 projects
Course Organization
- CS5604 Fall 2020 class: CRN 83127, MW, 2:30-3:45pm
- Pre-requisite: a course on data structures,
or permission of instructor
- Approach: problem/project based learning, teams, online, flipped classroom
- Goal: solve the following question: How can we best build
a state-of-the-art information retrieval and analysis system in
support of the communities interested in each of
- All the nation's electronic theses/dissertations (ETDs) -
related to
an IMLS grant to VT and ODU for 8/1/2019 - 7/31/2022
- Tweets and webpages about important events, trends, and topics,
and help University Libraries make our
growing collection useful
- The students in the class will confront this driving question,
working in teams, with the teams cooperating, as they co-design
a working system that can handle the two collections.
- There will be teams for ingesting content, indexing and
searching (with ElasticSearch), clustering, topic analysis,
and UX/interface development.
- The instructor and several GRAs working on related research
will provide guidance and assistance.
- This is one of the courses leading to a
XCaliber Award "for making extraordinary contributions to technology
enriched active learning".
About the Instructor
- Professor Edward A. Fox, fox@vt.edu, 540-231-5113, Torg. 2160G.
Office hours are TBD, or by appointment.
- Dr. Fox is an ACM fellow as well as an IEEE Fellow: for
contributions and leadership in
information retrieval
and digital libraries
- Dr. Fox's 1983 Ph.D. was supervised by Prof. Gerard Salton at
Cornell University, often called "the father of information
retrieval".
- GRA: TBD
Author:
Edward A. Fox (CV, directions, hours,
photo)
Curator:
Virginia Tech
; Dept.
of Computer Science
Last Updated: June 12, 2020
Email: fox@vt.edu
© Edward A. Fox 2020