Notes by E. A. Fox for 7/27/2016 2016 NEH Summer Institute for College and University Teachers Veterans in Society: Ambiguities & Representations 10-29 July 2016 Blacksburg, VA http://www.veteransinsociety.org/ Welcome to 25 Participants! https://drive.google.com/file/d/0BxK4IVb3dnJHd2JJUk5kTHdzWDA/view Workshop on Integrated Information Collection, Archiving, and Digital Library Services Edward A. Fox, Ph.D. Professor, Computer Science; Director, Digital Library Research Laboratory Attendees: Those interested in managing information related to veterans Organization: Hands-on problem based learning session, with teams collaborating to explore events/topics of particular interest Outline: • Internet Archive, WayBack Machine, Heritrix crawler • Collecting tweets using YourTwapperKeeper • Assembling webpage collection from tweets or focused crawling • Searching/browsing tweets and webpages collected (e.g., with Solr) • Natural language processing (NLP) and machine learning enhancements: text extraction (of various types of named entities), topic identification, clustering, classification, and summarization • Information visualization • Support with Hadoop cluster, databases, and other software Acknowledgment: This will be supported by the team working on the NSF-funded project (grant IIS – 1319578): Integrated Digital Event Archiving and Library (IDEAL); see http://www.eventsarchive.org . Fox (fox@vt.edu) http://www.veteransinsociety.org/#!blank-3/fspdx Father in WWW2 but rarely discussed Until relatively soon after seeing “Saving Private Ryan” (> 1998) http://fox.cs.vt.edu/ Links to 4 DL Books with Morgan and Claypool (through Library) https://www.lib.vt.edu/; Databases . . . Exercise: download the books http://www.dlib.org/ Magazine about Digital Libraries Team today: Andrea Kavanaugh http://www.hci.vt.edu/profile.php?id=kavan@vt.edu Donald Shoemaker http://liberalarts.vt.edu/faculty-directory/sociology-faculty/donald-shoemaker.html Digital Library Research Laboratory (DLRL) http://www.dlib.vt.edu/ Mohamed Magdy Gharib Farag: focused crawling Matthew Bock: extending our information retrieval system Liuqing Li: taking over support of our Hadoop cluster (Sunshin Lee - slides and data: Hadoop cluster, geo-locating tweets) Exercise: contact a person on the team who seems interesting Team connected with Events Archiving http://www.eventsarchive.org/ DL416: NSF IIS-0736055 CTRnet: NSF IIS-0916733 IDEAL: NSF IIS-1319578 GETAR: NSF IIS-1619028 and 1619371 http://www.eventsarchive.org/sites/default/files/GETARsummaryWeb.pdf Invitation to work with us to 2020 to be aided in research with global events and trends Exercise: prepare a proposal for collaboration with GETAR Andrea Kavanaugh: her work, thoughts, and approaches Mohamed Farag: his work and demos and exercises Finished indexing two collections about veterans (one with keyword: veteran and the other with hashtag #veterans) Indexed 1000 tweets from each Extracted the URLs and downloaded the corresponding webpages Indexed their text http://nick.dlib.vt.edu:3000/ Matthew Bock: his work and thoughts Liuqing Li: his work and thoughts Donald Shoemaker: his work, thoughts, and approaches http://hadoop.dlib.vt.edu/ http://eventsarchive.org/node/12 Cluster http://hadoop.dlib.vt.edu:8088/cluster Cluster operations http://hadoop.dlib.vt.edu:82/twitter/ Big archive http://hadoop.dlib.vt.edu:81/twitter/ Recent additions http://jingluo.dlib.vt.edu/twitter/ GETAR WADL http://fox.cs.vt.edu/wadl2016.html DL curriculum - VT & UNCCH (interesting to Betty Ann Koelsch) http://curric.dlib.vt.edu/ https://en.wikiversity.org/wiki/Digital_Libraries NDLTD http://www.ndltd.org/ http://union.ndltd.org/portal/ http://search.ndltd.org/ “veteran” (2444) Choice, transition, engagement, and persistence : the experiences of female student veterans at the University of Texas at Austin Heitzman, Amy Claire 2014 http://hdl.handle.net/2152/30991 L'indemnisation du traumatisme psychique chez les vétérans : un parcours difficile Paillart Anne 2015 http://hdl.handle.net/11143/6023 Internet Archive https://archive.org/ Check out collections TV News Archive https://archive.org/details/tvnews “veteran” (957) movies / TV programs Canadian Libraries https://archive.org/details/toronto “veteran” (56) “A Peninsular Veteran” https://archive.org/details/storyofpeninsula00peni Use “Search” box there with “veteran”: https://archive.org/search.php?query=veteran Sort by Date Archived, Creator, . . . Use “Search” with “oral history” https://archive.org/search.php?query=oral%20history Then Community Audio https://archive.org/details/opensource_audio?and[]=oral%20history Exercise: find all IA content of personal interest Wayback Machine https://archive.org/web/ Try with http://www.va.gov/ Exercise: try other sites of interest, see how content changed Archive-It https://archive-it.org/ “veteran” https://archive-it.org/explore?q=veteran https://archive-it.org/explore?q=veteran&show=Collections “oral history” https://archive-it.org/collections/4383 VT collections https://archive-it.org/explore?q=virginia+tech Exercise: identify collections of interest, and particular content VTechWorks http://vtechworks.lib.vt.edu/ “veteran” Understanding and Building Effective Narrative on Veteran Experiences to Compel Program and Policy Action Dunkenberger, Mary Beth; Lo, Suzanne (2014-04) http://hdl.handle.net/10919/56361 Military Experience and the Arts: Bridging the Gap Between Military and Civilian Cultures Through Creative Expression Scholarship  Martin, Travis L. (Virginia Tech; Veterans in Society: Changing the Discourse, 2013-04-15) http://hdl.handle.net/10919/25213 Multimedia, Hypertext, and Information Access http://fox.cs.vt.edu/VTCS4624S14syllabus-extended.pdf http://vtechworks.lib.vt.edu/handle/10919/18655 Team term projects related to any course theme Exercise: propose a team project to help you Computational Linguistics http://fox.cs.vt.edu/CS4984CL.htm http://vtechworks.lib.vt.edu/handle/10919/50956 PBL around summarizing collections Information Retrieval http://vtechworks.lib.vt.edu/handle/10919/19081 PBL about building next generation search engine Digital Libraries http://vtechworks.lib.vt.edu/handle/10919/47780 Team term projects about tailored DLs Exercise: propose a team project to help you Writing Joe Moxley http://english.usf.edu/faculty/jmoxley/ MyReviewers http://myreviewers.com/ Broad areas of computer science related: Text (data, analytics, analysis, mining, processing) Documents (text, multimedia, hypertext, hypermedia) Information retrieval (searching, browsing, exploring, visualizing, indexing, summarizing, clustering, classifying, cataloging, ontologies, WWW, ranking, matching, deduping, recommending, . . .) Artificial intelligence Machine learning Exercise: pick one that you find interesting and pose questions ChengXiang Zhai and Sean Massung. 2016. Text Data Management and Analysis: a Practical Introduction to Information Retrieval and Text Mining. Association for Computing Machinery and Morgan & Claypool, New York, NY ISBN: 978-1-97000-117-4. doi>10.1145/2915031, http://dx.doi.org.ezproxy.lib.vt.edu/10.1145/2915031 Exercise: download a copy for your use Go to: https://www.lib.vt.edu/ Select Databases Search for ACM Digital Library, then ACM Books Get to this work and download PDF or ePub version Exercise: get the software that goes along with it https://meta-toolkit.org/ MeTA: ModErn Text Analysis A Modern C++ Data Sciences Toolkit Learning activity: Take one or both of the related Coursera courses: Text Retrieval and Search Engines https://www.coursera.org/learn/text-retrieval Text Mining and Analytics https://www.coursera.org/learn/text-mining For more information, see http://fox.cs.vt.edu/cv.htm including links to theses near the end http://fox.cs.vt.edu/talks/ For example, see entries in http://fox.cs.vt.edu/talks/2016/ e.g., http://fox.cs.vt.edu/talks/2016/20160211GFUURseminar.pptx http://fox.cs.vt.edu/talks/2016/20160619JCDLFoxTutorialSlides.pptx http://fox.cs.vt.edu/talks/2016/ViSlinks.txt (this file)