FAQs (frequently asked questions)
How can I participate?
Register to get a login account so we know about you. Contact us as explained on the About Us page to let us know of your data, interest, and plans for collaboration.
Who should view this site?
- Those who would like our help to archive events they find of interest, that fit within our scope.
- Those who would like to collaborate with us on research, development, or educational activities related to our project.
- Those who would like to use our search, analysis, or visualization services - some accessible from this site, some available to those who want to collaborate with us.
What kinds of events are of interest in this project?
- Events connected with crises, tragedies, and disasters -- both natural and man-made
- Events of community or governmental interest, including elections.
What kinds of data are you collecting and analyzing?
- Sources: WWW (including online news, as well as government and organizational sites), social media (e.g., Twitter), RSS feeds, scholarly publications, etc.
- Data types: HTML documents, text files, PDFs, tweet texts, videos, and images
- As of September 2015 we have over 1 billion tweets and over 12 terabytes of Web content.
How are you collecting all this data?
- Internet Archive crawls using Heritrix, typically listed at https://archive-it.org/explore?show=Collection&fc=meta_Subject:spontaneo...
- Crawls at Virginia Tech, using Heritrix or our own crawling tools (including a novel focused crawler)
- DMI-TCAT and Yourtwapperkeeper collecting of Tweets using many different event descriptions of varying scope
What data analysis techniques are you using?
Data and Text Mining, Information Visualization, Social Network Analysis, Information Retrieval, Computational Linguistics, Machine Learning, etc.
How do you manage Big Data?
- We run a local Hadoop cluster, with Cloudera software, and tools like Solr, Mahout, HBase, Hive, Pig, ...
- We are partnering with Internet Archive.
There are several projects dealing with analysis of data from the domain of crises and tragedies. What's unique about this project?
- Scope and methods:
- We focus on the many different kinds of crises and tragedies, and also on events related to communities or governments.
- We integrate digital library and archiving approaches, researching improved methods and demonstrating them with our collections and services.
What should I do if there is an emergency at Virginia Tech?
Use the Threat Assessment (http://www.threatassessment.vt.edu) and Emergency Management (http://www.emergency.vt.edu) sites.