Web Archiving and Digital Libraries -- a Virtual Workshop in conjunction with
Date: June 24, 2022
We welcome broad attendance; please contact the co-chairs for any questions you may have.
Please also refer to past WADL homepages:
and 2016. Past
workshop proceedings can be found from:
Prior workshops have led in part to a special issue of the International Journal on Digital Libraries.
RegistrationSince WADL is hosted by JCDL, at least one author per paper must register at least for the workshop at:
Schedule (using EDT)
== Opening Session (Moderator Martin Klein) == 9 Welcome, Introductions, Tech Ironing (everyone speaks!) == Talks 1 (Moderator Martin Klein) == 9:30am Invited Talk 1: Karolina Holub (see below for details) 10:10 Discussion == Talks 2 (Moderator Martin Klein) == 10:30 Where are the Datasets? A case study on the German Academic Web Archive Yousef Younes, Sebastian Tiesler, Robert Jäschke and Brigitte Mathiak 10:45 Comparison of Access Patterns of Robots and Humans in Web Archives Himarsha Jayanetti, Kritika Garg, Sawood Alam, Michael Nelson and Michele Weigle 10:50 Wayback Machine Video Archiving Insights Sawood Alam, Bill O'Connor and Mark Graham 10:55 Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives Kritika Garg, Himarsha Jayanetti, Sawood Alam, Michele Weigle and Michael Nelson 11:00 Discussion == Talks 3 (Moderator Zhiwu Xie) == 11:20 Emulation-based long-term Access to Complex Web-sites Marcel Tschöpe, Rafael Gieschke and Klaus Rechert 11:35 Web Archiving as Entertainment Travis Reid, Michael Nelson and Michele Weigle 11:40 First steps in Identifying Academic Migration using Memento and Quasi-Canonicalization Mat Kelly, Deanna Zarrillo, Christopher Jackson and Erjia Yan 11:45 Discussion 12:05 (Lunch) Break == Talks 4 (Moderator Mat Kelly) == 13:00 Invited Talk 2: Carrie Pirmann and Erica Peaslee (see below for details) 13:40 Discussion == Talks 5 (Moderator Mat Kelly) == 14:00 CDX Summary for Web Archival Collection Insights Sawood Alam and Mark Graham 14:15 Russia-Ukraine News on the Dark Web Grant Atkins, Aaron Buehne, Abby Mabe, Zak Zebrowski and Justin Brunelle 14:20 Archiving Source Code in Scholarly Content: One in Five Articles References GitHub Emily Escamilla, Talya Cooper, Vicky Rampin, Martin Klein, Michele Weigle and Michael Nelson 14:25 Discussion == Talks 6 (Moderator Ed Fox) == 14:45 15m Arch-It Helge Holzmann, Nick Ruest, Jefferson Bailey, Alex Dempsey, Samantha Fritz, Ian Milligan and Kody Willis 15:00 WACZ Ed Summers, Ilya Kreymer and Cade Diehm 15:05 Moving the End of Term Web Archive to the Cloud to Encourage Research Use and Reuse Mark Phillips and Sawood Alam 15:20 Discussion Closing Session (Moderator Ed Fox) 15:40 Closing Discussion (publication and other collaboration opportunities, next event planning) 16:30 end
1. Karolina Holub, Library Adviser, Croatian Digital Library Development Centre, Croatian Institute for Librarianship
Title: A history of web archiving at the National and University Library in Zagreb
Abstract: The National and University Library in Zagreb (NSK), as a memory institution responsible for collecting all types of resources, early recognized the significance of collecting and preserving web resources as part of its core activities. In 2004, the NSK developed, in collaboration with the University of Zagreb University Computing Centre (Srce), the Croatian Web Archive (HAW). The NSK is using three different approaches and tools to archive the Croatian web. At the beginning, only selective archiving of web resources was conducted. In order to build a more comprehensive national collection, crawls of the whole national domain (.hr), thematic, and event crawls followed a few years later. This talk will present the chronology of working processes and diverse ways the NSK attempts to preserve Croatian web as a contemporary part of the cultural and scientific heritage.
Bio: Karolina Holub is a coordinator of the Croatian Digital Library Development Centre at the Croatian Institute for Librarianship in the National and University Library in Zagreb. Her field of work includes developing, implementing and maintaining digital library systems (Croatian Web Archive, Digital Collections of the National and University Library in Zagreb, Croatian electronic theses and dissertations repositories etc.) as well as taking care of metadata harmonization and interoperability with other systems for all types of resources. She is involved in managing and participating in the development of the Library’s digitization projects and thematic portals, and is involved in several national and international projects.
2. Carrie Pirmann (Bucknell University)
and Erica Peaslee (Centurion Solutions LLC)
Title: Building a Community of Web Archivers: The Race to Save Ukrainian Cultural Heritage Online
Abstract: In response to Russia’s invasion of Ukraine on 24 February 2022, over 1300 cultural heritage professionals—librarians, archivists, researchers, programmers came together to archive the web presence of Ukraine’s cultural heritage. In the proceeding 4 months, SUCHO (Saving Ukrainian Cultural Heritage Online) has digitally preserved over 40 TB of websites, databases, and other digitized cultural property to hold in trust for Ukrainian colleagues while they are working to preserve their heritage on the ground. This talk will cover the basics of coming together in a distributed grassroots response, the evolution to collaborating with heritage responders and using open-source information to guide efforts, implementing a workflow across 14+ timezones, and utilizing the Webrecorder suite of tools developed by Ilya Kreymer. We hope that the processes and lessons learned from this path-breaking project can be used to assist with responses to similar archiving emergencies and help institutions preemptively establish similar methods for future use.
Bio: Carrie Pirmann is the Social Sciences Librarian at Bucknell University (USA), working at the intersections of information literacy instruction, research support, and digital scholarship in the social sciences. She holds a master’s degree in library science from the University of Illinois, and has put her years of experience as a librarian to use for SUCHO by conducting extensive research to locate cultural heritage sites online that need to be archived, and working the Situation Monitoring team to keep abreast of situations in critical areas of Ukraine.
Bio: Erica Peaslee is the Administrative Operations Coordinator at Centurion Solutions LLC, a Disaster and Emergency Management consultancy in Texas (USA) where she also provides subject matter expertise regarding cultural heritage. Using her background in museum collections and her graduate education in Museum Studies (Harvard), she is particularly interested in centering cultural property in emergency planning and resilience, and promoting communication between the two communities. Erica currently serves as Situation Monitoring Coordinator for SUCHO, leading the observation and coordination of using real-time information from Ukraine to direct efforts to the most at-risk areas. In addition, she also works with other professionals at the intersection of cultural heritage, crime, and emergency response to coordinate and facilitate working towards similar goals.
Due to the current state of the world, WADL 2022 will be held entirely online.Please note though that JCDL 2022 is currently planned as a hybrid event and we encourage all WADL attendees to also register and attend JCDL.
WADL 2022 will continue the WADL tradition to provide a forum and collaboration platform for international leaders from academia, industry, and government to discuss challenges, and share insights, in designing and implementing concepts, tools, and standards in the realm of web archiving. Together, we will explore the integration of web archiving and digital libraries, over the complete digital resource life cycle: creation/authoring, uploading, publishing on the web, crawling/collecting, compressing, formatting, storing, preserving, analyzing, indexing, supporting access, etc.
WADL 2022 will cover all topics of interest and specifically invite contributions from practitioners. Topics include but are not limited to: