Web Archiving and Digital Libraries -- a Virtual Workshop in conjunction with
JCDL 2022
Date: June 24, 2022
We welcome broad attendance; please contact the co-chairs for any questions you may have.
Please see the approved WADL 2022 workshop description from the JCDL proceedings as well as the workshop page hosted by the conference.
Please also refer to past WADL homepages:
2020,
2019,
2018,
2017,
and 2016. Past
workshop proceedings can be found from:
WADL 2017-19,
Pre 2016.
Prior workshops have led in part to a special issue of the
International Journal on Digital Libraries.
Registration
Since WADL is hosted by JCDL, at least one author per paper must register at least for the workshop at:Schedule (using EDT)
== Opening Session (Moderator Martin Klein) == 9 Welcome, Introductions, Tech Ironing (everyone speaks!) == Talks 1 (Moderator Martin Klein) == 9:30am Invited Talk 1: Karolina Holub (see below for details) 10:10 Discussion == Talks 2 (Moderator Martin Klein) == 10:30 Where are the Datasets? A case study on the German Academic Web Archive Yousef Younes, Sebastian Tiesler, Robert Jäschke and Brigitte Mathiak 10:45 Comparison of Access Patterns of Robots and Humans in Web Archives Himarsha Jayanetti, Kritika Garg, Sawood Alam, Michael Nelson and Michele Weigle 10:50 Wayback Machine Video Archiving Insights Sawood Alam, Bill O'Connor and Mark Graham 10:55 Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives Kritika Garg, Himarsha Jayanetti, Sawood Alam, Michele Weigle and Michael Nelson 11:00 Discussion == Talks 3 (Moderator Zhiwu Xie) == 11:20 Emulation-based long-term Access to Complex Web-sites Marcel Tschöpe, Rafael Gieschke and Klaus Rechert 11:35 Web Archiving as Entertainment Travis Reid, Michael Nelson and Michele Weigle 11:40 First steps in Identifying Academic Migration using Memento and Quasi-Canonicalization Mat Kelly, Deanna Zarrillo, Christopher Jackson and Erjia Yan 11:45 Discussion 12:05 (Lunch) Break == Talks 4 (Moderator Mat Kelly) == 13:00 Invited Talk 2: Carrie Pirmann and Erica Peaslee (see below for details) 13:40 Discussion == Talks 5 (Moderator Mat Kelly) == 14:00 CDX Summary for Web Archival Collection Insights Sawood Alam and Mark Graham 14:15 Russia-Ukraine News on the Dark Web Grant Atkins, Aaron Buehne, Abby Mabe, Zak Zebrowski and Justin Brunelle 14:20 Archiving Source Code in Scholarly Content: One in Five Articles References GitHub Emily Escamilla, Talya Cooper, Vicky Rampin, Martin Klein, Michele Weigle and Michael Nelson 14:25 Discussion == Talks 6 (Moderator Ed Fox) == 14:45 15m Arch-It Helge Holzmann, Nick Ruest, Jefferson Bailey, Alex Dempsey, Samantha Fritz, Ian Milligan and Kody Willis 15:00 WACZ Ed Summers, Ilya Kreymer and Cade Diehm 15:05 Moving the End of Term Web Archive to the Cloud to Encourage Research Use and Reuse Mark Phillips and Sawood Alam 15:20 Discussion Closing Session (Moderator Ed Fox) 15:40 Closing Discussion (publication and other collaboration opportunities, next event planning) 16:30 end
Invited Talks
1. Karolina Holub, Library Adviser, Croatian Digital Library Development Centre, Croatian Institute for Librarianship
Title: A history of web archiving at the National and University
Library in Zagreb
Abstract: The National and University Library in Zagreb (NSK), as
a memory institution responsible for collecting all types of
resources, early recognized the significance of collecting and
preserving web resources as part of its core activities. In 2004, the
NSK developed, in collaboration with the University of Zagreb
University Computing Centre (Srce), the Croatian Web Archive (HAW).
The NSK is using three different approaches and tools to archive the
Croatian web. At the beginning, only selective archiving of web
resources was conducted. In order to build a more comprehensive
national collection, crawls of the whole national domain (.hr),
thematic, and event crawls followed a few years later.
This talk will present the chronology of working processes and diverse
ways the NSK attempts to preserve Croatian web as a contemporary part
of the cultural and scientific heritage.
Bio:
Karolina Holub is a coordinator of the Croatian Digital Library
Development Centre at the Croatian Institute for Librarianship in the
National and University Library in Zagreb. Her field of work includes
developing, implementing and maintaining digital library systems
(Croatian Web Archive, Digital Collections of the National and
University Library in Zagreb, Croatian electronic theses and
dissertations repositories etc.) as well as taking care of metadata
harmonization and interoperability with other systems for all types of
resources. She is involved in managing and participating in the
development of the Library’s digitization projects and thematic
portals, and is involved in several national and international
projects.
2. Carrie Pirmann (Bucknell University)
and Erica Peaslee (Centurion Solutions LLC)
Title:
Building a Community of Web Archivers: The Race to Save Ukrainian
Cultural Heritage Online
Abstract:
In response to Russia’s invasion of Ukraine on 24 February 2022, over
1300 cultural heritage professionals—librarians, archivists,
researchers, programmers came together to archive the web presence of
Ukraine’s cultural heritage. In the proceeding 4 months, SUCHO (Saving
Ukrainian Cultural Heritage Online) has digitally preserved over 40 TB
of websites, databases, and other digitized cultural property to hold
in trust for Ukrainian colleagues while they are working to preserve
their heritage on the ground. This talk will cover the basics of
coming together in a distributed grassroots response, the evolution to
collaborating with heritage responders and using open-source
information to guide efforts, implementing a workflow across 14+
timezones, and utilizing the Webrecorder suite of tools developed by
Ilya Kreymer. We hope that the processes and lessons learned from this
path-breaking project can be used to assist with responses to similar
archiving emergencies and help institutions preemptively establish
similar methods for future use.
Bio:
Carrie Pirmann is the Social Sciences Librarian at Bucknell University
(USA), working at the intersections of information literacy
instruction, research support, and digital scholarship in the social
sciences. She holds a master’s degree in library science from the
University of Illinois, and has put her years of experience as a
librarian to use for SUCHO by conducting extensive research to locate
cultural heritage sites online that need to be archived, and working
the Situation Monitoring team to keep abreast of situations in
critical areas of Ukraine.
Bio:
Erica Peaslee is the Administrative Operations Coordinator at
Centurion Solutions LLC, a Disaster and Emergency Management
consultancy in Texas (USA) where she also provides subject matter
expertise regarding cultural heritage. Using her background in museum
collections and her graduate education in Museum Studies (Harvard),
she is particularly interested in centering cultural property in
emergency planning and resilience, and promoting communication between
the two communities. Erica currently serves as Situation Monitoring
Coordinator for SUCHO, leading the observation and coordination of
using real-time information from Ukraine to direct efforts to the most
at-risk areas. In addition, she also works with other professionals at
the intersection of cultural heritage, crime, and emergency response
to coordinate and facilitate working towards similar goals.
Submissions:
Due to the current state of the world, WADL 2022 will be held entirely online.
Please note though that JCDL 2022 is currently planned as a hybrid event and we encourage all WADL attendees to also register and attend JCDL.WADL 2022 will continue the WADL tradition to provide a forum and collaboration platform for international leaders from academia, industry, and government to discuss challenges, and share insights, in designing and implementing concepts, tools, and standards in the realm of web archiving. Together, we will explore the integration of web archiving and digital libraries, over the complete digital resource life cycle: creation/authoring, uploading, publishing on the web, crawling/collecting, compressing, formatting, storing, preserving, analyzing, indexing, supporting access, etc.
WADL 2022 will cover all topics of interest and specifically invite contributions from practitioners. Topics include but are not limited to:
Objectives:
Workshop Co-chairs:
Program Committee: