Archive Team and Crowdsourced Digital Preservation

· Posted by Joshua in Online History

ArchiveTeam Logo
Archive Team Logo

In Digital History, Daniel Cohen and Roy Rosenzweig wrote that the growth of “the history web” has been driven as much by “grassroots historians” as by formal history institutions or credentialed professionals. Archive Team serves as an organizational website for some of this grassroots history, exposing both its promises and flaws. Describing itself as “a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage,” the site provides advice, software tools, and coordination for crowd-based campaigns to archive born-digital history such as websites and social media communities that are threatened with deletion.

Archive Team was founded in January 2009 by Jason Scott in response to the closure of a number of commercial online communities like AOL Hometown that held irreplaceable user-contributed material. Scott later described creating the project “out of anger and a feeling of powerlessness, this feeling that we were letting companies decide for us what was going to survive and what was going to die.” Rather than host digital archives itself, Archive Team offers, in Scott’s words, “a really profane, rough and tumble version of a library sciences convention” comprising a wiki where volunteers can keep track of websites threatened with closure and coordinate efforts to download backups of these sites and their content before they disappear. The resulting backups are generally uploaded to the Internet Archive or other third-party repositories that make them available to the public.

Soon after Archive Team launched, Yahoo! announced in 2009 that it would be closing GeoCities and deleting the 38 million websites its users had created there since 1994. In response, Archive Team’s volunteers set out to download as many Geocities sites as possible and released them publicly on bittorrent, gaining significant publicity for the project. Since then, the growing team has also preserved parts of other shuttered or threatened sites including Friendster, Tabblo, Posterous, Google Reader, Genealogy.com, and many others.

Although Archive Team’s early projects preserved data with little standardization and demanded a fair amount of technical expertise and communication between volunteers, it has made major improvements on both counts. Volunteers today can contribute their internet bandwidth to preserve threatened websites just by downloading VirtualBox and the Archive Team Warrior, which takes care of automatically coordinating tasks with other users and downloading and processing data for Archive Team’s ongoing preservation projects. Archive Team also now uses the WARC format for its archival files, ensuring that metadata and provenance are recorded along with preserved digital files.

Despite its improved usability and data-standardization, Archive Team remains a visibly grassroots project driven by what Scott called its “three virtues: rage, paranoia, and kleptomania” toward commercial web providers who have sometimes been irresponsible stewards of their user-contributed content. The Archive Team wiki’s irreverent tone and somewhat disorganized structure may not be inviting to all volunteers. Furthermore, the group’s aggressive stance towards downloading copyrighted or private material without permission have sometimes caused backlash. On the other hand, the site’s willingness to work outside institutional norms have given it the agility to preserve material quickly in the rapidly shifting online landscape. The group’s success in preserving digital memories has garnered awards and praise from web users who would have otherwise lost their own family photos and online scrapbooks to website closures. As Scott explained, “this is why we do it: because these people had their history taken away.” Hopefully as the Archive Team matures it can better address issues of privacy and civility while continuing to empower communities to preserve their online heritage for current and future generations.