2019.12.20; Friday December 20th, 2019: This Page is Designed to Last: A Manifesto for Preserving Content on the Web
Published by DB,
A Manifesto for Preserving Content on the Web
This Page is Designed to Last
By Jeff Huang on 2019-12-19
For a professor, the end of the year is an opportunity to clean up and reset for the upcoming new semester. I found myself clearing out old bookmarks—yes, bookmarks: that formerly beloved browser feature that seems to have lost the battle to 'address bar autocomplete'. But this nostalgic act of tidying led me to despair.
Bookmark after bookmark led to dead link after dead link. Vanished are amazing pieces of writing on kuro5hin about tech culture, and a collection of mathematical puzzles and their associated discussion by academics that my father introduced me to; gone are Woodman's Reverse Engineering tutorials from my high school years, where I first tasted the feeling of dominance over software; even my most recent bookmark, a series of posts on Google+ exposing usb-c chargers' non-compliance with the specification, disappeared.
This is more than just link rot, it's the increasing complexity of keeping alive indie content on the web, leading to a reliance on platforms and time-sorted publication formats (blogs, feeds, tweets).
Of course, I have also contributed to the problem. A paper I published 7 years ago has an abstract that includes a demo link, which has been taken over by a spammy page with a pumpkin picture on it. Part of that lapse was laziness to avoid having to renew and keep a functioning web application up year after year.
I've recommended my students to push websites to Heroku, and publish portfolios on Wix. Yet every platform with irreplaceable content dies off some day. Geocities, LiveJournal, what.cd, now Yahoo Groups. One day, Medium, Twitter, and even hosting services like GitHub Pages will be plundered then discarded when they can no longer grow or cannot find a working business model.
The problem is multi-faceted. First, content takes effort to maintain. The content may need updating to remain relevant, and will eventually have to be rehosted. A lot of content, what used to be the vast majority of content, was put up by individuals. But individuals (maybe you?) lose interest, so one day maybe you just don't want to deal with migrating a website to a new hosting provider.
Second, a growing set of libraries and frameworks are making the web more sophisticated but also more complex. First came jquery, then bootstrap, npm, angular, grunt, webpack, and more. If you are a web developer who is keeping up with the latest, then that's not a problem.
But if not, maybe you are an embedded systems programmer or startup CTO or enterprise Java developer or chemistry PhD student, sure you could probably figure out how to set up some web server and toolchain, but will you keep this up year after year, decade after decade? Probably not, and when the next year when you encounter a package dependency problem or figure out how to regenerate your html files, you might just throw your hands up and zip up the files to deal with "later". Even simple technology stacks like static site generators (e.g., Jekyll) require a workflow and will stop working at some point. You fall into npm dependency hell, and forget the command to package a release. And having a website with multiple html pages is complex; how would you know how each page links to each other? index.html.old, Copy of about.html, index.html (1), nav.html?
Third, and this has been touted by others already (and even rebutted), the disappearance of the public web in favor of mobile and web apps, walled gardens (Facebook pages), just-in-time WebSockets loading, and AMP decreases the proportion of the web on the world wide web, which now seems more like a continental web than a "world wide web".
So for these problems, what can we do about it? It's not such a simple problem that can be solved in this one article. The Wayback Machine and archive.org helps keep some content around for longer. And sometimes an altruistic individual rehosts the content elsewhere.
But the solution needs to be multi-faceted. How do we make web content that can last and be maintained for at least 10 years? As someone studying human-computer interaction, I naturally think of the stakeholders we aren't supporting. Right now putting up web content is optimized for either the professional web developer (who use the latest frameworks and workflows) or the non-tech savvy user (who use a platform).
But I think we should consider both 1) the casual web content "maintainer", someone who doesn't constantly stay up to date with the latest web technologies, which means the website needs to have low maintenance needs; 2) and the crawlers who preserve the content and personal archivers, the "archiver", which means the website should be easy to save and interpret.
So my proposal is seven unconventional guidelines in how we handle websites designed to be informative, to make them easy to maintain and preserve. The guiding intention is that the maintainer will try to keep the website up for at least 10 years, maybe even 20 or 30 years. These are not controversial views necessarily, but are aspirations that are not mainstream—a manifesto for a long-lasting website.
After doing these things, go ahead and place a bit of text in the footer, "The page was designed to last", linking to this page explaining what that means. The words promise that the maintainer will do their best to follow the ideas in this manifesto.
Before you protest, this is obviously not for web applications. If you are making an application, then make your web or mobile app with the workflow you need. I don't even know any web applications that have remained similarly functioning over 10 years (except Philip Guo's python tutor, due to his minimalist strategy for maintaining it) so it seems like a lost cause anyway. It's also not for websites maintained by an organization like Wikipedia or Twitter. You do your thing, and the salary for an IT team is probably enough to keep something alive for a while.
In fact, it's not even that important you strictly follow the 7 "rules", as they're more of a provocation than strict rules.
But let's say some small part of the web starts designing websites to last for content that is meant to last. What happens then? Well, people may prefer to link to them since they have a promise of working in the future. People more generally may be more mindful of making their pages more permanent. And users and archivers both save bandwidth when visiting and storing these pages.
The effects are long term, but the achievements are incremental and can be implemented by website owners without being dependent on anyone else or waiting for a network effect. You can do this now for your website, and that already would be a positive outcome. Like using a recycled shopping bag instead of a taking a plastic one, it's a small individual action.
This article is meant to provoke and lead to individual action, not propose a complete solution to the decaying web. It's a small simple step for a complex sociotechnical system. So I'd love to see this happen. I intend to keep this page up for at least 10 years.
Thanks to my Ph.D. students Shaun Wallace, Nediyana Daskalova, Talie Massachi, Alexandra Papoutsaki, my colleagues James Tompkin, Stephen Bach, my teaching assistant Kathleen Chai, and my research assistant Yusuf Karim for feedback on earlier drafts.