Wednesday, November 26, 2025

Internet Archive Wayback Machine Link Fixer

Internet Archive (via Matt Mullenweg):

Internet Archive Wayback Machine Link Fixer is a WordPress plugin designed to combat link rot—the gradual decay of web links as pages are moved, changed, or taken down. It automatically scans your post content—on save and across existing posts—to detect outbound links. For each one, it checks the Internet Archive’s Wayback Machine for an archived version and creates a snapshot if one isn’t available.

When a linked page disappears, the plugin helps preserve your user experience by redirecting visitors to a reliable archived version. It also works proactively by archiving your own posts every time they’re updated, creating a consistent backup of your content’s history.

This is such a great idea. I’ve had it installed for a few weeks now but have mixed thoughts on the execution. The initial version had a bunch of significant bugs, and they seem to be doing a good job of fixing them. It seems to be thoughtfully designed to process a large number of old posts without overloading your server. The queueing functionality is also important because the Internet Archive’s own servers frequently go down.

The part where it submits your own posts, and the pages your post links to, to the archive seems to work well. I think this is the most important part because you can always go back and fix broken links, but you can’t go back and archive pages that weren’t archived. However, some of my posts since installing the plug-in (e.g. this one) don’t seem to have made it into the archive. This may be because the archive was down at the time of the post. Presumably, the Auto Archiver will eventually come back around and submit them again.

The part where it replaces broken links with archive links is implemented in JavaScript. I like that it doesn’t modify the post content in your database. It seems safe to install the plug-in without worrying about it messing anything up. However, I had kind of hoped that it would fix the links as part of the PHP rendering process. Doing it in JavaScript means that the fixed links are not available in the actual HTML tags on the page. And the data that the JavaScript uses is stored in an invisible <div> under the attribute data-iawmlf-post-links, which makes the page fail validation.

I have in the past manually inserted Internet Archive links when I came across links that were broken, and I thought I might use the plug-in to help with that instead of relying on the JavaScript fix-ups. However, when you set it to show broken links that are archived, I don’t see any such links. It’s currently showing me 188 pages of links where the Archive Status is “Link is excluded from being archived.” I tried sorting by Archive Status, but it still doesn’t show any that are both broken and archived.

The part where it finds broken links that are not archived is also not very useful because there are a huge number of links where it shows a 403 error even though the link still works. There doesn’t seem to be a way to separate the URLs that are genuinely gone from the ones that the Internet Archive doesn’t have permission to access.

Ashley Belanger:

Last month, the Internet Archive’s Wayback Machine archived its trillionth webpage, and the nonprofit invited its more than 1,200 library partners and 800,000 daily users to join a celebration of the moment. To honor “three decades of safeguarding the world’s online heritage,” the city of San Francisco declared October 22 to be “Internet Archive Day.”

[…]

An Internet Archive spokesperson confirmed to Ars that the archive currently faces no major lawsuits and no active threats to its collections.

Previously:

Comments RSS · Twitter · Mastodon

Leave a Comment