Automatically Unshortening Links in WordPress Posts

Meta

On this site, I have the Broken Links Checker Plugin chugging away in the background. He tirelessly checks and rechecks every link in every post to find URLs that no longer work; pages sometimes just disappear.

In most cases, I’m able to use the Internet Archive Wayback Machine to find archived snapshots of the long-gone links so that the context of my writing archive remains preserved.

I also recently imported all of my old Twitter posts from the past years into my Microblog. Quite a few of those tweets contain links I shared.

At some point, Twitter started automatically shortening links to go through their service. Link shortening https://en.wikipedia.org/wiki/URL_shortening has become somewhat commonplace. Lots of companies exist to provide link shortening services (ex. bit.ly); one of their value propositions is that they provide interesting analytics about the kinds of sites people visit.

Others have written about the problems with link shorteners.

A primary concern is that link shortening creates a single point of failure on the web; this is the antithesis of the way the Internet is supposed to work. If any one of these shortening services goes down, then suddenly those short links point to nothing, effectively breaking the web. This is a real issue; it actually happens.

Furthermore, if the unshortened link goes away, then the short link obfuscates the original source, making archiving nearly impossible.

Brett Terpstra’s StretchLink is an invaluable tool that watches your clipboard for shortened links to expand in the background. However, manually going through the thousands of back posts on my blog to unshorten links by copying and pasting seems a bit obsessive and not really worth my time. Automatic cross-posting happens using IFTTT, and I don’t want to have to “fix” posts that are inbound from Twitter.

So I quickly hacked some code to automatically unshorten links in my posts. It uses a code snippet I found by Jonathon Hill and Gruber’s URL matching regex.

I noticed that the unshortened links tended to have analytics-enabling “UTM” parameters, so I strip those out as well.

A next step would be to somehow “bake” the older links using the Wayback Machine or via downloading snapshots so that they remain in an unchanged format.

Just add this code to the functions.php of your WordPress theme and you’re on your way to abandoning shortened links whenever you save or update a post.