dan forys

blog

Digital conversations preserved

Posted on 2011-02-21 - Comments

The web is not static. By its very nature, it changes all the time – anything you look at today might have changed, or be removed entirely tomorrow. The BBC’s recent announcements that they are shutting down, rather than simply mothballing, several of their sites, got me thinking about my online activities and their lack of persistence.

That’s when I decided to make a journal.

Tweets and Time

Twitter seemed like the perfect source for my journal. I’ve used it since 2008 and tend to update it very frequently. So I started prototyping a tool to import my tweets using the Twitter API. This was quite simple to get up and running with a Cron job running every five minutes to fetch a new page of tweets.

Currently, the Twitter API limits you from fetching more than 3200 tweets. This means that my archive only goes back to July 2009. Twitter say that this is for performance reasons, but they hope to make older tweets available some time in the future.

Of course, I’m not the only person talking on Twitter, so I also examine the tweets to check if I have replied to anyone else. If I have, I also fetch the original tweet to ensure the context is preserved.

Once the tweet import was working, it was time to do something with the data...

Snapshots of the web

I also wanted to archive the links mentioned in the tweets. Simply preserving the link wasn’t enough – what if the website shuts down, has a redesign, changes the content or removes it entirely?

I needed a system to take screenshots of the web pages mentioned in tweets. It turns out there aren’t many free systems for taking screenshots of entire web pages. Plenty for taking thumbnails of web pages, sure – but very few meeting my needs. After spending a day playing with a standalone browsershots server and not being happy with the results, I settled on a Firefox extension that can take screenshots from the command line.

Using a nifty virtual frame buffer called Xvfb, it’s easy to pop open Firefox and take a screenshot in a couple of lines of code:

Xvfb :2 -screen 0 1024x768x24&
DISPLAY=:2 firefox -saveimage http://website.com -saveas /path/to/screenshot.png

Eventually, I came up with a queuing system where my remote web server imports the tweets from Twitter and sends any embedded links to my home Linux server “Fozzabox”. Every minute, Fozzabox looks for a new URL in the queue – takes a screenshot – then uploads it to the remote server hosting the journal. If my wife isn’t watching too much YouTube, it can happily crunch through one screenshot every two minutes or so.

Photo of the Fozzabox server

I also built in a little extra intelligence too – before taking a screenshot, it first requests the URL in a PHP script. It examines the content type of the result and if it is already an image (rather than a web page) it’ll directly import the image – no screenshot required. Likewise, if the URL points to Flickr, TwitPic or YFrog, it will also look for and import an image directly.

With the dual-server import and screenshot system configured, I enabled the Cron jobs and sat back. Approximately 48 hours later, I had circa 4,400 tweets and 1,500 links in the archive.

HTML5 goodness (sort of)

Finally, I needed to build the web application itself. For this, I decided to try and make something really responsive, where you could browse a timeline of tweets quickly and easily.

Tweets are organised into individual days. Visiting a day will load all the tweets and replies by hitting an AJAX endpoint on my remote server. Once a day is loaded, all the associated screenshots appear as thumbnails next to the tweet.

The timeline at the top is built as an unordered list, where each list item corresponds to an individual day. Inside each li is a div tag which has its height dynamically set according to how many tweets occurred on that day. The more tweets in the database, the higher the bar.

Journal timeline screenshot

The timeline

There’s a mousemove event on the timeline allowing you to drag it around to see more dates. I also added a bit of iPhone-esque momentum thanks to a suggestion by Mark Stickley.

I added a little animation to the page in the form of the date display. When you click on a new date, it starts spinning and gradually stops at the right date when the tweets have loaded.

Screenshot of the journal date animation

The date during animation (left) and after tweets have loaded (right) For this I use a series of unordered lists, each with a weekday, day, month and year respectively. Each list has the final item at the start and the end of the list – this is so that you don’t notice when the list wraps back to the start, giving the impression of a cylindrical background. The blur effect is simply a grey text-shadow applied behind the text. The date is overlaid with a div with overflow:hidden set, hiding the rest of the list. You can see what the list looks like with its container is made bigger:

Screenshot of the journal dates outside their container

I have also added support for HTML5 history, albeit only functional in Chrome and Firefox 4 (at the time of writing). Each time you click on a new date, it does a history.pushState() adding the current state on to a stack and updating the page URL. This means I can update the page using pure Javascript (for speed) whilst maintaining the ability to publish and share URLs for specific days. For example, a day on my holiday to China in 2010.

Final thoughts

On and off, it’s taken me about three weeks of tinkering in my spare time to get this up and running. I hope, over time, it’ll become more and more i nteresting to browse my conversations and thoughts – it would be a shame to lose them.