Digital conversations preserved
Posted on 2011-02-21 - Comments
The web is not static. By its very nature, it changes all the time – anything you look at today might have changed, or be removed entirely tomorrow. The BBC’s recent announcements that they are shutting down, rather than simply mothballing, several of their sites, got me thinking about my online activities and their lack of persistence.
That’s when I decided to make a journal.
Tweets and Time
Twitter seemed like the perfect source for my journal. I’ve used it since 2008 and tend to update it very frequently. So I started prototyping a tool to import my tweets using the Twitter API. This was quite simple to get up and running with a Cron job running every five minutes to fetch a new page of tweets.
Currently, the Twitter API limits you from fetching more than 3200 tweets. This means that my archive only goes back to July 2009. Twitter say that this is for performance reasons, but they hope to make older tweets available some time in the future.
Of course, I’m not the only person talking on Twitter, so I also examine the tweets to check if I have replied to anyone else. If I have, I also fetch the original tweet to ensure the context is preserved.
Once the tweet import was working, it was time to do something with the data...
Snapshots of the web
I also wanted to archive the links mentioned in the tweets. Simply preserving the link wasn’t enough – what if the website shuts down, has a redesign, changes the content or removes it entirely?
I needed a system to take screenshots of the web pages mentioned in tweets. It turns out there aren’t many free systems for taking screenshots of entire web pages. Plenty for taking thumbnails of web pages, sure – but very few meeting my needs. After spending a day playing with a standalone browsershots server and not being happy with the results, I settled on a Firefox extension that can take screenshots from the command line.
Using a nifty virtual frame buffer called Xvfb, it’s easy to pop open Firefox and take a screenshot in a couple of lines of code:
Xvfb :2 -screen 0 1024x768x24& DISPLAY=:2 firefox -saveimage http://website.com -saveas /path/to/screenshot.png
Eventually, I came up with a queuing system where my remote web server imports the tweets from Twitter and sends any embedded links to my home Linux server “Fozzabox”. Every minute, Fozzabox looks for a new URL in the queue – takes a screenshot – then uploads it to the remote server hosting the journal. If my wife isn’t watching too much YouTube, it can happily crunch through one screenshot every two minutes or so.
I also built in a little extra intelligence too – before taking a screenshot, it first requests the URL in a PHP script. It examines the content type of the result and if it is already an image (rather than a web page) it’ll directly import the image – no screenshot required. Likewise, if the URL points to Flickr, TwitPic or YFrog, it will also look for and import an image directly.
With the dual-server import and screenshot system configured, I enabled the Cron jobs and sat back. Approximately 48 hours later, I had circa 4,400 tweets and 1,500 links in the archive.
HTML5 goodness (sort of)
Finally, I needed to build the web application itself. For this, I decided to try and make something really responsive, where you could browse a timeline of tweets quickly and easily.
Tweets are organised into individual days. Visiting a day will load all the tweets and replies by hitting an AJAX endpoint on my remote server. Once a day is loaded, all the associated screenshots appear as thumbnails next to the tweet.
The timeline at the top is built as an unordered list, where each list item corresponds to an individual day. Inside each li is a div tag which has its height dynamically set according to how many tweets occurred on that day. The more tweets in the database, the higher the bar.
There’s a mousemove event on the timeline allowing you to drag it around to see more dates. I also added a bit of iPhone-esque momentum thanks to a suggestion by Mark Stickley.
I added a little animation to the page in the form of the date display. When you click on a new date, it starts spinning and gradually stops at the right date when the tweets have loaded.
The date during animation (left) and after tweets have loaded (right) For this I use a series of unordered lists, each with a weekday, day, month and year respectively. Each list has the final item at the start and the end of the list – this is so that you don’t notice when the list wraps back to the start, giving the impression of a cylindrical background. The blur effect is simply a grey text-shadow applied behind the text. The date is overlaid with a div with overflow:hidden set, hiding the rest of the list. You can see what the list looks like with its container is made bigger:
On and off, it’s taken me about three weeks of tinkering in my spare time to get this up and running. I hope, over time, it’ll become more and more i nteresting to browse my conversations and thoughts – it would be a shame to lose them.
- Next generation console wishlist 2013-02-18
- Digital conversations preserved 2011-02-21
- Pastel de Nata recipe 2011-01-10
- Fixing a bricked D-Link DSL-G624T 2010-01-22
- Learning PHP - Part 6: functions 2009-11-03
- Learning PHP - Part 5: your first dynamic web page 2009-11-03
- Learning PHP - Part 4: controlling flow 2009-11-02
- Symptoms of a Wordpress hack 2009-11-02
- Learning PHP - Part 3: array basics 2009-10-26
- Learning PHP - Part 2: variable basics 2009-10-25
- Learning PHP - Part 1: introduction 2009-10-24
- Unsetting HTTP headers in PHP 2008-08-06
- Intermittent 1px gap in Firefox 3 2008-07-30
- Understanding Linux file permissions 2008-07-29
- Step by step: Moving code between Subversion repositories 2008-07-23
- Novell client on OpenSuse 10.3 2007-10-08
- Removing Windows from Apple's Bootcamp 2007-10-01
- HTTP authentication in PHP 2007-06-12
- Microformats and me 2007-06-11