Rambling about Twitter archives

Saturday, December 22, 2012 – 485 views

— by manton

I'm trying out LongPosts as a place to put some rough ideas that aren't quite right for my blog. Here are some random thoughts on working with Twitter archives this week.


One of the main goals of my web app Watermark is to archive and search tweets and ADN posts, so it was natural for me to implement support for Twitter's new archive export format. I finished it last night and linked it from the Watermark account page this evening for all customers.

I had heard that Twitter's export included a CSV version before I saw the actual files, so I started work coding an importer based on that, with the assumption that I could tweak it later. Once I saw a real tweets.zip, I had to throw out most of my initial work. The CSV files have two problems:

I switched to using the JSON files and it's working well. They're JavaScript but not strictly JSON, so you just skip the first line.

Since the ZIP archive can be fairly big, instead of uploading in a web browser I let the user choose the file via Dropbox. This was a nice opportunity to try out the Dropbox Chooser. Then on the server I extract the files and load the data.

Dave Winer is doing something interesting with archives too. He's started linking up other people's archives on S3 — both the HTML view and the .zip file. I have a test Watermark account that I've loaded one of these into. It's interesting to import multiple archives and have them all merged together and searchable.

For so long we've waited for access to our old tweets. In the meantime I've shipped two products around fixing this limitation, so it's especially funny that Twitter finally rolls out archives after I've stopped posting there. (And of course I love that ADN has allowed access to your full post history from the very beginning.) Not entirely sure where all this is going to lead, but I agree with Dave Winer that new apps should be possible now.


1 Replies – 1 Reposts – 4 Stars


Discussion

Link to Conversation on ADN