Saturday, November 29, 2008

Using SiteSucker to backup my Blogger blogs - and my extended memory

For several years I've used Teleport Pro to create local searchable and browsable copies of my Blogger blogs.

That way, if Google falls to the The Dapocalypse I'll at least have my own copy of my extended cybernetic memory. More recently Google has added the ability to export one's blog in a google-readable format, so I do that as well.

Recently Teleport Pro ran out of gas. I hit a 65K limit for its URL database. TP has great support, and the author referred me to a $165 upgrade to their professional web spider. I've been very pleased with TPP, so if I weren't (with occasional regrets) primarily an OS X shop these days I'd pay for the upgrade.

Instead I decided to re-evaluate an OS X spider I'd tested years ago: SiteSucker for OS X. It's donationware (Paypal, sigh) and a quick download with no nasty system side-effects. I'd used it years ago, but even back then my much smaller blogs broke it. I had to set it aside.

I used it to download the site that broke Teleport Pro. It's not nearly as fast as TP Pro, and it wasn't able to handle blogger's tag links (I need to contact the author) but, overnight, it completed the download of over 15,000 separate files related to about 4,000 posts occupying 560MB of disk space (clearly the actual text is the least of the content). The download doesn't include any images, they're included by reference since I constrained the spider to my blogger path.

The first time I did the download I forgot to localize my links, so I couldn't navigate internally. The localization seems to work for some links, but not, as mentioned earlier, for the tag links.

I suspect Teleport Pro is a more robust solution -- but it's XP only and it can no longer handle my blog. Site Sucker looks very promising. I'm going to try tuning it and corresponding with the author about the tag links. If it passes my further tests I'll add configuration notes to this post and I'll be making a donation (much as I dislike using PayPal!).

1 comment:

  1. Did you actually succeed in configuring SiteSucker to download your Blogger blog to your satisfaction? I am trying the same right now, but getting mixed results.

    ReplyDelete