Monday, September 06, 2010

Archiving email

In the 90s Slashdot was hot. There were no blogs, no feeds, just Slashdot and their commenting system.

Even at its peak, however, you could see the problems. There were hordes of comments on stories, but most were worthless. Good comments often arrived late, and were never ranked so never seen. Realtime before its time, and flawed in the same way realtime is now.

Slashdot is still around, but I rarely find anything novel there. Today was different. Someone asked a question I've wondered about for years ..
Ask Slashdot | Best Way To Archive Emails For Later Searching? (Anonymous)
... I have kept every email I have ever sent or received since 1990, with the exception of junk mail (though I kept a lot of that as well). I have migrated my emails faithfully from Unix mail, to Eudora, to Outlook, to Thunderbird and Entourage, though I have left much of the older stuff in Outlook PST files. To make my life easier I would now like to merge all the emails back into a single searchable archive — just because I can. 
But there are a few problems: a) Moving them between email systems is SLOW; while the data is only a few GB, it is hundred of thousands of emails and all of the email systems I have tried take forever to process the data. b) Some email systems (i.e. Outlook) become very sluggish when their database goes over a certain size. c) I don't want to leave them in a proprietary database, as within a few years the format becomes unsupported by the current generation of the software. d) I would like to be able to search the full text, keep the attachments, view HTML emails correctly and follow email chains. e) Because I use multiple operating systems, I would prefer platform independence. f) Since I hope to maintain and add emails for the foreseeable future, I would like to use some form of open standard. So, what would you recommend?'... 

I think I might still have my NCMail (Norton Commander Mail) archives, back before the public internet, when MCIMail was a great services. That was, by the way, one of the best email clients ever written.

Here are some of the suggestions, with my comments:
  • Run an IMAP server and host them there
  • Notmuch (Linux)
  • Gmail
  • MailSteward for OS X: Uses SQLite or MySQL and process mbox files from Eudora and Endourage. Works with I'm going to see if this can process my PC Eudora files.
  • Maildir storage format uses system directories for mail folders and is indexable. It's used by Dovecot IMAP sesrver.
  • mairix - email index and search tool (unix)
Sadly, most of the comments are as worthless as I remember, except they degenerate to mod disputes faster than ever.

Incidentally, Sarbanes-Oxley means CEOs can go to jail for corporate malfeasance. This is inspiring corporate rules around email retention and especially email deletion. So the email archive management industry is spinning up.

Update: MailSteward failed Gordon's Law of Software Acquisition #4:
Inspect the uninstaller. The best apps don't need one - just delete the app. After that look for something built into the app. Then look for something that downloads with the app. If there's no installer stop immediately.
MailSteward has an Apple installer, but neither the FAQ or the Manual seem to discuss uninstallation.

That ended my MailSteward evaluation.

1 comment:

Alec said...

FYI: Sonian offers a platform agnostic email archive that is easily searchable.