Tuesday, September 07, 2010

Operators in Windows Search and Spotlight - Common and Similar

This is a narrowcast post. It's of interest to someone who ...
  1. Is a serious geek.
  2. Has to routinely find things in very large document and email collections.
  3. Uses both Windows Search (built into Vista/7, add-on for XP) and Spotlight for OS X.
If you're still reading we need to go out for a beer the next time you're in MSP. There are only 2-3 like us on earth.

In an earlier post I discussed operators in Spotlight. When I first posted I complained about the difficulty of reconciling Windows Search operators and Spotlight operators. It's tough enough to learn one set, but learning two is kinda painful.

My first impression was wrong though. It turns out that several operators work in both Spotlight and Windows Search. Below is a list of common operators, followed by a list of differing operators and conventions. I'll update both lists over time. I'm only including the ones I use, there are many more.

Common operators (work in both Windows Search and Spotlight)
  • author:
  • kind:folder
  • kind:contacts
  • kind:email
  • kind:music
  • date:>7/4/1776
  • Boolean rules with parens (AND, OR, NOT)
Differing operators (W|S)
  • Windows uses () to contain phrases, Spotlight uses quotes
  • kind:docs | kind:document
  • not available | kind:application
  • modified:3/7/08..3/10/08 | modified:3/7/08-3/10/08 (hyphen might work in Win)
I think Windows Search accepts a number of variations, so I'm going to try more OS X Spotlight operators and syntax with Windows Search and document what works. Even now, however, it's impressive how much commonality there is.
--
My Google Reader Shared items (feed)

Monday, September 06, 2010

Archiving email

In the 90s Slashdot was hot. There were no blogs, no feeds, just Slashdot and their commenting system.

Even at its peak, however, you could see the problems. There were hordes of comments on stories, but most were worthless. Good comments often arrived late, and were never ranked so never seen. Realtime before its time, and flawed in the same way realtime is now.

Slashdot is still around, but I rarely find anything novel there. Today was different. Someone asked a question I've wondered about for years ..
Ask Slashdot | Best Way To Archive Emails For Later Searching? (Anonymous)
... I have kept every email I have ever sent or received since 1990, with the exception of junk mail (though I kept a lot of that as well). I have migrated my emails faithfully from Unix mail, to Eudora, to Outlook, to Thunderbird and Entourage, though I have left much of the older stuff in Outlook PST files. To make my life easier I would now like to merge all the emails back into a single searchable archive — just because I can. 
But there are a few problems: a) Moving them between email systems is SLOW; while the data is only a few GB, it is hundred of thousands of emails and all of the email systems I have tried take forever to process the data. b) Some email systems (i.e. Outlook) become very sluggish when their database goes over a certain size. c) I don't want to leave them in a proprietary database, as within a few years the format becomes unsupported by the current generation of the software. d) I would like to be able to search the full text, keep the attachments, view HTML emails correctly and follow email chains. e) Because I use multiple operating systems, I would prefer platform independence. f) Since I hope to maintain and add emails for the foreseeable future, I would like to use some form of open standard. So, what would you recommend?'... 

I think I might still have my NCMail (Norton Commander Mail) archives, back before the public internet, when MCIMail was a great services. That was, by the way, one of the best email clients ever written.

Here are some of the suggestions, with my comments:
  • Run an IMAP server and host them there
  • Notmuch (Linux)
  • Gmail
  • MailSteward for OS X: Uses SQLite or MySQL and process mbox files from Eudora and Endourage. Works with Mail.app I'm going to see if this can process my PC Eudora files.
  • Maildir storage format uses system directories for mail folders and is indexable. It's used by Dovecot IMAP sesrver.
  • mairix - email index and search tool (unix)
Sadly, most of the comments are as worthless as I remember, except they degenerate to mod disputes faster than ever.

Incidentally, Sarbanes-Oxley means CEOs can go to jail for corporate malfeasance. This is inspiring corporate rules around email retention and especially email deletion. So the email archive management industry is spinning up.

Update: MailSteward failed Gordon's Law of Software Acquisition #4:
Inspect the uninstaller. The best apps don't need one - just delete the app. After that look for something built into the app. Then look for something that downloads with the app. If there's no installer stop immediately.
MailSteward has an Apple installer, but neither the FAQ or the Manual seem to discuss uninstallation.

That ended my MailSteward evaluation.

Google Apps aliases can stop working

I don't think this is related to recent password changes, but I just learned today that the email address for this blog wasn't working (jgordon@kateva.org).

It was configured as an alias on a Google Apps account at kateva.org. I removed the alias then restored it, now it's working again. So Google Apps aliases can stop working.

Sunday, September 05, 2010

Better Spotlight in 10.6: search current Finder folder and more

This is new in 10.6. I just read of it. For me it's one of the very best things about Snowie ...
TidBITS Problem Solving: Find Files More Easily in Mac OS X
... you can restrict Spotlight to search the current Finder folder by default, instead of This Mac. To do this, choose Finder -> Preferences, click the Advanced button, and choose Search the Current Folder from the pop-up menu. From then on, when you invoke the Finder's Find command by choosing File > Find (Command-F), searches will be limited to the current folder showing in the frontmost Finder window....
The search window in the Finder menu will also default to searching the currently selected folder.

Drives me crazy that the best features of 10.6 are bloody secrets.

Now if I could only search by file name instead of all contents...
... you can make sure the Search bar at the top of the Finder window is set to File Name without requiring an additional click. Hold down the Shift key, and choose File > Find by Name (Command-Shift-F). This command is available in both Mac OS X 10.5 Leopard and 10.6 Snow Leopard...
Auggghhhahaha! Now they tell me. I just did it. It works.

Wow. Search scoped by folder context, default to file name.... It doesn't get better than this. If only this were the default behavior ...

Yeah, search by file name can be mapped to Cmd-F -- but it requires a logout and login to work. You can also tweak the search results window layout and add a "Last Modified" column to the list view. Please read the original article and send TidBITS some love.

Oh, one last thing. Suppose you've done all of the above but you want to restrict your search to only folder names and modified after 1/1/2010. That looks like this:
kind:folder date:>1/1/2010
Yes, you can use the same sort of operators with Spotlight that you can use with Windows Search. Alas, they aren't identical, so if you do both you are more or less doomed. (I was wrong, they do overlap.)

These operators are usually described as "undocumented", the including in this excellent 7/10 CNET article. That article gives us examples like:
"Apple Computer" kind:pdf OR "Apple Computer" kind:text NOT (Google OR Yahoo OR "Microsoft Corporation")
In fact in 10.6 these features are documented in the little known OS X feature known only as "Help". (It's still not as good as Windows Help, but it no longer sucks.). Search Help on "Spotlight" and look for these Help articles:
  • Performing a Boolean or metadata search
  • Searching for specific types of items

Saturday, September 04, 2010

My Google (gmail) account is hacked - by ductus.com

9/20/10: I've updated this post to fix some errors. For example, I originally misread whois and thought tucows owned the hacked domain, they are the registrar. My longer term evaluation and responses are in a separate post.

My Gmail/Google account has a robust password. So this notice surprised me:

It showed up when I connected to Gmail. I was told my account had been accessed from an atypical location 1 day ago. The next thing I saw was that it was accessed from ductus.com (WA, IP 63.83.70.14), a domain that belonged to a software company in the 1990s. [1]

I followed the advice and changed my password. I looked into my Google store account but didn't see any new transactions or sent email.

After my password change things got a little odd. My new password wasn't recognized. I had to do a password reset (fortunately I'd followed Google's password reset advice). That worked, but it's like going to the reserve parachute. It's a very bad thing. Not to mention that I now need to change my stored Gmail/Google password in about 30 places.

Clearly something bad is going down.

The best answer is that this is a false alarm. That's bad enough.

The less best option is that either my Google password has leaked or Google has a global security issue. A dictionary attack wouldn't work on my prior password; I don't change my Google password very often (like most security professionals), but it's a robust non-word five letter four number sequence. (Now, of course, every string in my 58,000 + emails is potentially part of a dictionary attack. I will eventually need to change every password I and my family use.)

Assuming my Google password leaked, how did that happen?

I don't store my Google password with online services, but I can't rule out a leak from an old forgotten online account or a wifi intercept. I very rarely log-in on public sites, but I do log-in from work. My employer could certainly be logging my keystrokes, but it is very unlikely that my large corporate employer would take the risk of hacking my Google account via an abandoned domain (though HP did do something like that to its board members). On the other hand, we do get virus infections every few months, and I don't think we catch them all.

I do store my Google pw in several iPhone apps. Any of those could steal that password but they are all pretty high profile apps.

For now I'm redoing all my passwords everywhere. This will take weeks, but I'll start with the highest security sites. I discuss the implications and possible attacker profile in a later post.


footnotes

[1] Ductus was a company in 1998:  "Ductus, Inc. is a Mountain View, California-based company that develops and markets 2D graphics software and hardware http://www.ductus.com". So this domain was abandoned.

See also:
Update: If Google doesn't limit the number of login attempts, then my old password would be vulnerable simply because it was only 10 characters. That will fall to a brute force attack. Interestingly I can't locate any documentation on this. From my own testing I think the first time you access Google from a new location you have to enter a CAPTCHA as well as a password. If the password fails you keep getting a CAPTCHA.

Update 9/14/10 - useful links

CrashPlan Fail - you still can't remove an account

Is it obvious how to delete your account and all data and services?
This rule is always important, but it's critically important for a Cloud backup service. Do you really want to forget or lose control of a complete backup of all your unencrypted data?

CrashPlan failed this test, and others, seven months ago. Back then, during a trial period, I had my backup data on their servers. Their FAQ then didn't describe how to delete it.

I was later told you could delete it by logging in to the CrashPlan account web site, then choosing CrashPlan Central -> Destinations>Online, then selecting "Remove Backup Destination". There was no way to remove an account however.

Today I checked. My account still remains, but now that the trial period ended I don't see an option to remove my data from their servers. Perhaps the data is gone, or maybe if I paid up I'd see it reappear. I wouldn't be surprised by either option.

There's still no way to remove an account from the web site. I also noticed that "My Profile" includes "Receive promotional emails from CrashPlan.com". Hell freezes over the day I opt-in to promotional emails, so that was a sneak play.

CrashPlan is on course to crash landing. When they go into receivership their creditors will own your data. Creditors who need to recover some of their losses.

Update: See comments from CrashPlan. They tell me the data is likely overwritten, and is not recoverable after the promotional period ends. They have no plans at this time for an account removal feature, that requires email (seems a risky practice). Maybe that will change. Comments underscored the importance of client-side encryption.

See also:

VMware Virtual Machines - the backup problem

It's times like this that I really miss Byte (or BYTE?) magazine. They would have had great coverage of VMWare VMs - how they work, and what the risks are. Now that's specialist knowledge. Knowledge that, when I use Google, is obscured by a haze of marketing material.

The best we non-specialists can do is share our limited experience in blog posts, like this one sharing my experience with VM backup. That's been a problem for me.

First - my experience. I've used VMWare Fusion on my Macs for a few years. I need it less than once a month, typically to launch XPSP4 and run Access or (yech) Quicken. On the other hand, I configured and use a VMWare Workstation on a 64bit Win7 machine at work. That VM is running a Windows 2003 Server environment with terminal server and I use it very frequently.

Both my Fusion and Workstation VMs are configured to store the VM data as many files rather than a single monolithic file. Both are about 80-100 GB in size and store as little of my data as possible; on the Mac the individual .vmdk files vary in size from about 200 to 500 MB. I don't have the Workstation VM at hand but I think its files are all a fixed size.

The host OS X machine is backed up using Time Capsule (sigh) and SuperDuper! (sigh). Neither give me the warm fuzzies of Retrospect at its best. The Windows 7 machine is backed up using (Dantz -> EMC -> Roxio) Retrospect Professional.

I configured both VMs to use multiple files because of the VM backup problem I knew about.

The obvious backup problem for these machines is that if you configure a VM as one monolithic file, then every time you touch it the host system backup software has to backup a 100GB backup event. That will overload Time Machine (Capsule) or Retrospect pretty quickly. (More sophisticated backup software can manage this differently, but I don't think TM or Retrospect can.)

That's why I went with separate files. Backups would only have to manage the files that changed. (Ahh, but how does the backup software know what's changed - esp. if the files are a fixed size?)

I think that approach does work when the VM is shut down. I think it works on my Mac. It doesn't work with Retrospect Professional on the Windows 7 machine where our VM is always running.

I learned that the hard way when we tried to do a restore. The restored VM seemed good at first, but it was soon clear that we'd somehow ended up with different time slices. We had to kill the VM. Fortunately, because I'm justifiably paranoid about backup, we also had a file system backup that was only a few weeks old. Since we don't keep data on the VM we lost very little.

This is a nasty problem. As best I can tell, at least on Windows, Retrospect Professional can't do a reliable backup of a running multi-file VMWare VM. The limited VMWare marketing material I could find suggests this isn't just a Retrospect problem. The solution is, of course, to buy their costly backup software. You can also do backup from within the client OS, but that adds a new level of cost and complexity to overall backup. Retrospect Professional, for example, won't install on Windows 2003 server. For that you need their much more costly server backup.

Now you know what I know. If you know any more, or can point me to anything that's not marketing material, I'd be grateful.

I do miss Byte.

--My Google Reader Shared items (feed)