Monday, July 12, 2004

Full text search in 10.4 and elsewhere

Daring Fireball: Spotlight on Spotlight
... Both metadata collection and full-text indexing depend on cooperating per-file-format Importers, either written by Apple or by third parties. Like Google, no matter how much text an Importer provides, Spotlight only cares about the first 100K of raw text.

Importers are fired on every file the moment it is created, saved, changed, or moved, including when files are made available through a newly mounted drive. Performance is said to be excellent in every case except network-mounted home directories, which are bedeviling on several levels and on which they’re still working.

Interesting limitation of both Google and OS X Tiger's full text indexing ignores much past 100K. That's bigger than the raw text content of most documents, but it leaves books out of the picture. For my taste it's the right choice. I hope I can choose which folders NOT to index.

I imagine I'll stay with 10.3 on my iBook -- I just don't have a big enough drive on that machine. Tiger I'll get with my next machine.

