15 Nov 2005

osx spotlight metadata not working reliably enough

a while back i talked about a little app i've been hacking on tentatively called spotstamp that basically tries to expose the metadata of the file so that the user can tag on extra keywords or information about a file.

there hasn't been much action on that front lately, because i've been busy with real work, and also because i've hit a major problem with the metadata on mac osx tiger.

the problem is that the metadata store is unreliable. metadata associated with the file is not preserved over a file copy. more over, if you backup your files, because the metadata is stored in the metadata store on the volume rather than in the resource fork of a file, you cannot back it up! i could risk losing all my annotations on my file system because of a hard disk crash, even if i have perfect backups (eg. some sort of zip file that preserves resource forks).

hence i've been considering some solutions to this problem.

one, for file types like jpeg (exif), mp3s (id3) and pdfs that allow metadata to be stored, we put the data directly in the file rather than in spotlight. what this means is that there are less possible fields without abusing a free form comment field with some structured data, but at least that data is persistent.

second solution would be to do some heavy lifting in the background and keep this mirrored data store so we could do a restore for a file if the metadata does get lost. in this case, you might as well give up on probing the metadata server directly to add extra information and store it in a separate database and write an importer for spotlight we can generate the data per file if spotlight every wants to reimport the file into the database. you could also create aliases to the files and tag those instead and keep them safe.

but, the idea is to keep the metadata with the file, so the second solution would be a real big kludge. what i wish happened was that osx actually preserved metadata in the resource fork or in a way that wasn't volatile like that. i suppose that is the reason why the osx developers hid the function call to updating the metadata manually outside of the importers, because they couldn't guarantee that the data would be preserved. they'd rather that all the metadata can be generated from the file contents itself.

until i solve that problem, i'd probably not release what i have as it would just cause more frustration than good.

other interesting observations

1. in preview, if you press cmd+i you can add keywords to a document. however, that re-saves the pdf in a format where you lose the table of contents and other PDF metadata. however, these keywords don't seem to be indexed by spotlight at all. other pdf files with keywords embedded have their keywords indexed by spotlight when downloaded.

2. a 104KB pdf in preview can be bloated by 4 times if you re-save it after adding the keywords.

3. finder copies kMDItemFinderComment attribute if you copy a file inside finder. but it doesn't do it for any other attributes.

You can reply to me about this on Twitter: