Closed Bug 718910 Opened 13 years ago Closed 12 years ago

Consider disabling Spotlight on profile dirs

Categories

(Core :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla18

People

(Reporter: Dolske, Assigned: spectre)

References

Details

(Keywords: perf)

Attachments

(1 file)

I noticed my system acting kind of sluggishly. Lots of diskio, according to iosnoop, all coming from some "mds" framework/daemon. A bit of googling shows this seems to be a common problem for OS X users (not specific to Firefox).

I was curious exactly what was being indexed, some of the google results made vague statements about Spotlight excluding system dirs and such. I'm not sure how to dump a list of everything indexed by Spotlight, but you can search with "mdfind"... A broad search with "mdfind e" or "mdfind firefox" indicates that it's indexing a lot of stuff that seems pointless.

E.G.

   /Users/dolske/Library/Application Support/Firefox/Profiles/...
   /Users/dolske/Library/Caches/Firefox/Profiles/

This certainly isn't great for general system perf, and isn't really good for privacy either. Presumably Private Browsing won't be writing things to disk, but if you're just clearing stuff after the fact you can't really clear things that Spotlight happened to index.

The stuff I found suggests using "sudo mdutil -i off /path/to/exclude/", the need for root privilege is kind of a bummer though. :( Maybe there's some FS metadata we can set to disable indexing? (And/or ask Apple to add this to their exclusions?)

I wonder if a similar issue exists on Windows?
> Maybe there's some FS metadata we can set to disable indexing?

Not that I know of.  But I'll look into this when I get the chance (sometime in the next few weeks).

By the way, I assume the Spotlight indexes (wherever they are) require root privileges to access, and if so they themselves can't be considered a privacy/security issue.  But it'd be interesting to know if Spotlight lets one user search files in another user's home directory.
I think that doing this could potentially hurt applications such as Thunderbird.  I don't use Spotlight myself, but I remember reading something about Thunderbird supporting Spotlight integration?  (It definitely integrates with the windows indexing service).

Do you know of any application which disables indexing like that on some of its folders?  In general this seems like a bad idea to me.  The privacy argument doesn't sound compelling to me since you already have that problem with *anything else* that gets indexed on your hard drive.  It's just the same story with you having a backup solution of some sort.
Integration is much better than raw file indexing: it means we can give spotlight specific answers to searches. I don't see how this would really hurt Thuderbird or Firefox: even if spotlight found something interesting in our profile directory, how would it know to open the correct application and navigate to the useful result (bookmark/email message/whatever) if it's tucked away inside a sqlite file?
I think we can obviously do it on our cache dir. That's going to hurt spotlight more than anything else. Given that on new computers firefox cache is the largest set of files, it should be a decent perf boost for users.
(In reply to Taras Glek (:taras) from comment #4)
> I think we can obviously do it on our cache dir. That's going to hurt
> spotlight more than anything else. Given that on new computers firefox cache
> is the largest set of files, it should be a decent perf boost for users.

Sounds like a good idea.  Also, if we can do it per file, we should do it for our databases as well.  It's not like an indexing service can get anything useful out of them!
http://www.thexlab.com/faqs/stopspotlightindex.html says: """Furthermore, Spotlight will neither index nor search:

    Hidden files: Files whose names begin with a period (.).
    Invisible files: Files whose invisible file-system attribute has been enabled.
    Files within hidden or invisible folders. 
"""

If this is true, what permissions are required to mark the invisible filesystem attribute?
Brian, is there something similar we can do on windows?
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #6)
> http://www.thexlab.com/faqs/stopspotlightindex.html says: """Furthermore,
> Spotlight will neither index nor search:
> 
>     Hidden files: Files whose names begin with a period (.).
>     Invisible files: Files whose invisible file-system attribute has been
> enabled.
>     Files within hidden or invisible folders. 
> """
> 
> If this is true, what permissions are required to mark the invisible
> filesystem attribute?

For cache, we can just modify the cache filename. This will require a bump in the cache version, but we'll be doing that soon anyway. In theory that's cheaper for cache files since that will get rid of an extra syscall.
To make a directory/folder "hidden" on OS X, do the following at the command line:

chflags hidden [/path/to/dir]

Someone should try doing this on (say) the cache dir, and see if it causes problems.

We'd also need to test whether this stops Spotlight from indexing/searching that directory ... but I'm not entirely sure how to do that.
> Brian, is there something similar we can do on windows?

There is an indexing service (Service name: CiSvc) on Windows 2000 - Windows 2003 Server that is stopped by default.
But as of Windows Vista that was removed and it was replaced with Windows Search (WSearch service) which is started by default.

Regarding Windows Search (Vista and Win7):
- Started by default on both Windows 7 and Vista
- Windows 7 explicitly excludes AppData for each user automatically.
- Windows Vista does not exclude AppData for each user (so it is included)

Regarding Indexing Service:
- Disabled and stopped on Windows 2000 by default
- Not disabled but is stopped on Windows 2000 by default (it is a manually started service)
I verified the Indexing Service (the pre Win7 one) on my Windows XP, Windows 2000 and Windows Vista, and on all of them it is stopped.

It wouldn't hurt for us to set FILE_ATTRIBUTE_NOT_CONTENT_INDEXED on the folder with SetFileAttributes.
New files in the folder would automatically implicitly have this attribute.
Existing files/folder could be cleaned up as a one time operation.

I think we can tie into the Windows Search list to exclude our application data.
Indexing service I'm not sure but it would be worth looking into it for the people that use it.

I'll post some new bugs for this.
(In reply to Ehsan Akhgari [:ehsan] from comment #2)
> I think that doing this could potentially hurt applications such as
> Thunderbird.  I don't use Spotlight myself, but I remember reading something
> about Thunderbird supporting Spotlight integration?  (It definitely
> integrates with the windows indexing service).

We ship a mdimport file that knows how to read the mbox file so that the spotlight can find things in it.
 
> Do you know of any application which disables indexing like that on some of
> its folders?  In general this seems like a bad idea to me.  The privacy

No - but when you see how bad the machine reacts when spotlight has issues indexing the folder while Ff is running - you start thinking that it's a good idea.

(In reply to Taras Glek (:taras) from comment #4)
> I think we can obviously do it on our cache dir. That's going to hurt
> spotlight more than anything else. Given that on new computers firefox cache
> is the largest set of files, it should be a decent perf boost for users.

It is.

(In reply to Benjamin Smedberg  [:bsmedberg] [away 27-July until 7-Aug] from comment #3)
> Integration is much better than raw file indexing: it means we can give
> spotlight specific answers to searches. I don't see how this would really
> hurt Thuderbird or Firefox: even if spotlight found something interesting in
> our profile directory, how would it know to open the correct application and
> navigate to the useful result (bookmark/email message/whatever) if it's
> tucked away inside a sqlite file?

well spotlight tries to index it but fails.


(In reply to Steven Michaud from comment #9)

> We'd also need to test whether this stops Spotlight from indexing/searching
> that directory ... but I'm not entirely sure how to do that.

Did you try ?
Keywords: perf
>> We'd also need to test whether this stops Spotlight from
>> indexing/searching that directory ... but I'm not entirely sure how
>> to do that.
>
> Did you try ?

No.  And I'm not sure when I'll have the time to do so.

I never use Spotlight, so I don't know much about it.  For example I
don't know if it's possible to reset it's index.  So I'm not sure my
advice is worth much.

But here's what I'd suggest if someone else wants to try this:

1) "chflags hidden [cachedirectory]"

2) Create a text file in the cache directory with a unique string in it
   (possibly using uuidgen).

3) If possible, reset Spotlight's index (make it restart indexing from
   scratch).

4) Wait a few hours for the reindexing to finish.

5) Search in Spotlight for the unique string.
osxdaily.com/2011/12/30/exclude-drives-or-folders-from-spotlight-index-mac-os-x/

I have no idea if there is an official api or cli command for that.
On cli, there is |mdutil(1)|, but it only seems to work per volume.
(In reply to Steven Michaud from comment #13)
> >> We'd also need to test whether this stops Spotlight from
> >> indexing/searching that directory ... but I'm not entirely sure how
> >> to do that.
> >
> > Did you try ?
> 
> No.  And I'm not sure when I'll have the time to do so.
> 
> I never use Spotlight, so I don't know much about it.  For example I
> don't know if it's possible to reset it's index.  So I'm not sure my
> advice is worth much.
> 
> But here's what I'd suggest if someone else wants to try this:
> 
> 1) "chflags hidden [cachedirectory]"

done
 
> 2) Create a text file in the cache directory with a unique string in it
>    (possibly using uuidgen).

done
 
> 3) If possible, reset Spotlight's index (make it restart indexing from
>    scratch).

http://geekology.co.za/article/2009/02/how-to-reset-mac-osx-spotlight-data-cache-and-reindex-your-hard-drive

> 4) Wait a few hours for the reindexing to finish.

done

> 5) Search in Spotlight for the unique string.

Done and the string is not found. Ff works properly cause it was running all the time I had indexing running.
Seems like this should be as simple as a chflags(2) when the cache directory is initialized.
This is just a proof-of-concept. I haven't tested this much more than making sure it compiles.
Attachment #653905 - Flags: feedback?(smichaud)
Comment on attachment 653905 [details] [diff] [review]
Proof of concept: chflags(UF_HIDDEN) the cache parent directory

This looks fine to me.

I likewise haven't tested it.  But someone should ... someone like you :-)

Josh Aas should probably be the one to review the patch.

It's hard to imagine that there'd be any downside to this patch:  The "hidden" flag only effects whether or not a directory or file is visible from the UI.  But someone should try running with this patch for a few days, just in case.

Also, someone who knows more about Spotlight than I do should run my test from comment #13 without step 1 (i.e. on a cache directory that's not hidden).  I tried this, but still couldn't find the "unique string" in Spotlight.
Attachment #653905 - Flags: feedback?(smichaud) → feedback+
I need to confirm that the flag works on 10.4 (I bet it does secretly, but it's not defined) since that is my primary operating system. In the meantime, if someone wants to give this a go, it should "just work" the same as the above. If someone is a netwerk/ peer, is that the right place to hook into the cache service?
Comment on attachment 653905 [details] [diff] [review]
Proof of concept: chflags(UF_HIDDEN) the cache parent directory

Josh can suggest a proper place to add this code. Probably makes sense to only do this on Cache directory creation.
Attachment #653905 - Flags: feedback?(joshmoz)
Yes, but my rationale for putting it there was in case the directory already existed, so it would get tagged there too. But I don't know the vagaries of cache.
Comment on attachment 653905 [details] [diff] [review]
Proof of concept: chflags(UF_HIDDEN) the cache parent directory

This looks good to me, though I'm not as clear on where exactly this code should execute. We need a necko/cache person to verify that.
Attachment #653905 - Flags: review?(michal.novotny)
Attachment #653905 - Flags: feedback?(joshmoz)
Attachment #653905 - Flags: feedback+
Comment on attachment 653905 [details] [diff] [review]
Proof of concept: chflags(UF_HIDDEN) the cache parent directory

Review of attachment 653905 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache/nsCacheService.cpp
@@ +705,5 @@
>      if (mDiskCacheParentDirectory) {
> +#ifdef XP_MACOSX
> +        // ensure that this directory is not indexed by Spotlight
> +        // (bug 718910). it may already exist, so we "just do it."
> +        nsCAutoString cachePD;

I think we might be trying to convert this to "nsAutoCString".
Attachment #653905 - Flags: review?(michal.novotny) → review+
so can we set checkin-needed ?
Keywords: checkin-needed
(In reply to Josh Aas (Mozilla Corporation) from comment #24)
> Comment on attachment 653905 [details] [diff] [review]
> Proof of concept: chflags(UF_HIDDEN) the cache parent directory
> 
> Review of attachment 653905 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/cache/nsCacheService.cpp
> @@ +705,5 @@
> >      if (mDiskCacheParentDirectory) {
> > +#ifdef XP_MACOSX
> > +        // ensure that this directory is not indexed by Spotlight
> > +        // (bug 718910). it may already exist, so we "just do it."
> > +        nsCAutoString cachePD;
> 
> I think we might be trying to convert this to "nsAutoCString".

Please address this comment before landing this patch!
Keywords: checkin-needed
Taras, if you removed checkin-needed because of comment 26, I can do that and land this.  I only meant that comment to be a reminder!
(In reply to Ehsan Akhgari [:ehsan] from comment #27)
> Taras, if you removed checkin-needed because of comment 26, I can do that
> and land this.  I only meant that comment to be a reminder!

Please do.
https://hg.mozilla.org/integration/mozilla-inbound/rev/a742bf08f125
Assignee: nobody → spectre
Target Milestone: --- → mozilla18
https://hg.mozilla.org/mozilla-central/rev/a742bf08f125
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Depends on: 801883
Looks like the strategy adopted here (hiding the profile directory) has problems, after all.  See bug 801883.

I recommend we back this out until we've figured out out to deal with at least the "about:support" problem mentioned in bug 801883 comment #0.

If hiding the profile directory creates insurmountable problems, we'll need to choose a different strategy.
(Following up comment #31)

I jumped the gun a bit -- I'm not able to reproduce the "about:support" problem reported at bug 801883.  So let's wait to back out this bug's patch until bug 801883 is clarified.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: