Closed Bug 918612 Opened 10 years ago Closed 10 years ago

Mac: Firefox 24 doesn't start, or very slow startup, when home directory is on AFP network share

Categories

(Toolkit :: Storage, defect)

24 Branch
x86
macOS
defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla27
Tracking Status
firefox24 --- wontfix
firefox25 - wontfix
firefox26 - verified
firefox27 - fixed
firefox-esr24 27+ fixed
b2g-v1.2 --- fixed

People

(Reporter: msachs, Assigned: mak)

References

Details

(Keywords: perf, regression)

Attachments

(5 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:23.0) Gecko/20100101 Firefox/23.0 (Beta/Release)
Build ID: 20130814063812

Steps to reproduce:

Update from Firefox 23.0.1 to Firefox 24 (on OS X 10.6.8; home directory on an AFP server)


Actual results:

After the update; Firefox 24 hung and would not start up.  When I downgraded back to 23.0.1, I found that my preferences were corrupted.  I was able to restore my preferences from backup.  I tried running Firefox 24 from a plain vanilla user account (also located on the server), the same thing happened (including the corruption of the preferences).


Expected results:

Firefox 24 should have started up normally after the update.
Component: Untriaged → Profile: Roaming
Product: Firefox → Core
I'm wondering if this could have something to do with where your profile is stored. Maybe the update was creating a profile locally while you also have a profile on the server.
Core: Roaming is *not* about Windows roaming
Component: Profile: Roaming → Profile: BackEnd
Thanks Ben, I had the feeling I was getting it wrong there...
To answer Liz's question.  Yes, it clearly does have something to do with where my profile is stored (Firefox 24 appears to work fine for users who have their account on their local startup drive).  No, the update doesn't appear to have created anything on my local drive.  It did however corrupt my actual profile on the server; rendering it unusable when I downgraded (fortunately I had a recent backup).
Component: Profile: BackEnd → General
Summary: Firefox 24 doesn't start when home directory is on a server → Mac: Firefox 24 doesn't start when home directory is on AFP network share
> home directory on an AFP server

How have you set this up?  Please be as detailed as possible.  Especially provide as much information as you can about your server (what version of which OS does it run? what services does the server need to run?).
We have an Xserve2,1 (Quad-Core Intel Xeon) running Mac OS X Server 10.5.8.  Our User directories are on this server and accessed via AFP.  The computer that I mainly use is an iMac11,2 (Intel Core i3); running Mac OS X 10.6.8.  

Neither Safari nor Chrome appear to have any difficulty with this set up.  However, aside from the major issue I found with Firefox 24, I've also observed memory issues with earlier version of Firefox (see https://bugzilla.mozilla.org/show_bug.cgi?id=648042; which apparently is a duplicate of issue 604710).  I've found that these memory issues have become progressively worse in recent versions of Firefox.
> We have an Xserve2,1 (Quad-Core Intel Xeon) running Mac OS X Server
> 10.5.8.  Our User directories are on this server and accessed via
> AFP.

This isn't enough information.

How, precisely, is access to your User directory set up on your
(client) computer?  I need detailed, technical information -- which
you may need to get from the people who manage your
company's/university's computers.

Does your (client) computer simply do a net boot?  Or is it just the
User directory that's mounted from the server?  If the latter, how
(very precisely) is it mounted?

I also need to know how the server is set up.  For example, what
services is it running?  And is it running any special software that
doesn't come with OS X Server?

I have OS X 10.7 Server running on my home network, and can (probably)
set it up to support net booting without too much trouble.  I'm
unlikely to be able to replicate anything much more complicated than
that.
Liz, Ben and Robert:

Does this kind of setup (Mac home directories on an AFP network share) exist in any of the Mozilla offices, that you know of?
Flags: needinfo?(robert.bugzilla)
Flags: needinfo?(lhenry)
Flags: needinfo?(benjamin)
I haven't heard of it.
Me neither.
Flags: needinfo?(robert.bugzilla)
Flags: needinfo?(benjamin)
This is from out IT person:

====
Server is a 10.5.8 machine with OpenDirectory-hosted user accounts and
account homers (/Users) exported via AFP. Clients authenticate against OD
(set up through Open Directory Utility or the equivalent on various
different versions of OS X) and mount their home directories from the
Xserve via AFP. No net booting or anything of the sort.

It is a box-stock AFP/OD configuration as (badly) documented by Apple for
centralized authentication and storage. Apache, AFP, OD, some mail
services are running. The specific Apple-ized service names are AFP,
Firewall, Mail, MySQL, NFS, Open Directory and Web as listed in Server
Admin. No special software installed. We have not deviated from the Apple
directions for running this type of setup.
=======

Basically there is no net boot.  I boot normally from my client machine and then login to my user account (located on the Server).

I'm sending Steven a pdf copy of the pertinent page from our Administration Guide, which has additional details.
Thanks Marty.  I'll look at the PDF file once I get it.

The bottom line is that I need to find out how your AFP-mounted Home directories are implemented, so I can replicate that.  I don't have Active Directory, or the wherewithal to set it up (OpenDirectory is usually just a front end for Active Directory).  But with luck all I'll need is an account on the server (which I already have, since I maintain it), plus perhaps some hand-waving in the Open Directory Utility.
It was extremely painful, but I finally managed to replicate AFP-mounted home directories, and (I think) to reproduce this bug.  But FF 23.0.1 doesn't work properly, either -- it appears to be unable to access its profile.  So I may need to tweak my setup to match yours.

At least on my setup, FF 24 doesn't actually hang.  It just runs *extremely* slowly -- taking several minutes to display the first browser window.  Then it, too, seems unable to access its profile.  The error is the same in both FF 23 and 24:

"The bookmarks and history system will not be functional because one of Firefox's files is in use by another application. Some security software can cause this problem."

These problems happen regardless of whether you run Firefox from your server-mounted home directory (e.g. from your Desktop) or from the local machine (e.g. from the /Applications directory).

I created a new account in Workgroup Manager, and have been testing with that.  I did *not* also create a corresponding local account.  As best I can tell this is a "standard" account, or its equivalent -- it doesn't have administrator privileges.

Did you (or the people who run your server) need to play any special tricks to get even FF 23 to work properly on your setup?  Please check and let me know.

(By the way, I know my test account can write to its remotely-mounted home directory:  My changes to Terminal's settings are preserved between logins, for example.  So are its Desktop copies of FF 23 and FF 24.)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(lhenry)
I should mention that the client machine from which I tested runs OS X 10.7.5.
Flags: needinfo?(msachs)
If you started with FF24, it may have corrupted the profile (as I've experienced) making them inaccessible if you then downgrade to FF23.0.1.  You may need to make a fresh profile for FF23.0.1 to work.

As I mentioned earlier, there have been issues that I've experienced from FF 4.0 on, getting worse with the more recent versions (with FF23 I would get into a spinning beach-ball mode a few times a day and need to quit and restart FF).  However, prior to FF24, it more or less worked.
I actually started with FF 23.0.1, and have seen the "nonfunctional bookmarks and history" every time I run it.  Are you telling me you've never seen this error?  Just to be sure, please test with FF 23.0.1 and a clean profile.
> Just to be sure, please test with FF 23.0.1 and a clean profile.

On second thought, don't bother.

I deleted all of Firefox's profiles (in Application Support and Caches) and then started and quit FF 23 a bunch of times -- I never saw the "nonfunctional bookmarks and history" error.  I also made some settings changes which "took".

So FF 24 does seem to be messing with the profile.

Though it's odd that FF 23 never offers to import settings (which always happens the first time you run FF in "normal" circumstances).
Flags: needinfo?(msachs)
I started this before your last comment:

I did see this "nonfunctional bookmarks and history" error after updating to FF24 and then downgrading back to FF23.0.1, but never before that.  I must say that my profile (which I now have restored from a backup) were originally created years ago from a very early version of FF (they may have been derived from my old Netscape preferences).

That being said, on the plain vanilla user AFP account we have, I just deleted the existing profile and opened Firefox 23.0.1. It created a new profile and seems to be working fine.

This is from a MacBook-Air running OS X 10.7.5.

It did ask to import my Safari settings.
> It did ask to import my Safari settings.

Turns out I'd never run Safari in my AFP-mounted home directory.

After I'd done that and once again deleted all of Firefox's profile directories, FF 23.0.1 *did* ask me to import Safari's settings.  So I'm now reasonably confident my setup is functionally equivalent to yours.
Wonderful.  I hope you're able to find a fix.  Thanks!!
I've found a (I hope *the*) regression range for this bug in mozilla-central nightlies:

firefox-2013-06-06-03-11-28-mozilla-central
firefox-2013-06-07-03-10-55-mozilla-central

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=204de5b7e0a6&tochange=dc8e78ed8c44

But no patch in that range stands out as a likely cause/trigger for this bug.

Marty, please test with these two m-c nightlies, and tell us if you get the same result.  In other words, let us know if your bug happens with the 2013-06-07 nightly, but not with the 2013-06-06 nightly.
Confirmed:  the 2013-06-06 nightly opened fine.  The 2013-06-07 failed in the same manner as FF 24.  When 2013-06-06 nightly was tried after this it gave the "nonfunctional bookmarks and history" error.

I just noticed a new bug report 919943 that may be a duplicate of this one.
The only vague possibility I see in there is
73ef965a7b66	Joey Armstrong — bug 870407: move CMMSRCS to moz.build (logic). r=ted
But that seems pretty far-fetched.
My tests (with non-opt non-debug builds) show that the following patch triggered this bug:

http://hg.mozilla.org/mozilla-central/rev/ad781d8dcdc6

 Bug 878411 - define xFetch and xUnfetch methods in TelemetryVFS file objects; r=mak
author	Nathan Froyd <froydnj@mozilla.com>
	Sat Jun 01 12:26:17 2013 -0400 (at Sat Jun 01 12:26:17 2013 -0400)

Which is frankly a bit surprising.  I'll do some opt builds and see if I come up with the same result.
Yeah that actually sounds pretty likely, now that I'm reading the patch. Nathan any ideas?
Flags: needinfo?(nfroyd)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #25)
> Yeah that actually sounds pretty likely, now that I'm reading the patch.
> Nathan any ideas?

This same issue came up in bug 902486, but that was cross-OS remote profiles.

My guess is that xFetch and xUnfetch don't work properly across remote shares.  I think sqlite should be fixed, but we'd need a short-term solution.  Maybe there's an SQLite VFS for Unix that DTRT with remote drives...does setting the boolean pref storage.nfs_filesystem to true help out at all?
Flags: needinfo?(nfroyd) → needinfo?(smichaud)
> My guess is that xFetch and xUnfetch don't work properly across remote shares.

That'd be my guess, too.  But how, exactly, should I go about testing it?

Tomorrow I'll write up a detailed description of how I set up AFP-mounted home directories -- for those who want to try it themselves, and for my own future reference.

> does setting the boolean pref storage.nfs_filesystem to true help out at all?

I'll try this, tomorrow.
Flags: needinfo?(smichaud)
(Following up comment #24)

> I'll do some opt builds and see if I come up with the same result.

I get the same result with opt builds.
Depends on: 878411
> does setting the boolean pref storage.nfs_filesystem to true help out at all?

Yes!  It cleared the problem right up.

You probably want to try this, Marty.  Note that the setting doesn't already exist -- you need to create it in about:config.
Steven, you'll have to give me instructions on this.  While I've changed settings in about:config, I've never previously created a new preference and it's not 100% obvious to me as to how to do this.
The following should work:

1) On the about:config page, type "storage" in the search box.  This should limit the number of items displayed, so that it doesn't fill the entire page.
2) Right-click in the blank part of the page and choose New : Boolean.
3) Enter "storage.nfs_filesystem" as the preference name.
4) Choose "true" from the list of possible values.

The new setting should now have been added to the list of settings displayed.
Ok, it seemed to work with my plain vanilla account.  FF 24 opened immediately.  

However, when I tried it with my personal account, I updated the preferences and then did the Firefox update.  It seemed to go better than previously and I got to the checking add-ons for compatibility.  It seemed then to go into an infinite loop.  After several minutes, I pressed cancel.  It opened a window and there were no error messages.  However, my bookmarks were missing and it was behaving weirdly.  I'm back to 23.0.1. with my backed-up profile.
> I updated the preferences and then did the Firefox update

I assume you updated Firefox from inside Firefox.

I'll try this with FF 23.0.1 (with vanilla settings), and see what happens.  But, whatever I find, I suspect these problems are another bug.
Yes, in both cases I updated Firefox from within Firefox (i.e., selecting 'About Firefox' and letting the update happen.
There *might* be code that uses sqlite before the profile loads *after* an update though I don't know of any. It is likely that if there is a problem after an update it is after the update has completed and during startup.
I played a little more with FF 24 on the plain vanilla AFP user.  Yes, it opens and I can connect to various web sites.  However, if I attempt to bookmark a page; nothing happens. If I look at history, there is nothing listed.  So, the 'boolean pref storage.nfs_filesystem to true' setting only partially fixes this problem.
> Tomorrow I'll write up a detailed description of how I set up AFP-mounted
> home directories

Here they are.
Marty, the weirdness you saw (no history, can't create bookmarks) seems to be a side effect of setting storage.nfs_filesystem to true.  I see it when I

1) Delete all Firefox/Mozilla/org.mozilla.xxx directories in ~/Library/Application Support and ~/Library/Caches
2) Run FF 23.0.1 and set storage.nfs_filesystem to true in about:config
3) Quit FF 23.0.1 and restart

So it's not part of this bug.  But it *is* a side effect of what's so far the only known cure.
Steve,  Yes, I see that it does affect FF 23.0.1 too.  So, it was three steps forward and two steps back.

So, for now, I'll be staying with FF 23.0.1 with the about:config settings back to the way they were prior to adding the 'set storage.nfs_filesystem to true' setting.
I think there are various issues here.

First bug 719952 shows that we have a long story of problems on remote shares, basically most remote FS are bogus and Sqlite makes its best, but won't be able to work flawlessy cause the FS lies to it.

Second, I think something regressed storage.nfs_filesystem functionality, what this pref does it enabling the unix-excl VFS in Sqlite, that means only a process at a time can access a given database. Could be some other process is accessing the database before the main process? I was wondering about background thumbnailing, but it's just a random guess.

Third, I think it's the second time I see confirmation of issues with xFetch xUnfetch. we don't use the mmap feature of Sqlite but regardless it seems to do some of those calls on startup.
Sqlite uses xFetch and xUnfetch only if the vfs sqlite3_io_methods version is 3 or greater, so a simple workaround may be to set our version to 2 (and change the assert) so we tell Sqlite we don't support them.
It may also be interesting to introduce storage.afp_filesystem that will use the "unix-afp" VFS, that is a specific locking system for AFP.
Component: General → Storage
Product: Core → Toolkit
(In reply to Marco Bonardo [:mak] from comment #40)
> Third, I think it's the second time I see confirmation of issues with xFetch
> xUnfetch. we don't use the mmap feature of Sqlite but regardless it seems to
> do some of those calls on startup.
> Sqlite uses xFetch and xUnfetch only if the vfs sqlite3_io_methods version
> is 3 or greater, so a simple workaround may be to set our version to 2 (and
> change the assert) so we tell Sqlite we don't support them.

ISTR that our version of sqlite doesn't do proper checking before invoking xFetch and xUnfetch and that was why we needed bug 878411.  Upgrading SQLite would help here.

> It may also be interesting to introduce storage.afp_filesystem that will use
> the "unix-afp" VFS, that is a specific locking system for AFP.

If this unix-afp VFS already exists, I can write a patch for that.
Marco, if you want to talk history of problems with remote shares, see Bug 417037 and Bug 497792
(In reply to Nathan Froyd (:froydnj) from comment #41)
> ISTR that our version of sqlite doesn't do proper checking before invoking
> xFetch and xUnfetch and that was why we needed bug 878411.  Upgrading SQLite
> would help here.

I'm going to review the upgrade to Sqlite 3.8.0.2 now. Though what I see in https://bugzilla.mozilla.org/show_bug.cgi?id=878411#c1 is that Sqlite won't invoke them when they are null pointers (our crash!), doesn't look like the case here. If the underlying vfs defines them, they will be invoked, since we just forward.

> 
> > It may also be interesting to introduce storage.afp_filesystem that will use
> > the "unix-afp" VFS, that is a specific locking system for AFP.
> 
> If this unix-afp VFS already exists, I can write a patch for that.

unix-afp and unix-nfs do both exist, but Sqlite should automatically select the best one on Mac. The last thing I found online though, is that code has been contributed by Apple, but it's currently somehow unmaintained, so it's hard to tell whether it works as expected.
Would be interesting to check if it is selecting the right VFS (unix-afp) when accessing a database on AFP with a debugger, so if there's a bug in the detection or a bug in the AFP vfs.

Probably we could also introduce a more generic storage.locking option accepting "auto" (default), "none", "dotfile", "excl" options. The dotfile locking should be the most compatible one (even if slower) and in the worst case one may just disable locking (at his own risk) if he's sure only one process will access the database at any time. The old nfs_filesystem would survive only for backwards compatibility as an alias to locking = "excl". What do you think?
(In reply to Marty Sachs from comment #42)
> Marco, if you want to talk history of problems with remote shares, see Bug
> 417037 and Bug 497792

yes, one of those patches added SQLITE_ENABLE_LOCKING_STYLE=1 that is exactly the Apple contributed code I was talking about. So, it should select unix-afp automatically and try to use it.
(In reply to Marco Bonardo [:mak] from comment #43)
> > If this unix-afp VFS already exists, I can write a patch for that.
> 
> unix-afp and unix-nfs do both exist, but Sqlite should automatically select
> the best one on Mac. The last thing I found online though, is that code has
> been contributed by Apple, but it's currently somehow unmaintained, so it's
> hard to tell whether it works as expected.
> Would be interesting to check if it is selecting the right VFS (unix-afp)
> when accessing a database on AFP with a debugger, so if there's a bug in the
> detection or a bug in the AFP vfs.

That would be interesting.  We have checks in TelemetryVFS to make sure we're using the "excl" filesystem on Unix and failing if we're not...

> Probably we could also introduce a more generic storage.locking option
> accepting "auto" (default), "none", "dotfile", "excl" options. The dotfile
> locking should be the most compatible one (even if slower) and in the worst
> case one may just disable locking (at his own risk) if he's sure only one
> process will access the database at any time. The old nfs_filesystem would
> survive only for backwards compatibility as an alias to locking = "excl".
> What do you think?

What is this buying us over the current option?  Seems like if we're going to have non-default options like this, we should keep them as simple as possible.
(In reply to Nathan Froyd (:froydnj) from comment #45)
> (In reply to Marco Bonardo [:mak] from comment #43)
> > > If this unix-afp VFS already exists, I can write a patch for that.
> That would be interesting.  We have checks in TelemetryVFS to make sure
> we're using the "excl" filesystem on Unix and failing if we're not...

only if nfs_filesystem is set AIUI, that preference overwrites the default.
By default "unix" uses autolockIoFinder, that tries to guess the best VFS for the underlying FS (posix, afp or nfs.
The code is here (warning, slooow) http://mxr.mozilla.org/mozilla-central/source/db/sqlite3/src/sqlite3.c#28105 maybe Steven may look if it's doing the right system calls to detect the FS and support it.

> What is this buying us over the current option?  Seems like if we're going
> to have non-default options like this, we should keep them as simple as
> possible.

well, it's exposing more possibilities, like dotfile or none, currently we only have excl, that doesn't seem to really help. Though, I now must notice that dotfile would be problematic in case of crashes, we should indeed remove all of the lock files on unclean shutdown.

Let's start by figuring if unix-afp is properly selected and used, we can keep other options for later.
interestingly enough, afpIoMethods is a version 1 IO methods, xFetch/xUnfetch should not even be invoked, though we define them, and forward them to the underlying VFS. I wonder if the fact we publish ourselves as V3 IO methods, but then wrap a V1 IO methods is confusing the caller (as "I just did an xFetch so I can now do this" when xFetch has been a no-op).

We may have to revert to the old method of copying the sub iVersion and just asser if that version is > than the last version we were aware of.
I corrected one mistake and added information about enabling SSL on the Open Directory LDAP server.
Attachment #810085 - Attachment is obsolete: true
I finally tried updating FF 23.0.1 from "Check for Updates" in the About Firefox dialog.  It gets stuck forever "Checking for updates".

I did this after deleting all the Firefox profile directories, then setting setting storage.nfs_filesystem to true in about:config.  So this seems to be another bad side effect of that setting.
(In reply to Steven Michaud from comment #49)
> I did this after deleting all the Firefox profile directories, then setting
> setting storage.nfs_filesystem to true in about:config.  So this seems to be
> another bad side effect of that setting.

is it possible that in that situation we start 2 processes and both try to access the database? May you log that somehow?
Another report on French support board Geckozone: http://forums.mozfr.org/viewtopic.php?f=5&t=115047 (master Xserve server sharing Open Directory service and registering users and their profiles)
(In reply to comment #50)

> is it possible that in that situation we start 2 processes and both
> try to access the database? May you log that somehow?

I've no idea how to do this.

But if you tell me where in the code to add printf statements, I can
do that and report back.
It's probably too much to ask ordinary developers to acquire a copy of OS X Server, install it on a dedicated machine (or in a VM), then follow my instructions to set up AFP-mounted home directories.  So I think Mozilla needs to set up a server and client, both on the same network, that Mozilla employees (at least) can connect to remotely (via VPN and screen sharing).

I could open a bug for this.  But I've never done it before, and so have no idea how.
Just tested with storage.nfs_filesystem == true in FF 23.0.1 on OS X 10.7.5 in a non-remotely-mounted (i.e. local) home directory.  The problems I reported above with that setting (in an AFP-mounted home directory) don't happen in this case.

History : Show All History doesn't fail to display any history, it's possible to bookmark the current page, and "Check for updates" (in the About dialog) doesn't get stuck forever "Checking for updates".
(In reply to Steven Michaud from comment #53)
> I could open a bug for this.  But I've never done it before, and so have no
> idea how.

mschifer - how would you like this handled?
Flags: needinfo?(mschifer)
I want to hook up here too. We have nearly the same configuration, our server runs OS X Server 10.6.8, Clients run 10.6 to 10.8.
We experienced the same things on our network clients.

The so called Home Network Folder links the homefolder ("~/" or "/Users/username") to a network share (like "/Network/192.168.1.1/Users/username"). This ist where ~/Library/Application Support/Firefox will reside.
I don't know if this happens transparent to the application.

All roaming clients (using MacBooks i.e.) do not have any problems here, because their homefolder will be synchronised only on logon and logoff.

If you need any further information or assistance feel free to contact me. We would be VERY happy to see this problem solved as soon as possible.

Thanks in advance.
Keywords: qawanted
from comment 53, I don't think there is anything QA can do here without a server to test against.  leaving needinfo from marc but removing qawanted for now. please add it back, if/when a testing environment is made available.
Keywords: qawanted
"is made available" is pretty passive. Isn't it part of QA responsibility to find or build the necessary testing environment, or work with outside partners such as riechie from comment 56 to get the necessary testing?
Keywords: qawanted
marc is needinfo'd on the the testing infrastucture.  I can certainly help coordinate testing efforts with partners.
I seem to remember that we do (or did) make testing machines available to "developers" over VPN.  If we still do that, maybe those people could help out here.

I've already given partial setup instructions above.  If need be I could write up a fuller set of instructions -- one that covers everything from installing OS X Server, through minimal setup instructions tailored to this bug.
Steven, yes, instructions would be much appreciated.
This affects such a tiny portion of users, and we're so late in beta 25 that it's not really something we'd block future releases on. QA can get this test env set up and add this to their test plans and a low risk uplift will be considered when ready, but no tracking needed.
We're seeing the same bug here. I've been following this thread and would very much appreciate a fix in future versions of FF. We have several hundred users with network homes here.
(In reply to mnapier from comment #63)
> We're seeing the same bug here. I've been following this thread and would
> very much appreciate a fix in future versions of FF. We have several hundred
> users with network homes here.

Same problem Here

Config : 
Server : XServe under MAC OS X 10.7.5
Clients : iMac 21 " (year 2011) under MAC OS X 10.7.5

I tried to add storage.nfs_filesystem under 23.01 version, but it hangs as soon as I enter an URL

I also tried Firefox 25.0b7 that does not fix the problem.

Hope future version will solve the problem.
Please, no more me-too comments.

We realize that, even if this bug only effects several thousand (or tens of thousands) of users, it's a show-stopper for them.  Furthermore this bug should be relatively easy to fix, since it's 100% reproducible.

Where we're stuck now is setting up a testbed to which developers can be given access who know the code where this bug exists.  Even there we don't need advice -- I (for one) already know more or less what needs to be done.  We just need Mozilla's right and left and gripping hands to work better together :-)
> Steven, yes, instructions would be much appreciated.

I'm working on it.  With luck I should have them later today.

I'd hoped to have the work done Friday.  But I made a mistake and had to wipe my test partition.  I'll start over today.
> Please, no more me-too comments.

If you urgently want this bug fixed, the best way to get our attention is to "vote" for it.
SERVER : OSX SERVER 10.5.8 
CLIENT : OSX 10.8.5 + Firefox 24


Reverting Firefox to 17.06 ESR resolve hang problem. (i have not try other version of firefox)

I have voted ;)


Regards
Please vote *without* commenting :-)

Otherwise this bug will get filled to the brim with me-too comments, and will become much harder to follow.
Here, as promised, are soup-to-nuts instructions for setting up AFP-mounted home directories, starting with the initial configuration of OS X Server.

It turns out the current version (OS X Server 2.2 aka OS X 10.8.5 Server) makes it easy to add SSL support to the Open Directory LDAP server.  So these instructions are specifically for that version (on the server side), and no others.  If you're going to set up support for AFP-mounted home directories on a dedicated partition/VM (which seems to be the best approach), there's no reason to use any other version of OS X Server.

Tracy, are you going to be doing the work on this?  If so, please let me know if you have any questions (on IRC or in email).

I expect you'll want to set up both a client and a server somewhere on the Mozilla internal network.  Mozilla employees will then be able to connect to them over VPN.  Make sure both are set up to allow SSH logins and screen sharing.  Both client and server need to be fairly beefy machines.  If they're to be separate machines, I think recent Mac Minis would do.  But if you have a *very* beefy desktop (like a recent Mac Pro with at least 16GB of RAM), I think you could run both as VMs on that single machine.

Theoretically it'd be possible to only set up a server on the Mozilla internal network, and have the client be on the developer's desk, connecting over VPN to its remote home directory.  But I expect that'd be extremely slow (and painful).
Attachment #810614 - Attachment is obsolete: true
Steven, thank you very much for the instructions.  I do not have access to the hardware required to set this up locally.  

Marc, what do you think here?
What hardware do we need to set this up?
Flags: needinfo?(mschifer)
> What hardware do we need to set this up?

See comment #70 for my recommendations.

You'll also need software (OS X 10.8 client and OS X Server 2.2 for the server; OS X 10.8 or 10.7 for the client).  But Mozilla has at least one official subscription to Apple's Mac Developer Center, and everything we need is available from that.
Ah yes, I will see about acquiring some mac mini's and put them down in the QA Lab.
(In reply to Marc Schifer [mschifer] from comment #74)
> Ah yes, I will see about acquiring some mac mini's and put them down in the
> QA Lab.

Thanks Marc, setting you as QA Contact for the time being.
QA Contact: mschifer
I am seeing issues here which are at least peripherally related if not intricately overlapping.

https://bugzilla.mozilla.org/show_bug.cgi?id=926987
Attached patch vfs.diffSplinter Review
Steven, would you mind trying this patch please? we briefly discussed about this during the summit, this is what I had in mind.
Attachment #818119 - Flags: feedback?(smichaud)
Comment on attachment 818119 [details] [diff] [review]
vfs.diff

I'll try it tomorrow.

And if it works for me I'll post a tryserver build for others to try.
Comment on attachment 818119 [details] [diff] [review]
vfs.diff

The results of my brief tests were very good!  With this patch (on current trunk) the browser starts up reasonably quickly (with no more delay, as best I can tell, than FF 23.0.1).  I have no problems setting bookmarks, and I can view history.

I'll start an all-platform tryserver run.  It will probably take all day for the automated tests to finish, but builds for testing should be available in a few hours.

Tracy, if you haven't already started setting up this bug's testbed, I'd put the work aside for a while to see what results we get from other testers (once the tryserver builds are available).
Attachment #818119 - Flags: feedback?(smichaud) → feedback+
Tryserver builds and test results will eventually be available here:
https://tbpl.mozilla.org/?tree=Try&rev=be4650747659
I've noticed a problem that's unrelated to Marco's patch (since it also happens with FF 23.0.1):

When you run either of these across the network (e.g. from your remotely-mounted Desktop), they quickly start using 100% of a CPU (as measured by 'top').  This doesn't happen if you run them locally (e.g. from the /Applications/ directory), or if you run Safari across the network.  So Firefox must be continuously (and wastefully) accessing things in its own bundle, in a way that Safari doesn't.

I don't know what to do about this, or if we even care about it.  In any case it's a different bug.

I mention it because testers may notice this happening with tryserver builds containing Marco's patch, and blame it on the patch -- which would be wrong.
Yes, as mentioned early on, I've been complaining about this bug since FF 4 see: bug 648042 (which apparently is a duplicate of bug 604710). 

It's gotten progressively worse with each release since then.  We certainly would very much appreciate this being fixed too.
(In reply to comment #82)

So, Marty, you always run FF from across the network (e.g. from your Desktop directory), and not locally (from the /Applications/ directory)?
does the storage.nfs_filesystem pref make a difference related to the cpu problem? Does it now function properly (with the patch) or does it still cause lockup?
(In further reply to comment #82)

I just found that FF (the nightly with Marco's patch) can start using 100% of a CPU even when run locally -- if you let it stay running for a while.  This does sound like what you report at bug 648042.  (I see from bug 648042 comment #0 that even then you were using a remotely-mounted home directory.)

Let's stop further discussion of this here -- since it's not related.  Once I've had a chance to do more investigation I'll open a new bug (or bugs) on these issues.  I'll be sure to CC you, and I'll mention the bug here.
(In reply to comment #84)

I'll try once again setting storage.nfs_filesystem to true, for testing with your patch, and see what happens.
(In reply to comment #83)

At work, yes.  Our group has their home directories on the Server.

(In reply to comment #85)

While you're stopping further discussion here, I hope that this can be picked up and worked on under bug 648042 or bug 604710.  Thanks!
> Our group has their home directories on the Server.

I may be restating the obvious here, but it's not clear you understand it:

Even when your home directory is remotely-mounted, your machine's /Applications/ directory is *not* remotely mounted.  Your ~/Desktop directory is under/inside your home directory; your machine's /Applications/ directory isn't.
(In reply to Steven Michaud from comment #88)

> I may be restating the obvious here, but it's not clear you understand it:
> 
> Even when your home directory is remotely-mounted, your machine's
> /Applications/ directory is *not* remotely mounted.  Your ~/Desktop
> directory is under/inside your home directory; your machine's /Applications/
> directory isn't.


That is correct.  My /Applications/ directory is *not* remotely mounted. It is on whichever client Mac I'm using.  My ~/Desktop directory is under/inside my home directory which is located on the remote Server.  So, sorry for the confusion, the Firefox application is run from my client Mac; my profile is in my home directory, which is accessed remotely from the server.
(In further reply to comment #84)

Yes, Marco, setting storage.nfs_filesystem to true reduces CPU usage to "normal" values when running either FF 23.0.1 or the build with your patch over the network (i.e. from the remotely-mounted home direstory's Desktop directory).  But then I once again can't view history or set a bookmark.

I'll open a new bug on this issue (as I promised above), and try running Benoit Girard's profiler without the storage.nfs_filesystem pref.  That should reveal what's doing all the bundle accesses.
Now I can no longer repro the 100%-CPU-when-running-FF-over-the-network problem, even without the storage.nfs_filesystem setting (and a reboot).  Sigh :-(

Let's put the matter aside until I have more time to deal with it -- next week or the week after.
> Tryserver builds and test results will eventually be available here:
> https://tbpl.mozilla.org/?tree=Try&rev=be4650747659

The tests are incomplete, but looking very good so far.

And here's a Mac test build containing Marco's patch from comment #77:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com-be4650747659/try-macosx64/firefox-27.0a1.en-US.mac.dmg

Please test with it, Marty Sachs and others, and let us know your results.
(In reply to Steven Michaud from comment #92)
> Please test with it, Marty Sachs and others, and let us know your results.

Steven, It's looking very good.  The preexisting memory problems are still there, but this build seems to work at least as well as FF 23.0.1.  I tried it on my plain vanilla user and from my user account.
Test environnement : 
osx server home folder served by afp
osx client

- i have deleted all firefox setting (in Application Support/Firefox  and Caches/Firefox
- i have downloaded firefox 24 on the desktop and launch from desktop folder => firefox hang
- i have deleted firefox 24 and all setting (in Application Support/Firefox and Caches/Firefox
- i have downloaded http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com-be4650747659/try-macosx64/firefox-27.0a1.en-US.mac.dmg and put in the desktop
- i have launched  firefox-27.0a1.en-US from desktop folder and firefox work good. 

I have not try to launch firefox from Application folder

Good Jobs :-)
I concur. Tested on 10.6.8 with home directory served over AFP by OSX Server 10.6.8.
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com-be4650747659/try-macosx64/firefox-27.0a1.en-US.mac.dmg
Booted and ran fine, with existing user Firefox library folders.
(In reply to hturkoz from comment #94) and  (In further reply to comment #88)

> - i have launched  firefox-27.0a1.en-US from desktop folder and firefox work
> good. 
> 
> I have not try to launch firefox from Application folder

I have tried running firefox-27.0a1.en-US from the client's application folder and directly from the .dmg disk image on the desktop.  I see no difference; both work when home directory (and FF profile) is accessed remotely from the Server.
Comment on attachment 818119 [details] [diff] [review]
vfs.diff

Marco, I think you should seek a review for this patch :-)
Sweet!

I think QA should still shoot for getting a test environment setup so that we can periodically, at a some point in the Fx QA cycle, make regression test passes over this.
Keywords: qawanted
> I think QA should still shoot for getting a test environment setup
> so that we can periodically, at a some point in the Fx QA cycle,
> make regression test passes over this.

I agree.

It might also come in handy if I ever manage to open a coherent bug
report about the issue I raised in comment #81.
(In reply to Ray Koltys from comment #95)
> I concur. Tested on 10.6.8 with home directory served over AFP by OSX Server
> 10.6.8.
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com-
> be4650747659/try-macosx64/firefox-27.0a1.en-US.mac.dmg
> Booted and ran fine, with existing user Firefox library folders.

I confirm. Tested on 10.7.5 with home directory served over AFP by OSX Server 10.7.5.

Version 27.0a1.en-US runs OK

I also noticed 100 % CPU load (100 % of 1 core among 4), but only time to time, not always.

Thanks for your efforts ; will this patch be included in a minor new version of 24.0 or in next versions ?
This bug, though seemingly intimately related, is NOT solved by this patch:

https://bugzilla.mozilla.org/show_bug.cgi?id=926987
Dear,
just a idea :
server/client in mac/linux can be done with no complex or investment :
Only "symlink" some folder to a remote share (afp/smb/nfs) 

PS : Steven, i have a "little bug" (very little) to report that concern "add-ons", can you provide where and how i can report this strange result of install add-ons like eid belgium ?

Regards
(In reply to hturkoz from comment #102)
> just a idea :
> server/client in mac/linux can be done with no complex or investment :
> Only "symlink" some folder to a remote share (afp/smb/nfs) 

If I understand this correctly, it would only work if one uses the same client Mac all the time.  One advantage of having your home directory on a server is that you don't have this restriction.
Comment on attachment 818119 [details] [diff] [review]
vfs.diff

Asking for review to Ben, considered the positive testing from Steven.
This is actually better than the original patch since it doesn't mask the underlying vfs version.
Attachment #818119 - Flags: review?(bent.mozilla)
Comment on attachment 818119 [details] [diff] [review]
vfs.diff

Review of attachment 818119 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me!
Attachment #818119 - Flags: review?(bent.mozilla) → review+
https://hg.mozilla.org/integration/fx-team/rev/ff5a19df4bed
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → mozilla27
Comment on attachment 818119 [details] [diff] [review]
vfs.diff

I wonder if this may be considered for a ridealong on the ESR branch.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: profiles on AFP,NFS remote shares are broken. Companies using this setup have to revert to ESR 17.
User impact if declined: Firefox hangs on startup.
Fix Landed on Version: 27
Risk to taking this patch (and alternatives if risky): the changes are simple enough to not be scary
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #818119 - Flags: approval-mozilla-esr24?
https://hg.mozilla.org/mozilla-central/rev/ff5a19df4bed
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
(In reply to Carsten Book [:Tomcat] from comment #108)
> https://hg.mozilla.org/mozilla-central/rev/ff5a19df4bed

OK, Thanks to all for your efforts.

I apologize, but perhaps I didn't understand.

"Fix Landed on Version: 27" : does it mean that the bug will be corrected in version 27 ?
> does it mean that the bug will be corrected in version 27?

If we don't do anything more here, the first release in which this bug is fixed will be Firefox 27.

But I strongly suspect that, after we've tested this patch for a while on the 27 and 28 branches, we'll want to "uplift" it to the 26 branch.  That would mean this bug will get fixed in Firefox 26.  It's almost certainly too late to fix this in Firefox 25, whose release is due in a few days.
How about the memory issue discussed in comment #85 comment #87 and comment #90 ???  Is a new bug report needed?  Or, should bug 648042 (which apparently is a duplicate of bug 604710) be updated?  Bug 700397 also appears to be related.
> How about the memory issue discussed in comment #85 comment #87 and comment #90?

I'll get to the 100% CPU issue eventually.  It's very unlike to be a memory issue, and much more likely to be some repetitive (and likely unnecessary) action that (for some reason) does no visible harm except for people who have remote-mounted home directories.

As for the rest, it's unlikely I'll ever have time for them :-(
I haven't seen this reported anywhere yet: the current Thunderbird 24 has the exact same issue with its SQLite databases on AFP home directories. No surprise as the storage stuff is common code.
Still it should be tested if the patch resolves the issue on Thunderbird and which release it might be incorporated into.
Comment on attachment 818119 [details] [diff] [review]
vfs.diff

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 878411
User impact if declined: profiles on AFP,NFS remote shares are broken. Firefox hangs on startup.
Testing completed (on m-c, etc.): m-c, aurora
Risk to taking this patch (and alternatives if risky): simple changes
String or IDL/UUID changes made by this patch: none
Attachment #818119 - Flags: approval-mozilla-beta?
Comment on attachment 818119 [details] [diff] [review]
vfs.diff

This doesn't meet ESR landing criteria and I am not aware of any request from the ESR community for this fix.  We could look at a potential uplift if there was a pressing need for a large deployment but without that, we can leave this until the next ESR.
Attachment #818119 - Flags: approval-mozilla-esr24?
Attachment #818119 - Flags: approval-mozilla-esr24-
Attachment #818119 - Flags: approval-mozilla-beta?
Attachment #818119 - Flags: approval-mozilla-beta+
I can confirm bug fixed in (In reply to Steven Michaud from comment #110)
> > does it mean that the bug will be corrected in version 27?
> 
> If we don't do anything more here, the first release in which this bug is
> fixed will be Firefox 27.
> 
> But I strongly suspect that, after we've tested this patch for a while on
> the 27 and 28 branches, we'll want to "uplift" it to the 26 branch.  That
> would mean this bug will get fixed in Firefox 26.  It's almost certainly too
> late to fix this in Firefox 25, whose release is due in a few days.

I can confirm fix on 27 in both my environments: user-library stored on smb and afs shares. 25 and 26 crashing on both.
(In reply to Steven Michaud from comment #119) 
> Try testing with FF26b2, which was "released" today.

Confirmed; FF26b2 works.  The preexisting 100% cpu problem is still there, but this build seems to work at least as well as FF 23.0.1.
The release version of FF 26 works.  However, I wish that the preexisting 100% cpu problem (discussed in this bug's comments; that I've been complaining about since FF 4) would also be fixed (hint hint).  Thanks!
CPU problems are unrelated to this bug, and commenting about them really isn't welcome here. If you want to file a bug specifically about that with steps to reproduce, please do that separately.
Status: RESOLVED → VERIFIED
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #122)
> CPU problems are unrelated to this bug, and commenting about them really
> isn't welcome here. If you want to file a bug specifically about that with
> steps to reproduce, please do that separately.

Sorry, I just want to verify that this bug's fix works in the release version, but also to mention Comment 112 (I was told there that another bug report wouldn't be necessary).
So it's now fixed in the released Firefox. However, it's still unfixed in the current Thunderbird release (24.2.0) because Thunderbird won't be getting a major revision before the next ESR.
Severity: normal → major
Keywords: perf
Summary: Mac: Firefox 24 doesn't start when home directory is on AFP network share → Mac: Firefox 24 doesn't start, or very slow startup, when home directory is on AFP network share
Home directories on servers on Mac is incredibly common, especially in schools.

The reason you're probably not seeing a lot of reports yet is because the ESR was officially released right before a holiday.

This really should be fixed in the ESR because it impacts corporate environments.

It also breaks Thunderbird.
We have several hundred OS X machines with networked home directories at our school. If the fix isn't pushed to the ESR branch we will end up having to remove Firefox from our machines once version 17 becomes irrelevant.
lsblakk, I'm going to renom this for ESR24 based on evidence that it's blocking ESR adoption.
Attachment #818119 - Flags: approval-mozilla-esr24- → approval-mozilla-esr24?
Blocks: 926181
Comment on attachment 818119 [details] [diff] [review]
vfs.diff

In the interest of ensuring ESR users on Mac can stay on the latest, secure version we can make an exception to the landing criteria for this fix.
Attachment #818119 - Flags: approval-mozilla-esr24? → approval-mozilla-esr24+
Attached file Crash log, ESR24
I don't yet have an AFP share to test on, so I tried an SMB share with a clean remote profile in the meantime.

Launch takes a while, but not terrible. All is fine until I tried to bookmark a page. No bookmark created. Tried a few more times and eventually got a crash.

This is ESR24 using nightly build from 2014-01-19. Crash log attached. I also see a more immediate crash on my own build, built today.

This does not happen on FF27.

Still looking into testing via AFP, but in the interest of time, I wanted to raise this now in case this means our fix in ESR24 is not right.
> I don't yet have an AFP share to test on, so I tried an SMB share

If you have control over your OS X server, you can change whether the Users share point uses AFP or SMB to mount home directories on remote computers.  Run the Server app, under File Sharing select Users, then select "Edit share point".  In the resulting dialog you can make this share point "available for home directories" over either AFP or SMB.
I just tried the 2014-01-26 esr24 nightly over both an AFP and and SMB connection to my test server (OS X Server 2.2), and had no problems creating bookmarks with either kind of connection.

But changing the the server to use one or the other type of connection is more complicated than I realized (not surprising for anything to do with OS X Server).  In addition to the first step above, you also have to:

2) In Workgroup Manager, change your test user's Home setting appropriately.
3) Delete or rename the old home directory, then click on the Create Home Now button in Workgroup Manager.
Attached file esr_crash_log_2.txt
Thanks for the info, Steven. Based on your advice, I tried something different.

I now have another Mac to set up for file sharing. I configured it to default to AFP. Using both Fx27 and ESR24, I created new (unique) temp folders on the remote machine and new remote profiles. 

Both builds crash for me. I see a crash on shutdown for Fx27, and general slowness. I see another crash on ESR after quitting and relaunching the browser (and subsequent times thereafter). I'm attaching these crash logs.

What I'd like to know is if the scenario I'm creating here is valid. I don't use remote profiles, and I'm sure there are many various network configs out there.
Also, to be clear, I don't currently have OSX Server to test with. This is just using two Macs with built-in AFP file sharing, which I'm not sure is valid or not.
> What I'd like to know is if the scenario I'm creating here is valid.

It's hard to know.  You need to write a *detailed* description of how you set up your test environment -- as detailed as my attachment 817358 [details] above.

> I don't currently have OSX Server to test with

Mozilla has at least one company-wide license for Apple's Mac Developer Program (https://developer.apple.com/programs/mac/).  With that you can download OS X Server (pretty much any version you choose) at no charge.  You can even buy your own subscription, if that's more convenient (that's what I've done) -- it's only $99/year.  Once you have a copy, follow my instructions in attachment 817358 [details] to set it up.  You don't need hardware to install it on -- I installed it to a VMWare VM.
Thanks for the detailed info, Steven. First off, I should have tried this with an old build to see if this was a regression. I just tried the same thing with Fx 16 and I get the same crash there, so whatever I'm seeing is not new. That lowers the priority, and I apologize for not trying this sooner.

Regarding getting OSX Server up and running... thanks for the tip. I had hoped to do this faster but we'll do what we can to get this running. Unfortunately, I'm out of town and have to hand this off to someone else today.

Regarding steps to reproduce:

0. Create an admin account on Mac A.
1. Have Mac A and Mac B on the same network. 
 - In my case, both are wireless, which is not a likely scenario for remote profile use.
2. On Mac A, enable sharing via control panel Sharing > File Sharing.
3. On Mac B, cmd-K, enter "afp://" plus the computer name of Mac A on the network.
4. Enter login credentials for an admin account.
5. Confirm that Mac B has access to Mac A's home folder.
6. In Finder, find Mac A's mounted file system and create a new folder named "temp"
7. Launch Fx 27 with option to choose user profile.
8. Create new profile and choose folder you created from step 6 as the home.
9. Launch.

Again, if this is too non-standard, sorry for the noise. Just trying to get this nailed down before release.
It certainly is non-standard (at least from Apple's presumed point of view).  So it's probably not relevant to this bug.  But if it doesn't work, and especially if it crashes, that's probably a real bug in Firefox (or more than one bug).

Matt, how about you open a new bug on this?  Also put a link to this bug in it's See Also field.

I'll try your STR myself, and see what happens.
> I'll try your STR myself, and see what happens.

I tried it over both AFP and SMB connections, and I didn't see any crashes or major problems.  I *did* see some weirdness trying to bookmark http://www.apple.com, but I suspect that was caused by the slow speed of the network connection (over wireless on a home network).  (My Apple bookmark wouldn't "stick" -- it disappeared if I quit and restarted.)

I tested with the 2014-01-26 esr24 nightly, FF 26 and yesterday's m-c nightly, on OS X 10.7.5.
The Apple bookmark weirdness doesn't happen with a "properly" set up remotely mounted home directory.
Matt, you really need to do a test of "correctly" remotely mounted home directories on your network, when you get back from your trip.  I suspect your network is flakier than mine, and that you will have problems even with a standard setup.

That said, though, I suspect it's safe to release esr24 as is.

I know it isn't my job to make that decision.  But I suspect I'm the only Mozillian who's done proper testing, and my test results have been pretty good.
> Can you try Fx27RC build, please?

I tried this with a AFP remote-mounted home directory, and had no problems.  I set a few bookmarks, then logged off and back on again, then started FF again and looked at my bookmarks and history.  I didn't see any problems.

(I did see some problems with an SMB-mounted home directory, but I'm quite sure they were Apple's fault and/or mine.  I haven't seen these problems before, and I think OS X Server gets upset if you switch back and forth between having it support AFP-mounted home directories and SMB-mounted home directories.)
No longer blocks: 926181
You need to log in before you can comment on or make changes to this bug.