Closed Bug 196487 Opened 21 years ago Closed 21 years ago

Doesn't run if home directory is on NFS

Categories

(Core Graveyard :: Profile: BackEnd, defect)

PowerPC
macOS
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.4beta

People

(Reporter: krister, Assigned: ccarlen)

References

Details

(Keywords: fixed1.4, regression)

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.3b) Gecko/20030212
Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.3b) Gecko/20030212

After deleting ~/Library/Mozilla, the first time Mozilla starts it exits almost
immediately.  It has, however, created the directory ~/Library/Mozilla with a
profile called "default".  Next time Mozilla starts it reports that I hve to
choose a user profile.  The dialog lists "default" as one of the available
profiles.  If I choose "default" (or create a new profile and choose that),
Mozilla pops up an error:

Component returned failure code: 0x80004005 (NS_ERROR_FAILURE)
[nslProfileInternal.currentProfile]


Reproducible: Always

Steps to Reproduce:
1. Home directory on NFS
2. Start Mozilla (twice if ~/Library/Mozilla doesn't exist)

Actual Results:  
Mozilla fails to get further than an the Profile Manager.


Expected Results:  
Normal startup.
I can confirm and have reported the exact same behaviour under bug 152287:

In version 1.2.1, Mozilla would launch and load a user profile if that profile
resided on an NFS-mounted home directory.  It would not, however, save
downloaded files to any NFS-mount (including the home directory), makeing it
useless for Mail or all but the most basic browsing.

In 1.3a, 1.3b, and the current (9 March) 1.4a nightly, Mozilla will NOT load a
user profile from an NFS-mounted home directory.  Since this was an issue that
was fixed previously, this appears to be a regression.  I WAS able to save a
downloaded PDF file to an NFS-mounted share when running from a local (HFS+)
home directory with the 1.4a build.

The error message on failing to load the profile is:
"Component returned failure code: 0x80004005 (NS_ERROR_FAILURE)
[nslProfileInternal.currentProfile]

Please elevate the priority of this bug if at all possible.  Thank you.
Confirming. This is a result of using fcntl profile locking instead of symlink
profile locking. fcntl is more failsafe on local volumes but doesn't work on
some NFS volumes. On the Camino (Chimera) branch, this specific failure was
allowed. Now that Camino is on the trunk, and for Mozilla, this needs to be
revisted on the trunk.
Status: UNCONFIRMED → NEW
Ever confirmed: true
*** Bug 199557 has been marked as a duplicate of this bug. ***
*** Bug 199870 has been marked as a duplicate of this bug. ***
It doesn't appear that this bug is targeted to a milestone.  If this worked in
1.2, shouldn't it block release of 1.4?  I can see not fixing it in the 1.3 tree
because that's not supposed to be stable, but 1.4 is an even numbered release. 
It appears that this bug may also be affecting the ability to run recent
versions of evolution (so far all that is clear is that evolution has a similar
problem with NFS home directories and that there are Mozilla dependencies in
evolution).
Flags: blocking1.4?
*** Bug 204643 has been marked as a duplicate of this bug. ***
This is a pretty big one for adoption of software in a corporate environment. It
could definitely block deployment.
Flags: blocking1.4b?
Flags: blocking1.4b?
Flags: blocking1.4b?
As promised, I've done some investigation.

From the patch in bug 76431:

>+    // First, try the 4.x-compatible symlink technique, which works with NFS
>+    // without depending on (broken or missing, too often) lockd.

So, Mac OS/X NFS is _not_ supporting symlink().  If it doesn't support
symlink(), does it run lockd by default?  If so does lockd work?

If Camino switched from the "symlink then fcntl" algorithm, that's a problem.

Looks like this checkin is the root cause:

http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsProfileAccess.cpp&branch=&root=/cvsroot&subdir=mozilla/profile/src&command=DIFF_FRAMESET&rev1=1.71&rev2=1.72

The comment was:
Use only fcntl-based profile locking on Mac OS X (disable symlink-based
locking). Fixes bug 176608. r=ccarlen, sr=jag, a=roc.

Here are two relevant comments from that bug:
---------------------
------- Additional Comment #5 From Conrad Carlen  2002-10-24 19:27 -------

The #ifdef I'd like to see is one which causes us to do locking via fcntl
instead of the symlink method which requires signal handlers to be set in the
first place. If we use fcntl:

1. We're absolutely guaranteed not to have a stuck lock if the machine is rebooted.
2. We're not going to suffer from PID rollover after reboot or changing IP
adresses after the machine goes to sleep and is woken up in a new net environment.
3. No need for signal handlers so we don't have this problem.

Brendan, I know that locking via fcntl is not supported by all NFS servers but
which servers are those exactly? Considering that we're getting more than a few
bugs on stuck locks with Chimera, I think it may be better to relnote the
problem of not all NFS servers supporting locking and use fcntl. I think that
far fewer people would run into the problem of having their profile on such a
server than are running into stuck locks. I'm in favor of this #ifdef only for
XP_MACOSX - not for XP_UNIX in general.


------- Additional Comment #6 From Brendan Eich 2002-10-24 19:44 -------

I think I said in another bug that the broken NFS servers don't care whether
your client Mozilla was compiled XP_UNIX or XP_MACOSX, but sure, you can use
fcntl and see whether there are fewer bugs filed against the locking code.  I'm
not stopping that from happening.  Try it and if it works over a few milestones
("works" meaning you get 0 or fewer bugs about stuck/broken locks due to
NFS-mounted profile dirs), maybe we should consider using fcntl for XP_UNIX.

/be

---------------------

Looks to me like the number of servers without working fcntl locking is
definitely non-0, unless it's simply a bug in the non-symlink code.

I would assume there are other solutions to the signal() issue.  The "we had our
IP address change" issue is real and was not really considered in the design for
the locking.  It's not as large as you'd think since it only bites you if you
crash mozilla (leaves symlink w/ wrong IP).  To some extent that could be
avoided by checking that our IP hasn't changed periodically and re-writing the
link.  Ugly, but would work in most cases.  If we could be notified of IP
changes the hole would be very small (you'd think there'd be a way; it is rather
important to a number of apps such as things that do H.323/SIP, etc).
> If Camino switched from the "symlink then fcntl" algorithm, that's a problem.

That depends on for whom (the minority vs. the majority). The number of stuck
lock bugs on OS X has been zero since we went to fcntl locking. It's more
reliable on the local machine, and that's not going to change.
Clearly, though, we need a solution that works for profiles on NFS home dirs as
well as local home dirs. I am working on a patch now for OS X which should do
that. Somebody suffering from the NFS home dir problem has offered to help me test.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.4beta
If you need more testers, then feel free to contact me at t.bubeck@reinform.de. 
I'm using Mac OS 10.2.5 toegether with RHAT 7.1 running 2.4.19 as NFS Server.
Ok.  If possible, I'd prefer a solution that (on NFS from OS/X) uses the
symlink.  That would allow an NFS-mounted profile to be used from Linux/BSD/etc
as well as OS/X.

Hmm.  Question: what if the profile is on a local OS/X volume - but the volume
is exported via NFS and may be used from elsewhere as well?  Does/will the code
you're working on handle that case?
Flags: blocking1.4b?
Apparently this bug is different from bug 90682? A regression?
Keywords: regression
*** Bug 90682 has been marked as a duplicate of this bug. ***
I too am happy to test fixes for this bug.
I am also willing to test (can't test anything else until it is fixed!) : (viv <at> ic.ac.uk)

I have a related problem that I would love to see fixed that earlier versions exhibit (latest Netscape 
for example):  When it starts up, it takes ages to load, and it causes the nfs file system containing 
my home directory to be mounted a second time under /Volumes/mathew (the server name is 
mathew, the original mount point is /Users/viv).  If I close and reopen Netscape, the volume is 
mounted a third time, and so on.  It isn't simple to unmount these unwanted mounts, 
unpredictable things happen.  Usually the system reports they are "busy" but once when it didn't I 
then found it thought my Trash folder had become my ~/Library folder.  Nothing else I run causes 
anything like this except the mozilla engine.
Blocks: 101953
*** Bug 206632 has been marked as a duplicate of this bug. ***
*** Bug 207259 has been marked as a duplicate of this bug. ***
*** Bug 207265 has been marked as a duplicate of this bug. ***
*** Bug 207412 has been marked as a duplicate of this bug. ***
Attached patch patchSplinter Review
Patch allows OS X to work over NFS by using fcntl first and, if that fails, it
uses the symlink approach. There's the caveat that running the same profile
locally on the NFS server and remotely from an NFS client won't work - the lock
will go undetected. I think we'll have to live with that.

diff came up with the wackiest interpretation of the changes, so I'll describe
them a bit and then attach the whole new file. Both the fcntl locking code and
the symlink locking code were factored out of nsProfileLock::Lock and into
separate routines. Pretty much straight copy and paste. Since we're no longer
making Mac CFM builds, all the glop I put in to allow CFM code to call through
to Mach-O code (fcntl) could be removed.
I rebuilt Mozilla with sources from May 31 and the patch applied.
I can confirm the patch works for me and Mozilla starts without a problem.
The NFS server is running FreeBSD 4.8.
Comment on attachment 124460 [details] [diff] [review]
patch

reviewers - see comment 20
Attachment #124460 - Flags: superreview?(brendan)
Attachment #124460 - Flags: review?(bryner)
Comment on attachment 124460 [details] [diff] [review]
patch

Looks good to me. r=bryner.
Attachment #124460 - Flags: review?(bryner) → review+
Comment on attachment 124460 [details] [diff] [review]
patch

sr=brendan@mozilla.org

/be
Attachment #124460 - Flags: superreview?(brendan) → superreview+
Checked into trunk, will ask for approval for 1.4.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment on attachment 124460 [details] [diff] [review]
patch

Seeking approval for 1.4. 

Code is only different after fcntl lock fails on Mach-O. At that point, without
the patch, we're  doomed anyway (forced to quit).
Attachment #124460 - Flags: approval1.4?
reporter, commenters,

will you try this now and verify fix?
Everything OKAY on this nightly build

Mozilla 1.5a
Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.5a) Gecko/20030604

on Mac OS X 10.2.5 using a NFS home mounted from Linux 2.4.18.

Thanks!
thanks!

verified
Status: RESOLVED → VERIFIED
Comment on attachment 124460 [details] [diff] [review]
patch

a=asa (on behalf of drivers) for checkin to the 1.4 branch.
Attachment #124460 - Flags: approval1.4? → approval1.4+
a=adt Please land this fix on the 1.4 Branch and add the keyword fixed1.4
Checked into branch.
Keywords: fixed1.4
*** Bug 192737 has been marked as a duplicate of this bug. ***
This may have caused regression bug 209048 ... ;-((
mozilla1.4 shipped. unsetting blocking1.4 request.
Flags: blocking1.4?
There is a report to which it seems that this bug recurred.
Can someone be checked?
   bug 234395
(In reply to comment #37)
> There is a report to which it seems that this bug recurred.
> Can someone be checked?
>    bug 234395
> 

That bug, while the end result is the similar, is a different problem.
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: