Closed Bug 303633 Opened 19 years ago Closed 19 years ago

Profile manager reports all profiles as locked

Categories

(Core Graveyard :: Profile: BackEnd, defect)

x86
Linux
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: MatsPalmgren_bugz, Assigned: MatsPalmgren_bugz)

References

Details

(Keywords: fixed1.8, regression)

Attachments

(2 files, 1 obsolete file)

STEPS TO REPRODUCE
1. have your home directory NFS-mounted
2. start SeaMonkey (Linux/GTK2/i386)

ACTUAL RESULTS
Profile manager says all profiles are locked, even creating a new one and
choosing it is reported as locked.
If I start it using "HOME=/tmp seamonkey" then it works, so I guess this
has something to do with my HOME being NFS-mounted...

The regression occured 2005-08-02-03 -- 2005-08-03-04
Looking at the checkins I'm guessing it was caused by bug 151188
Not sure if this was intentional or not.
Depends on: 151188
This was not intended.  It sounds as though file locking over NFS is still
busted for some scenarios.  Mats, can you strace and post a log of system calls?

/be
Attached file strace (obsolete) —
My home directory is /home/mats
Attached file strace 2
This includes sub-processes as well... I had to delete a large chunk at the
beginning and end to make it fit within the maximum attachment size 300kb.
Hopefully I kept the bit you were interested in.
Attachment #191797 - Attachment is obsolete: true
With a debug build I get:

fcntl(F_SETLK) failed. errno = 13
###!!! ASSERTION: Could not get profile directory lock.: 'Error', file
nsProfile.cpp, line 1175
(In reply to comment #5)
> With a debug build I get:
> 
> fcntl(F_SETLK) failed. errno = 13
> ###!!! ASSERTION: Could not get profile directory lock.: 'Error', file
> nsProfile.cpp, line 1175

Great, EACCES ("Permission denied").

I ported Sun's original vnode-based NFS to SGI Irix almost 20 years ago.  The
design decision to keep NFS stateless, which was in the end hopeless, meant that
the "lock manager" woud have to solve the obvious hard problem, later (unobvious
hard problems NFS ended up having to solve involved avoiding replay of mutating
operations due to its crappy UDP-based error control, via an "idempotency cache"
indexed by RPC transaction id).

Being stateless at one layer pushes problems up the stack; it can be a good
trade-off for the whole system, or it can be a shortcut for implementors working
at only the lower layer.  With NFS I think the latter was more the case.  The
gain of a simpler (nominally; Sun RPC in full is not simple) protocol was offset
by the complexity of recovering Unix-ish statefull semantics, which costs were
borne by many others, not by the NFS designers.

When lockd first appeared several years later, it was a bug-ridden hunk of junk.
 The situation did not improve while I was paying attention, wherefore the whole
pid-signed symlink profile locking scheme that I created first in 1995 for
Netscape 2.

Anyway, is locking over NFS still just unreliable?  Or is it possible that your
home directory is exported with some option preventing you from locking your own
files?  If you can write to files in your NFS-mounted profile directory, you
ought to be able to lock them.  Perhaps the problem is that you get EACCES
always for any attempt to lock a symlink?

We should be able to work around this, whatever it is, but it would be best to
understand the problem.

/be
(In reply to comment #6)
> When lockd first appeared several years later, it was a bug-ridden hunk of junk.

After some testing, it appears it still is.

My NFS server is a FreeBSD 5.3/i386.
My NFS client is a Linux 2.6/i386 (SuSE 9.3).

I can create files but not lock them with fcntl().

It turns out 'lockd' does not come for free when you enable the NFS
server and export filesystems on FreeBSD, you have to explicitly say,
(in /etc/rc.conf):
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
in addition to:
nfs_server_flags="-u -t -n 2 -h 192.168.1.1"
nfs_server_enable="YES"
which I already had.

So after adding that and rebooting both the client and server my test
program reports that fcntl(F_SETLK) succeeds.
The problem is that now my KDE desktop (on the client) is not very happy,
I can login but no new windows appears and it eventually just freezes.

I have a second NFS server that runs Linux 2.4/i386 and
if I set my HOME to a directory there I get:
fcntl(F_SETLK) failed. errno = 37
(ENOLCK          37      /* No record locks available */)

This system seems to have started 'lockd' automatically so I don't know
why it gives an error, but since the error isn't EAGAIN or EACCES
SeaMonkey falls back to using "LockWithSymlink()" which works.
Attached patch fix?Splinter Review
So after disabling my flakey 'lockd' again this is a workaround that works
for me. It does a "fcntl(F_GETLK)" before doing the real lock operation
and if it fails we return NS_ERROR_FAILURE instead.
Normally "fcntl(F_GETLK)" should succeed both when the file is locked or not,
so a failure would indicate that fcntl() isn't working as expected for this
file. The Linux man-page fcntl(3) says:

F_GETLK
       On  input to this call, lock describes a lock we would like to place
       on the file.  If the lock could be placed, fcntl() does not actually
       place it, but returns F_UNLCK in the l_type field of lock and leaves
       the other fields of the structure unchanged.  If one or more  incom
       patible	locks  would  prevent  this lock being placed, then fcntl()
       returns details about one of these locks in  the  l_type,  l_whence,
       l_start,  and  l_len  fields of lock and sets l_pid to be the PID of
       the process holding that lock.

F_SETLK
       Acquire a lock (when l_type is F_RDLCK or F_WRLCK) or release a lock
       (when l_type is F_UNLCK) on the bytes  specified  by  the  l_whence,
       l_start, and l_len fields of lock.  If a conflicting lock is held by
       another process, this call returns -1 and sets errno  to  EACCES  or
       EAGAIN.

I think it resolves the ambiguity of EACCES for F_SETLK, what do you think?
Comment on attachment 191833 [details] [diff] [review]
fix?

The second hunk fixes an unrelated compilation warning.
Comment on attachment 191833 [details] [diff] [review]
fix?

Thanks, this looks good to me.	I can't believe lockd hasn't been fixed in all
these years and Unix/Linux rewrites.

/be
Attachment #191833 - Flags: superreview?(roc)
Attachment #191833 - Flags: review+
Attachment #191833 - Flags: approval1.8b4?
Flags: blocking1.8b4+
Comment on attachment 191833 [details] [diff] [review]
fix?

looks good. thanks for handling this
Attachment #191833 - Flags: superreview?(roc) → superreview+
Attachment #191833 - Flags: approval1.8b4? → approval1.8b4+
Assignee: roc → mats.palmgren
Checked in to trunk at 2005-08-08 19:40 PDT.

-> FIXED
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Keywords: fixed1.8
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: