fcntl file locking needs to set other fields in struct flock.

RESOLVED FIXED in 4.1.1

Status

defect
P3
normal
RESOLVED FIXED
19 years ago
18 years ago

People

(Reporter: wtc, Assigned: wtc)

Tracking

4.1.1
x86
Neutrino

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

One implementation of NSPR file locking functions
is based on fcntl().  (Currently only Neutrino uses
this implementation.)  It only sets the l_type field
of struct flock before passing it to fcntl().  Other
fields, such as l_whence, l_start, and l_len, should
also be set.
Posted patch Proposed fix.Splinter Review
See W. Richard Stevens, Advanced Programming in
the Unix Environment, Program 12.2 on p. 370 for
sample code.

Jerry, could you test the proposed fix?
mozilla/nsprpub/pr/tests/lockfile.c is a test
program for PR_LockFile.  Note that PR_LockFile
is not used by Mozilla.
Status: NEW → ASSIGNED
Checked in the updated patch on the main trunk
and NSPRPUB_RELEASE_4_1_BRANCH.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Target Milestone: --- → 4.1.1
There are two problem with fcntl file locking in multi-threaded applications.
I'll describe them below.  Because of these problems, fcntl file and record 
locking using F_SETLKW is unsuitable for use in multi-threaded applications, 
IMO.  So, I wonder if it is wise to implement PR_LockFile with fcntl.

fcntl locks are associated with a process, not with individual threads.  
This creates two problems.

1. If one thread holds a lock on a file (or record in a file) and
a second thread in the same process attempts to grab the lock, the 
second thread does not block.  Rather, the fcntl call succeeds, 
returning an indication that the lock has been aquired.  Thereafter
the two threads both think they have exclusive access when they don't.
This can be worked around by using mutexes.

2. A more serious problem is the "false deadlock" problem.  fcntl
will sometimes return -1 with errno EDEADLK instead of blocking,
even when there is not really a deadlock.  An example will illustrate
this.

Imagine two libraries, libA and libB, each of which uses fcntl to lock 
a file, fileA and fileB respectively.  Each library uses fcntl to lock 
just its one file, and never returns from the library function while 
holding its lock, so it cannot deadlock.  

Now imagine a multi-threaded multi-process program.  Each process has
one thread the uses libA and a second thread that uses libB.  The 
following sequence occurs.

Process 1, thread A, libA, gets lock on file A.
Process 2, thread B, libB, gets lock on file B.
Process 1, thread B, libB, uses F_SETLKW to get lock on file B and blocks.
Process 2, thread A, libA, uses F_SETLKW to get lock on file A, and fails,
                           getting EDEADLK.  

There is no real deadlock here.  There is no lock ordering problem ehre. 
The two threads that hold the locks on files A and B are not blocked 
waiting on the other locks.  But fcntl doesn't see the threads.  It 
sees only the processes.  To fcntl, it appears that
Process 1 holds the lock on A and wants the lock on B, and
Process 2 holds the lock on B and wants the lock on A, a classic deadlock.

IMO, this problematic behavior of fcntl F_SETLKW in multi-threaded apps
is a serious shortcoming of fcntl record locking.  
This is why PR_LockFile is not in the public NSPR API.
You need to log in before you can comment on or make changes to this bug.