Closed Bug 307147 Opened 19 years ago Closed 3 years ago

Firefox should be a lot smarter about detecting stale lock files

Categories

(Toolkit :: Startup and Profile System, defect)

defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: gerv, Unassigned)

References

Details

If Firefox (or Thunderbird) crashes, sometimes a lock file can be left behind in
the profile. At least on Linux, you have to delete this manually before you can
use the profile again. For those who don't know how to do this, they've
basically lost all their profile info (bookmarks etc.). I get complaining emails
about this regularly, and it happened to a friend of mine. And helping people
delete stale lockfiles over the phone is really hard.

Would it be possible for the lockfile to contain info which let a new process
determine whether the creator of the lockfile was actually still running?

Gerv
This has already been addressed by patches in bug 253950 and bug 302039, as well
as in the still-open bug 286355. Is there something here that those bugs don't
cover?
is this not covered by bug 151188? In current m1.8branch 2005-09-01-14Z fx on
linux, I get

lrwxrwxrwx /home/tt/.mozilla/firefox/atv3emgs.test1/lock -> 127.0.0.1:+6454

which, after a kill -KILL 6454 of the fx running on that profile, is left
behind. Then, on restart on that profile, fx starts normally, the lock is
silently ignored and replaced, as expected.
Ah, reading this again I see that my comments are somewhat orthogonal to this
bug as filed. Tuukka found the right bug, though :).
Tuukka: so that's OK for Linux, but what about Windows? Does it have a similar
system? (That's where the complaints I've had have been coming from.)

Gerv
no idea, hence didn't dup :)
It also needs to be smarter about determining whether lockfiles even exist.  This may be worthy of a separate bug, but there seems to be resistance to more bugs on the lockfile behaviour...

I mount my home directory over NFS, and if the NFS server is under heavy load the local firefox gives reports 'not responding, blah blah' rather than retrying with na slightly longer timeout, which should be the correct behaviour IMHO.  Moving the profile directory off of the nfs share solves the problem.  Note that the NFS server _is_ reachable, as firefox can read the profiles.ini file to track down the new profile location.

firefox is the only program I see failing to open files in this situation, does it do something funky during the file-exists check?

I'm open to testing patches in my environment, btw.

/home/dave is nfsmounted with default mount options, and exported with (rw,sync,no_root_squash)
----
/home/dave/.mozilla/firefox/profiles.ini

WORKING: 
[Profile0] 
Name=default
IsRelative=0
Path=/home/dave_local_storage/firefox_profile

BUSTED:
[Profile0]
Name=default
IsRelative=1
Path=wtt3nmfx.default
(In reply to comment #6)
> It also needs to be smarter about determining whether lockfiles even exist.

That sounds rather more like bug 278860, no?
*** Bug 327432 has been marked as a duplicate of this bug. ***
(In reply to comment #1)
> This has already been addressed by patches in bug 253950 and bug 302039, as
> well
> as in the still-open bug 286355. Is there something here that those bugs don't
> cover?

Yes. Please see bug 327432, which has just been marked a duplicate of this bug.
*** Bug 337348 has been marked as a duplicate of this bug. ***
I've got this issue when using several hosts and the homedir mounted via NFS. 
Mozilla isn't able to detect an previously crashed instance when this one was on an different host. Okay, this is quite tricky, I admit.

If pipes or unix sockets would be passed properly by NFS (my tests showed up they aren't), we could use them. But there may be other ways:

+ maybe ident could be used if the process is still running ? (don't know much about it). 
+ use flock(), but this also doesn't work stable w/ NFS (at least linux-2.4 vs. linux-2.6 makes problems) 
+ we could regularily touch the lockfile's mtime and consider lockfiles of some age (ie. >30secs) as stale. Since it just touches metadata, the additional load shouldn't be noticable on common systems, and as long as the clocks are synchronized it should be working.
+ extend the remote control capability to TCP (ie. use an key stored within the lockfile for auth) and use it for checking.



please don't forget that people might share profiles w/ older versions, so you can't go randomly changing the protocol, this has to work w/ older versions :).
Of course. 

What do you think about adding switches for it to commandline or environment ?
(In reply to comment #0)
> If Firefox (or Thunderbird) crashes, sometimes a lock file can be left behind
> in
> the profile. At least on Linux, you have to delete this manually before you can
> use the profile again. For those who don't know how to do this, they've
> basically lost all their profile info (bookmarks etc.). I get complaining
> emails
> about this regularly, and it happened to a friend of mine. And helping people
> delete stale lockfiles over the phone is really hard.
> 
> Would it be possible for the lockfile to contain info which let a new process
> determine whether the creator of the lockfile was actually still running?
> 
> Gerv
> 

(In reply to comment #0)
> If Firefox (or Thunderbird) crashes, sometimes a lock file can be left behind
> in
> the profile. At least on Linux, you have to delete this manually before you can
> use the profile again. For those who don't know how to do this, they've
> basically lost all their profile info (bookmarks etc.). I get complaining
> emails
> about this regularly, and it happened to a friend of mine. And helping people
> delete stale lockfiles over the phone is really hard.
> 
> Would it be possible for the lockfile to contain info which let a new process
> determine whether the creator of the lockfile was actually still running?
> 
> Gerv
> 

I have some 40 teachers on Linux terminal server and I constanly have to go in and clear locks for them. It is so easy for the locks to be left when they power off the thin client without logging out. It also tells them that Firefox (Thunderbird has the same problem) is running and refuses to start a new copy. This is true whether I have NFS mounted home directories or not. This is extremely user unfriendly for people who are not really Linux gurus. If I can write a script to clear these things why can't it clear its own.
*** Bug 362381 has been marked as a duplicate of this bug. ***
Can we get any progress on this?  This is really becoming a major pain.
Looks like a NFS file locking issue:

2918  open("/home/orion/.thunderbird/jrzobt68.default/.parentlock",
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 6
2918  fcntl64(6, F_GETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1,
pid=0}) = 0
2918  fcntl64(6, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = -1
EAGAIN (Resource temporarily unavailable)
2918  close(6)                          = 0

See also https://bugzilla.redhat.com/show_bug.cgi?id=435480

Product: Firefox → Toolkit

See also https://bugzilla.redhat.com/show_bug.cgi?id=435480

That bug report has since been closed.

(In reply to Orion Poplawski from comment #19)

Can we get any progress on this? This is really becoming a major pain.

Orion, can you still reproduce this issue?

Flags: needinfo?(orion)
Whiteboard: [closeme 2021-07-10]

Resolved per whiteboard

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(orion)
Resolution: --- → INCOMPLETE
Whiteboard: [closeme 2021-07-10]
You need to log in before you can comment on or make changes to this bug.