Open Bug 1780004 Opened 2 years ago Updated 2 years ago

isOtherInstanceRunning() is unexpectedly false when /tmp/MozillaUpdateLock* is removed (e.g. by OS)

Categories

(Toolkit :: Application Update, defect, P3)

defect

Tracking

()

Tracking Status
firefox-esr91 --- wontfix
firefox-esr102 --- affected
firefox102 --- wontfix
firefox103 --- wontfix
firefox104 --- wontfix
firefox105 --- wontfix
firefox106 --- wontfix

People

(Reporter: robwu, Unassigned)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [fidedi-ope])

The update manager relies on a file in a temporary location (i.e. MozillaUpdateLock) to detect whether other instances are active. This is not reliable, because the file can freely be deleted by the OS. Consequently, Firefox binaries can be replaced even with existing Firefox instances still running.

For example, on macOS temporary files are deleted after three days (according to atime/mtime of the file).

The user impact is that after ~3-4 days, a new Firefox instance triggers an application update (replacement) and breaks existing Firefox instances. A Firefox launch or quit does not touch atime/mtime of the MozillaUpdateLock, so the clock starts ticking since the first Firefox launch (you can verify this by using stat -x /private/tmp/MozillaUpdateLock-31210A081F86E80E, which shows that all timestamps are not changed).

STR:

  1. Start Firefox
  2. Open another instance of Firefox, e.g.
    '/Applications/Firefox Nightly.app/Contents/MacOS/firefox' --no-remote -profile profdir
    (or web-ext run -f nightly, to load Firefox with the web-ext tool)
  3. Visit a privileged page and open the console there (or just use the Browser Console if you've enabled browser chrome debugging), and run the following snippet:
    Cc["@mozilla.org/updates/update-sync-manager;1"].getService(Ci.nsIUpdateSyncManager).isOtherInstanceRunning();
  4. Look in /tmp/ for the MozillaUpdateLock file. E.g. /private/tmp/MozillaUpdateLock-31210A081F86E80E
  5. Wait 4 days (on macOS), without closing Firefox.
  6. Repeat step 4 (to observe that the file has been removed by macOS)
  7. Open a new instance of Firefox (with a new profile directory).
  8. Repeat step 3 to see whether thew Firefox instance is aware of the other instances.
  9. Run lsof | grep MozillaUpdateLock to see the open file handles.

Expected:

  • At step 3, the result of isOtherInstanceRunning() is true.
  • At step 4, the MozillaUpdateLock file exists.
  • At step 6, the MozillaUpdateLock file has been removed by the OS.
  • At step 8, the result of isOtherInstanceRunning() should be true.
  • At step 9, all three Firefox instances should have a reference to the same MozillaUpdateLock file AND the same inode.

Actual:

  • Step 3, 4, 6 are as expected (confirming that the test setup is valid).
  • At step 8, isOtherInstanceRunning() is false. because of this, Firefox will auto-update itself and cause bugs like https://github.com/mozilla/web-ext/issues/2383
  • At step 9, while the first two Firefox instances share the same MozillaUpdateLock file, the third one has a new file, as seen by the different inode.

Example of lsof output. Note that 61581958 is the inode of the first two launches, and 61650972 is the inode of the last launch (firefox-b is firefox-bin here via web-ext).

firefox   63019  user  14r      REG                1,4          0            61581958 /private/tmp/MozillaUpdateLock-31210A081F86E80E
firefox-b 63300  user  14r      REG                1,4          0            61581958 /private/tmp/MozillaUpdateLock-31210A081F86E80E
firefox-b 69254  user  14r      REG                1,4          0            61650972 /private/tmp/MozillaUpdateLock-31210A081F86E80E

Set release status flags based on info from the regressing bug 1553982

This feels like it's quite edge case-y given how long an instance needs to be running for to trigger it in the example given. Nonetheless, it seems like we ought to renew this lock from time to time (perhaps tying it in with update checks could make sense, given that those are somewhat related, and already on a fairly long timer). We even have code for doing so (that's currently only used by tests AFAICT).

Severity: -- → S3
Priority: -- → P3

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #2)

This feels like it's quite edge case-y given how long an instance needs to be running for to trigger it in the example given.

3-4 days is not an edge case. A device can be suspended without quitting Firefox, to easily end up with a multi-day browsing session.
If my SQL query is correctly written (I am no expert), then CDF of longest session lengths of clients of Firefox Desktop in the past 28 days (1% sample) shows that 10% of the Nightly users have 5+ day sessions. So despite the daily nags to download the update, 10% don't update (or at least they don't restart after an update).

Nonetheless, it seems like we ought to renew this lock from time to time (perhaps tying it in with update checks could make sense, given that those are somewhat related, and already on a fairly long timer). We even have code for doing so (that's currently only used by tests AFAICT).

That sounds reasonable.

Another option is to write to a location where the lock file is not automatically nuked by a periodic task. Or is that not an option for some reason?

:mhowell, since you are the author of the regressor, bug 1553982, could you take a look?
For more information, please visit auto_nag documentation.

Flags: needinfo?(mhowell)

(In reply to Rob Wu [:robwu] from comment #3)

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #2)

This feels like it's quite edge case-y given how long an instance needs to be running for to trigger it in the example given.

3-4 days is not an edge case. A device can be suspended without quitting Firefox, to easily end up with a multi-day browsing session.
If my SQL query is correctly written (I am no expert), then CDF of longest session lengths of clients of Firefox Desktop in the past 28 days (1% sample) shows that 10% of the Nightly users have 5+ day sessions. So despite the daily nags to download the update, 10% don't update (or at least they don't restart after an update).

Nonetheless, it seems like we ought to renew this lock from time to time (perhaps tying it in with update checks could make sense, given that those are somewhat related, and already on a fairly long timer). We even have code for doing so (that's currently only used by tests AFAICT).

That sounds reasonable.

Another option is to write to a location where the lock file is not automatically nuked by a periodic task. Or is that not an option for some reason?

I talked to Molly (the original author of this) for some context. She said it was chosen because there was no real equivalent of ProgramData (where we put these on Windows) and wanting to have some level of confidence that the file would eventually get deleted. She also suggested that we could bump atime whenever we check the lock state - which seems like a good idea to me.

Flags: needinfo?(mhowell)
Whiteboard: [fidedi-ope]

Set release status flags based on info from the regressing bug 1553982

Should we bump the priority here given comment 3?

Flags: needinfo?(bhearsum)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #7)

Should we bump the priority here given comment 3?

Going to defer this to Kirk, who has much more context/history.

Flags: needinfo?(bhearsum) → needinfo?(bytesized)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #7)

Should we bump the priority here given comment 3?

I'm inclined to say no. I would like to fix this soonish. But AFAIK this is only used by the update system. And it times out waiting for other instances to close after 6 hours. If the user's session has already been running for 3 days, it doesn't feel urgent that we make sure that we wait another 6 hours for other instances to close.

That being said, I will see about bumping this a bit further up my to do list.

Flags: needinfo?(bytesized)

(In reply to Kirk Steuber (he/him) [:bytesized] from comment #9)

But AFAIK this is only used by the update system. And it times out waiting for other instances to close after 6 hours. If the user's session has already been running for 3 days, it doesn't feel urgent that we make sure that we wait another 6 hours for other instances to close.

I filed this bug because I was fed up with my browser becoming unusable after having launched an independent new instance. Your comment suggests that even if the identified issue were to be fixed, that the updater would still update after 6 more hours.

If this is indeed the case, then I'd like to know what I can do to start a new instance of Firefox without replacing/updating the binary. I'd like to use whatever supported mechanism you recommend in the web-ext CLI tool, to launch Firefox without side effects.

(In reply to Rob Wu [:robwu] from comment #10)

I filed this bug because I was fed up with my browser becoming unusable after having launched an independent new instance. Your comment suggests that even if the identified issue were to be fixed, that the updater would still update after 6 more hours.

Actually, it might not even get you the additional 6 hours. We check if there is currently another instance running when we download and stage a new update. If there is only one instance running at that point, we will do that without waiting the additional 6 hours. When Firefox is launched with a new profile and there is an update staged, it will immediately be installed.

If this is indeed the case, then I'd like to know what I can do to start a new instance of Firefox without replacing/updating the binary. I'd like to use whatever supported mechanism you recommend in the web-ext CLI tool, to launch Firefox without side effects.

Sadly, until we are able to fix Bug 1480452, this is pretty much the expected behavior. Really the only way to get around it would be to use an Enterprise Policy like DisableAppUpdate or ManualAppUpdateOnly to prevent Firefox from ever updating automatically, putting the burden on the user to make sure it stays up to date.

You need to log in before you can comment on or make changes to this bug.