Closed Bug 1553982 Opened 4 years ago Closed 3 years ago

Use a multi-instance lock/semaphore to track the number of running instances of an installation so we can prevent updating a running instance out from under it

Categories

(Toolkit :: Application Update, enhancement, P1)

68 Branch
enhancement

Tracking

()

RESOLVED FIXED
86 Branch
Tracking Status
firefox86 --- fixed

People

(Reporter: robert.strong.bugs, Assigned: mhowell)

References

(Blocks 4 open bugs, Regressed 4 open bugs)

Details

(Whiteboard: [iu_tracking])

Attachments

(7 files)

Some time in the past mhowell suggested using a semaphore to determine whether there are other instances running. This would allow us to lessen the issue where launching an additional instance of the same installation will perform an update and thereby require the other instance to restart. Below is a very simple flow of how I think this could work.

  1. Create or open the semaphore on startup.
  2. Use the Install hash for the name.
  3. Prefix with Global\ on Windows so it is available to all users
  4. The security of the semaphore will need to allow other users to work with it (need to figure out which users should be able to). This would allow other processes to DOS the update.
  5. Permissions will need to be set on the Linux and OS X semaphores files so other users can use it. This would allow other processes to DOS the update.
  6. For exits and crashes it will need to release the semaphore and it should close the handle when the instance count is 0. Does anyone know if this will be an issue in our crash handling code?
  7. After X amount of time has passed (channel specific) based on a user pref (prevents other user processes from preventing this) without updating due to the semaphore force the update. This will likely be done with a new update status and this will mitigate the ability to DOS the update.
  8. UI will most likely be shown when it is going to force the update.
  9. Change the update restart UI or possibly display a warning so the user is informed that clicking the button will force the update.

Note: Some people have had issues with users creating semaphores on Windows Terminal server so it should ignore the semaphore when it can't be created or opened.

I think this method should also be sufficient for the Update Agent to determine if there is a running instance for an update.

Summary: Use a semaphore to track the number of instances of an installation are running so we can prevent updating a running instance out from under it → Use a semaphore to track the number of running instances of an installation so we can prevent updating a running instance out from under it
Blocks: 1464441
See Also: → 1366808
See Also: → 1624992
Whiteboard: [iu_tracking]

I did a few minutes digging into what we have in tree that's relevant to this ticket. For context, we want:

  1. to interact with the new semaphore across all of our platforms (Windows, macOS, Linux);
  2. from at least two different processes -- Firefox and the background update agent -- that do not have any relationship with each other;
  3. with one of the processes, the background update agent, being written in Rust and not having the Firefox-y infrastructure to hand.

There are many semaphore implementations in the tree. Most aren't intended to address point 2: they're sem_init (under POSIX). In particular ipc/glue/CrossProcessSemaphore.cpp expects to pass shared memory between processes, which isn't pleasant here, since there's no parent/child relationship. NSPR has an implementation but it's unused in-tree (and NSPR is large for this small use case). There are a many Rust implementations but most aren't intended for global sharing (i.e., with sem_open (under POSIX) semantics).

I conclude that the most sensible thing is to wrap POSIX and Windows semaphores directly. It would be nice to share the code but it's a little awkward to have a little C++ RAII class that can also be used from Rust.

Assignee: nobody → mhowell
Severity: normal → N/A
Status: NEW → ASSIGNED
Type: defect → enhancement
Priority: P2 → P1
Attachment #9185329 - Attachment description: Bug 1553982 Part 1 - Management of the update semaphore itself. r=bytesized → Bug 1553982 Part 1 - Manage "global semaphores" that can be created by any component. r=bytesized
Attachment #9185330 - Attachment description: Bug 1553982 Part 2 - Update sync manager XPCOM component. r=bytesized → Bug 1553982 Part 2 - Use a global semaphore to implement an update sync manager XPCOM component. r=bytesized
See Also: → 1678371
Attached file Data review request
Attachment #9189061 - Flags: data-review?(chutten)

Comment on attachment 9189061 [details]
Data review request

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Molly Howell is responsible.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2, Interaction.

Is the data collection request for default-on or default-off?

Default on for all channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does there need to be a check-in in the future to determine whether to renew the data?

No. This collection is permanent.


Result: datareview+

Attachment #9189061 - Flags: data-review?(chutten) → data-review+
Pushed by mhowell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/91bc72bf7a1b
Part 0 - Fix a few unified build issues that crop up when adding files in this directory. r=bytesized
https://hg.mozilla.org/integration/autoland/rev/63f834f25945
Part 1 - Manage "global semaphores" that can be created by any component. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/e9e14538b7ff
Part 2 - Use a global semaphore to implement an update sync manager XPCOM component. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/1299e017328e
Part 3 - Wait for the semaphore before performing update operations. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/bfea27666adf
Part 4 - Display a prompt when waiting for the update semaphore expires. r=bytesized,fluent-reviewers,flod
https://hg.mozilla.org/integration/autoland/rev/a207ff8ae135
Part 5 - Add a test for the update semaphore and make existing tests support it. r=bytesized,nalexander

The failures are on Android in the global semaphore (part 1); the global semaphore is not used on Android in this stack, and it probably never will be used there, so I'm going to try disabling building it. Will reland if that works.

Flags: needinfo?(mhowell)
Pushed by mhowell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5040354e75c2
Part 0 - Fix a few unified build issues that crop up when adding files in this directory. r=bytesized
https://hg.mozilla.org/integration/autoland/rev/01d41760db29
Part 1 - Manage "global semaphores" that can be created by any component. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/9dcf78cd576f
Part 2 - Use a global semaphore to implement an update sync manager XPCOM component. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/71742fced1ba
Part 3 - Wait for the semaphore before performing update operations. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/40d67c6dfdf3
Part 4 - Display a prompt when waiting for the update semaphore expires. r=bytesized,fluent-reviewers,flod
https://hg.mozilla.org/integration/autoland/rev/78dce99516dd
Part 5 - Add a test for the update semaphore and make existing tests support it. r=bytesized,nalexander
See Also: → 1680935

This may have caused bug 1680935.
Seeing multiple reports including one on the #nightly:mozilla.org channel.

Regressions: 1680935
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 85 Branch → ---
Attachment #9185329 - Attachment description: Bug 1553982 Part 1 - Manage "global semaphores" that can be created by any component. r=bytesized → Bug 1553982 Part 1 - Manage "multi-instance locks" that can be created by any component. r=bytesized
Attachment #9185330 - Attachment description: Bug 1553982 Part 2 - Use a global semaphore to implement an update sync manager XPCOM component. r=bytesized → Bug 1553982 Part 2 - Use a multi-instance lock to implement an update sync manager XPCOM component. r=bytesized
Attachment #9185331 - Attachment description: Bug 1553982 Part 3 - Wait for the semaphore before performing update operations. r=bytesized → Bug 1553982 Part 3 - Wait for the update lock before performing update operations. r=bytesized
Attachment #9185332 - Attachment description: Bug 1553982 Part 4 - Display a prompt when waiting for the update semaphore expires. r=bytesized → Bug 1553982 Part 4 - Display a prompt when waiting for the update lock expires. r=bytesized
Attachment #9185333 - Attachment description: Bug 1553982 Part 5 - Add a test for the update semaphore and make existing tests support it. r=bytesized → Bug 1553982 Part 5 - Add a test for the update lock and make existing tests support it. r=bytesized
Pushed by mhowell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f5234182f295
Part 0 - Fix a few unified build issues that crop up when adding files in this directory. r=bytesized
https://hg.mozilla.org/integration/autoland/rev/a65058184c20
Part 1 - Manage "multi-instance locks" that can be created by any component. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/b616a1a15f94
Part 2 - Use a multi-instance lock to implement an update sync manager XPCOM component. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/7fe2fcf9dd6b
Part 3 - Wait for the update lock before performing update operations. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/cecc2311e348
Part 4 - Display a prompt when waiting for the update lock expires. r=bytesized,fluent-reviewers,flod
https://hg.mozilla.org/integration/autoland/rev/54a9f18d1680
Part 5 - Add a test for the update lock and make existing tests support it. r=bytesized,nalexander
Regressions: 1683186

I'm on Manjaro KDE and this fix does not seem to be doing anything.

I have nightly unpacked in home subfolder and I often have my "main" profile running and then start clean copy for testing: /home/rdk/Apps/firefox-nightly/firefox -profile /tmp/nightly. When I close this tmp profile, go back to "main" and try open new tab, most of the time I get blank page and sometimes "one small thing" message https://i.imgur.com/KfavQIp.png

Summary: Use a semaphore to track the number of running instances of an installation so we can prevent updating a running instance out from under it → Use a multi-instance lock/semaphore to track the number of running instances of an installation so we can prevent updating a running instance out from under it
Regressions: 1700870

Can someone respond? This still does not work for me.

I'm on Manjaro KDE and this fix does not seem to be doing anything.

I have nightly unpacked in home subfolder and I often have my "main" profile running and then start clean copy for testing: /home/rdk/Apps/firefox-nightly/firefox -profile /tmp/nightly. When I close this tmp profile, go back to "main" and try open new tab, most of the time I get blank page and sometimes "one small thing" message https://i.imgur.com/KfavQIp.png

gwarser, what you're describing sounds like bug 1480452, which this will not fix. If Nightly had downloaded an update, then starting a tmp profile will apply that update, please see the other bug for more discussion.

See Also: → 1480452

Can someone tell me what is the difference between this bug and bug 1480452?

I thought fixing this bug was supposed to prevent opening up another profile from applying updates thereby breaking the prior instance.

(In reply to Caspy7 from comment #24)

Can someone tell me what is the difference between this bug and bug 1480452?

I thought fixing this bug was supposed to prevent opening up another profile from applying updates thereby breaking the prior instance.

It's a partial solution: it should prevent two running Firefox instances from updating out from each other. What it doesn't prevent is a newly launched Firefox applying a pending update with a running Firefox instance. There's a great deal of context, but basically we don't have a good way to message early in startup that we're not updating because of this lock, and that opens the possibility for malicious actors to prevent Firefox updating. That's a risk we're not willing to take. I think this is covered in the discussion of the linked ticket.

I have spent a fair amount of time looking into this on multiple occasions because I can never seem to remember exactly how it works or why it does what it does. This question prompted me to do this again today. :nalexander beat me to actually answering the question, but I want to leave a record of what this does and why, mostly to make it easier on myself the next time I need to look into this.

Firefox has two related things going on with regard to checking for other instances. The update mutex and nsIUpdateSyncManager.

The update mutex:

  • Has existed for a long time.
  • Works similarly to a normal mutex, but is automatically closed if Firefox crashes.
  • Is locked when an installation of Firefox tries to update.
  • If it cannot be acquired, it means that another instance of Firefox is handling update, so the update system pretty much lets the other instance handle it.
  • Currently this exists only on Windows. We may eventually add it to macOS as part of enabling Background Update.

Whereas nsIUpdateSyncManager:

  • Was created more recently (in this bug).
  • Is implemented via a read/write lock system. Each Firefox instance takes a read lock. The write lock can only be taken if there are no other instances with read locks. (Note that I only checked the Windows implementation when I wrote this. I seem to recall that we do slightly different things on other OS's, but I believe they work in similar ways)
  • A read lock is taken when Firefox starts. This allows nsIUpdateSyncManager to be able to say pretty definitively if any other instances are running, not just if any are updating.
  • nsIUpdateSyncManager reporting another instance running has a couple of effects.
    • It causes the update interfaces to indicate that update is being handled by another instance. This message isn't quite correct, but the intention is basically correct since if you have multiple instances open, it's not the best time to update.
    • If/when we try check for updates in the background, we wait for the other instances to exit. There is a timeout on this process (measured across application restarts). If the timeout expires, we will check for updates anyways.
    • If there are still multiple instances running when we go to show the update notification, we don't show the update badge until the doorhanger (popup) notification is shown. When it is shown, we show different message text.
  • Technically there is a small race condition in the implementation of nsIUpdateSyncManager. We have to release our read lock to try to take a write lock or else taking the write lock will always fail.

IIRC, we originally wanted nsIUpdateSyncManager to wait for other instances to close between downloading the update and setting the necessary flags that allow it to be installed on startup. But this would have been much more difficult because that would probably have required adding another state to the update state machine. Which would mean that we would probably have to look at basically every time that we do anything with that state and make sure that it still makes sense in the context of the new state machine. So we went with a different implementation that didn't involve having to do that.

Regressions: 1780004

Hi,

in relation with the rationale here explained for not updating when there are several instances running, I need to mention that some users, like me, do systematically run multiple instances of Firefox (one "basic" instance and another one which is launched with -no-remote).

I have witnessed that each of them says "Firefox is being updated by another instance", which I now understand is an imprecise formulation of an intended behaviour.

However this behaviour also means that Firefox is never going to get updated, ever.

I was not able to find an about:config setting to tell one if the instances not to bother with updates and let the other manage them. I think you might want to have this. This would at least let one instance check for and even download an update, and prompt the user to close the others to apply the update.

(In reply to YS1 from comment #27)

However this behaviour also means that Firefox is never going to get updated, ever.

That is inaccurate. As stated in Comment 26:

There is a timeout on this process (measured across application restarts). If the timeout expires, we will check for updates anyways.

I cannot currently find the code that causes that to persist across application restarts. Perhaps I was mistaken about that detail. But once your installation is sufficiently old, you would hit the logic that causes it to check for updates much earlier in startup. Which would mean that either (a) the two instances would run for over 6 hours and then we would check for updates anyways, or (b) you would eventually start up the two instances and in the brief window where only one of them is running, an update should be initiated.

I was not able to find an about:config setting to tell one if the instances not to bother with updates and let the other manage them. I think you might want to have this. This would at least let one instance check for and even download an update, and prompt the user to close the others to apply the update.

Long story short, this would be very difficult to implement in a way that is both useful and safe and, frankly, a solution that requires the user properly configure their two separate profiles in order to update is not ideal. We are working on a way around this problem entirely in Bug 1705217 and Bug 1480452. The short version is that we want to allow Firefox continue running properly even if the files on the disk have been updated. This would allow us to eliminate an entire class of update problems and remove the safeguards that avoid having Firefox update when multiple instances are running simultaneously.

In the meantime, you are welcome to adjust app.update.checkOnlyInstance.timeout to control how long we wait for other instances to close before proceeding with an update anyways. It should be set to an integer value which will be interpreted as a number of milliseconds. It doesn't have a value by default, but it falls back to 6 hours. But I do feel like I should warn you that it is likely that you are going to hit Bug 1480452 if you do this. As I touched on earlier, Firefox currently does like having its files updated while it is running. It becomes unable to start new processes and eventually you will get a message telling you that you need to restart in order for the browser to keep working.

Oh, one additional warning. Until Bug 1787511 is fixed, you should not attempt to set app.update.checkOnlyInstance.timeout to 0. But you can set it to 1 if you want.

Thank you for this very enlightening comment.

However, I added an integer app.update.checkOnlyInstance.timeout pref with value 1 yesterday to one of my instances, and it still says "Firefox is being updated by another instance" today (after a whole system restart). This is Firefox 104.0 for information.

Yeah, at the moment the preference basically affects Firefox's internal update mechanism, not the manual update UI. I added a note to Bug 1787511 to fix that in such a way that it would allow the update UI to work in that case. I think it should be a pretty easy fix. I'm just not sure when I will have time to get around to it. Hopefully soon.

See Also: → 1818484
You need to log in before you can comment on or make changes to this bug.