Use a multi-instance lock/semaphore to track the number of running instances of an installation so we can prevent updating a running instance out from under it
Categories
(Toolkit :: Application Update, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox86 | --- | fixed |
People
(Reporter: robert.strong.bugs, Assigned: mhowell)
References
(Blocks 4 open bugs, Regressed 4 open bugs)
Details
(Whiteboard: [iu_tracking])
Attachments
(7 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
Bug 1553982 Part 1 - Manage "multi-instance locks" that can be created by any component. r=bytesized
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
3.49 KB,
text/plain
|
chutten
:
data-review+
|
Details |
Some time in the past mhowell suggested using a semaphore to determine whether there are other instances running. This would allow us to lessen the issue where launching an additional instance of the same installation will perform an update and thereby require the other instance to restart. Below is a very simple flow of how I think this could work.
- Create or open the semaphore on startup.
- Use the Install hash for the name.
- Prefix with Global\ on Windows so it is available to all users
- The security of the semaphore will need to allow other users to work with it (need to figure out which users should be able to). This would allow other processes to DOS the update.
- Permissions will need to be set on the Linux and OS X semaphores files so other users can use it. This would allow other processes to DOS the update.
- For exits and crashes it will need to release the semaphore and it should close the handle when the instance count is 0. Does anyone know if this will be an issue in our crash handling code?
- After X amount of time has passed (channel specific) based on a user pref (prevents other user processes from preventing this) without updating due to the semaphore force the update. This will likely be done with a new update status and this will mitigate the ability to DOS the update.
- UI will most likely be shown when it is going to force the update.
- Change the update restart UI or possibly display a warning so the user is informed that clicking the button will force the update.
Note: Some people have had issues with users creating semaphores on Windows Terminal server so it should ignore the semaphore when it can't be created or opened.
I think this method should also be sufficient for the Update Agent to determine if there is a running instance for an update.
![]() |
Reporter | |
Updated•3 years ago
|
![]() |
Reporter | |
Updated•3 years ago
|
Updated•2 years ago
|
I did a few minutes digging into what we have in tree that's relevant to this ticket. For context, we want:
- to interact with the new semaphore across all of our platforms (Windows, macOS, Linux);
- from at least two different processes -- Firefox and the background update agent -- that do not have any relationship with each other;
- with one of the processes, the background update agent, being written in Rust and not having the Firefox-y infrastructure to hand.
There are many semaphore implementations in the tree. Most aren't intended to address point 2: they're sem_init
(under POSIX). In particular ipc/glue/CrossProcessSemaphore.cpp
expects to pass shared memory between processes, which isn't pleasant here, since there's no parent/child relationship. NSPR has an implementation but it's unused in-tree (and NSPR is large for this small use case). There are a many Rust implementations but most aren't intended for global sharing (i.e., with sem_open
(under POSIX) semantics).
I conclude that the most sensible thing is to wrap POSIX and Windows semaphores directly. It would be nice to share the code but it's a little awkward to have a little C++ RAII class that can also be used from Rust.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 3•2 years ago
|
||
Assignee | ||
Comment 4•2 years ago
|
||
Depends on D95623
Assignee | ||
Comment 5•2 years ago
|
||
Depends on D95626
Assignee | ||
Comment 6•2 years ago
|
||
Depends on D95627
Assignee | ||
Comment 7•2 years ago
|
||
Depends on D95628
Assignee | ||
Comment 8•2 years ago
|
||
Depends on D95629
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 9•2 years ago
|
||
Comment 10•2 years ago
|
||
Comment on attachment 9189061 [details]
Data review request
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes.
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.
If the request is for permanent data collection, is there someone who will monitor the data over time?
Yes, Molly Howell is responsible.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 2, Interaction.
Is the data collection request for default-on or default-off?
Default on for all channels.
Does the instrumentation include the addition of any new identifiers?
No.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does there need to be a check-in in the future to determine whether to renew the data?
No. This collection is permanent.
Result: datareview+
Comment 11•2 years ago
|
||
Pushed by mhowell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/91bc72bf7a1b Part 0 - Fix a few unified build issues that crop up when adding files in this directory. r=bytesized https://hg.mozilla.org/integration/autoland/rev/63f834f25945 Part 1 - Manage "global semaphores" that can be created by any component. r=bytesized,nalexander https://hg.mozilla.org/integration/autoland/rev/e9e14538b7ff Part 2 - Use a global semaphore to implement an update sync manager XPCOM component. r=bytesized,nalexander https://hg.mozilla.org/integration/autoland/rev/1299e017328e Part 3 - Wait for the semaphore before performing update operations. r=bytesized,nalexander https://hg.mozilla.org/integration/autoland/rev/bfea27666adf Part 4 - Display a prompt when waiting for the update semaphore expires. r=bytesized,fluent-reviewers,flod https://hg.mozilla.org/integration/autoland/rev/a207ff8ae135 Part 5 - Add a test for the update semaphore and make existing tests support it. r=bytesized,nalexander
Comment 12•2 years ago
|
||
Backed out for build bustages.
Log: https://treeherder.mozilla.org/logviewer?job_id=323479356&repo=autoland&lineNumber=32308
Backout: https://hg.mozilla.org/integration/autoland/rev/b6232a49dc3921b1f8bb1956e30b312359fdc46b
Assignee | ||
Comment 13•2 years ago
|
||
The failures are on Android in the global semaphore (part 1); the global semaphore is not used on Android in this stack, and it probably never will be used there, so I'm going to try disabling building it. Will reland if that works.
Assignee | ||
Comment 14•2 years ago
|
||
According to https://treeherder.mozilla.org/jobs?repo=try&revision=d4f0ca4009826d0ef1f398ea7b68c54e070da9eb that attempt has succeeded.
Comment 15•2 years ago
|
||
Pushed by mhowell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5040354e75c2 Part 0 - Fix a few unified build issues that crop up when adding files in this directory. r=bytesized https://hg.mozilla.org/integration/autoland/rev/01d41760db29 Part 1 - Manage "global semaphores" that can be created by any component. r=bytesized,nalexander https://hg.mozilla.org/integration/autoland/rev/9dcf78cd576f Part 2 - Use a global semaphore to implement an update sync manager XPCOM component. r=bytesized,nalexander https://hg.mozilla.org/integration/autoland/rev/71742fced1ba Part 3 - Wait for the semaphore before performing update operations. r=bytesized,nalexander https://hg.mozilla.org/integration/autoland/rev/40d67c6dfdf3 Part 4 - Display a prompt when waiting for the update semaphore expires. r=bytesized,fluent-reviewers,flod https://hg.mozilla.org/integration/autoland/rev/78dce99516dd Part 5 - Add a test for the update semaphore and make existing tests support it. r=bytesized,nalexander
Comment 16•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/5040354e75c2
https://hg.mozilla.org/mozilla-central/rev/01d41760db29
https://hg.mozilla.org/mozilla-central/rev/9dcf78cd576f
https://hg.mozilla.org/mozilla-central/rev/71742fced1ba
https://hg.mozilla.org/mozilla-central/rev/40d67c6dfdf3
https://hg.mozilla.org/mozilla-central/rev/78dce99516dd
Comment 17•2 years ago
|
||
This may have caused bug 1680935.
Seeing multiple reports including one on the #nightly:mozilla.org channel.
Comment 18•2 years ago
•
|
||
Backed out from central for causing bug 1680935: https://hg.mozilla.org/mozilla-central/rev/7167ab3f6e8d7b77439ac1cc7d9356bbaff02510
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 19•2 years ago
|
||
Pushed by mhowell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f5234182f295 Part 0 - Fix a few unified build issues that crop up when adding files in this directory. r=bytesized https://hg.mozilla.org/integration/autoland/rev/a65058184c20 Part 1 - Manage "multi-instance locks" that can be created by any component. r=bytesized,nalexander https://hg.mozilla.org/integration/autoland/rev/b616a1a15f94 Part 2 - Use a multi-instance lock to implement an update sync manager XPCOM component. r=bytesized,nalexander https://hg.mozilla.org/integration/autoland/rev/7fe2fcf9dd6b Part 3 - Wait for the update lock before performing update operations. r=bytesized,nalexander https://hg.mozilla.org/integration/autoland/rev/cecc2311e348 Part 4 - Display a prompt when waiting for the update lock expires. r=bytesized,fluent-reviewers,flod https://hg.mozilla.org/integration/autoland/rev/54a9f18d1680 Part 5 - Add a test for the update lock and make existing tests support it. r=bytesized,nalexander
Comment 20•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f5234182f295
https://hg.mozilla.org/mozilla-central/rev/a65058184c20
https://hg.mozilla.org/mozilla-central/rev/b616a1a15f94
https://hg.mozilla.org/mozilla-central/rev/7fe2fcf9dd6b
https://hg.mozilla.org/mozilla-central/rev/cecc2311e348
https://hg.mozilla.org/mozilla-central/rev/54a9f18d1680
Comment 21•2 years ago
|
||
I'm on Manjaro KDE and this fix does not seem to be doing anything.
I have nightly unpacked in home subfolder and I often have my "main" profile running and then start clean copy for testing: /home/rdk/Apps/firefox-nightly/firefox -profile /tmp/nightly
. When I close this tmp profile, go back to "main" and try open new tab, most of the time I get blank page and sometimes "one small thing" message https://i.imgur.com/KfavQIp.png
Comment 22•1 year ago
|
||
Can someone respond? This still does not work for me.
I'm on Manjaro KDE and this fix does not seem to be doing anything.
I have nightly unpacked in home subfolder and I often have my "main" profile running and then start clean copy for testing:
/home/rdk/Apps/firefox-nightly/firefox -profile /tmp/nightly
. When I close this tmp profile, go back to "main" and try open new tab, most of the time I get blank page and sometimes "one small thing" message https://i.imgur.com/KfavQIp.png
Comment 23•1 year ago
|
||
gwarser, what you're describing sounds like bug 1480452, which this will not fix. If Nightly had downloaded an update, then starting a tmp profile will apply that update, please see the other bug for more discussion.
Comment 24•4 months ago
|
||
Can someone tell me what is the difference between this bug and bug 1480452?
I thought fixing this bug was supposed to prevent opening up another profile from applying updates thereby breaking the prior instance.
(In reply to Caspy7 from comment #24)
Can someone tell me what is the difference between this bug and bug 1480452?
I thought fixing this bug was supposed to prevent opening up another profile from applying updates thereby breaking the prior instance.
It's a partial solution: it should prevent two running Firefox instances from updating out from each other. What it doesn't prevent is a newly launched Firefox applying a pending update with a running Firefox instance. There's a great deal of context, but basically we don't have a good way to message early in startup that we're not updating because of this lock, and that opens the possibility for malicious actors to prevent Firefox updating. That's a risk we're not willing to take. I think this is covered in the discussion of the linked ticket.
Comment 26•4 months ago
•
|
||
I have spent a fair amount of time looking into this on multiple occasions because I can never seem to remember exactly how it works or why it does what it does. This question prompted me to do this again today. :nalexander beat me to actually answering the question, but I want to leave a record of what this does and why, mostly to make it easier on myself the next time I need to look into this.
Firefox has two related things going on with regard to checking for other instances. The update mutex and nsIUpdateSyncManager
.
The update mutex:
- Has existed for a long time.
- Works similarly to a normal mutex, but is automatically closed if Firefox crashes.
- Is locked when an installation of Firefox tries to update.
- If it cannot be acquired, it means that another instance of Firefox is handling update, so the update system pretty much lets the other instance handle it.
- Currently this exists only on Windows. We may eventually add it to macOS as part of enabling Background Update.
Whereas nsIUpdateSyncManager
:
- Was created more recently (in this bug).
- Is implemented via a read/write lock system. Each Firefox instance takes a read lock. The write lock can only be taken if there are no other instances with read locks. (Note that I only checked the Windows implementation when I wrote this. I seem to recall that we do slightly different things on other OS's, but I believe they work in similar ways)
- A read lock is taken when Firefox starts. This allows
nsIUpdateSyncManager
to be able to say pretty definitively if any other instances are running, not just if any are updating. nsIUpdateSyncManager
reporting another instance running has a couple of effects.- It causes the update interfaces to indicate that update is being handled by another instance. This message isn't quite correct, but the intention is basically correct since if you have multiple instances open, it's not the best time to update.
- If/when we try check for updates in the background, we wait for the other instances to exit. There is a timeout on this process (measured across application restarts). If the timeout expires, we will check for updates anyways.
- If there are still multiple instances running when we go to show the update notification, we don't show the update badge until the doorhanger (popup) notification is shown. When it is shown, we show different message text.
- Technically there is a small race condition in the implementation of
nsIUpdateSyncManager
. We have to release our read lock to try to take a write lock or else taking the write lock will always fail.
IIRC, we originally wanted nsIUpdateSyncManager
to wait for other instances to close between downloading the update and setting the necessary flags that allow it to be installed on startup. But this would have been much more difficult because that would probably have required adding another state to the update state machine. Which would mean that we would probably have to look at basically every time that we do anything with that state and make sure that it still makes sense in the context of the new state machine. So we went with a different implementation that didn't involve having to do that.
Description
•