Closed Bug 1553982 Opened 3 years ago Closed 2 years ago

Use a multi-instance lock/semaphore to track the number of running instances of an installation so we can prevent updating a running instance out from under it

Categories

(Toolkit :: Application Update, enhancement, P1)

68 Branch
enhancement

Tracking

()

RESOLVED FIXED
86 Branch
Tracking Status
firefox86 --- fixed

People

(Reporter: robert.strong.bugs, Assigned: mhowell)

References

(Blocks 4 open bugs, Regressed 4 open bugs)

Details

(Whiteboard: [iu_tracking])

Attachments

(7 files)

Some time in the past mhowell suggested using a semaphore to determine whether there are other instances running. This would allow us to lessen the issue where launching an additional instance of the same installation will perform an update and thereby require the other instance to restart. Below is a very simple flow of how I think this could work.

  1. Create or open the semaphore on startup.
  2. Use the Install hash for the name.
  3. Prefix with Global\ on Windows so it is available to all users
  4. The security of the semaphore will need to allow other users to work with it (need to figure out which users should be able to). This would allow other processes to DOS the update.
  5. Permissions will need to be set on the Linux and OS X semaphores files so other users can use it. This would allow other processes to DOS the update.
  6. For exits and crashes it will need to release the semaphore and it should close the handle when the instance count is 0. Does anyone know if this will be an issue in our crash handling code?
  7. After X amount of time has passed (channel specific) based on a user pref (prevents other user processes from preventing this) without updating due to the semaphore force the update. This will likely be done with a new update status and this will mitigate the ability to DOS the update.
  8. UI will most likely be shown when it is going to force the update.
  9. Change the update restart UI or possibly display a warning so the user is informed that clicking the button will force the update.

Note: Some people have had issues with users creating semaphores on Windows Terminal server so it should ignore the semaphore when it can't be created or opened.

I think this method should also be sufficient for the Update Agent to determine if there is a running instance for an update.

Summary: Use a semaphore to track the number of instances of an installation are running so we can prevent updating a running instance out from under it → Use a semaphore to track the number of running instances of an installation so we can prevent updating a running instance out from under it
Blocks: 1464441
See Also: → 1366808
Duplicate of this bug: 1628452
See Also: → 1624992
Whiteboard: [iu_tracking]

I did a few minutes digging into what we have in tree that's relevant to this ticket. For context, we want:

  1. to interact with the new semaphore across all of our platforms (Windows, macOS, Linux);
  2. from at least two different processes -- Firefox and the background update agent -- that do not have any relationship with each other;
  3. with one of the processes, the background update agent, being written in Rust and not having the Firefox-y infrastructure to hand.

There are many semaphore implementations in the tree. Most aren't intended to address point 2: they're sem_init (under POSIX). In particular ipc/glue/CrossProcessSemaphore.cpp expects to pass shared memory between processes, which isn't pleasant here, since there's no parent/child relationship. NSPR has an implementation but it's unused in-tree (and NSPR is large for this small use case). There are a many Rust implementations but most aren't intended for global sharing (i.e., with sem_open (under POSIX) semantics).

I conclude that the most sensible thing is to wrap POSIX and Windows semaphores directly. It would be nice to share the code but it's a little awkward to have a little C++ RAII class that can also be used from Rust.

Assignee: nobody → mhowell
Severity: normal → N/A
Status: NEW → ASSIGNED
Type: defect → enhancement
Priority: P2 → P1
Attachment #9185329 - Attachment description: Bug 1553982 Part 1 - Management of the update semaphore itself. r=bytesized → Bug 1553982 Part 1 - Manage "global semaphores" that can be created by any component. r=bytesized
Attachment #9185330 - Attachment description: Bug 1553982 Part 2 - Update sync manager XPCOM component. r=bytesized → Bug 1553982 Part 2 - Use a global semaphore to implement an update sync manager XPCOM component. r=bytesized
See Also: → 1678371
Attached file Data review request
Attachment #9189061 - Flags: data-review?(chutten)

Comment on attachment 9189061 [details]
Data review request

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Molly Howell is responsible.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2, Interaction.

Is the data collection request for default-on or default-off?

Default on for all channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does there need to be a check-in in the future to determine whether to renew the data?

No. This collection is permanent.


Result: datareview+

Attachment #9189061 - Flags: data-review?(chutten) → data-review+
Pushed by mhowell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/91bc72bf7a1b
Part 0 - Fix a few unified build issues that crop up when adding files in this directory. r=bytesized
https://hg.mozilla.org/integration/autoland/rev/63f834f25945
Part 1 - Manage "global semaphores" that can be created by any component. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/e9e14538b7ff
Part 2 - Use a global semaphore to implement an update sync manager XPCOM component. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/1299e017328e
Part 3 - Wait for the semaphore before performing update operations. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/bfea27666adf
Part 4 - Display a prompt when waiting for the update semaphore expires. r=bytesized,fluent-reviewers,flod
https://hg.mozilla.org/integration/autoland/rev/a207ff8ae135
Part 5 - Add a test for the update semaphore and make existing tests support it. r=bytesized,nalexander

The failures are on Android in the global semaphore (part 1); the global semaphore is not used on Android in this stack, and it probably never will be used there, so I'm going to try disabling building it. Will reland if that works.

Flags: needinfo?(mhowell)
Pushed by mhowell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5040354e75c2
Part 0 - Fix a few unified build issues that crop up when adding files in this directory. r=bytesized
https://hg.mozilla.org/integration/autoland/rev/01d41760db29
Part 1 - Manage "global semaphores" that can be created by any component. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/9dcf78cd576f
Part 2 - Use a global semaphore to implement an update sync manager XPCOM component. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/71742fced1ba
Part 3 - Wait for the semaphore before performing update operations. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/40d67c6dfdf3
Part 4 - Display a prompt when waiting for the update semaphore expires. r=bytesized,fluent-reviewers,flod
https://hg.mozilla.org/integration/autoland/rev/78dce99516dd
Part 5 - Add a test for the update semaphore and make existing tests support it. r=bytesized,nalexander
See Also: → 1680935

This may have caused bug 1680935.
Seeing multiple reports including one on the #nightly:mozilla.org channel.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 85 Branch → ---
Attachment #9185329 - Attachment description: Bug 1553982 Part 1 - Manage "global semaphores" that can be created by any component. r=bytesized → Bug 1553982 Part 1 - Manage "multi-instance locks" that can be created by any component. r=bytesized
Attachment #9185330 - Attachment description: Bug 1553982 Part 2 - Use a global semaphore to implement an update sync manager XPCOM component. r=bytesized → Bug 1553982 Part 2 - Use a multi-instance lock to implement an update sync manager XPCOM component. r=bytesized
Attachment #9185331 - Attachment description: Bug 1553982 Part 3 - Wait for the semaphore before performing update operations. r=bytesized → Bug 1553982 Part 3 - Wait for the update lock before performing update operations. r=bytesized
Attachment #9185332 - Attachment description: Bug 1553982 Part 4 - Display a prompt when waiting for the update semaphore expires. r=bytesized → Bug 1553982 Part 4 - Display a prompt when waiting for the update lock expires. r=bytesized
Attachment #9185333 - Attachment description: Bug 1553982 Part 5 - Add a test for the update semaphore and make existing tests support it. r=bytesized → Bug 1553982 Part 5 - Add a test for the update lock and make existing tests support it. r=bytesized
Pushed by mhowell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f5234182f295
Part 0 - Fix a few unified build issues that crop up when adding files in this directory. r=bytesized
https://hg.mozilla.org/integration/autoland/rev/a65058184c20
Part 1 - Manage "multi-instance locks" that can be created by any component. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/b616a1a15f94
Part 2 - Use a multi-instance lock to implement an update sync manager XPCOM component. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/7fe2fcf9dd6b
Part 3 - Wait for the update lock before performing update operations. r=bytesized,nalexander
https://hg.mozilla.org/integration/autoland/rev/cecc2311e348
Part 4 - Display a prompt when waiting for the update lock expires. r=bytesized,fluent-reviewers,flod
https://hg.mozilla.org/integration/autoland/rev/54a9f18d1680
Part 5 - Add a test for the update lock and make existing tests support it. r=bytesized,nalexander
Regressions: 1683186

I'm on Manjaro KDE and this fix does not seem to be doing anything.

I have nightly unpacked in home subfolder and I often have my "main" profile running and then start clean copy for testing: /home/rdk/Apps/firefox-nightly/firefox -profile /tmp/nightly. When I close this tmp profile, go back to "main" and try open new tab, most of the time I get blank page and sometimes "one small thing" message https://i.imgur.com/KfavQIp.png

Summary: Use a semaphore to track the number of running instances of an installation so we can prevent updating a running instance out from under it → Use a multi-instance lock/semaphore to track the number of running instances of an installation so we can prevent updating a running instance out from under it
Regressions: 1700870

Can someone respond? This still does not work for me.

I'm on Manjaro KDE and this fix does not seem to be doing anything.

I have nightly unpacked in home subfolder and I often have my "main" profile running and then start clean copy for testing: /home/rdk/Apps/firefox-nightly/firefox -profile /tmp/nightly. When I close this tmp profile, go back to "main" and try open new tab, most of the time I get blank page and sometimes "one small thing" message https://i.imgur.com/KfavQIp.png

gwarser, what you're describing sounds like bug 1480452, which this will not fix. If Nightly had downloaded an update, then starting a tmp profile will apply that update, please see the other bug for more discussion.

See Also: → 1480452

Can someone tell me what is the difference between this bug and bug 1480452?

I thought fixing this bug was supposed to prevent opening up another profile from applying updates thereby breaking the prior instance.

(In reply to Caspy7 from comment #24)

Can someone tell me what is the difference between this bug and bug 1480452?

I thought fixing this bug was supposed to prevent opening up another profile from applying updates thereby breaking the prior instance.

It's a partial solution: it should prevent two running Firefox instances from updating out from each other. What it doesn't prevent is a newly launched Firefox applying a pending update with a running Firefox instance. There's a great deal of context, but basically we don't have a good way to message early in startup that we're not updating because of this lock, and that opens the possibility for malicious actors to prevent Firefox updating. That's a risk we're not willing to take. I think this is covered in the discussion of the linked ticket.

I have spent a fair amount of time looking into this on multiple occasions because I can never seem to remember exactly how it works or why it does what it does. This question prompted me to do this again today. :nalexander beat me to actually answering the question, but I want to leave a record of what this does and why, mostly to make it easier on myself the next time I need to look into this.

Firefox has two related things going on with regard to checking for other instances. The update mutex and nsIUpdateSyncManager.

The update mutex:

  • Has existed for a long time.
  • Works similarly to a normal mutex, but is automatically closed if Firefox crashes.
  • Is locked when an installation of Firefox tries to update.
  • If it cannot be acquired, it means that another instance of Firefox is handling update, so the update system pretty much lets the other instance handle it.
  • Currently this exists only on Windows. We may eventually add it to macOS as part of enabling Background Update.

Whereas nsIUpdateSyncManager:

  • Was created more recently (in this bug).
  • Is implemented via a read/write lock system. Each Firefox instance takes a read lock. The write lock can only be taken if there are no other instances with read locks. (Note that I only checked the Windows implementation when I wrote this. I seem to recall that we do slightly different things on other OS's, but I believe they work in similar ways)
  • A read lock is taken when Firefox starts. This allows nsIUpdateSyncManager to be able to say pretty definitively if any other instances are running, not just if any are updating.
  • nsIUpdateSyncManager reporting another instance running has a couple of effects.
    • It causes the update interfaces to indicate that update is being handled by another instance. This message isn't quite correct, but the intention is basically correct since if you have multiple instances open, it's not the best time to update.
    • If/when we try check for updates in the background, we wait for the other instances to exit. There is a timeout on this process (measured across application restarts). If the timeout expires, we will check for updates anyways.
    • If there are still multiple instances running when we go to show the update notification, we don't show the update badge until the doorhanger (popup) notification is shown. When it is shown, we show different message text.
  • Technically there is a small race condition in the implementation of nsIUpdateSyncManager. We have to release our read lock to try to take a write lock or else taking the write lock will always fail.

IIRC, we originally wanted nsIUpdateSyncManager to wait for other instances to close between downloading the update and setting the necessary flags that allow it to be installed on startup. But this would have been much more difficult because that would probably have required adding another state to the update state machine. Which would mean that we would probably have to look at basically every time that we do anything with that state and make sure that it still makes sense in the context of the new state machine. So we went with a different implementation that didn't involve having to do that.

Regressions: 1780004
You need to log in before you can comment on or make changes to this bug.