1900726 - macOS and Linux need an update mutex

Reporter

Description

•

1 year ago

•

We have an update mutex that prevents multiple Firefox instances from updating simultaneously and interfering with one another. Interestingly, this mutex was actually added for a different reason and it was only incidental that it also fixed this issue. This probably explains why it was only ever implemented on Windows.

But this functionality is necessary across platforms. I was originally going to update Bug 1278252 to use for this, but I decided to create a new bug instead because that one is not quite right. It correctly identifies that multi-user updating is a problem, but fails to recognize that multi-profile updating causes all the same problems (and sometimes more).

To be a bit more clear, this is what the mutex is needed for: The files in the update directory represent the update state. We should never make changes to the update state from multiple Firefox instances at once. One instance gets to take the mutex successfully and all the others should refuse to update and allow the instance with the mutex to run the update process uninterrupted.

I recall that I looked into adding an update mutex for macOS once in the past and had some trouble finding a mechanism that fulfilled all the requirements. IIRC, the requirements were:

Concurrency safety, of course. Exactly one instance should be able to hold it at once, regardless of the timing of when each instance attempts to take it.
It should be file-based since what it is protecting is file-based. Ideally we would place it on the same filesystem as what we are protecting.
If the Firefox instance holding the mutex crashes, the mutex must be released. Ideally it should be released immediately.
Broad compatibility. We don't want this to fail just because someone is using a non-standard filesystem.

Someone raised the idea of using sqlite to do this. I never looked into how reasonable that was. Another option that was raised is nsProfileLock. Reportedly there are some pretty rare issues with certain file systems, but I think this might still be our best option. I'm still looking into whether those issues are documented somewhere.

There is also a macOS specific issue here. macOS is the one platform that still uses per-user update directories. Migrating to a per-installation update directory is something that we want to do, but I know from experience that it's difficult and I'd rather not make it a prerequisite of this work. But if we put the macOS update mutex in the update directory (where it is on Windows), it won't actually solve Bug 1278252. I'm not aware of any directory that Firefox already uses that is installation specific. We might want to do something like /Library/Application Support/<bundle-id>/installHash/mutex? I'm not totally sure.

But once we have answered the relevant questions, I believe the work should be pretty straightforward: Implement createMutex for the remaining platforms, remove this check, and add some testing.

I'd like to quickly mention one related thing: nsIUpdateSyncManager. This is also known as the Multi Instance Lock (MIL) and is very commonly confused with the update mutex. The MIL does exist on all platforms, but it does not work the same way or solve the same problem.

Confusingly, it does exist to address a very similar, very closely related problem that sometimes causes it to appear to fix this problem. When it detects other instances of Firefox running, it introduces very long delays into the update system. This mitigates Bug 1480452, but it also kind of makes it look a bit like this bug is fixed. But those delays are only temporary. Eventually Firefox will stop showing the "Another instances is updating" in the update UI, and allow manual or automatic updates to proceed, potentially causing this bug.

The MIL works using GNU file locking. Paraphrased from here:
A write lock gives a process exclusive access. While a write lock is held, no other process can lock the file at all.
A read lock prohibits any other process from requesting a write lock. However, other processes can request read locks.

This allows the MIL to do this:

Very early in Firefox launch, take a read lock
If we want to know if other instances are running, query the lock to find out if we could take a write lock (but don't actually take one).
If we could take the lock, we are the only Firefox instance. If we could not, there are other instances running.

But this does not help determine which instance should drive update.

An obvious followup question is "why not use GNU file write locks for this?"
It's possible that it is reasonable to do so. We would need to look into how well that fulfills the requirements above. I'm not sure off the top of my head.

Bug 1900726 - Move update mutex logic to a nsIUpdateMutex XPCOM interface. r=bytesized 1 year ago Yannis Juglaret [:yannis] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1900726 - Make UpdateMutex cross-platform by using nsProfileLock. r=bytesized 1 year ago Yannis Juglaret [:yannis] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1900726 - Take the update mutex through a child process in the canCheckForAndApplyUpdates.js test. r=bytesized 1 year ago Yannis Juglaret [:yannis] 48 bytes, text/x-phabricator-request		Details \| Review