Open Bug 1705217 Opened 3 years ago Updated 1 month ago

When Firefox is updated by a package manager while it is running, new processes cannot be started

Categories

(Toolkit :: Application Update, defect, P2)

Desktop
Linux
defect

Tracking

()

People

(Reporter: bytesized, Assigned: mhughes)

References

(Depends on 5 open bugs)

Details

(Whiteboard: [fidedi-toolkit])

There is a long-standing bug that when Firefox is running and is updated by a package manager, it can no longer start new processes. This eventually results in a page that looks like this saying "Sorry. We just need to do one small thing to keep going".

This is closely related to Bug 1480452, which describes the same problem caused by another instance of Firefox rather than by a package manager. Because the problems are so similar, we hope to solve them both in the same way. Therefore, Bug 1480452 will be the tracker for this work, while this bug will track occurrences of this issue that are specifically caused by package managers.

Severity: -- → S3
See Also: → 1685387
See Also: → 1711143
Whiteboard: [fidedi-toolkit]

hi,

This is closely related to Bug 1480452, which describes the same problem caused by another instance of Firefox rather than by a package manager. Because the problems are so similar, we hope to solve them both in the same way.

although the failure mode is the same, these are completely different issues. this bug happens when the normal update mechanisms of some OSes, say package managers of unices, kick in. firefox strives to be unix-compatible and thus is expected behave the way un*x programs do and play nice in those ecosystems. bug 1480452 is a bug of the staging updater. the resolutions are completely independent: fix firefox vs fix the updater.

using a staging updater in unices is reinventing the wheel, and a much worse wheel at that. it could be made to work, at the expense of efficiency and complexity and maybe correctness, but that is not the way a un*x app should behave anyway. and the updater is just not there in package manager-based installs.

avoiding the packet manager as a recommendation is a big no. if every app author would strive to distribute their own updates, unices would turn to the hell windows was years ago, and nobody will back you there. even microsoft has been working to eliminate direct distribution the last few years and make the platform more manageable.

so note: no fix to the updater will ever fix this issue on un*x. chrome does it right, and you can too if you want. again the solution is fixing firefox, the updater does not enter into it. IMHO, this is not a job for the distribution/updater team.

regarding bug 1480452, the staging updater is broken too. it can be fixed. but should you? i do not know. chrome has an updater, but does firefox need one? could updates to firefox for windows be distributed though MS store? i do not have the slightest idea, but maybe it is no longer needed.

i do not know about windows, but if an updater is packed in firefox for un*x, it should not stage. staging is unneeded complexity. running firefox should just run it, not do deferred-installs. if an install fails, it should fail at install time, not at some later time. on unices, staging is unneeded if firefox is fixed.

fixing firefox

assumption: file updates are atomic.

  • i landed here because of the misleading "Sorry. We just need to..." message. it made me look for a setting to disable forced updates on users, and found out about this issue. fixes to this bug, like it or not, will be platform specific. some platforms may be fixed while other might not. this stop-gap, this message, is already implemented and should stay because it will save users of unfixed platforms from a worse evil. but please, fix the message text. "Firefox cannot continue and needs to restart. While Firefox was running, its program files were modified by an unknown actor (possibly the automatic update mechanism of your platform)." anything less than that is condescending and misleading.

  • on un*x kernels, file descriptors reference inodes. if files are renamed moved or unlinked, FDs remain valid and usable. permissions and other metadata are part of the inode, not the directory entry(s). a file's storage will only be reclaimed when fully unlinked and all its FDs are closed. a unix program should obtain FDs for all its resources when it starts, and use them as needed. if the files are later moved replaced or removed, it will be of no consequence.

  • there is a race condition for the short time the FDs are obtained, but this is a different bug that is not related to the "Sorry. We just need to..." bug. it already affects firefox in a much harder way, since the race window is currently gigantic compared to the FDs-at-launch strategy. ultimately fixing this involves versioning all files and check their versions at launch. pop a dialog "Firefox is being updated and cannot launch. [Cancel] [Retry]". retry would exec the on-disk file (not retry nor fork), as it might have been updated after other files. for the time being, this bug is way too improbable to devote resources to it IMHO.

  • dynamic linked libs should also be loaded eagerly as they suffer from the same race condition at startup.

  • you must use fork/vfork/clone for child processes. by default, the children will inherit the parent's FDs, so they can safely use the parent's resources.

on linux:

  • on linux you can obtain an FD without actually opening a file by using the O_PATH open option.

  • firefox's codebase may open its resources by filename all over the place, and switching to FDs-at-launch may look like too much work. although that is the POSIX way, linux provides an alternative. you can switch from FDs-at-launch to filenames-at-launch. on unsupported platforms, you just define strings with the filenames and hope for the best. on linux, you obtain FDs at launch and generate /proc/self/fd/{myFD} filenames. those pseudo files look like links but they are not: they are handles to the FD's inodes. when it is time to use resources, you just open them using the precomputed filenames. this involves close to no change to the general codebase.

  • in children, you might be tempted to use /proc/{parentPID}/fd/{myFD} but you should not: the parent may have died or closed its FD. FDs inherit by default, so use /proc/self.

  • on linux you can obtain an FD to a file that you can execute (even if you cannot read it) using O_PATH, and later pass it to fexecve. do not do this to spawn children! it would be unnecessary because in that case you could just pass /proc/self/exe to execve. but do not do this either! children should be forked/cloned.

  • i think execve would reload possibly updated dynamic linked libs from disk and fail. execve could be used for statically linked programs, yet it would be slower, use more memory, and you would need to pass the FD numbers to the child process manually. (by default, FDs remain open across an execve.)

  • if fixing firefox to fork is too much work, and you accept the performance loss of not doing it, the quick and dirty way would be to statically link all non-backwards compatible libs (possibly your own, but not the platform libs) and then execve /proc/self/exe. you would pass FD numbers or /proc/... filenames of resources somehow; command line, environment, idk.

fixing firefox for linux in a hacky way is simple. fixing firefox for POSIX takes more work. but after so many years, hacky might be better than bug.

fixing the updater

...assuming you want to.

  • the updater grabs update lock, grabs main lock, checks if update needed (version-named subfolder does not yet exist), releases main lock, downloads and installs update to temp subfolder, grabs main lock, renames temp subfolder to a name based on the version, releases both locks. cleanup is run.

  • versions are never promoted to the main directory. the firefox in the main folder is a stub. it grabs main lock, locates the newest version subfolder (can cleans up while doing this), grabs shared lock for that version, releases main lock, execs that newest version passing shared lock. (the shared lock is released when firefox later dies and its FDs are freed. the stub would check if firefox is already open for this user and possibly use that version, but i am simplifying things. but they are still not simple, staging should be avoided if possible.)

  • cleanup just grabs main lock, for each version that is not the most recent it tries grabbing exclusive lock for that version and remove the folder if successful, releases main lock.

on unices:

staging should not be done on unices. but if done anyways, it can work with package managers. just treat the system-installed version as the version on mozilla's servers, and install that version if needed to the user's home dir on launch. the launcher would be a stub again, doing version-based folder name installs as described above.

There might be work towards the "Fork Server" method discussed in bug 1480452 comment 8; the setting dom.ipc.forkserver.enable pref true will enable this if Firefox is built with it, which is the default on Linux, FreeBSD and OpenBSD. I don't know how stable this is, since it's never been enabled by default (except on KaiOS, but there it's for saving RAM), and I don't know how well it deals with other files (e.g. omni.ja) changing.

I did a quick test with a Nightly on Debian; I was able to untar a new version directly over an old one and still open tabs with new domains. There's probably a good reason why it isn't yet enabled in general on Linux, desirable as it would be in the face of package managers, so I'd be careful before enabling this pref on a system you can't afford to break. But it may not be impossibly far off.

There might be work towards the "Fork Server" method

cool! i imagine a fork server is needed to emulate a clean process exec because firefox was designed around exec. they also need FDs for all resources and eager lib loading before that service runs.

on linux the fork server can be avoided with the /proc/self/exe hack and FDs can be avoided with the /proc/self/fd/... hack. but it is good to know they are working towards a general POSIX solution.

I guess this is as good a time as any to give a status update on this project.

I am indeed looking into fixing this issue on Linux. As Adam mentioned, Firefox already has a fork server, which can be enabled via the dom.ipc.forkserver.enable pref. The fork server contains some unknowns. It seems to be working, but it's not entirely clear that it is ready for production. And, arguably, the bigger issue is that Firefox has existed for a long time in a situation where we have these expectations that it will not be updated while it's running. Thus, we open files and load libraries when we need them, expecting that they will be available and at the correct version. Since this situation has existed for decades, changing this is going to take time and effort.

(In reply to Lanchon from comment #10)

the resolutions are completely independent: fix firefox vs fix the updater.

I'm guessing that by "fix the updater", you mean the "Don't have Firefox self-update if another instance is running" solution mentioned in Bug 1480452 Comment 8. For the reasons already given in that comment, I don't really think that that is an appropriate solution to this problem. Thus, the solution will wind up being "fix Firefox" in both cases. Granted, "fixing Firefox" will mean fairly different things on different operating systems.

using a staging updater in unices is reinventing the wheel, and a much worse wheel at that

I don't think that staging really enters into this problem. If you really want to turn it off, there is a pref to do that: app.update.staging.enabled.

avoiding the packet manager as a recommendation is a big no.

As far as I know, our recommendation is that users use the package manager. If you know of any place where we recommend our self-updater over the package manager, let me know so I can address it.

regarding bug 1480452, the staging updater is broken too. it can be fixed. but should you? i do not know. chrome has an updater, but does firefox need one? could updates to firefox for windows be distributed though MS store? i do not have the slightest idea, but maybe it is no longer needed.

We do distribute Firefox via the Microsoft Store. See Bug 1709696 for details. But we still don't have a way of getting in to the macOS App Store. And even on platforms that can use package managers, many users still use the in-app updater. It's probably not going away any time soon.

i landed here because of the misleading "Sorry. We just need to..." message. it made me look for a setting to disable forced updates on users

There are numerous ways of disabling in-app updates. I can help you with this if you would like.

please, fix the message text.

This bug isn't really the right place to discuss this. If you would like to discuss it further, please file a separate bug.

a unix program should obtain FDs for all its resources when it starts, and use them as needed.

dynamic linked libs should also be loaded eagerly as they suffer from the same race condition at startup.

This is exactly what I'm working on, but it's not exactly simple. Since you feel so strongly about this, perhaps you would consider lending a hand? Firefox is an open source project and fixing this would go faster with more help.

fixing firefox for linux in a hacky way is simple. fixing firefox for POSIX takes more work. but after so many years, hacky might be better than bug.

I'm not keen on this. A hacky solution usually becomes a permanent solution. And hacky solutions result in difficult problems down the road, kind of like the one that we have here.

I guess this is as good a time as any to give a status update on this project.

thanks!!

I'm guessing that by "fix the updater", you mean the "Don't have Firefox self-update if another instance is running" solution

no, i mean the "fixing the updater" section in https://bugzilla.mozilla.org/show_bug.cgi?id=1705217#c10 which basically is a take on your "Versioned Installations".

As far as I know, our recommendation is that users use the package manager. If you know of any place where we recommend...

i've just seen this suggestion as a fix for this issue

This is exactly what I'm working on, but it's not exactly simple. Since you feel so strongly about this, perhaps you would consider lending a hand?

i wrote comment 10 not because i feel strongly, but because i saw in this issue that the fork and updater issues were supposedly the same and would be fixed together. IMHO they are very different. i talked about how fork (or exec if no libraries are involved) and file descriptors should be used to support in-place upgrades because i hadn't seen your posts on other issues yet. so i was just scribbling some ideas to spark discussion but i didn't know you were already working on this.

A hacky solution usually becomes a permanent solution.

but the "hacky" solution (which is perfectly good linux only solution, just not a posix solution) is much easier because it deals with this:

we open files and load libraries when we need them, expecting that they will be available and at the correct version. Since this situation has existed for decades, changing this is going to take time and effort.

it involves minimal touching of the existing code: changing the involved filenames from constants to variables set up at startup. at startup, on non-linux platforms the variables get init to the previous constants, while on linux the files are all opened and the variables set up to reference them using linux-specific FD references. the FD survive forks and execs (but execs cannot be used if you are linking your own libraries dynamically (those that get updated in-sync with the firefox binary)).

if you are using code or libraries that take filenames as argument, you don't need to mess with them to have them accept FDs. the FD references can be passed instead.

if you do this and statically link your own libraries, you don't need a fork server. just exec /proc/self/exe

(but fork is still better for performance reasons.)

See Also: → 1751366

This project now has a roadmap. It's probably going to be slow going so I don't expect this to be fixed particularly soon. But I plan to chip away at this during any time that I have available for it.

Roadmap Link

The conversation that I had with the security team was on Slack, so I'm going to reproduce that conversation here


bytesized Jan 18th at 10:17 PM
I'm looking into eventually turning on the Fork Server. Maybe even at some point using it for process types other than Content processes. This would give us less ASLR benefits, so I thought I should run it by y'all.

cpeterson
@jesup worked on the pre-allocated content process cache for Fission, which IIUC, might have some functionality overlap or interactions with the fork server.

cpeterson
Is that fork server only relevant for Linux?

bytesized
Yeah, this is for the fork server enabled by dom.ipc.forkserver.enable, which I do believe it Linux-only.

freddy
This thread could really need a link to a bug. :) Do I correctly understand that we consider spinning off new content processes through a fork server. And does this imply that all processes have the same memory layout for... "most" stuff? Mhh...

mccr8
Here's a bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1609882

mccr8
I also saw a recent related thread on Chromium. Apparently they have two different zygote processes, because one is sandboxed and one isn't. https://groups.google.com/a/chromium.org/g/chromium-dev/c/SCy6ysANp6g

jesup
This would mostly underlie the (preallocated) process creation done in dom/ipc for content/etc processes; from that point of view it shouldn't matter much to the upper layers. IIRC linux zygotes need to be from before any threads are started, so pretty early (though one could preload some data (fonts, etc) on the main thread, probably. Really annoying that fork doesn't include threads.....

freddy
Tom and I had a chat and I have a better understanding of the relevant threats here. I was overly worried :)

jld
Chrome's situation with sandboxing is a little different: originally there was no seccomp-bpf and no unprivileged namespaces, so they had to have a setuid root wrapper run the zygote and all of its children in the same sandbox (namespaces / empty chroot). We haven't done that, so it should be possible for one fork server to start any process type with any sandbox policy (or none).Also this reminds me that there's at least one bug I need to file to block shipping the fork server, because I don't want to lose the ability to see process exit statuses. (I have several security bugs to deal with, and some reviews, and some graphics-vs-sandboxing stuff, and I am so tired….)

bytesized
@jld CC me on it. It might be possible for me to work on it at some point if you can tell me where to start :)

jld
As for the ASLR thing, to put it in context, it's basically the situation we already have on Windows: all child processes have the same offsets, so if your exploit attempt crashes the process and it's restarted, you can try again. (With the fork server, restarting it (or the browser) will re-randomize; on Windows, it might need a reboot but I'm not sure.)

bytesized
So, I'm not sure that I totally understand everything that's been said in this thread. But it sounds to me like turning on the fork server would be acceptable, despite the loss of ASLR benefits?

tritter
Yes

thanks again for sharing.

my 0.02 re ASLR and fork:

the natural way of doing things would be to load and link the program and all its version-matched private libs if any, plus as much other libs as possible and then fork as needed. this would translate in enormous savings in memory usage, as code pages are shared between all processes. i look forward to this state of affairs.

giving away these memory savings for the sake of increased ASLR is IMHO a priori questionable. is this really the best security-wise usage of such a big increase in the working footprint of the browser? you cannot assume there is no memory footprint/performance trade off. on most systems there is. there might be smaller sacrifices of performance for more important mitigations. the increased ASLR "feature" of the current satus quo exists not by design but by accident. the burden of proof of value to warrant the existence of this feature seems to not have been met.

it should be possible for one fork server to start any process type with any sandbox policy (or none)

but loading version-matched libs should be done before or else you would loose in-place updating, and lazy loading of any other lib would increase memory footprint and should be justified.

thanks!

See Also: → 1366808
See Also: → 1724360

The severity field for this bug is relatively low, S3. However, the bug has 9 duplicates and 6 See Also bugs.
:nrishel, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(nrishel)
Severity: S3 → S2
Flags: needinfo?(nrishel)
Depends on: 1779313

:bytesized should this be assigned to you?

Flags: needinfo?(bytesized)

I don't think so.

Flags: needinfo?(bytesized)
Duplicate of this bug: 1797529
Depends on: 1805785
Depends on: 1805787
Depends on: 1805788
Duplicate of this bug: 1827826
Depends on: forkserver
No longer depends on: 1805788
Duplicate of this bug: 1784726
Duplicate of this bug: 1853512

This is being worked on for Linux through other bugs and in testing in the experimental Nightly .debs. (Normal Nightly doesn't have this issue to begin with because it uses our own updater which won't overwrite the in-use files...)

OS: Unspecified → Linux
Priority: -- → P2
Hardware: Unspecified → Desktop
Version: unspecified → Trunk
Assignee: nobody → mhughes

@gcp Is this resolved? Can we close it?

Flags: needinfo?(gpascutto)

The status of the bug is still correct, work is ongoing in the dependencies like https://bugzilla.mozilla.org/show_bug.cgi?id=forkserver.

Flags: needinfo?(gpascutto)
You need to log in before you can comment on or make changes to this bug.