Reproducible crash when running in memory-constrained Docker container
Categories
(Core :: IPC, defect, P3)
Tracking
()
People
(Reporter: hlovdal, Unassigned)
References
Details
(Keywords: crash, csectype-oom, Whiteboard: [platform-rel-Wikipedia])
Attachments
(7 files)
User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0 Build ID: 20180523134233 Steps to reproduce: This is a follow-up to https://crash-stats.mozilla.com/report/index/843f44bd-134c-469e-bda8-289570180527. I have for some time tried to create a Docker container for running Firefox in, but unfortunately it crashes too often for it to be of much value. I assume this is due to the run-time environment inside the container is missing something that Firefox implicitly depends on. I am creating this bug report with details on how to create the container, since you should be able to reproduce quite easily.
Reporter | ||
Comment 1•6 years ago
|
||
Reporter | ||
Comment 2•6 years ago
|
||
Reporter | ||
Comment 3•6 years ago
|
||
Reporter | ||
Comment 4•6 years ago
|
||
Reporter | ||
Comment 5•6 years ago
|
||
The instance that resulted in the linked crash report was started with "./run.sh /bin/bash" and then at the bash prompt "/usr/local/bin/su-exec firefox:firefox xterm" was run, and then in the xterm window: [firefox@anton /]$ firefox --no-remote & [1] 51 [firefox@anton /]$ (firefox:51): Gtk-WARNING **: Could not load a pixbuf from /org/gtk/libgtk/theme/Adwaita/assets/check-symbolic.svg. This may indicate that pixbuf loaders or the mime database could not be found. ExceptionHandler::GenerateDump cloned child 311 ExceptionHandler::SendContinueSignalToChild sent continue signal to child ExceptionHandler::WaitForContinueSignal waiting for continue signal... 2018-05-27 16:08:55: minidump.cc:4808: ERROR: ReadBytes: read 0/32 2018-05-27 16:08:55: minidump.cc:4453: ERROR: Minidump cannot read header (crashreporter:312): Gtk-WARNING **: Could not load a pixbuf from /org/gtk/libgtk/theme/Adwaita/assets/check-symbolic.svg. This may indicate that pixbuf loaders or the mime database could not be found. [Child 183, Chrome_ChildThread] WARNING: pipe error (3): Connection reset by peer: file /builddir/build/BUILD/firefox-60.0.1/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353 Failed to open curl lib from binary, use libcurl.so instead [1]+ Bus error (core dumped) firefox --no-remote [firefox@anton /]$ The reason for the intermediate xterm window is that if firefox is started directly it will die and terminate the crash reporter before it successfully reports anything. This bug report was submitted from a second firefox instance started from that xterm window, but I am sure it will crash again if I try to open reddit again.
Reporter | ||
Comment 6•6 years ago
|
||
about:crashes also lists https://crash-stats.mozilla.com/report/index/2eea3113-4446-44be-bf0e-ed7b00180326 https://crash-stats.mozilla.com/report/index/e31527ce-76f4-4001-974a-d63a90180326 which I assume are accidentally included in the ~/.mozilla directory that I embedded to bootstrap a few plugins. They are however related to crashes of earlier attempts to create a usable docker container. The contents of ~/.mozilla is not mapped to a volume since my goal is to create a volatile container that can be started in an isolated throw away fashion.
Reporter | ||
Comment 7•6 years ago
|
||
(In reply to Håkon Løvdal from comment #6) > ... to bootstrap a few plugins. That should be "...to bootstrap a few add-ons."
Reporter | ||
Comment 8•6 years ago
|
||
Trying to reload the reddit tab shows "Gah. Your tab just crashed." and the following is printed in the xterm window: [Parent 322, Gecko_IOThread] WARNING: pipe error (69): Connection reset by peer: file /builddir/build/BUILD/firefox-60.0.1/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353 [Parent 322, Gecko_IOThread] WARNING: pipe error (58): Connection reset by peer: file /builddir/build/BUILD/firefox-60.0.1/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353 [Parent 322, Gecko_IOThread] WARNING: pipe error (57): Connection reset by peer: file /builddir/build/BUILD/firefox-60.0.1/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353 ###!!! [Parent][MessageChannel] Error: (msgtype=0x15007F,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv With enough reload attempts I assume firefox process itself will crash instead of just the tab.
Reporter | ||
Comment 9•6 years ago
|
||
Reporter | ||
Comment 10•6 years ago
|
||
And sure enough, after 12 reloads (including the one in comment #8) firefox crashed, https://crash-stats.mozilla.com/report/index/4de712da-c73c-41bb-abd3-c31460180527. Output in xterm window attached in comment #9.
Reporter | ||
Comment 11•6 years ago
|
||
Disabling all add-ons did not help, https://crash-stats.mozilla.com/report/index/30ba763a-01b8-4bdd-ac57-430090180527.
Reporter | ||
Comment 12•6 years ago
|
||
And also running "firefox --save-mode --jsconsole --devtools --no-remote" crashed, https://crash-stats.mozilla.com/report/index/05ea6d8f-dc82-490e-9721-b43a20180527. Attaching jsconsole output and xterm screen capture.
Reporter | ||
Comment 13•6 years ago
|
||
Reporter | ||
Comment 14•6 years ago
|
||
Reporter | ||
Comment 15•6 years ago
|
||
And it is not just reddit that triggers chrashes. It also happened once on when reloading this very page (in order to cancel adding a comment): [firefox@anton /]$ [GFX1-]: Receive IPC close with reason=AbnormalShutdown [Child 1569, Chrome_ChildThread] WARNING: pipe error (3): Connection reset by peer: file /builddir/build/BUILD/firefox-60.0.1/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353 [1]+ Bus error (core dumped) firefox --save-mode --jsconsole --devtools --no-remote
Reporter | ||
Comment 16•6 years ago
|
||
Crash when clicking a link (all javascript disabled by noscript), https://crash-stats.mozilla.com/report/index/d5b038bc-46f3-4cb3-b864-87d480180527. [firefox@anton /]$ ExceptionHandler::GenerateDump cloned child 2132 ExceptionHandler::SendContinueSignalToChild sent continue signal to child ExceptionHandler::WaitForContinueSignal waiting for continue signal... 2018-05-27 18:19:23: minidump.cc:4808: ERROR: ReadBytes: read 0/32 2018-05-27 18:19:23: minidump.cc:4453: ERROR: Minidump cannot read header (crashreporter:2133): Gtk-WARNING **: Could not load a pixbuf from /org/gtk/libgtk/theme/Adwaita/assets/check-symbolic.svg. This may indicate that pixbuf loaders or the mime database could not be found. [Child 1922, Chrome_ChildThread] WARNING: pipe error (3): Connection reset by peer: file /builddir/build/BUILD/firefox-60.0.1/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353 Failed to open curl lib from binary, use libcurl.so instead [1]+ Bus error (core dumped) firefox --save-mode --jsconsole --devtools --no-remote
Updated•6 years ago
|
Comment 17•6 years ago
|
||
Unfortunately those crashes aren't very informative -- there's no "mini-dump" which means no information about what the program was doing at the time of the crash, just some general information about the runtime environment. My first guess was that your Docker containers are too small, but the crashes seem to think they have 32GB of memory so it's not that. Maybe not enough disk space? The fact that it can't capture a full crash dump is very odd. There doesn't seem to be any immediate security threat to other users from this. I'm unhiding the bug and maybe it'll get more love than if it's just locked away with the security team.
Comment 18•6 years ago
|
||
(In reply to Håkon Løvdal from comment #15) > [1]+ Bus error (core dumped) firefox --save-mode --jsconsole > --devtools --no-remote This is probably because the memory limit for /dev/shm inside the container is too small — SIGBUS usually indicates an I/O error on a memory-mapped file (in this case, probably ENOSPC). We've seen this happen with Docker before. The --shm-size flag should help with this: https://docs.docker.com/engine/reference/run/#runtime-constraints-on-resources
Updated•6 years ago
|
Reporter | ||
Comment 19•6 years ago
|
||
I can confirm that adding --memory "16g" --shm-size "1g" to the docker run command made a huge difference. I have been running this instance now for more than one day, opening hundreds of tabs on reddit and elsewhere without any crash. I have not tried with just one of them or with lower values.
Comment 20•6 years ago
|
||
Seems to be fixed by adjusting container resource limits.
Reporter | ||
Comment 21•6 years ago
|
||
No this bug is not resolved! While proper resource allocation avoids the problem all together, a crash is still a crash and the problem has not been fixed. Do you know that javascript is not able to trigger excessive shared memory usage in some way which then crashes the browser and might be exploitable? If the severity and risk of this bug is determined to be low (which I do not have the knowledge to judge), classifying it as low priority (or possibly wontfix) is fair, but resolved is wrong under all circumstances.
Reporter | ||
Updated•6 years ago
|
Comment 22•6 years ago
|
||
I stand corrected. This is effectively an out-of-memory crash, and we do seem to consider those valid bugs and even have a keyword for them, although they're not in the criteria for any of the security severity levels. I'll hand this back to dveditz for re-triage. As far as possible fixes: bug 1440203 will make shared memory work like regular memory allocation for accounting purposes, but only on Linux 3.17 and up (currently ~85% of the Linux Firefox userbase); bug 1245239 was going to make shared memory pre-commit space (so it would fail immediately instead of SIBGUSing later), but caused test failures.
Updated•6 years ago
|
Comment 23•5 years ago
|
||
A number of folks are reporting issues running Firefox in docker since 68 was released as they were previously working around this issue by disabling e10s but Firefox 68 dropped the browser.tabs.remote.autostart
pref in bug 1548941.
e.g.
Updated•4 years ago
|
Comment 24•4 years ago
|
||
Wikimedia's CI environment is also affected (as it uses Docker and Headless Firefox). We're currently pinned to Firefox 60esr.
Downstream issue - https://phabricator.wikimedia.org/T240955
Comment 25•4 years ago
|
||
Bug 1440203 has landed in Firerfox 84 (Nightly right now) which likely significantly reduces the amount of times we're going to run into this.
Comment 26•4 years ago
|
||
I'm going to mark this fixed as Firefox 84 ships with bug 1440203 which should work around broken Docker configurations. If you're still seeing this, or a similar issue, please file a new bug as the underlying cause must be different.
Description
•