Closed Bug 1464690 Opened 6 years ago Closed 4 years ago

Reproducible crash when running in memory-constrained Docker container

Categories

(Core :: IPC, defect, P3)

60 Branch
defect

Tracking

()

RESOLVED FIXED
Tracking Status
thunderbird_esr78 --- wontfix
firefox-esr78 --- wontfix
firefox84 --- fixed

People

(Reporter: hlovdal, Unassigned)

References

Details

(Keywords: crash, csectype-oom, Whiteboard: [platform-rel-Wikipedia])

Attachments

(7 files)

User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
Build ID: 20180523134233

Steps to reproduce:

This is a follow-up to https://crash-stats.mozilla.com/report/index/843f44bd-134c-469e-bda8-289570180527.

I have for some time tried to create a Docker container for running Firefox in, but unfortunately it crashes too often for it to be of much value. I assume this is due to the run-time environment inside the container is missing something that Firefox implicitly depends on.

I am creating this bug report with details on how to create the container, since you should be able to reproduce quite easily.
Attached file Dockerfile
Attached file entrypoint.sh
Attached file Makefile
Attached file run.sh
The instance that resulted in the linked crash report was started with "./run.sh /bin/bash" and then at the bash prompt "/usr/local/bin/su-exec firefox:firefox xterm" was run, and then in the xterm window:

[firefox@anton /]$ firefox --no-remote &
[1] 51
[firefox@anton /]$ 
(firefox:51): Gtk-WARNING **: Could not load a pixbuf from /org/gtk/libgtk/theme/Adwaita/assets/check-symbolic.svg.
This may indicate that pixbuf loaders or the mime database could not be found.
ExceptionHandler::GenerateDump cloned child 311
ExceptionHandler::SendContinueSignalToChild sent continue signal to child
ExceptionHandler::WaitForContinueSignal waiting for continue signal...
2018-05-27 16:08:55: minidump.cc:4808: ERROR: ReadBytes: read 0/32
2018-05-27 16:08:55: minidump.cc:4453: ERROR: Minidump cannot read header

(crashreporter:312): Gtk-WARNING **: Could not load a pixbuf from /org/gtk/libgtk/theme/Adwaita/assets/check-symbolic.svg.
This may indicate that pixbuf loaders or the mime database could not be found.
[Child 183, Chrome_ChildThread] WARNING: pipe error (3): Connection reset by peer: file /builddir/build/BUILD/firefox-60.0.1/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353
Failed to open curl lib from binary, use libcurl.so instead

[1]+  Bus error               (core dumped) firefox --no-remote
[firefox@anton /]$


The reason for the intermediate xterm window is that if firefox is started directly it will die and terminate the crash reporter before it successfully reports anything.

This bug report was submitted from a second firefox instance started from that xterm window, but I am sure it will crash again if I try to open reddit again.
about:crashes also lists

https://crash-stats.mozilla.com/report/index/2eea3113-4446-44be-bf0e-ed7b00180326
https://crash-stats.mozilla.com/report/index/e31527ce-76f4-4001-974a-d63a90180326

which I assume are accidentally included in the ~/.mozilla directory that I embedded to bootstrap a few plugins. They are however related to crashes of earlier attempts to create a usable docker container.

The contents of ~/.mozilla is not mapped to a volume since my goal is to create a volatile container that can be started in an isolated throw away fashion.
(In reply to Håkon Løvdal from comment #6)
> ... to bootstrap a few plugins.

That should be "...to bootstrap a few add-ons."
Trying to reload the reddit tab shows "Gah. Your tab just crashed." and the following is printed in the xterm window:


[Parent 322, Gecko_IOThread] WARNING: pipe error (69): Connection reset by peer: file /builddir/build/BUILD/firefox-60.0.1/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353
[Parent 322, Gecko_IOThread] WARNING: pipe error (58): Connection reset by peer: file /builddir/build/BUILD/firefox-60.0.1/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353
[Parent 322, Gecko_IOThread] WARNING: pipe error (57): Connection reset by peer: file /builddir/build/BUILD/firefox-60.0.1/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353

###!!! [Parent][MessageChannel] Error: (msgtype=0x15007F,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv



With enough reload attempts I assume firefox process itself will crash instead of just the tab.
Attached file Screen capture
And sure enough, after 12 reloads (including the one in comment #8) firefox crashed, https://crash-stats.mozilla.com/report/index/4de712da-c73c-41bb-abd3-c31460180527.

Output in xterm window attached in comment #9.
And also running "firefox --save-mode --jsconsole --devtools --no-remote" crashed, https://crash-stats.mozilla.com/report/index/05ea6d8f-dc82-490e-9721-b43a20180527.

Attaching jsconsole output and xterm screen capture.
Attached file jsconsole.txt
And it is not just reddit that triggers chrashes. It also happened once on when reloading this very page (in order to cancel adding a comment):

[firefox@anton /]$ [GFX1-]: Receive IPC close with reason=AbnormalShutdown
[Child 1569, Chrome_ChildThread] WARNING: pipe error (3): Connection reset by peer: file /builddir/build/BUILD/firefox-60.0.1/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353

[1]+  Bus error               (core dumped) firefox --save-mode --jsconsole --devtools --no-remote
Crash when clicking a link (all javascript disabled by noscript), https://crash-stats.mozilla.com/report/index/d5b038bc-46f3-4cb3-b864-87d480180527.

[firefox@anton /]$ ExceptionHandler::GenerateDump cloned child 2132
ExceptionHandler::SendContinueSignalToChild sent continue signal to child
ExceptionHandler::WaitForContinueSignal waiting for continue signal...
2018-05-27 18:19:23: minidump.cc:4808: ERROR: ReadBytes: read 0/32
2018-05-27 18:19:23: minidump.cc:4453: ERROR: Minidump cannot read header

(crashreporter:2133): Gtk-WARNING **: Could not load a pixbuf from /org/gtk/libgtk/theme/Adwaita/assets/check-symbolic.svg.
This may indicate that pixbuf loaders or the mime database could not be found.
[Child 1922, Chrome_ChildThread] WARNING: pipe error (3): Connection reset by peer: file /builddir/build/BUILD/firefox-60.0.1/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 353
Failed to open curl lib from binary, use libcurl.so instead

[1]+  Bus error               (core dumped) firefox --save-mode --jsconsole --devtools --no-remote
Group: firefox-core-security
Unfortunately those crashes aren't very informative -- there's no "mini-dump" which means no information about what the program was doing at the time of the crash, just some general information about the runtime environment. My first guess was that your Docker containers are too small, but the crashes seem to think they have 32GB of memory so it's not that. Maybe not enough disk space? The fact that it can't capture a full crash dump is very odd.

There doesn't seem to be any immediate security threat to other users from this. I'm unhiding the bug and maybe it'll get more love than if it's just locked away with the security team.
(In reply to Håkon Løvdal from comment #15)
> [1]+  Bus error               (core dumped) firefox --save-mode --jsconsole
> --devtools --no-remote

This is probably because the memory limit for /dev/shm inside the container is too small — SIGBUS usually indicates an I/O error on a memory-mapped file (in this case, probably ENOSPC).  We've seen this happen with Docker before.

The --shm-size flag should help with this: https://docs.docker.com/engine/reference/run/#runtime-constraints-on-resources
See Also: → 1245239
Status: UNCONFIRMED → NEW
Component: Untriaged → IPC
Ever confirmed: true
Keywords: crash
Product: Firefox → Core
I can confirm that adding
    --memory "16g" --shm-size "1g"
to the docker run command made a huge difference. I have been running this instance now for more than one day, opening hundreds of tabs on reddit and elsewhere without any crash. I have not tried with just one of them or with lower values.
Seems to be fixed by adjusting container resource limits.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
No this bug is not resolved! While proper resource allocation avoids the problem all together, a crash is still a crash and the problem has not been fixed. Do you know that javascript is not able to trigger excessive shared memory usage in some way which then crashes the browser and might be exploitable?

If the severity and risk of this bug is determined to be low (which I do not have the knowledge to judge), classifying it as low priority (or possibly wontfix) is fair, but resolved is wrong under all circumstances.
Flags: needinfo?(jld)
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
I stand corrected.  This is effectively an out-of-memory crash, and we do seem to consider those valid bugs and even have a keyword for them, although they're not in the criteria for any of the security severity levels.  I'll hand this back to dveditz for re-triage.

As far as possible fixes: bug 1440203 will make shared memory work like regular memory allocation for accounting purposes, but only on Linux 3.17 and up (currently ~85% of the Linux Firefox userbase); bug 1245239 was going to make shared memory pre-commit space (so it would fail immediately instead of SIBGUSing later), but caused test failures.
Flags: needinfo?(jld) → needinfo?(dveditz)
Keywords: csectype-oom
Flags: needinfo?(dveditz)
Summary: Reproducible crash when running in Docker container → Reproducible crash when running in memory-constrained Docker container
Depends on: 1440203
Priority: -- → P3
Depends on: 1245239

A number of folks are reporting issues running Firefox in docker since 68 was released as they were previously working around this issue by disabling e10s but Firefox 68 dropped the browser.tabs.remote.autostart pref in bug 1548941.

e.g.

Whiteboard: [platform-rel-Wikipedia]

Wikimedia's CI environment is also affected (as it uses Docker and Headless Firefox). We're currently pinned to Firefox 60esr.

Downstream issue - https://phabricator.wikimedia.org/T240955

Bug 1440203 has landed in Firerfox 84 (Nightly right now) which likely significantly reduces the amount of times we're going to run into this.

I'm going to mark this fixed as Firefox 84 ships with bug 1440203 which should work around broken Docker configurations. If you're still seeing this, or a similar issue, please file a new bug as the underlying cause must be different.

Status: REOPENED → RESOLVED
Closed: 6 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: