Closed
Bug 1338771
Opened 8 years ago
Closed 3 years ago
[e10s] Crash in libyuv::ARGBSetRow_X86
Categories
(Core :: Graphics, defect, P3)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: kristian, Unassigned)
References
Details
(Keywords: crash, Whiteboard: [gfx-noted])
Crash Data
This bug was filed from the Socorro interface and is
report bp-1369f601-b6f1-46a9-abb5-3bc272170210.
=============================================================
Firefox running in Docker with help from Xvfb and llvmpipe.
The tab crash when visiting http://version2.dk, I haven't been able to find other sites which also crash the tab, but if I disable e10s (browser.tabs.remote.autostart.2 false) it stop crashing.
Reporter | ||
Comment 1•8 years ago
|
||
Changing the Docker ipc namespace to host seems to solve this, but cost some security.
Could it be a shared memory usage issue?
Reporter | ||
Comment 2•8 years ago
|
||
Switching on xrender also solve the issues.
Comment 3•8 years ago
|
||
(In reply to Kristian Klausen from comment #1)
> Changing the Docker ipc namespace to host seems to solve this, but cost some
> security.
> Could it be a shared memory usage issue?
Yea, it seems to related to shmem issue. shmem is not used when e10s is disabled. The shmem allocation seemed to be failed, but ShmemTextureData::Create thought it succeeded.
> https://hg.mozilla.org/releases/mozilla-release/file/327e081221b0/gfx/layers/BufferTexture.cpp#l558
Comment 5•8 years ago
|
||
Bill, seems that in e10s mode running in docker breaks unless you're using docker's --ipc=host flag. Is there something we could do about this?
Flags: needinfo?(wmccloskey)
See Also: 1350721 →
Sorry, I have no idea. Maybe Jed can think of something.
Flags: needinfo?(wmccloskey) → needinfo?(jld)
Comment 7•8 years ago
|
||
The IPC namespace controls SysV IPC. I could understand needing to use --ipc=host if Firefox were running inside a container and connecting to an X11 server on the host, because of the MIT-SHM extension, but if the X client and server are in the same container (which is what I'd expect for a test setup) then that shouldn't matter. Possibly there's something besides X that's trying to use SysV IPC; I don't know if there's a good way to find out what that might be other than using strace.
But this gets weirder. The crash stacks mention "shared memory", but that's Gecko IPC's shared memory (ipc::Shmem); as far as I can tell from searchfox, that doesn't ever use SysV shm: it opens a file and uses mmap(). The crashes are also SIGBUS, which is unusual on x86 Linux; one possible cause, which seems to be the most likely in this context, is accessing past the end of a memory-mapped file.
That shouldn't be possible, because ShmemTextureData::Create appears to be passing the same size to AllocUnsafeShmem and InitBuffer, but apparently it is. Maybe AllocUnsafeShmem (or whatever it eventually winds up calling) returns a shared memory area that isn't big enough?
Flags: needinfo?(jld)
Comment 8•8 years ago
|
||
I have a reproducible example of this crash in Bug 1323701 if this may be helpful for anyone. This is blocking us updating our test suite to Selenium 3 / Marionette, so am happy to try and provide info if I can.
Comment 10•8 years ago
|
||
It turns out I'm wrong about Docker: --ipc *does* affect /dev/shm as well as SysV IPC; see https://github.com/moby/moby/pull/12159. (The documentation does have the word POSIX wedged in in one place, I now see, but the rest of it seemed to be talking about SysV IPC so I didn't realize it would also affect the filesystem.)
What I think is going on is that /dev/shm runs out of space — Docker's default is 64M — and we're not actually allocating space when the file is created, so ENOSPC happens in the page fault handler and we get SIGBUS.
(The CrossProcessSemaphore SIGBUS crashes were the big clue here — that's a small fixed-size allocation, not a potentially large array, so out-of-bounds access didn't make sense as the cause.)
So, one part of this (if I'm right) is to raise the --shm-size in the test containers.
The other thing that could happen is to allocate space (with posix_fallocate) when creating shared memory items and handle failure somehow, even if that's just by immediately crashing with appropriate metadata — at least then we'd see the allocation site, not something else later on, and we could include these in any statistics on OOM crashes. I don't know if anything in graphics that uses shared memory is actually expecting fallible allocation, but that could also be done.
Flags: needinfo?(jld)
Comment 11•8 years ago
|
||
It turns out there's already a bug about pre-allocating shared memory, for exactly this reason: bug 1245239, which has a patch, which I r+ed, but it caused breakage on Try and didn't land.
See Also: → 1245239
Comment 12•8 years ago
|
||
There is also an issue for geckodriver, where people see crashes with Docker and Selenium:
https://github.com/mozilla/geckodriver/issues/285
One of our affected users mentioned that attaching the /dev/shm volume to docker container fixed it for him.
Comment 13•8 years ago
|
||
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #11)
> It turns out there's already a bug about pre-allocating shared memory, for
> exactly this reason: bug 1245239, which has a patch, which I r+ed, but it
> caused breakage on Try and didn't land.
It looks like this patch was submitted over a year ago, and the bug has not seen much movement since. Is this likely to get any movement?
Comment 14•8 years ago
|
||
I think I might have accidentally found out what the problem with bug 1245239 was.
Keep in mind that fixing bug 1245239 just means that things will crash (or otherwise fail) in a more friendly way. The real fix is to use a larger /dev/shm.
Comment 15•8 years ago
|
||
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #14)
> I think I might have accidentally found out what the problem with bug
> 1245239 was.
>
> Keep in mind that fixing bug 1245239 just means that things will crash (or
> otherwise fail) in a more friendly way. The real fix is to use a larger
> /dev/shm.
I see, understood - thanks for the follow up :)
Updated•8 years ago
|
Whiteboard: [gfx-noted]
Updated•8 years ago
|
Priority: -- → P3
Comment 18•7 years ago
|
||
Just experienced it on github ( https://github.com/La0/mozilla-static-analysis/issues/new )
bp-d73f88ef-f121-42f8-9902-791900180718#tab-details
Updated•3 years ago
|
Comment 19•3 years ago
|
||
Doesn't happen in any of the recent version of Firefox
Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Updated•3 years ago
|
Resolution: FIXED → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•