Closed Bug 1509813 Opened 1 year ago Closed 1 year ago

Crash in libc-2.27.so@0x8f97a

Categories

(Core :: IPC, defect, P2, critical)

Unspecified
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 1477037

People

(Reporter: gsvelto, Unassigned)

References

Details

(Keywords: crash)

Crash Data

This bug was filed from the Socorro interface and is
report bp-e16b1cc0-c493-4bc1-a08e-987ae0181125.
=============================================================

Top 10 frames of crashing thread:

0 libc-2.27.so libc-2.27.so@0x8f97a 
1 libxul.so XRE_InitChildProcess toolkit/xre/nsEmbedFunctions.cpp:761
2 firefox-bin content_process_main ipc/contentproc/plugin-container.cpp:50
3 firefox-bin main browser/app/nsBrowserApp.cpp:287
4 libc-2.27.so libc-2.27.so@0x22b16 
5 libstdc++.so.6.0.25 libstdc++.so.6.0.25@0x174277 
6 firefox-bin firefox-bin@0x76ff 
7 firefox-bin double_conversion::Bignum::Bignum mfbt/double-conversion/double-conversion/bignum.cc:38
8  @0x7ffcdd730f0f 
9 ld-2.27.so ld-2.27.so@0xf1d5 

=============================================================

We're crashing with a null-pointer exception when creating a new content process from within a glibc call. Looking at the various implementations of ProcessChild::Init() this is possible though it's hard to tell which one it is (and why it's happening).
(In reply to Gabriele Svelto [:gsvelto] from comment #0)
> Top 10 frames of crashing thread:
> 
> 0 libc-2.27.so libc-2.27.so@0x8f97a 

$ addr2line -Cfie /usr/lib/debug/.build-id/e9/38fe6706abe362f6c3c7474373ccc626cf4805.debug 0x8f97a
__strcmp_sse2_unaligned
/build/glibc-aYuVJl/glibc-2.27/string/../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:31

And then there's a frame that we're missing because of bug 1310314, between the strcmp and the XRE_InitChildProcess, which would have told us which strcmp is causing the problem.  (This could be recovered from a minidump file by manually scanning the stack memory for the return address.)

We can see that the problem is the first argument (rdi) being null, which almost makes sense given that the ProcessChild::Init implementations have a lot of strcmp(aArgv[i], "some string literal").  The second argument (rsi) is 0x00007ff3ad033af3, which is within the libxul.so mapping at offset 0x574caf3.  So, given a copy of this libxul.so, it should be enough to load it into gdb and `x/s 0x574caf3 to get the string, which should narrow it down.

Our release catalog can't find 20181124220147 (or anything newer than 11-23) for some reason, but I can try this on bp-568f03d3-a3b5-4a16-a9c4-3bc140181124:

(gdb) x/s 0x00007fa36775b903 - 0x7fa362010000
0x574b903:      "-parentBuildID"

I think this can't happen for content processes, because (assuming I didn't miss something in all the else ifs) the value is explicitly checked against null, and then has to pass through a lot of other strcmps to get there.  The other possibilities according to searchfox are RDD, GPU, and VR.  Those all have a simple loop over aArgv/aArgc, so it's not strictly impossible for a null to get in there, but at least there shouldn't be any in the argc/argv we get from the C runtime.

Trying to break this down a little more, this doesn't quite make sense for RDD: the patch that added it landed on 11-08, got backed out on 11-09, and relanded on 11-14, but we see crashes ever since 11-05.  (It's also preffed off.)  The GPU process isn't enabled yet on Linux, but it's possible some people tried flipping the pref.  VR I have no idea about.
(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #1)
> Our release catalog can't find 20181124220147 (or anything newer than 11-23) for some reason, but I can try this on
> bp-568f03d3-a3b5-4a16-a9c4-3bc140181124:

Hello, that is my crash, I have no clue yet, but it happens every day. Shall I run Nightly ASAN or what else could I do?
I am often using two profiles at once and yes, WebRender, GPU process, VR process and RDD process are enabled just because I'm hoping to run into bugs.
Priority: -- → P2
(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #1)
> 0x574b903:      "-parentBuildID"

In the past there was a bug related to -parentBuildID: bug 1460127 comment 9
GPU and RDD process don't work at all with Asan Nightly on Linux: bug 1477037
(If the GPU process is enabled but fails to start, WebRender won't be used.)
Bug 1477037 looks like the same issue as this.

(Also, for reference: the “crash signature” on this bug is an offset in one particular build of glibc, so there could be other instances of this crash that weren't being counted.  But also, any crash in that glibc's strcmp would match, and I did see a few reports that were clearly unrelated (SIGBUS).)
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1477037
You need to log in before you can comment on or make changes to this bug.