Gecko profiler crashes when running under rr, when the "Native stacks" feature is enabled
Categories
(Core :: Gecko Profiler, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox93 | --- | fixed |
People
(Reporter: julienw, Assigned: mozbugz)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
Prerequisite: have rr installed locally.
STR:
- run firefox under rr, for example from mozilla-central:
./mach run --debugger=rr --debugger-args=record
- Start the profiler. Just make sure that "Native stacks" is enabled (which is the case in all presets).
=> Crash with this message:
Exiting due to channel error.
Sandbox: Unexpected EOF, op 0 flags 00 path
Exiting due to channel error.
Exiting due to channel error.
I noticed issues with a few tests:
This crashes:
./mach gtest --debugger=rr --debugger-args="record" GeckoProfiler.Pause
This doesn't crash but fails:
./mach test --debugger=rr --debugger-args="record" tools/profiler/tests/xpcshell/test_feature_stackwalking.js
(of course it passes without rr)
I couldn't find any problems with other tests.
I'm making a debug build to see if I get more logs, and will push a new pernosco recording.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 1•3 years ago
|
||
Pernosco's khuey said "I don't think you can fix this on your side", so it looks like you'll have to choose between rr and native stacks.
Reporter | ||
Comment 2•3 years ago
|
||
A debug build doesn't bring new interesting logs.
However, I could capture a pernosco session from the gtest crash:
https://pernos.co/debug/choiTdzb6D0p9oMB7J3r8Q/index.html#f{m[GiQ,AA_,t[Dw,Bd+3_,f{e[Gh0,Aic_,s{af72QlKAA,bAYk,uKiLhaw,oKjBBsw___
We clearly see the crashing memcpy.
(In reply to Gerald Squelart [:gerald] (he/him) from comment #1)
Pernosco's khuey said "I don't think you can fix this on your side", so it looks like you'll have to choose between rr and native stacks.
Do we have a way to detect that we'll have these mismatched adresses before crashing? And maybe disable native stack automatically in this case?
Assignee | ||
Comment 3•3 years ago
|
||
Looking at the code in DoLULBacktrace: https://searchfox.org/mozilla-central/rev/912ff8d38996365e8b843cd82ddce348d8b7f78b/tools/profiler/core/platform.cpp#2375-2400
end
is the root-side end of the stack: 0x7FFDDAFEC770.
start
is the leaf-side end of the stack, taken from the SP register: 0x681FFCD0
It's quite clear already that something is fishy!
nCopy
is the number of bytes to copy, computed from end - start
, it's more than 100 trillion! But then it's just clamped to lul::N_STACK_BYTES
(160 KiB).
Instead we should see that end - start
is impossibly big, and gracefully give up. There is already a check that start < end
, otherwise nCopy
is kept at zero; we could do the same if end - start > lul::N_STACK_BYTES
.
Julien, could you please try something?
Change the following line in tools/profiler/core/platform.cpp:
if (nToCopy > lul::N_STACK_BYTES) nToCopy = lul::N_STACK_BYTES;
to:
if (nToCopy > lul::N_STACK_BYTES) nToCopy = 0;
If that still fails, try:
if (nToCopy > lul::N_STACK_BYTES) return;
(I'm not familiar with LUL, that's why I'm not 100% confident that this will "just" work.)
Good luck!
Reporter | ||
Comment 4•3 years ago
|
||
Both changes fix the crash, and I can capture a profile with them.
What should I look especially in these profiles?
Assignee | ||
Comment 5•3 years ago
|
||
(In reply to Julien Wajsberg [:julienw] from comment #4)
Both changes fix the crash, and I can capture a profile with them.
What should I look especially in these profiles?
Not crashing is a good outcome already. 😀
I'm wondering if you still see stacks in the profile? Are there empty samples?
I'll prepare a proper patch soon...
Assignee | ||
Comment 6•3 years ago
|
||
Running the profiler while Firefox runs in rr sometimes crashes, and it seems the SP register is far from the known stack area (ending at StackTop), which then proceeds to copy some protected memory.
Note that there was already a start < end
test, this patch adds an equivalent test on the other side, allowing for a 1MB stack size. Observed stacks during quick testing were up to around 520KB, and the crashing rr case was in the trillions(!), so 1MB should be an appropriate number that should catch "wrong" SPs while allowing real stacks.
Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2efa540983a1 On Linux, detect if SP is likely to be outside the known stack area - r=mstange
Comment 8•3 years ago
|
||
bugherder |
Description
•