Crash in js::jit::BaselineScript::nativeCodeForPC
Categories
(Core :: JavaScript Engine: JIT, defect, P3)
Tracking
()
People
(Reporter: 6lobe, Assigned: djvj)
Details
(Keywords: nightly-community)
Crash Data
Attachments
(3 files)
This bug was filed from the Socorro interface and is report bp-ea445ca1-3f30-45b3-9ddd-be58b0170527. ============================================================= I can consistently reproduce this crash on two different computers. Go to https://www.reittiopas.fi/ and just use the site for a while and you'll get this crash sooner or later. I also got the following crash signature once: js::jit::BailoutIonToBaseline https://crash-stats.mozilla.com/report/index/b974e836-cbf5-4268-ab10-75c130170527
Comment 2•6 years ago
|
||
77% of the crashes on recent versions[0] of Firefox are happening on MOZ_CRASH(No native code for this pc) at https://hg.mozilla.org/releases/mozilla-release/annotate/afa87f9be3a8/js/src/jit/BaselineJIT.cpp#l895 Kannan, any idea if this assertion is actionable? [0] https://crash-stats.mozilla.com/signature/?version=60.0a1&version=59.0b&version=59.0b9&version=59.0b10&version=59.0b7&version=59.0b8&version=59.0b6&version=59.0b5&version=59.0b4&version=59.0b3&version=59.0a1&version=58.0.2&version=58.0.1&version=58.0b&version=58.0&version=58.0b99&version=58.0b16&version=58.0b15&version=58.0b14&version=58.0b13&version=58.0b12&version=58.0b11&version=58.0b10&version=58.0b9&version=58.0b8&version=58.0b7&version=58.0b6&version=58.0b5&version=58.0b4&version=58.0b3&version=57.0.4&version=57.0.3&version=57.0.2&version=57.0.1&version=57.0&signature=js%3A%3Ajit%3A%3ABaselineScript%3A%3AnativeCodeForPC&date=%3E%3D2017-08-15T17%3A22%3A18.000Z&date=%3C2018-02-15T17%3A22%3A18.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=moz_crash_reason&_sort=version&_sort=uptime&_sort=-date&page=1
Assignee | ||
Comment 3•6 years ago
|
||
This should be actionable. How urgent is this, and how high of a crasher is this? It seems pretty low-frequency. If we can get a repro for this it should be very actionable.
Comment 4•6 years ago
|
||
(In reply to Kannan Vijayan [:djvj] from comment #3) > This should be actionable. Can you detail what would be the conditions needed to reach this assertion, and how fixing this issue should be approached?
Comment 5•6 years ago
|
||
Could be related to running the gecko-profiler? https://crash-stats.mozilla.com/report/index/d0349b47-46f6-4a07-b6ef-c65470180627
Comment 6•6 years ago
|
||
(In reply to Paul Bone [:pbone] from comment #5) > Could be related to running the gecko-profiler? Maybe! I just ran into this crash, just after capturing a profile with perf-html.io. My crash report is bp-d16a027c-d908-49e8-a55f-1b5b10180913 . The crash happened just after the profile was displayed -- possibly while resolving symbols.
Assignee | ||
Comment 7•6 years ago
|
||
The crash is almost definitely related to jit-coach, and native stack walking. These crashes are basically impossible to reproduce and happen randomly. The proper resolution is to implement bug 1426134 and get rid of the ad-hoc, extremely fragile stack-walking that explicitly keeps itself aware of the stack structure of every type of frame. This is just extremely prone to bitrot, and crashes are very hard to diagnose.
Comment 8•5 years ago
|
||
FYI, this crashes within a few pages of browsing typically for me with the profiler on in a Debug build, on MOZ_ASSERT(native).
Is there anything else that would help? attaching info from gdb. This is easy to make happen, if you can live with how slow a Debug build is.
Comment 9•5 years ago
|
||
Assignee | ||
Comment 10•5 years ago
|
||
So the thing that gets us here is that there is a some PC being passed in that is not mapped to a baseline native address. The PC being looked up is generated here:
https://searchfox.org/mozilla-central/source/js/src/jit/JSJitFrameIter.cpp#633
The comment above that line:
// Certain exception handling cases such as debug OSR or resuming a generator
// with .throw() will use BaselineFrame::setOverridePc() to indicate the
// effective |pc|. We translate the effective-pc into a Baseline code
// address.
I suspect that some new situatoins have come up where we are setting overridePC here to something that is not being properly mapped by PCMap in the bytecode. My suspicions lean in the direction of generators, potentially - since those are the main new feature to be provided support in the jits after the stack-walking functionality landed.
Comment 11•5 years ago
|
||
In Debug opt builds it still crashes, just less often, and in non-debug Opt builds (though on a null deref). The nice thing about trapping it in a Debug non-opt build is (a) it happens much faster, and b) it's easier to examine vars in the stack.
So what's the next step here?
Comment 12•5 years ago
|
||
Kannan, do you have thoughts on next steps?
Comment 13•5 years ago
|
||
Can you catch it in rr? If you can, then we can figure out who set the bad address by setting a watchpoint on overrideOffset_ in the baseline frame and running in reverse. That should give us a pretty good idea who's to blame.
Comment 14•5 years ago
|
||
We should land bug 1511370 or, maybe better, fix bug 1437246. Then we will get these crashes and assertion failures in the JS shell.
Assignee | ||
Comment 15•5 years ago
|
||
:jesup - I've been trying to repro this on the original site mentioned, as well as others. Do you have any particular trigger actions to recommend?
Assignee | ||
Comment 16•5 years ago
|
||
I did some digging into this today with jesup's help. Found some tempting clues. I couldn't replicate it locally but jesup caught one in gdb and dumped some info for me. The nativeCodeForPC() override address that's failing falls within the following script bytecode:
(gdb) x/54bx script->code()
0x7fe7c158b730: 0x3e 0x88 0x00 0x03 0x00 0x00 0xec 0x0c
0x7fe7c158b738: 0xeb 0x89 0x00 0x03 0x00 0x00 0x51 0x14
0x7fe7c158b740: 0x07 0x1a 0x00 0x00 0x00 0xe6 0x05 0x00
0x7fe7c158b748: 0x00 0x00 0x88 0x00 0x02 0x00 0x00 0x0c
0x7fe7c158b750: 0xb8 0x00 0x00 0x00 0x00 0x0a 0x3a 0x00
0x7fe7c158b758: 0x00 0x05 0xe6 0x08 0x00 0x00 0x00 0x88
0x7fe7c158b760: 0x01 0x05 0x00 0x00 0x70 0x99
The override pointer is: 0x7fe7c158b765
(the 0x99 on the last line).
The table below is a bytecode disassembly of the above:
BYTECODE:
0 - @0 - 0x3e - ZERO
1 - @1 - 0x88 - GETALIASEDVAR
2 - @6 - 0xec - TONUMERIC
3 - @7 - 0x0c - DUP
4 - @8 - 0xeb - DEC
5 - @9 - 0x89 - SETALIASEDVAR
6 - @14 - 0x51 - POP
7 - @15 - 0x14 - LT
8 - @16 - 0x07 - IFEQ
9 - @21 - 0xe6 - JSOP_JUMPTARGET
10 - @26 - 0x88 - GETALIASEDVAR
11 - @31 - 0x0c - DUP
12 - @32 - 0xb8 - CALLPROP
13 - @37 - 0x0a - SWAP
14 - @38 - 0x3a - CALL
15 - @41 - 0x05 - RETURN
16 - @42 - 0xe6 - JSOP_JUMPTARGET
17 - @47 - 0x88 - GETALIASEDVAR
18 - @52 - 0x70 - THROW
19 - @53 - 0x99 - RETRVAL
The method pseucode would be something along the lines of:
function () {
if (A-- != 0) {
B.method();
}
throw D;
}
Specifically, the override PC points to offset 53: the RETRVAL opcode immediately after the THROW.
Looking at the source code, the following logic seemed relevant:
https://searchfox.org/mozilla-central/source/js/src/jit/BaselineJIT.cpp#905
// Code was not generated for this PC because BaselineCompiler believes it
// is unreachable.
return nullptr;
Looking at BaselineCompiler - we will not emit any mapping entries for opcodes which follow other opcodes which are not Fallthrough (of which JSOP_THROW is one).
Jesup dumped the pcmapping table for the baselineScript:
(gdb) x/38bx ((char *) script->baseline) + script->baseline->pcMappingOffset_
0x7fe7be334fd0: 0x00 0x8d 0x1c 0x81 0x46 0x81 0x01 0x0a
0x7fe7be334fd8: 0x92 0x06 0x81 0x20 0x81 0x51 0x02 0x00
0x7fe7be334fe0: 0x81 0x22 0x80 0x10 0x80 0x1c 0x81 0x2e
0x7fe7be334fe8: 0x92 0x06 0x81 0x20 0x92 0x08 0x81 0x2e
0x7fe7be334ff0: 0x80 0x0a 0x80 0x1c 0x81 0x50
And the dumps for the indexTable that indexes into that:
(gdb) x/6xw ((char *) script->baseline) + script->baseline->pcMappingIndexOffset_
0x7fe7be334fc0: 0x00000000 0x00000807 0x00000000
I tried to parse this manually by eye several times, but my reading seems to indicate that there are more than 19 instructions being mapped? Posting temporary status for now before I work more on this.
Assignee | ||
Comment 17•5 years ago
|
||
Spoke with Ted about this today and he pointed out this bit of code: https://searchfox.org/mozilla-central/source/js/src/jit/VMFunctions.cpp#930
DebugEpilogue sets the PC to the last bytecode instruction. It's called from:
OnLeaveBaselineFrame: https://searchfox.org/mozilla-central/source/js/src/jit/JitFrames.cpp#254
called from HandleExceptionBaseline: https://searchfox.org/mozilla-central/source/js/src/jit/JitFrames.cpp#513
called from HandleException: https://searchfox.org/mozilla-central/source/js/src/jit/JitFrames.cpp#689
which is called in the exceptionTail trampoline code for baseline.
This seems like an obvious mechanism whereby we would skip generating pcmappings for the last bytecode instruction, and then have that be set as the overridePc anyway, and fail to map if the profiler manages to hit between the time the overridePC is set, and the time that the frame is popped and the proper resumePC on the caller frame set.
I have a question I can't answer easily: why are we setting the overridePC
to the last instruction here?
It's temporarily being set on the frame, immediately before the frame gets popped for a return (so the assumption is that the overridePc is never observed). However, the profiler is breaking this assumption and observing the overridePc.
If the overridePc is never expected to be observed during this time-period, then can't we just set it to the original pc passed in? Or the first bytecode? Is there a specific reason why lastPC is used here?
nbp, jandem - either of you have an idea about the reasoning behind that 'lastPC' choice?
Assignee | ||
Comment 18•5 years ago
|
||
This is an educated stab in the dark. Imagine a man in a monocle and tophat, stabbing with a fancy walking stick of some sort.
Jesup: Since you're getting these crashes reliably and repeatedly - would you mind running with this patch applied?
Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=846756468695993d1f15be26a944a98c462bf6f8
Comment 19•5 years ago
|
||
(In reply to Kannan Vijayan [:djvj] from comment #17)
I have a question I can't answer easily: why are we setting the
overridePC
to the last instruction here?
The overridePC is set for the exception handling trampoline (when we |return false| from DebugEpilogue). After unwinding environments we want to prevent the exception handler from getting confused by the now-unwound environment chain not matching the pc. Setting it to the first instruction, script->code(), would likely work.
However does the profiler even use the override pc? I thought it only cared about return addresses and the lastCallFrame stuff.
Assignee | ||
Comment 20•5 years ago
•
|
||
It saves the PC and uses it later (when dumping profiles) to generate frame display-name info (e.g. filename:lineno info for the script):
https://searchfox.org/mozilla-central/source/js/src/jit/JSJitFrameIter.cpp#635
The relevant stack trace (from jesup):
#0 0x00007fe860f84538 in js::jit::BaselineScript::nativeCodeForPC(JSScript*, unsigned char*, js::jit::PCMappingSlotInfo*)
(this=0x7fe7be334ee0, script=0x25d31274e5b0, pc=0x7fe7c158b765 "\231\230\200", slotInfo=0x7fe834cee010)
at ../../../../js/src/jit/BaselineJIT.h:446
native = 0x0
#1 0x00007fe8613360fe in js::jit::JSJitProfilingFrameIterator::fixBaselineReturnAddress() (this=0x7fe834cee2b0)
at /home/jesup/src/mozilla/inbound/js/src/jit/JSJitFrameIter.cpp:639
slotInfo = {
slotInfo_ = 0 '\000'
}
script = 0x25d31274e5b0
override = 0x7fe7c158b765 "\231\230\200"
bl = 0x7fff4e59f3b0
#2 0x00007fe861335d89 in js::jit::JSJitProfilingFrameIterator::moveToNextFrame(js::jit::CommonFrameLayout*)
(this=0x7fe834cee2b0, frame=0x7fff4e59f350) at /home/jesup/src/mozilla/inbound/js/src/jit/JSJitFrameIter.cpp:731
stubFrame = 0x7fff4e59f390
prevType = js::jit::FrameType::BaselineStub
#3 0x00007fe86133615d in js::jit::JSJitProfilingFrameIterator::operator++() (this=0x7fe834cee2b0)
at /home/jesup/src/mozilla/inbound/js/src/jit/JSJitFrameIter.cpp:649
frame = 0x7fff4e59f350
#4 0x00007fe8609827ac in JS::ProfilingFrameIterator::operator++() (this=0x7fe834cee288)
at /home/jesup/src/mozilla/inbound/js/src/vm/Stack.cpp:1807
Comment 21•5 years ago
|
||
(In reply to Kannan Vijayan [:djvj] from comment #17)
nbp, jandem - either of you have an idea about the reasoning behind that 'lastPC' choice?
I do not. I was not even aware of this overridePC
mechanism.
Assignee | ||
Comment 22•5 years ago
|
||
Comment 23•5 years ago
|
||
Pushed by kvijayan@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1f4aa15478f1 Set overridePc to firstPc instead of lastPc when unwinding frames. r=jandem
Comment 24•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Updated•5 years ago
|
Description
•