Intermittent [tier 2] Android Jit tests/jit-test/jit-test/* | Segmentation fault (code 139, args "--no-asmjs") [0.3 s]
Categories
(Core :: JavaScript Engine, defect, P5)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox-esr68 | --- | unaffected |
firefox69 | --- | unaffected |
firefox70 | --- | unaffected |
firefox71 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: jandem)
References
(Regression)
Details
(Keywords: crash, intermittent-failure, regression)
Attachments
(2 files)
Filed by: ccoroiu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=268383886&repo=mozilla-inbound
Full log: https://queue.taskcluster.net/v1/task/UivmY5VxQyyXjEok9aPBEA/runs/0/artifacts/public/logs/live_backing.log
task 2019-09-25T16:06:07.953Z] 16:06:02 INFO - TEST-PASS | tests/jit-test/jit-test/tests/asm.js/testBug1437534.js | Success (code 0, args "--blinterp-eager") [0.3 s]
[task 2019-09-25T16:06:07.953Z] 16:06:03 INFO - Segmentation faultSegmentation faultExit code: 139
[task 2019-09-25T16:06:07.953Z] 16:06:03 INFO - FAIL - asm.js/testBug1437534.js
[task 2019-09-25T16:06:07.953Z] 16:06:03 WARNING - TEST-UNEXPECTED-FAIL | tests/jit-test/jit-test/tests/asm.js/testBug1437534.js | Segmentation fault (code 139, args "--no-asmjs") [0.3 s]
[task 2019-09-25T16:06:07.953Z] 16:06:03 INFO - INFO exit-status : 139
[task 2019-09-25T16:06:07.953Z] 16:06:03 INFO - INFO timed-out : False
[task 2019-09-25T16:06:07.953Z] 16:06:03 INFO - INFO stdout > Segmentation fault
[task 2019-09-25T16:06:07.953Z] 16:06:03 INFO - INFO stderr 2> Segmentation fault
Comment 1•5 years ago
|
||
Updated•5 years ago
|
The "intermittent" in the title is not quite right, this is a perma fail, but I'm too scared to touch it in case I break the sheriff team's workflow.
Random needinfo victims picked from bug 1555479: how can I debug these "Android 8.0 Pixel2 pgo" failures caused by a compiler upgrade? Is it hopeless without an actual device? How long can I let this tier2 failure sit before upsetting you?
Comment 4•5 years ago
|
||
Why did you pick bug 1555479? It looks jit tests are failing and the wrench tests are a separate job.
I didn't read closely enough.
Updated•5 years ago
|
Comment 6•5 years ago
|
||
There is a bit more information in the logcat artifact, and some failures also have a tombstone artifact for the crash; they don't look very helpful to me, but maybe there's something useful there for you.
You can run an android arm emulator locally if you have the android sdk installed, with 'mach android-emulator --version 4.3'; I don't know if you can reproduce the failure that way, but it might be worth checking.
:aerickson and :bc know all about the "Android 8.0 Pixel2" environment and might have additional advice?
At this rate, I would expect this to be on the intermittent sheriff's radar within a few days.
Comment 7•5 years ago
|
||
tombstone and the logcat says "Cause: null pointer dereference"
Looking at the WARNINGs from the clang 9.0 build and the previous, the only thing in js land is
+WARNING - [style 0.0.1] /builds/worker/workspace/build/src/obj-firefox/dist/include/js/Proxy.h:222:43: warning: offset of on non-standard-layout type 'js::BaseProxyHandler' [-Winvalid-offsetof], err: false
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 9•5 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #6)
You can run an android arm emulator locally if you have the android sdk installed, with 'mach android-emulator --version 4.3'; I don't know if you can reproduce the failure that way, but it might be worth checking.
I tried this but it doesn't work - the shell crashes and logcat shows it's a SIGILL in libmozglue.so. What's the simplest way to start the JS shell in the emulator? I tried this zip.
Assignee | ||
Comment 10•5 years ago
|
||
The browser crashes (J1 suite) hit the release assert here: https://searchfox.org/mozilla-central/rev/f43ae7e1c43a4a940b658381157a6ea6c5a185c1/js/src/ds/LifoAlloc.h#403
Comment 11•5 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #9)
(In reply to Geoff Brown [:gbrown] from comment #6)
You can run an android arm emulator locally if you have the android sdk installed, with 'mach android-emulator --version 4.3'; I don't know if you can reproduce the failure that way, but it might be worth checking.
I tried this but it doesn't work - the shell crashes and logcat shows it's a SIGILL in libmozglue.so. What's the simplest way to start the JS shell in the emulator? I tried this zip.
That reminds me of https://bugzilla.mozilla.org/show_bug.cgi?id=1582838#c3; I don't know what's happening.
Assignee | ||
Comment 12•5 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #11)
That reminds me of https://bugzilla.mozilla.org/show_bug.cgi?id=1582838#c3; I don't know what's happening.
Hm I hit that issue too when I tried to run the GeckoView Example APK in the emulator.
I'm sorry but I can't do anything here if we can't even get things to run in the ARM emulator.
Comment 13•5 years ago
|
||
Sorry, it looks like local runs of the arm emulator are not very usable at this time. I've updated bug 1582838 and will continue to try to move that forward.
Assignee | ||
Comment 14•5 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #13)
Sorry, it looks like local runs of the arm emulator are not very usable at this time. I've updated bug 1582838 and will continue to try to move that forward.
Thanks!
For what it's worth, all tests that fail (the browser jsreftest + shell jit-tests) use Function or eval to create/call a function with a ton of arguments. Combined with the jsreftest InterpreterStack LifoAlloc::release
stack I wonder if it's related to the LifoAlloc's oversized chunk handling (pushing many arguments => large expression stack => large stack frame).
I'm doing some Try debugging but it takes time.
Assignee | ||
Comment 15•5 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #14)
I'm doing some Try debugging but it takes time.
We fail the range check assertion in BumpChunk::release(Mark)
because we have a BumpChunk::Mark
that has:
- chunk_: 0xd03e6000
- bump_: 0xd03e6000
They're identical. This shouldn't happen because the chunk has a fixed-size BumpChunkReservedSpace
header. I wonder if we're miscompiling BumpChunk::begin()
somewhere.
The BumpChunk itself appears to be valid: the bump_
pointer when we crash is 0xd03f98f0, that's 80112 bytes in - the test pushes 10,000 JS Values of 8 bytes each so considering BumpChunkReservedSpace and InterpreterFrame that looks about right.
I'll see if I can figure out why bump_ and chunk_ are equal.
Assignee | ||
Comment 16•5 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #15)
The BumpChunk itself appears to be valid: the
bump_
pointer when we crash is 0xd03f98f0, that's 80112 bytes in - the test pushes 10,000 JS Values of 8 bytes each so considering BumpChunkReservedSpace and InterpreterFrame that looks about right.
For what it's worth, when I run this test locally in a 32-bit opt JS shell, I get the same 80112 number so that's all correct. The only difference is that my BumpChunk::Mark
struct has a bump_ pointer that matches BumpChunk::bump_
instead of the BumpChunk itself.
Assignee | ||
Comment 17•5 years ago
|
||
LLVM bug. It ends up inlining pushInlineFrame => LifoAlloc::mark into Interpret but messes up codegen for it. This shows where/how it goes wrong.
Assignee | ||
Comment 18•5 years ago
|
||
Assignee | ||
Comment 19•5 years ago
|
||
I verified this workaround fixes the jsreftest + jit-test crashes on Try.
Comment 20•5 years ago
|
||
Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5bd04359efb6 Add MOZ_NEVER_INLINE to LifoAlloc::mark to work around Clang 9 miscompilation on Android. r=nbp
Comment 23•5 years ago
|
||
Thanks very much Jan for the investigation and patch!
Comment 24•5 years ago
|
||
I should mention, we'll still file this upstream, but the current form of the repro is not a great thing to attach to a bug report. I'd like to reduce and/or bisect it first, then I'll report back.
Comment 25•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Comment 28•5 years ago
|
||
I filed https://bugs.llvm.org/show_bug.cgi?id=43526 for this.
Updated•2 years ago
|
Description
•