Assertion failure: bool(resultScript) == (decoder.resultCode() == JS::TranscodeResult_Ok), at js/src/vm/HelperThreads.cpp:477 with OOM

RESOLVED FIXED in Firefox 58

Status

()

defect
P2
critical
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: decoder, Assigned: tcampbell)

Tracking

(Blocks 2 bugs, {assertion, jsbugmon, testcase})

Trunk
mozilla59
x86_64
Linux
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox57 wontfix, firefox58 fixed, firefox59 fixed)

Details

(Whiteboard: [jsbugmon:update])

Attachments

(1 attachment)

The following testcase crashes on mozilla-central revision 564e82f0f289 (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --enable-stdcxx-compat --disable-profiling --enable-debug --enable-optimize, run with --fuzzing-safe --thread-count=2 --ion-eager):

var lfLogBuffer = `
function evalWithCacheLoadOffThread(code, ctx) {
  code = code instanceof Object ? code : cacheEntry(code);
  var incremental = ctx.incremental || false;
    ctx.global = newGlobal({ cloneSingletons: !incremental });
  var ctx_save;
  if (incremental)
    oomAfterAllocations(50);
  else
    ctx_save = Object.create(ctx, {saveBytecode: { value: true } });
  evaluate(code, ctx_save);
  offThreadDecodeScript(code, ctx);
  ctx.global.eval(\`runOffThreadDecodedScript()\`);
}
test = \`
\`;
evalWithCacheLoadOffThread(test, {});
evalWithCacheLoadOffThread(test, { incremental: true });
`;
loadFile(lfLogBuffer);
function loadFile(lfVarx) {
  try {
    oomTest(function() {
        eval(lfVarx);
    });
  } catch (lfVare) {}
}



Backtrace:

 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff5efd700 (LWP 601)]
0x0000000000b872d8 in js::ScriptDecodeTask::parse (this=0x7ffff3a45400, cx=<optimized out>) at js/src/vm/HelperThreads.cpp:477
#0  0x0000000000b872d8 in js::ScriptDecodeTask::parse (this=0x7ffff3a45400, cx=<optimized out>) at js/src/vm/HelperThreads.cpp:477
#1  0x0000000000b9107d in js::HelperThread::handleParseWorkload (this=this@entry=0x7ffff69661c0, locked=...) at js/src/vm/HelperThreads.cpp:1961
#2  0x0000000000b93e2e in js::HelperThread::threadLoop (this=this@entry=0x7ffff69661c0) at js/src/vm/HelperThreads.cpp:2243
#3  0x0000000000b93eb0 in js::HelperThread::ThreadMain (arg=0x7ffff69661c0) at js/src/vm/HelperThreads.cpp:1723
#4  0x0000000000b93f12 in js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::callMain<0ul> (this=0x7ffff691e070) at js/src/threading/Thread.h:234
#5  js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start (aPack=0x7ffff691e070) at js/src/threading/Thread.h:227
#6  0x00007ffff7bc16fa in start_thread (arg=0x7ffff5efd700) at pthread_create.c:333
#7  0x00007ffff6c38b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
rax	0x0	0
rbx	0x7ffff3a45400	140737281020928
rcx	0x7ffff6c28a2d	140737333332525
rdx	0x0	0
rsi	0x7ffff6ef7770	140737336276848
rdi	0x7ffff6ef6540	140737336272192
rbp	0x7ffff5efbf80	140737319518080
rsp	0x7ffff5efbeb0	140737319517872
r8	0x7ffff6ef7770	140737336276848
r9	0x7ffff5efd700	140737319524096
r10	0x58	88
r11	0x7ffff6b9f750	140737332770640
r12	0x7ffff5efbed0	140737319517904
r13	0x7ffff5efbeb0	140737319517872
r14	0x7ffff5efd6f8	140737319524088
r15	0x7ffff5efbfb8	140737319518136
rip	0xb872d8 <js::ScriptDecodeTask::parse(JSContext*)+616>
=> 0xb872d8 <js::ScriptDecodeTask::parse(JSContext*)+616>:	movl   $0x0,0x0
   0xb872e3 <js::ScriptDecodeTask::parse(JSContext*)+627>:	ud2
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:update]
JSBugMon: Bisection requested, result:
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/48078fb0fcc2
user:        André Bargull
date:        Fri Jun 02 12:04:31 2017 +0200
summary:     Bug 1368963 - Avoid extra calls to GetPropertyKeys() in Object.freeze/seal/preventExtensions. r=jandem

This iteration took 277.511 seconds to run.
Andre, is bug 1368963 a likely regressor?
Blocks: 1368963
Flags: needinfo?(andrebargull)
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #2)
> Andre, is bug 1368963 a likely regressor?

No, I don't think so. It's more likely that due to changed GC allocation patterns bug 1368963 was picked as the regressor (cf. bug 1394523, comment #4).

Maybe allocation failures in js::XDRScript [1] need to be checked instead? I've only read, but didn't debug the code, so the following explanation could be erroneous!

If js::XDRScript() returns |false| because of allocation failures, XDRState<mode>::codeScript() calls postProcessContextErrors() [2], but postProcessContextErrors() doesn't update XDRState::resultCode_ when called on helper threads [3], |scriptp| is still set to nullptr in [4]. So when we return to ScriptDecodeTask::parse() [5], |resultScript| is nullptr, but |decoder.resultCode()| is still at its initial value JS::TranscodeResult_Ok [6]. That could explain why the assertion in [5] fails.


[1] http://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/js/src/jsscript.cpp#311
[2] http://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/js/src/vm/Xdr.cpp#178
[3] http://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/js/src/vm/Xdr.cpp#34
[4] http://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/js/src/vm/Xdr.cpp#179
[5] http://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/js/src/vm/HelperThreads.cpp#549
[6] http://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/js/src/vm/Xdr.h#187
Flags: needinfo?(andrebargull)
Priority: -- → P2
I should look at this in relation to XDR crash investigation.
Flags: needinfo?(tcampbell)
When we OOM during XDR in a helper thread, we still return TranscodeResult_Ok. This will eventually end up in a nullptr crash or debug assert. Putting together a patch now for this and potential related problems.
Assignee: nobody → tcampbell
Status: NEW → ASSIGNED
Flags: needinfo?(tcampbell)
There are a number (35 code locations) of places where when we fail, we bubble up 'return false' without setting codeResult_. When decoding happens on a helper thread, we end up returning TranscodeResult_Ok with data not fully initialized. I think under these cases we would expect memory corruption. There are also related reasons why this would be a problem on main thread encoding (particularly incremental mode). Based on telemetry, we see these errors in the wild https://mzl.la/2yXuN6N .

I don't think this explains all of the XDR crashing we see, but I think it does address real problems, so we should get it uplifted to Beta and see how the crash rates look.
(In reply to Ted Campbell [:tcampbell] from comment #7)
> There are also related reasons why this would be a problem on
> main thread encoding (particularly incremental mode). Based on telemetry, we
> see these errors in the wild https://mzl.la/2yXuN6N .

Currently we do not distinguish between asm.js failures and other failures.
Comment on attachment 8929895 [details]
Bug 1390856 - Fix XDR helper thread error handling.

https://reviewboard.mozilla.org/r/201066/#review206392

Nice catch!
Attachment #8929895 - Flags: review?(nicolas.b.pierron) → review+
Fixed test case to check for oomTest + Helper Threads.
Pushed by tcampbell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3e75d4b94737
Fix XDR helper thread error handling. r=nbp
Whiteboard: [jsbugmon:update] → [jsbugmon:update,ignore]
JSBugMon: The testcase found in this bug no longer reproduces (tried revision 0bbed2e6a4cc).
https://hg.mozilla.org/mozilla-central/rev/3e75d4b94737
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Whiteboard: [jsbugmon:update,ignore] → [jsbugmon:update]
Comment on attachment 8929895 [details]
Bug 1390856 - Fix XDR helper thread error handling.

Approval Request Comment
[Feature/Bug causing the regression]:
Unknown
[User impact if declined]:
This bug fixes problems that could lead to serious stability bugs and crashes.  It is unclear how often we hit this state, but it may help reduce a number of difficult crash signatures.
[Is this code covered by automated tests?]:
Yes
[Has the fix been verified in Nightly?]:
Automated tests pass in nightly.
[Needs manual test from QE? If yes, steps to reproduce]: 
No
[List of other uplifts needed for the feature/fix]:
This patch applies cleanly to beta tree.
[Is the change risky?]:
Low
[Why is the change risky/not risky?]:
This catches a number of cases where an XDR operation fails but we still think it succeeded. These operations already handle an error code so I expect risk is low.
[String changes made/needed]:
None
Attachment #8929895 - Flags: approval-mozilla-beta?
Comment on attachment 8929895 [details]
Bug 1390856 - Fix XDR helper thread error handling.

Fix an assertion error in javascript. Beta58+.
Attachment #8929895 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Just to clarify, in release without assertions Bad Things will happen.
(In reply to Ted Campbell [:tcampbell] from comment #15)
> [Is this code covered by automated tests?]:
> Yes
> [Needs manual test from QE? If yes, steps to reproduce]: 
> No

Given that this issue is already cover by automation and manual testing is not required, marking as qe-verify-.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.