Crash [@ JSScript::functionNonDelazifying] involving saveIncrementalBytecode

RESOLVED FIXED in Firefox 64

Status

()

defect
P3
critical
RESOLVED FIXED
2 years ago
7 months ago

People

(Reporter: gkw, Assigned: tcampbell)

Tracking

(Blocks 2 bugs, 4 keywords)

Trunk
mozilla65
x86_64
Linux
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +
qe-verify -

Firefox Tracking Flags

(firefox-esr52 wontfix, firefox-esr60 wontfix, firefox59 wontfix, firefox60 wontfix, firefox61 wontfix, firefox62 wontfix, firefox63 wontfix, firefox64 fixed, firefox65 fixed)

Details

(Whiteboard: [jsbugmon:update,testComment=9,origRev=d7e0e17065ca], crash signature)

Attachments

(2 attachments)

Reporter

Description

2 years ago
The following testcase crashes on mozilla-central revision ac93fdadf102 (build with --enable-debug, run with --fuzzing-safe --ion-offthread-compile=off --ion-eager):

// jsfunfuzz-generated
var x = []
for (var i = 0; i < 9; ++i) {
    x[i] = 0;
}
// Adapted from randomly chosen test: js/src/jit-test/tests/xdr/bug1390856.js
var t = cacheEntry("function f(){};");
evaluate(t, {
    sourceIsLazy: true,
    saveIncrementalBytecode: true
});
oomAtAllocation(9, 4);
offThreadDecodeScript(t);
try {
    runOffThreadDecodedScript();
} catch (e) {}
getLcovInfo();

Backtrace:

#0  JSScript::functionNonDelazifying (this=0x7f58dfcbe128) at js/src/jsscript.h:1619
#1  0x0000000000a76fe5 in JSScript::isTopLevel (this=<optimized out>) at js/src/jsscript.h:1695
#2  GenerateLcovInfo (cx=cx@entry=0x7f58e1616000, comp=0x7f58e162d800, out=...) at js/src/jsopcode.cpp:2903
#3  0x0000000000a85507 in js::GetCodeCoverageSummary (cx=cx@entry=0x7f58e1616000, length=length@entry=0x7ffd0f9ecdd8) at js/src/jsopcode.cpp:2978
#4  0x00000000008bae30 in GetLcovInfo (cx=0x7f58e1616000, argc=<optimized out>, vp=<optimized out>) at js/src/builtin/TestingFunctions.cpp:4263
/snip

For detailed crash information, see attachment.
Reporter

Comment 2

2 years ago
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset: https://hg.mozilla.org/mozilla-central/rev/36aebe4f719d
user:        Nicolas B. Pierron
date:        Tue Jan 31 20:03:58 2017 +0000
summary:     Bug 1316078 part 5 - Add XDR off-thread decoder test cases. r=bhackett

Nicolas, is bug 1316078 a likely regressor?
Blocks: 1316078
Flags: needinfo?(nicolas.b.pierron)
So far I cannot explain the sudden spike in crashes, especially since this seems to be affecting every channel starting on the 27 of december.

I will investigate this issue tomorrow.
Priority: -- → P2

Updated

Last year
Whiteboard: [jsbugmon:update] → [jsbugmon:update,ignore]

Comment 4

Last year
JSBugMon: The testcase found in this bug no longer reproduces (tried revision 3ee38289dac8).
Reporter

Updated

Last year
Whiteboard: [jsbugmon:update,ignore] → [jsbugmon:bisectfix]

Updated

Last year
Whiteboard: [jsbugmon:bisectfix] → [jsbugmon:]

Comment 5

Last year
JSBugMon: Fix Bisection requested, result:
autoBisect shows this is probably related to the following changeset:

The first good revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/1b4d5be72031
user:        Jon Coppeard
date:        Tue Jan 30 17:57:40 2018 +0000
summary:     Bug 1432794 - Skip prototype and constructor intialization for off-thread parsing r=jandem

This iteration took 269.721 seconds to run.
Jon/Nicolas, is bug 1432794 a likely fix?
Flags: needinfo?(jcoppeard)
No, I thnk that just made the testcase not reproduce by changing allocation patterns.
Flags: needinfo?(jcoppeard)
Well, it's possible I guess if the problem turned out to be OOM handling while setting up off-thread parsing.  I think we still need to find out what the original problem was.
Reporter

Comment 9

9 months ago
// Adapted from randomly chosen test: js/src/jit-test/tests/xdr/bug1390856.js
let x = cacheEntry("");
evaluate(x, {
    saveIncrementalBytecode: true
});
for (let i = 1; i < 19; ++i) {
    try {
        oomAtAllocation(i, 4);
        offThreadDecodeScript(x);
        runOffThreadDecodedScript();
    } catch (e) {}
}
// jsfunfuzz-generated
getLcovInfo();

crashes 64-bit debug shell on m-c rev d7e0e17065ca at JSScript::functionNonDelazifying with --fuzzing-safe --ion-offthread-compile=off --ion-eager

Please look at this testcase before the situation changes again.
Summary: Crash [@ JSScript::functionNonDelazifying] → Crash [@ JSScript::functionNonDelazifying] involving saveIncrementalBytecode
Whiteboard: [jsbugmon:] → [jsbugmon:update,testComment=9,origRev=d7e0e17065ca]
Priority: P2 → P1
The problem here is that GenerateLcovInfo is iterating over a partially initialized JSScript.

Because of oomAtAllocation(i, 4), we fail to allocate in NewEmptyScopeData. MOZ_TRY fails all the way up: NewEmptyScopeData -> Scope::XDRSizedBindingNames -> GlobalScope::XDR -> js::XDRScript -> js::XDRState::codeScript -> ScriptDecodeTask::parse. In parse(), we see that res.isOk() is false, so we don't append the new script to the scripts GCVector that ScriptDecodeTask wants to return. However, the script allocation still exists. Because we bailed halfway through the decoding process, some of the script data is just garbage.

In GenerateLcovInfo, for each zone, we iterate over the JSScript cells in that zone:

    for (ZonesIter zone(rt, SkipAtoms); !zone.done(); zone.next()) {
        for (auto script = zone->cellIter<JSScript>(); !script.done(); script.next()) {
            ...

The half-initialized script is still sitting in an arena, so the iterator finds it and checks to see if we need to collect its code coverage information. The call to isTopLevel() dereferences a null scope pointer, and we segfault.

I'm not sure if the solution is to fix the iterator in GenerateLcovInfo, to be more aggressive about cleaning up the failed script decode, or a combination of both.
Assignee: nobody → iireland
QA Contact: sdetar
QA Contact: sdetar
Assignee

Comment 11

8 months ago
Thanks for analyzing this! Taking this bug as part of my JSScript::data / XDRScript cleanup. (Bug 1471062 and its follow-ups).
Flags: needinfo?(nicolas.b.pierron)
Assignee

Updated

8 months ago
Assignee: iireland → tcampbell
Crash-stat no longer report the signature on Firefox 63.
As this is only a JS Shell / code-coverage harness issue, and not a fuzzblocker I will downgrade it to P3.
Assignee

Comment 13

8 months ago
Failures during XDR may leave the script partially initialized in a way
that confuses coverage collection. This ensures the shared script data
is removed from a script if there are any XDR failures in it.

Comment 14

8 months ago
Pushed by tcampbell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c54c26b40d59
XDR failures should leave script isUncompleted(). r=nbp

Comment 15

8 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/c54c26b40d59
Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Assignee

Comment 16

8 months ago
Comment on attachment 9020466 [details]
Bug 1427860 - XDR failures should leave script isUncompleted(). r?nbp

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: None

User impact if declined: This might explain/fix some startup / XDR crashes that still occur. We are early in Beta cycle and this patch is a low-risk change.

Is this code covered by automated tests?: Yes

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: No

If yes, steps to reproduce: 

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): Low risk change and only impacting low-mem / profile-corruption.

String changes made/needed:
Attachment #9020466 - Flags: approval-mozilla-beta?
Assignee

Comment 17

8 months ago
I'd like to try this on beta and see if it has any impacts on long-standing crashes. If there is any improvement, it might be a good candidate for ESR60.
Flags: qe-verify-
Comment on attachment 9020466 [details]
Bug 1427860 - XDR failures should leave script isUncompleted(). r?nbp

[Triage Comment]
Fix which may help some of the remaining XDR crashes we see. Approved for 64.0b6.
Attachment #9020466 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Do any of the crashes on ESR60 with this signature look related to this bug? If so, should we consider uplifting this fix?
Flags: needinfo?(tcampbell)
Assignee

Comment 21

7 months ago
Don't see anything that suggests this fixes anything but lcov under OOM.
Flags: needinfo?(tcampbell)
You need to log in before you can comment on or make changes to this bug.