Closed
Bug 820186
Opened 12 years ago
Closed 12 years ago
Various crashes/assertions with gczeal(10) and random recursion
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
VERIFIED
FIXED
mozilla20
Tracking | Status | |
---|---|---|
firefox17 | --- | unaffected |
firefox18 | --- | unaffected |
firefox19 | --- | unaffected |
firefox20 | + | fixed |
firefox-esr10 | --- | unaffected |
firefox-esr17 | --- | unaffected |
People
(Reporter: gkw, Assigned: jonco)
References
Details
(5 keywords, Whiteboard: [adv-main20-])
Crash Data
Attachments
(5 files, 1 obsolete file)
The upcoming attached testcase asserts js debug shell on m-c changeset 40324fd7e5db without any CLI arguments at Assertion failure: hasAllFlags(OBJECT_FLAG_DYNAMIC_MASK), and crashes js opt shell at js::gc::ChunkBitmap::markIfUnmarked
s-s and sec-critical because this involves gc and seems to access weird memory addresses.
autoBisect shows this is probably related to the following changeset:
The first bad revision is:
changeset: 114220:f55177f70719
user: Jon Coppeard
date: Fri Nov 16 15:34:22 2012 +0000
summary: Bug 790338 - Sweep compartments in groups r=billm
This bug only landed on m-c on Nov 27, see bug 790338 comment 62, making it only affecting Firefox 20.
![]() |
Reporter | |
Comment 2•12 years ago
|
||
Comment 3•12 years ago
|
||
The test in bug 820349 initially had a similar signature and then changed to a different one. Could be related/duplicate.
Assigning to Jon since the regressing patch is his. Reassign if it makes sense.
Assignee: general → jcoppeard
Updated•12 years ago
|
Assignee: jcoppeard → terrence
Comment 5•12 years ago
|
||
Assertion failure: hasAllFlags(OBJECT_FLAG_DYNAMIC_MASK), at jsinfer.h:984
The first bad revision is:
changeset: 2140f915a307
user: Jason Orendorff
date: Fri Nov 30 16:46:24 2012 -0600
summary: Bug 816638 - Add nondeterministicGetWeakMapKeys to the JS shell. r=jcoppeard.
Updated•12 years ago
|
Summary: Crash [@ js::gc::ChunkBitmap::markIfUnmarked] or "Assertion failure: hasAllFlags(OBJECT_FLAG_DYNAMIC_MASK)," → Various crashes/assertions with gczeal(10) and random recursion
Comment 9•12 years ago
|
||
The regression in comment 5 seems bogus, given that it just added a shell-only function.
Yeah, I expect this was probably a regression from incremental sweeping. I don't think ASAN would help since it doesn't have any special powers related to GC-allocated memory.
Assignee: terrence → jcoppeard
Comment 11•12 years ago
|
||
Thanks Bill, was just about to make that comment. Not sure why it got assigned to me.
![]() |
Reporter | |
Comment 12•12 years ago
|
||
try {
g1 = this
} catch (e) {}
try {
a1 = new Array
} catch (e) {}
try {
a2 = g1.a1.slice(4, 5)
} catch (e) {}
try {
print([])
function x() {}("")
} catch (e) {}
try {
with({
d: gczeal(10, 2)
}) var r = p("", "")
t(s.e(r, e, ""))
r
} catch (e) {}
try {
for (let x in [f, y, f, y, y, y, y, function() {}, function() {}, function() {}, function() {},
f]) {
t('')
}
} catch (e) {}
try {
t((''.e(a, n)))
} catch (e) {}
try {
g = evalcx('')
function y() {
/j/
}
g
} catch (e) {}
try {
t0 = Uint8Array(a2)
} catch (e) {}
try {
c(u)()
} catch (e) {}
try {
v = t0.h
} catch (e) {}
try {
L: with((7)) {
r
}
} catch (e) {}
try {
t0 = Float32Array(5)
} catch (e) {}
try {
p("", "")
t(n(r.c(s)))
} catch (e) {}
try {
var r = p("", "")
var s = ""
t(s.h(r))
i
} catch (e) {}
try {
e.s(s)
} catch (e) {}
try {
e.e(a)
} catch (e) {}
try {
s.a[3] = b
} catch (e) {}
try {
for (var ruqais = 0; ruqais < 4 && (x); ++s) {
for (t = 0; h < 1; ++h) {
i.t()
}
}
} catch (e) {}
crashes js opt shell on m-c changeset 87f8165c5a0b on Mac 10.8 64-bit opt shell with -a and -d at js::gc::Cell::compartment
autoBisect shows this is probably related to the following changeset:
The first bad revision is:
changeset: 114222:481687026c08
user: Jon Coppeard
date: Tue Oct 16 12:28:32 2012 +0100
summary: Bug 790338 - Sweep debugger objects in the same group as their debugees r=billm
Assignee | ||
Comment 13•12 years ago
|
||
The problem seems to be a TypeObject is being finalized, but the weak reference to it in TypeCompartment::allocationSiteTable is not being removed.
I think this is related to the fact that the logic for marking and sweeping types depends on JSCompartment::activeAnalysis, and this can have a different value in different GC slices.
Assignee | ||
Comment 14•12 years ago
|
||
Previous comment relates to the original test case, I currently can't get either of the reduced or opt-only testcase to reproduce.
![]() |
Reporter | |
Comment 15•12 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #14)
> Previous comment relates to the original test case, I currently can't get
> either of the reduced or opt-only testcase to reproduce.
These gc-related testcases can be quite tricky to reproduce, reduced versions sometimes rely on the specific hardware / OS configuration used on the reduction machine.
Brian, can you give some help here?
(In reply to Jon Coppeard (:jonco) from comment #13)
> The problem seems to be a TypeObject is being finalized, but the weak
> reference to it in TypeCompartment::allocationSiteTable is not being removed.
>
> I think this is related to the fact that the logic for marking and sweeping
> types depends on JSCompartment::activeAnalysis, and this can have a
> different value in different GC slices.
Jon and I discussed this a little more, and it seems like the following happens:
1. Start GC with activeAnalysis = false.
2. Start sweeping.
3. JS code does something that sets activeAnalysis.
4. This causes us to reset the current GC. Since we're sweeping, that means we try to finish the current one. Since activeAnalysis is now true, we don't sweep the allocationSiteTable. However, we haven't marked enough stuff to guarantee that it doesn't contain objects that will be finalized.
One idea would be to use the old value of activeAnalysis in step 4. That would cause us to sweep dead stuff out of allocationSiteTable. However, we still could sweep type objects or singleton objects. I'm not sure if that's a problem, given that activeAnalysis is true. Could TI have witnessed (via a weak pointer) some GC thing that's about to be swept? Brian, can you answer that?
An alternative idea would be to have a more elaborate ResetIncrementalGC routine. It would finish sweeping the current compartment group but it would not sweep any further compartment groups. I think this should be safe, and it's also sort of a nice optimization.
I sort of like the second idea better.
Comment 17•12 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #16)
> One idea would be to use the old value of activeAnalysis in step 4. That
> would cause us to sweep dead stuff out of allocationSiteTable. However, we
> still could sweep type objects or singleton objects. I'm not sure if that's
> a problem, given that activeAnalysis is true. Could TI have witnessed (via a
> weak pointer) some GC thing that's about to be swept? Brian, can you answer
> that?
>
> An alternative idea would be to have a more elaborate ResetIncrementalGC
> routine. It would finish sweeping the current compartment group but it would
> not sweep any further compartment groups. I think this should be safe, and
> it's also sort of a nice optimization.
The main point of activeAnalysis is to prevent the destruction of analysis information when we are trying to analyze or compile a script. If we sweep the analysis information when activeAnalysis is set because it wasn't set when triggering the GC, then the analysis/compilation will break when it uses the destroyed information. If we instead keep the analysis information around, but sweep type objects / scripts / etc. as normal, that can also be problematic as the type constraints will freely refer to type objects, singleton JS objects and scripts without holding references on them.
I think the solution here is to do bug 772820. Just prohibit any GC activity while activeAnalysis is set --- allocating GC things while analyzing or compiling a script cannot trigger a last ditch GC or any incremental behavior. There is such a puny amount of GC thing allocation going on in these phases, especially compared to LifoAlloc allocation, that I don't see any value in trying to allow GCs here.
This would be a big simplification, and you should always be able to sweep the allocationSiteTable and other weak pointers in the same way --- the only time we will want to keep this information around is if the compartment isPreservingCode(), and that is information set at the beginning of the IGC and invariant throughout it (and which will ensure all type objects are marked and that sweeping the allocationSiteTable will not actually collect anything).
Assignee | ||
Comment 18•12 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #16)
(I wrote this before I refreshed and saw Brian's comment above)
I've experimented with both approaches, and they both gave the same results on the testcases given in this bug - there are no more crashes, but the testcases from comments 1, 6 and 7 exit with error "TypeError: rndElt(...) is not a function". I don't know whether this is expected, or because we have messed up something to do with type inference. The case from comment 5 exits cleanly.
On balance I agree that the second approach is better even if it is a bit more complication because this is an improvement anyway when we have to reset.
Assignee | ||
Comment 19•12 years ago
|
||
I think Brian's idea is the way to go to solve this bug, but Bill's suggested changes to ResetIncrementalGC would also be nice, perhaps done as a separate bug. My initial patch dies horribly on try though.
BTW, after today I'm away on PTO until Tuesday.
![]() |
Reporter | |
Comment 20•12 years ago
|
||
> on the testcases given in this bug - there are no more crashes, but the
> testcases from comments 1, 6 and 7 exit with error "TypeError: rndElt(...)
> is not a function". I don't know whether this is expected, or because we
> have messed up something to do with type inference. The case from comment 5
> exits cleanly.
This TypeError should be expected. The bug that is not expected is the crash/assert that is reported here.
> BTW, after today I'm away on PTO until Tuesday.
This means we can ignore topcrashes involving GC for these few days, I see a lot of related crashes at different signatures involving gczeal(10) from fuzzing results and they are highly likely related to the spike involving gc on topcrashes these few days.
cc'ing CrashKill folks.
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #20)
> This means we can ignore topcrashes involving GC for these few days, I see a
> lot of related crashes at different signatures involving gczeal(10) from
> fuzzing results and they are highly likely related to the spike involving gc
> on topcrashes these few days.
>
> cc'ing CrashKill folks.
This bug is pretty unlikely to cause crashes in practice. It requires a GC slice to run while TI is running, which should almost never happen.
Gary, can you elaborate about a GC crash spike? I haven't heard anything about that.
![]() |
Reporter | |
Comment 22•12 years ago
|
||
> This bug is pretty unlikely to cause crashes in practice. It requires a GC
> slice to run while TI is running, which should almost never happen.
>
> Gary, can you elaborate about a GC crash spike? I haven't heard anything
> about that.
https://wiki.mozilla.org/Platform/2012-12-11#Stability_Report_.5BWeekly.5D
I did get crashes related to gczeal(10), when unreduced show up as js::GCMarker::processMarkStackTop. This may be one of the bugs that contribute to that signature.
![]() |
Reporter | |
Comment 23•12 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #14)
> Previous comment relates to the original test case, I currently can't get
> either of the reduced or opt-only testcase to reproduce.
I can definitely reproduce the testcases on 64-bit opt shells on Mac 10.8 in comment 1, comment 5, comment 6, comment 7 and comment 12 (with -a and -d), on m-c changeset 020555b69f72.
(tested with --enable-more-deterministic builds, but I think it might also occur with builds that don't have --enable-more-deterministic.)
Assignee | ||
Comment 25•12 years ago
|
||
Patch to abort sweeping on reset, after finishing the current compartment group.
Fixes the crash on the supplied testcases (and that for bug 817444), but still testing on try: https://tbpl.mozilla.org/?tree=Try&rev=bd7b619721b7
Assignee | ||
Comment 26•12 years ago
|
||
Alternate approach that simply marks the types on reset if analysis has become active.
This fixes the problem, and unlike the previous patch it gets good results on the try server: https://tbpl.mozilla.org/?tree=Try&rev=a1df8d401124
Attachment #693484 -
Attachment is obsolete: true
Assignee | ||
Updated•12 years ago
|
Attachment #693921 -
Flags: review?(wmccloskey)
Attachment #693921 -
Flags: review?(wmccloskey) → review+
![]() |
Reporter | |
Updated•12 years ago
|
Keywords: checkin-needed
![]() |
Reporter | |
Comment 27•12 years ago
|
||
I could have helped land this myself, but mozilla-inbound is closed due to bustage. :(
![]() |
Reporter | |
Comment 28•12 years ago
|
||
Keywords: checkin-needed
![]() |
Reporter | |
Comment 29•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
![]() |
Reporter | |
Updated•12 years ago
|
Flags: in-testsuite?
Updated•12 years ago
|
Status: RESOLVED → VERIFIED
Comment 30•12 years ago
|
||
JSBugMon: This bug has been automatically verified fixed.
Updated•12 years ago
|
Updated•12 years ago
|
Whiteboard: [adv-main20-]
Updated•11 years ago
|
Group: core-security
You need to log in
before you can comment on or make changes to this bug.
Description
•