Closed Bug 1456512 Opened Last year Closed Last year

Assertion failure: !JS::CurrentThreadIsHeapCollecting(), at js/src/gc/Marking.cpp:3625 with grayRoots function

Categories

(Core :: JavaScript Engine, defect, P1, critical)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- fixed
firefox62 --- fixed

People

(Reporter: decoder, Assigned: sfink)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [jsbugmon:update])

Attachments

(3 files, 1 obsolete file)

The following testcase crashes on mozilla-central revision 26e53729a109 (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --disable-profiling --enable-debug --enable-optimize, run with --fuzzing-safe):

var wm = new WeakMap();
grayRoot().map = wm;
wm = null;
gczeal(13, 7);
var lfOffThreadGlobal = newGlobal();


Backtrace:

received signal SIGSEGV, Segmentation fault.
0x0000000000f24468 in JS::UnmarkGrayGCThingRecursively (thing=thing@entry=...) at js/src/gc/Marking.cpp:3625
#0  0x0000000000f24468 in JS::UnmarkGrayGCThingRecursively (thing=thing@entry=...) at js/src/gc/Marking.cpp:3625
#1  0x00000000004eb7bb in js::gc::TenuredCell::readBarrier (thing=0x7ffff4c8a4c0) at js/src/gc/Cell.h:397
#2  0x0000000000c8d604 in js::InternalBarrierMethods<js::UnownedBaseShape*>::readBarrier (v=<optimized out>) at js/src/gc/Barrier.h:269
#3  js::ReadBarrieredBase<js::UnownedBaseShape*>::read (this=0x7ffff4930b88) at js/src/gc/Barrier.h:593
#4  js::ReadBarriered<js::UnownedBaseShape*>::get (this=0x7ffff4930b88) at js/src/gc/Barrier.h:649
#5  js::ReadBarriered<js::UnownedBaseShape*>::operator js::UnownedBaseShape* const& (this=0x7ffff4930b88) at js/src/gc/Barrier.h:661
#6  JS::WeakCache<JS::GCHashSet<js::ReadBarriered<js::UnownedBaseShape*>, js::StackBaseShape, js::SystemAllocPolicy> >::entryNeedsSweep (prior=...) at /srv/jenkins/jobs/mozilla-central-build-jsshell/workspace/arch/64/compiler/gcc/instrumentation/none/type/debug/dist/include/js/GCHashTable.h:641
#7  JS::WeakCache<JS::GCHashSet<js::ReadBarriered<js::UnownedBaseShape*>, js::StackBaseShape, js::SystemAllocPolicy> >::Range::settle (this=this@entry=0x7fffffffce40) at dist/include/js/GCHashTable.h:671
#8  0x0000000000c8233d in JS::WeakCache<JS::GCHashSet<js::ReadBarriered<js::UnownedBaseShape*>, js::StackBaseShape, js::SystemAllocPolicy> >::Range::popFront (this=0x7fffffffce40) at dist/include/js/GCHashTable.h:664
#9  JS::Zone::checkBaseShapeTableAfterMovingGC (this=0x7ffff49ec000) at js/src/vm/Shape.cpp:1516
#10 0x0000000000ee88df in js::gc::CheckHashTablesAfterMovingGC (rt=rt@entry=0x7ffff5f19000) at js/src/gc/GC.cpp:8462
#11 0x0000000000f30a90 in js::Nursery::doCollection (this=this@entry=0x7ffff5f1c070, reason=reason@entry=JS::gcreason::DESTROY_RUNTIME, tenureCounts=...) at js/src/gc/Nursery.cpp:937
#12 0x0000000000f30e0f in js::Nursery::collect (this=0x7ffff5f1c070, reason=reason@entry=JS::gcreason::DESTROY_RUNTIME) at js/src/gc/Nursery.cpp:729
#13 0x0000000000ed83c4 in js::gc::GCRuntime::minorGC (this=0x7ffff5f19700, reason=JS::gcreason::DESTROY_RUNTIME, phase=<optimized out>) at js/src/gc/GC.cpp:7826
#14 0x0000000000eea802 in js::gc::GCRuntime::gcCycle (this=this@entry=0x7ffff5f19700, nonincrementalByAPI=nonincrementalByAPI@entry=true, budget=..., reason=reason@entry=JS::gcreason::DESTROY_RUNTIME) at js/src/gc/GC.cpp:7431
#15 0x0000000000eeaff5 in js::gc::GCRuntime::collect (this=this@entry=0x7ffff5f19700, nonincrementalByAPI=nonincrementalByAPI@entry=true, budget=..., reason=reason@entry=JS::gcreason::DESTROY_RUNTIME) at js/src/gc/GC.cpp:7622
#16 0x0000000000eeb399 in js::gc::GCRuntime::gc (this=this@entry=0x7ffff5f19700, gckind=gckind@entry=GC_NORMAL, reason=reason@entry=JS::gcreason::DESTROY_RUNTIME) at js/src/gc/GC.cpp:7692
#17 0x0000000000c3a904 in JSRuntime::destroyRuntime (this=0x7ffff5f19000) at js/src/vm/Runtime.cpp:315
#18 0x0000000000b8bc71 in js::DestroyContext (cx=0x7ffff5f15000) at js/src/vm/JSContext.cpp:201
#19 0x0000000000a0ab8a in JS_DestroyContext (cx=<optimized out>) at js/src/jsapi.cpp:496
#20 0x00000000004442b1 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at js/src/shell/js.cpp:9254
rax	0x0	0
rbx	0x7ffff4c8a4c0	140737300178112
rcx	0x7ffff6c282ad	140737333330605
rdx	0x0	0
rsi	0x7ffff6ef7770	140737336276848
rdi	0x7ffff6ef6540	140737336272192
rbp	0x7fffffffcd50	140737488342352
rsp	0x7fffffffcd30	140737488342320
r8	0x7ffff6ef7770	140737336276848
r9	0x7ffff7fe4780	140737354024832
r10	0x58	88
r11	0x7ffff6b9e7a0	140737332766624
r12	0x7ffff4c8a4c7	140737300178119
r13	0x7fffffffcda0	140737488342432
r14	0x7ffff4930b88	140737296665480
r15	0x0	0
rip	0xf24468 <JS::UnmarkGrayGCThingRecursively(JS::GCCellPtr)+168>
=> 0xf24468 <JS::UnmarkGrayGCThingRecursively(JS::GCCellPtr)+168>:	movl   $0x0,0x0
   0xf24473 <JS::UnmarkGrayGCThingRecursively(JS::GCCellPtr)+179>:	ud2


Also likely shell-only.
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:update]
JSBugMon: Bisection requested, result:
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/95f01f3075fe
user:        Steve Fink
date:        Wed Mar 15 17:03:42 2017 -0700
summary:     Bug 1337209 - Add JS shell test mechanism for gray marking, r=jonco

This iteration took 1.873 seconds to run.
Flags: needinfo?(sphink)
Priority: -- → P1
I will look at this (and bug 1456518) Real Soon Now, but my immediate feeling is that this is likely to be a test-only problem.
I'll confess I didn't think too hard about this one.
Attachment #8976740 - Flags: review?(jcoppeard)
Assignee: nobody → sphink
Status: NEW → ASSIGNED
Comment on attachment 8976740 [details] [diff] [review]
Do not unmark gray during minor collections

Review of attachment 8976740 [details] [diff] [review]:
-----------------------------------------------------------------

This is fine.  We shouldn't trigger barriers when we're inside the collector.

The thing that's puzzling me is why this happens... we have overloaded opertator== for ReadBarriered<T> so that it doesn't trigger the barriers.  But this assertion doesn't call our overloaded version:

https://searchfox.org/mozilla-central/source/js/public/GCHashTable.h#641

If I change this line to call ::operator== directly (I have to create a parallel copy of the class because it doesn't work for all types) then it works.

This needs further thought.
Attachment #8976740 - Flags: review?(jcoppeard) → review+
(In reply to Jon Coppeard (:jonco) from comment #4)
> Comment on attachment 8976740 [details] [diff] [review]
> Do not unmark gray during minor collections
>
> Review of attachment 8976740 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> This is fine.  We shouldn't trigger barriers when we're inside the collector.
>
> The thing that's puzzling me is why this happens... we have overloaded
> opertator== for ReadBarriered<T> so that it doesn't trigger the barriers.
> But this assertion doesn't call our overloaded version:
>
> https://searchfox.org/mozilla-central/source/js/public/GCHashTable.h#641
>
> If I change this line to call ::operator== directly (I have to create a
> parallel copy of the class because it doesn't work for all types) then it
> works.
>
> This needs further thought.

I got intrigued by this question and had to figure it out.

What's going on is that the outer ::operator== is getting shadowed by a closer namespace. Operator overloads are searched for twice: first, by starting at the current scope and moving outwards until you reach the global scope, and then through Koenig lookup (aka Argument Dependent Lookup) that looks in the namespace of any arguments to a function, but only in those exact namespaces. We can ignore the Koenig lookup here since there is no operator== defined in the js::ReadBarriered scope.

The key with the first search is that it terminates as soon as any operator== is found, not just when an applicable overload is found. In our case, we unfortunately have a JS::operator== for GCCellPtr that shadows ::operator==. Removing that fixes the problem.

A more robust alternative for this particular case might be to define an operator== within the ReadBarriered class, since that would make Koenig lookup work. But now that I know that JS::operator== is a footgun, I want to get rid of it. And once that's done, there technically isn't a problem anymore...
Attachment #8977113 - Flags: review?(jcoppeard)
(In reply to Steve Fink [:sfink] [:s:] (PTO June 31) from comment #5)
> Operator overloads are searched for twice: first, by
> starting at the current scope and moving outwards until you reach the global
> scope ... The key with the first search is that it terminates as soon as any
> operator== is found, not just when an applicable overload is found.

Aha, I did not know that!  Thanks for figuring this out.
Attachment #8977113 - Flags: review?(jcoppeard) → review+
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5d8bf331bc54
Do not unmark gray during minor collections, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/2698e931d8b8
Move JS::operator==(GCCellPtr,GCCellPtr) to the global scope to avoid shadowing, r=jonco
https://hg.mozilla.org/mozilla-central/rev/5d8bf331bc54
https://hg.mozilla.org/mozilla-central/rev/2698e931d8b8
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Flags: needinfo?(sphink)
Is this something that can ride the trains or should we be considering it for backport?
Flags: needinfo?(sphink)
Duplicate of this bug: 1463421
(In reply to Ryan VanderMeulen [:RyanVM] from comment #9)
> Is this something that can ride the trains or should we be considering it
> for backport?

Given that multiple fuzzers (see bug 1463421) seem to be encountering this, I think I'll request backport.
Flags: needinfo?(sphink)
Attachment #8979771 - Attachment is obsolete: true
Comment on attachment 8979811 [details] [diff] [review]
Fixes for unwanted read barriers during minor GC: move JS::operator==(GCCellPtr,GCCellPtr) to the global scope to avoid shadowing, and add a dynamic check

Approval Request Comment
[Feature/Bug causing the regression]: it has been around for a while
[User impact if declined]: possibility of heap corruption
[Is this code covered by automated tests?]: tests included in this patch
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: this is just causing some stuff to not happen in a situation where it isn't needed, both by removing the call to it, and also dynamically checking whether we're in that situation. It really really ain't gonna happen after this.
[String changes made/needed]: none
Attachment #8979811 - Flags: approval-mozilla-beta?
Comment on attachment 8979811 [details] [diff] [review]
Fixes for unwanted read barriers during minor GC: move JS::operator==(GCCellPtr,GCCellPtr) to the global scope to avoid shadowing, and add a dynamic check

Avoid possible heap corruption and easily-hit bugs encountered by fuzzers. Approved for 61.0b8.
Attachment #8979811 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Do we need this on esr60?
Flags: needinfo?(sphink)
Comment on attachment 8979811 [details] [diff] [review]
Fixes for unwanted read barriers during minor GC: move JS::operator==(GCCellPtr,GCCellPtr) to the global scope to avoid shadowing, and add a dynamic check

I'm not sure, since I don't know how important eg fuzzing is. So I'll just request approval and let you decide.

[Approval Request Comment]
> If this is not a sec:{high,crit} bug, please state case for ESR consideration:

This interferes with fuzzing, as it's fairly easy to hit. It shouldn't affect the released binary if I'm not mistaken; if you run into this case, you just won't find anything gray.

> User impact if declined: 

None

> Fix Landed on Version:

61

> Risk to taking this patch (and alternatives if risky): 

None to speak of. General code churn? It's a small amount of code, and the end result definitely feels less risky to me.
Flags: needinfo?(sphink)
Attachment #8979811 - Flags: approval-mozilla-esr60?
Comment on attachment 8979811 [details] [diff] [review]
Fixes for unwanted read barriers during minor GC: move JS::operator==(GCCellPtr,GCCellPtr) to the global scope to avoid shadowing, and add a dynamic check

Christian doesn't think we need this, and the issue wouldn't affect end users, so let's leave esr60 as-is.
Attachment #8979811 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60-
You need to log in before you can comment on or make changes to this bug.