Closed
Bug 844769
Opened 12 years ago
Closed 8 years ago
TSan: Thread data race in AutoTraceSession::AutoTraceSession(JSRuntime *rt, js::HeapState heapState) vs. JSRuntime::isHeapBusy()
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
RESOLVED
FIXED
mozilla51
Tracking | Status | |
---|---|---|
firefox51 | --- | fixed |
People
(Reporter: posidron, Assigned: terrence)
References
(Blocks 1 open bug)
Details
(Whiteboard: [tsan][tsan-test-blocker])
Attachments
(6 files, 3 obsolete files)
4.91 KB,
text/plain
|
Details | |
19.21 KB,
patch
|
jonco
:
review+
|
Details | Diff | Splinter Review |
10.45 KB,
text/plain
|
Details | |
11.28 KB,
text/plain
|
Details | |
8.78 KB,
patch
|
sfink
:
review+
|
Details | Diff | Splinter Review |
3.87 KB,
patch
|
terrence
:
review+
|
Details | Diff | Splinter Review |
During Firefox start-up with ThreadSanitizer (LLVM version), we get a data race reported as described in the attached log. Trace was created on mozilla-central with changeset 122820:c233837cce08.
According to the TSan devs, most of the reported traces should be real data races, even though they can be "benign". We need to determine if the race can/should be fixed, or put on the ignore list. Even for benign races, TSan devs suggest to fix them (second priority), as they can also cause problems [1].
[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
Comment 1•8 years ago
|
||
This one still shows up on jit-tests. I'll attach a new trace.
Component: General → JavaScript: GC
Comment 2•8 years ago
|
||
The background parse thread is here:
bool isHeapBusy() const { return heapState_ != JS::HeapState::Idle; }
And the main thread writes to heapState_ in AutoTraceSession::AutoTraceSession.
Attachment #717812 -
Attachment is obsolete: true
Flags: needinfo?(terrence)
Assignee | ||
Comment 3•8 years ago
|
||
According to the GC analysis, the one path from defaultNewGroup to isHeapBusy is via:
#739477 = js::ObjectGroup* js::ObjectGroup::defaultNewGroup(js::ExclusiveContext*, js::Class*, js::TaggedProto, JSObject*)
#758165 = js::ObjectGroup* js::ObjectGroupCompartment::makeGroup(js::ExclusiveContext*, js::Class*, class JS::Handle<js::TaggedProto>, uint32)
#753457 = js::ObjectGroup* js::Allocate(js::ExclusiveContext*) [with T = js::ObjectGroup; js::AllowGC allowGC = (js::AllowGC)1u]
#752868 = uint8 js::gc::GCRuntime::checkAllocatorState(JSContext*, int32) [with js::AllowGC allowGC = (js::AllowGC)1u]
#5293 = uint8 JS::shadow::Runtime::isHeapBusy() const
Flags: needinfo?(terrence)
Assignee | ||
Comment 4•8 years ago
|
||
Note that that should not actually be possible: the background thread uses an ExclusiveContext, which should cause us to skip the checkAllocatorState [1]. This is enforced by the type system, so I'm not sure what's going on here. Maybe a debug build would give us a cleaner stack?
1- https://dxr.mozilla.org/mozilla-central/source/js/src/gc/Allocator.cpp#135-139
Comment 5•8 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #4)
> Maybe a debug build would give us a cleaner stack?
Good point. It's in TenuredCell::readBarrier (did the analysis miss this path?):
#0 JS::shadow::Runtime::isHeapBusy() const jspubtd.h:164 (js+0x000100d43722)
#1 js::InternalBarrierMethods<js::ObjectGroup*>::readBarrier(js::ObjectGroup*) Heap.h:1279 (js+0x000100dd5e08)
#2 js::ReadBarrieredBase<js::ObjectGroup*>::read() const Barrier.h:553 (js+0x000100dd5dbe)
#3 js::ReadBarriered<js::ObjectGroup*>::get() const Barrier.h:608 (js+0x000100dd5d10)
#4 js::ReadBarriered<js::ObjectGroup*>::operator js::ObjectGroup* const() const Barrier.h:620 (js+0x000100daeaf8)
#5 js::ObjectGroup::defaultNewGroup(js::ExclusiveContext*, js::Class const*, js::TaggedProto, JSObject*) ObjectGroup.cpp:495 (js+0x000100dade09)
#6 js::NewObjectWithClassProtoCommon(js::ExclusiveContext*, js::Class const*, JS::Handle<JSObject*>, js::gc::AllocKind, js::NewObjectKind) jsobj.cpp:795 (js+0x000100a4642e)
#7 js::NewObjectWithClassProto(js::ExclusiveContext*, js::Class const*, JS::Handle<JSObject*>, js::gc::AllocKind, js::NewObjectKind) jsobjinlines.h:703 (js+0x000100017b52)
Flags: needinfo?(terrence)
Assignee | ||
Comment 6•8 years ago
|
||
Does the traverse.py analysis only give us one path from A to B?
Flags: needinfo?(terrence) → needinfo?(sphink)
Assignee | ||
Comment 7•8 years ago
|
||
I think the best way to address this is to finally make the on-main-thread assertion we've wanted to have forever. I think if we add a custom ReadBarriered::get that takes the ExclusiveContext and does the get() || unbarrieredGet() internally, this shouldn't be so cumbersome. We do need to go out of line for this, so hopefully it won't be a slowdown. If it is, we can always open code it.
Assignee: nobody → terrence
Assignee | ||
Comment 8•8 years ago
|
||
Note that this is not a sec issue: we never fire the barrier from OMT so our correctness does not depend at all on the value we're reading.
Keywords: sec-want
Comment 9•8 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #6)
> Does the traverse.py analysis only give us one path from A to B?
If that's what you ask for. You can also use "routes from <X> to <Y>" and it'll give you multiple routes, making some attempt to make the routes look different. In this case, it still only displayed that one route, which means it is all it could find.
As for *why* it didn't see any other paths, it looks like the callgraph.txt I had lying around is too old. Sorry about that. But if I update, I still don't get the above path, which seems like it's a bug in my multiple route processing, since if I trace it through one link at a time it now has the full path. (It isn't terribly useful to print out *all* paths, since for most cases there are an exponential number of distinct ones. So some cleverness is required. Apparently, more than I have implemented at the moment.)
But I can fake it.
(Cmd) route from #1125956 to isHeapBusy avoiding ObjectGroupCompartment::makeGroup
Path from #1125956 to #3050: avoiding [1180999]
#1125956 = _ZN2js11ObjectGroup15defaultNewGroupEPNS_16ExclusiveContextEPKNS_5ClassENS_11TaggedProtoEP8JSObject$js::ObjectGroup* js::ObjectGroup::defaultNewGroup(js::ExclusiveContext*, js::Class*, js::TaggedProto, JSObject*)
#1099478 = _ZN2js11ObjectGroup12setTypeDescrEPNS_9TypeDescrE$void js::ObjectGroup::setTypeDescr(js::TypeDescr*)
#1099465 = _ZN2js11ObjectGroup11setAddendumENS0_12AddendumKindEPvb$void js::ObjectGroup::setAddendum(uint32, void*, uint8)
#1186658 = _ZN2js34PreliminaryObjectArrayWithTemplate15writeBarrierPreEPS0_$void js::PreliminaryObjectArrayWithTemplate::writeBarrierPre(js::PreliminaryObjectArrayWithTemplate*)
#3050 = _ZNK2JS6shadow7Runtime10isHeapBusyEv$uint8 JS::shadow::Runtime::isHeapBusy() const
(Cmd) route from #1125956 to isHeapBusy avoiding ObjectGroupCompartment::makeGroup and setTypeDescr
Path from #1125956 to #3050: avoiding [1180999, 1099478]
#1125956 = _ZN2js11ObjectGroup15defaultNewGroupEPNS_16ExclusiveContextEPKNS_5ClassENS_11TaggedProtoEP8JSObject$js::ObjectGroup* js::ObjectGroup::defaultNewGroup(js::ExclusiveContext*, js::Class*, js::TaggedProto, JSObject*)
#1186716 = _ZN2js14MallocProviderINS_16ExclusiveContextEE4new_INS_22ObjectGroupCompartment8NewTableEIPN2JS4ZoneEEEEPT_DpOT0_$T* js::MallocProvider<Client>::new_(Args&& ...) [with T = js::ObjectGroupCompartment::NewTable; Args = {JS::Zone*}; Client = js::ExclusiveContext]
#1116372 = _ZN2js14MallocProviderINS_16ExclusiveContextEE10pod_mallocIhEEPT_m$T* js::MallocProvider<Client>::pod_malloc(size_t) [with T = unsigned char; Client = js::ExclusiveContext; size_t = long unsigned int]
#1101809 = _ZN2js16ExclusiveContext13onOutOfMemoryENS_13AllocFunctionEmPv$void* js::ExclusiveContext::onOutOfMemory(int32, uint64, void*)
#1101811 = _ZN9JSRuntime13onOutOfMemoryEN2js13AllocFunctionEmPvP9JSContext$void* JSRuntime::onOutOfMemory(int32, uint64, void*, JSContext*)
#3050 = _ZNK2JS6shadow7Runtime10isHeapBusyEv$uint8 JS::shadow::Runtime::isHeapBusy() const
(Cmd) route from #1125956 to isHeapBusy avoiding ObjectGroupCompartment::makeGroup and setTypeDescr and #1186716
Path from #1125956 to #3050: avoiding [1180999, 1099478, 1186716]
#1125956 = _ZN2js11ObjectGroup15defaultNewGroupEPNS_16ExclusiveContextEPKNS_5ClassENS_11TaggedProtoEP8JSObject$js::ObjectGroup* js::ObjectGroup::defaultNewGroup(js::ExclusiveContext*, js::Class*, js::TaggedProto, JSObject*)
#1186728 = _ZN2js13TypeNewScript4makeEP9JSContextPNS_11ObjectGroupEP10JSFunction$uint8 js::TypeNewScript::make(JSContext*, js::ObjectGroup*, JSFunction*)
#1099464 = _ZN2js11ObjectGroup12setNewScriptEPNS_13TypeNewScriptE$void js::ObjectGroup::setNewScript(js::TypeNewScript*)
#1099465 = _ZN2js11ObjectGroup11setAddendumENS0_12AddendumKindEPvb$void js::ObjectGroup::setAddendum(uint32, void*, uint8)
#1186658 = _ZN2js34PreliminaryObjectArrayWithTemplate15writeBarrierPreEPS0_$void js::PreliminaryObjectArrayWithTemplate::writeBarrierPre(js::PreliminaryObjectArrayWithTemplate*)
#3050 = _ZNK2JS6shadow7Runtime10isHeapBusyEv$uint8 JS::shadow::Runtime::isHeapBusy() const
(Cmd) route from #1125956 to isHeapBusy avoiding ObjectGroupCompartment::makeGroup and setTypeDescr and #1186716 and #1186728
Path from #1125956 to #3050: avoiding [1180999, 1099478, 1186716, 1186728]
#1125956 = _ZN2js11ObjectGroup15defaultNewGroupEPNS_16ExclusiveContextEPKNS_5ClassENS_11TaggedProtoEP8JSObject$js::ObjectGroup* js::ObjectGroup::defaultNewGroup(js::ExclusiveContext*, js::Class*, js::TaggedProto, JSObject*)
#1186723 = _ZNK2js13ReadBarrieredIPNS_11ObjectGroupEEcvKS2_Ev$js::ReadBarriered<T>::operator const T() const [with T = js::ObjectGroup*]
#1187231 = _ZNK2js13ReadBarrieredIPNS_11ObjectGroupEE3getEv$const T js::ReadBarriered<T>::get() const [with T = js::ObjectGroup*]
#1187517 = _ZNK2js17ReadBarrieredBaseIPNS_11ObjectGroupEE4readEv$void js::ReadBarrieredBase<T>::read() const [with T = js::ObjectGroup*]
#1187683 = _ZN2js22InternalBarrierMethodsIPNS_11ObjectGroupEE11readBarrierES2_$static void js::InternalBarrierMethods<T*>::readBarrier(T*) [with T = js::ObjectGroup]
#1098443 = _ZN2js2gc11TenuredCell11readBarrierEPS1_$void js::gc::TenuredCell::readBarrier(js::gc::TenuredCell*)
#3050 = _ZNK2JS6shadow7Runtime10isHeapBusyEv$uint8 JS::shadow::Runtime::isHeapBusy() const
Ooh, at last the one jandem mentioned above. Any more?
(Cmd) route from #1125956 to isHeapBusy avoiding ObjectGroupCompartment::makeGroup and setTypeDescr and #1186716 and #1186728 and #1186723
Path from #1125956 to #3050: avoiding [1180999, 1099478, 1186716, 1186728, 1186723]
#1125956 = _ZN2js11ObjectGroup15defaultNewGroupEPNS_16ExclusiveContextEPKNS_5ClassENS_11TaggedProtoEP8JSObject$js::ObjectGroup* js::ObjectGroup::defaultNewGroup(js::ExclusiveContext*, js::Class*, js::TaggedProto, JSObject*)
#1110236 = _ZN2js17AddTypePropertyIdEPNS_16ExclusiveContextEPNS_11ObjectGroupEP8JSObject4jsidNS_7TypeSet4TypeE$void js::AddTypePropertyId(js::ExclusiveContext*, js::ObjectGroup*, JSObject*, jsid, js::TypeSet::Type)
#1110265 = _ZN2js17ConstraintTypeSet7addTypeEPNS_16ExclusiveContextENS_7TypeSet4TypeE$void js::ConstraintTypeSet::addType(js::ExclusiveContext*, js::TypeSet::Type)
#1204170 = _ZN2js17ConstraintTypeSet16postWriteBarrierEPNS_16ExclusiveContextENS_7TypeSet4TypeE$void js::ConstraintTypeSet::postWriteBarrier(js::ExclusiveContext*, js::TypeSet::Type)
#1204171 = _ZN2js2gc11StoreBuffer10putGenericI10TypeSetRefEEvRKT_$void js::gc::StoreBuffer::putGeneric(TypeSetRef*) [with T = TypeSetRef]
#1205212 = _ZN2js2gc11StoreBuffer3putINS1_13GenericBufferE10TypeSetRefEEvRT_RKT0_$void js::gc::StoreBuffer::put(js::gc::StoreBuffer::GenericBuffer*, TypeSetRef*) [with Buffer = js::gc::StoreBuffer::GenericBuffer; Edge = TypeSetRef]
#3050 = _ZNK2JS6shadow7Runtime10isHeapBusyEv$uint8 JS::shadow::Runtime::isHeapBusy() const
...and I can keep going. You can get it through a pre barrier. Calling setDelegate. Here's a fun one:
(Cmd) route from #1125956 to isHeapBusy avoiding ObjectGroupCompartment::makeGroup and setTypeDescr and #1186716 and #1186728 and #1186723 and #1110236 and #1099867 and #1186719
Path from #1125956 to #3050: avoiding [1180999, 1099478, 1186716, 1186728, 1186723, 1110236, 1099867, 1186719]
#1125956 = _ZN2js11ObjectGroup15defaultNewGroupEPNS_16ExclusiveContextEPKNS_5ClassENS_11TaggedProtoEP8JSObject$js::ObjectGroup* js::ObjectGroup::defaultNewGroup(js::ExclusiveContext*, js::Class*, js::TaggedProto, JSObject*)
#1099781 = _ZN2jsL8NameToIdEPNS_12PropertyNameE$String.h:jsid js::NameToId(js::PropertyName*)
#13302 = _ZL24NON_INTEGER_ATOM_TO_JSIDP6JSAtom$jsfriendapi.h:jsid NON_INTEGER_ATOM_TO_JSID(JSAtom*)
#13304 = _ZN2js6detail13IdMatchesAtomE4jsidP6JSAtom$uint8 js::detail::IdMatchesAtom(jsid, JSAtom*)
#15290 = _Z23INTERNED_STRING_TO_JSIDP9JSContextP8JSString$jsid INTERNED_STRING_TO_JSID(JSContext*, JSString*)
#902801 = _Z22JS_StringHasBeenPinnedP9JSContextP8JSString$uint8 JS_StringHasBeenPinned(JSContext*, JSString*)
#1178664 = _ZN2js16AssertHeapIsIdleEP9JSContext$void js::AssertHeapIsIdle(JSContext*)
#1184273 = _ZN2js16AssertHeapIsIdleEP9JSRuntime$void js::AssertHeapIsIdle(JSRuntime*)
#3050 = _ZNK2JS6shadow7Runtime10isHeapBusyEv$uint8 JS::shadow::Runtime::isHeapBusy() const
There are a few more before it fails to find any more. But in short: yes, you can get to isHeapBusy from defaultNewGroup.
Flags: needinfo?(sphink)
Assignee | ||
Comment 10•8 years ago
|
||
The reason that I allowed readBarrier from OMT originally was that I could not figure out how to make rekey use the move constructor instead of the copy constructor. The trick is that perfect forwarding requires a new template name for each and every level forwarded. By deleting the copy constructor in InitialShapeEntry, I was able to track down and fix all of the remaining places in HashTable that did not perfectly forward. With that, rekey can use a mozilla::Move'd value and we can successfully avoid triggering read barriers from OMT.
The next patch adds the actual OMT assertion to readBarrier and fixes up all of the places that we are currently triggering it from the parse thread. (At least all that I was able to turn up with the shell test suites.)
Attachment #8764437 -
Flags: review?(jcoppeard)
Assignee | ||
Comment 11•8 years ago
|
||
Assignee | ||
Comment 12•8 years ago
|
||
Attachment #8764439 -
Flags: review?(jcoppeard)
Comment 13•8 years ago
|
||
Comment on attachment 8764439 [details] [diff] [review]
do_not_fire_read_barriers_omt-v0.diff
Review of attachment 8764439 [details] [diff] [review]:
-----------------------------------------------------------------
Passing the context to get() is a bit of a pain, but it's great that we can now assert that we don't trigger barriers off main thread.
::: js/src/gc/Barrier.h
@@ +609,1 @@
>
I don't see how the copy assignment operator (just before here in the file) triggers the read barrier:
ReadBarriered& operator=(const ReadBarriered& v) {
T prior = this->value;
this->value = v.value;
this->post(prior, v.value);
return *this;
}
Doesn't this need to call get() or read() somewhere?
::: js/src/jsatominlines.h
@@ +29,5 @@
>
> inline JSAtom*
> js::AtomStateEntry::asPtrUnbarriered() const
> {
> MOZ_ASSERT(bits != 0);
This seems to only be used during GC. Can we add an assertion to that effect?
::: js/src/vm/Shape.h
@@ +1107,5 @@
> + }
> +
> + InitialShapeEntry& operator=(InitialShapeEntry&& other) {
> + shape = other.shape.unbarrieredGet();
> + proto = other.proto.unbarrieredGet();
I'm a bit confused by this, because I can't see how assigning the raw T to a ReadBarriered<T> will work. Whatever though, I think we want to give ReadBarriered a move assignment operator taking a T&& and do |x = mozilla::Move(other.x)| here for these.
Attachment #8764439 -
Flags: review?(jcoppeard) → review+
Comment 14•8 years ago
|
||
Comment on attachment 8764437 [details] [diff] [review]
use_perfect_forwarding_for_rekey-v0.diff
Review of attachment 8764437 [details] [diff] [review]:
-----------------------------------------------------------------
I don't understand this (see below).
::: js/public/HashTable.h
@@ +1800,5 @@
> MOZ_ASSERT(table);
> mozilla::ReentrancyGuard g(*this);
> MOZ_ASSERT(p.found());
> + using Entry = typename HashTableEntry<T>::NonConstT;
> + Entry copy(mozilla::Move(const_cast<Entry&>(*p)));
Why does this require a const_cast when it didn't before?
@@ +1801,5 @@
> mozilla::ReentrancyGuard g(*this);
> MOZ_ASSERT(p.found());
> + using Entry = typename HashTableEntry<T>::NonConstT;
> + Entry copy(mozilla::Move(const_cast<Entry&>(*p)));
> + HashPolicy::setKey(copy, const_cast<Key&>(k));
I don't understand what's going on here. We've forwarded the key all the way down to this method and then we just cast it to a reference. What am I missing?
Assignee | ||
Comment 15•8 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #13)
> Comment on attachment 8764439 [details] [diff] [review]
> do_not_fire_read_barriers_omt-v0.diff
>
> Review of attachment 8764439 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Passing the context to get() is a bit of a pain, but it's great that we can
> now assert that we don't trigger barriers off main thread.
>
> ::: js/src/gc/Barrier.h
> @@ +609,1 @@
> >
>
> I don't see how the copy assignment operator (just before here in the file)
> triggers the read barrier:
>
> ReadBarriered& operator=(const ReadBarriered& v) {
> T prior = this->value;
> this->value = v.value;
> this->post(prior, v.value);
> return *this;
> }
>
> Doesn't this need to call get() or read() somewhere?
Yeah, this looks like a bug. I didn't see it because InitialShapeEntry does not have copy-assign, only copy-construct. The copy constructor that it uses on it's children does have the right barrier logic. I'll make this use v.get() and fixup anything else that breaks.
> ::: js/src/jsatominlines.h
> @@ +29,5 @@
> >
> > inline JSAtom*
> > js::AtomStateEntry::asPtrUnbarriered() const
> > {
> > MOZ_ASSERT(bits != 0);
>
> This seems to only be used during GC. Can we add an assertion to that
> effect?
That's not true anymore. I changed AtomHasher::match to use the unbarriered variant on the entries we're checking as we don't want incidental usage in the hashtable to expand the live set. I think this is a pre-existing bug.
> ::: js/src/vm/Shape.h
> @@ +1107,5 @@
> > + }
> > +
> > + InitialShapeEntry& operator=(InitialShapeEntry&& other) {
> > + shape = other.shape.unbarrieredGet();
> > + proto = other.proto.unbarrieredGet();
>
> I'm a bit confused by this, because I can't see how assigning the raw T to a
> ReadBarriered<T> will work. Whatever though, I think we want to give
> ReadBarriered a move assignment operator taking a T&& and do |x =
> mozilla::Move(other.x)| here for these.
It avoids the barrier by using |MOZ_IMPLICIT ReadBarriered(const T& v)|, which appears to be a tighter match than |ReadBarriered& operator=(const ReadBarriered& v)|. I cribbed this from rekey where we don't have the option of Move, but you're quite right: I've updated these to use the move constructor instead, which appears to work just as well.
Assignee | ||
Comment 16•8 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #14)
> Comment on attachment 8764437 [details] [diff] [review]
> use_perfect_forwarding_for_rekey-v0.diff
>
> Review of attachment 8764437 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> I don't understand this (see below).
>
> ::: js/public/HashTable.h
> @@ +1800,5 @@
> > MOZ_ASSERT(table);
> > mozilla::ReentrancyGuard g(*this);
> > MOZ_ASSERT(p.found());
> > + using Entry = typename HashTableEntry<T>::NonConstT;
> > + Entry copy(mozilla::Move(const_cast<Entry&>(*p)));
>
> Why does this require a const_cast when it didn't before?
Because we need to use the move constructor to avoid the barriers. The copy takes a const&, so can source from the const Entry here by taking a ref. The move constructor cannot take a const, however, so we have to cast it out here.
This actually turns up an issue that I didn't catch before though: later in this function we remove(*p.entry_) after having moved it. From what I understand about rvalue references, this is actually allowed, if not particularly clear. Move isn't really "move", it's more like a new kind of reference/tag that every one agrees should move, but it doesn't have actual poisoning semantics itself. And indeed, it appears to work fine when used like this, as long as none of the rekeyable keys do poisoning on "move".
> @@ +1801,5 @@
> > mozilla::ReentrancyGuard g(*this);
> > MOZ_ASSERT(p.found());
> > + using Entry = typename HashTableEntry<T>::NonConstT;
> > + Entry copy(mozilla::Move(const_cast<Entry&>(*p)));
> > + HashPolicy::setKey(copy, const_cast<Key&>(k));
>
> I don't understand what's going on here. We've forwarded the key all the
> way down to this method and then we just cast it to a reference. What am I
> missing?
The key was only part of the issue. The majority of the new perfect forwarding is actually so that we can create |copy| using the move constructor instead of degrading to the copy constructor. The setKey here actually just dispatches to HashPolicy::rekey, so we lose the at some point anyway. I've just hacked InitialShapeSet's rekey to avoid the barriers in this particular case.
Actually(!), thinking about it again, we don't have to degrade here if we continue perfectly forwarding all the way into rekey. I was thinking we'd have to change a ton of rekey implementations, but it should just degrade if they take const&, which I think most do. At least it seems to compile alright, I guess I'll send it to try after I iron out the setLive issue.
This stuff is stupid subtle. At least we've got an assertion backstopping it.
Assignee | ||
Comment 17•8 years ago
|
||
Comment 18•8 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #16)
> > > + using Entry = typename HashTableEntry<T>::NonConstT;
> > > + Entry copy(mozilla::Move(const_cast<Entry&>(*p)));
> >
> > Why does this require a const_cast when it didn't before?
>
> Because we need to use the move constructor to avoid the barriers. The copy
> takes a const&, so can source from the const Entry here by taking a ref. The
> move constructor cannot take a const, however, so we have to cast it out
> here.
Oh OK, because T is |const KeyType| for HashSet. So I guess that Move didn't actually do anything before.
> This actually turns up an issue that I didn't catch before though: later in
> this function we remove(*p.entry_) after having moved it. From what I
> understand about rvalue references, this is actually allowed, if not
> particularly clear. Move isn't really "move", it's more like a new kind of
> reference/tag that every one agrees should move, but it doesn't have actual
> poisoning semantics itself.
Well it's down to the type what happens when you move it. IIUC post-move the object must be destructible but not release any originally owned resources.
> And indeed, it appears to work fine when used
> like this, as long as none of the rekeyable keys do poisoning on "move".
So this looks bad since we are accessing something after we move it, and even if that works it relies on the implementation of the move constructors not to poison/null the moved object's contents out after the move.
However I then realised that this is not the case since we move the T stored in the HashTableEntry<T>, but we call remove() with a reference to the outer HashTableEntry<T>.
I think this is fine regardless of how move is implemented, but we could change the code to extract a pointer to this first so it is more clear what's happening.
Assignee | ||
Comment 19•8 years ago
|
||
Oh, good point! You're right: it doesn't even matter as long as we're not violating move semantics.
Assignee | ||
Comment 20•8 years ago
|
||
Assignee | ||
Comment 21•8 years ago
|
||
Comment 22•8 years ago
|
||
Comment on attachment 8764437 [details] [diff] [review]
use_perfect_forwarding_for_rekey-v0.diff
Review of attachment 8764437 [details] [diff] [review]:
-----------------------------------------------------------------
Cancelling review while waiting for new patch.
Attachment #8764437 -
Flags: review?(jcoppeard) → feedback+
Comment 23•8 years ago
|
||
Ping .. is there any progess here?
Assignee | ||
Comment 24•8 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #23)
> Ping .. is there any progess here?
Sadly, no. The TreeHerder failures on jstestbrowser were not trivial to reproduce locally and I've been distracted tracking down similar failures in bug 1237058. I'm almost done there though, at which point I can get back to this. Sorry for the delay.
Assignee | ||
Comment 25•8 years ago
|
||
Assignee | ||
Comment 26•8 years ago
|
||
Assignee | ||
Comment 27•8 years ago
|
||
This is suffering from exactly the same symptoms as those in bug 1237058. Which is *very* odd as the mechanisms being poked are extremely different.
The attachments I've just attached are are from before and after the patches here, running on top of the patch from bug 1285605, piped through grep to collect only the "Compartments Collected" lines, then run through grep -v to strip out the DOM_WORKER "2 of 2" collections.
What's interesting is that even in the before run, our compartments pile up rapidly up to 6000+ strong. There are many, many GC's that are unable to collect these compartments until we happen to trigger one that does collect basically all of them in one swoop. We then build back up to 6000+ again before exiting.
The after run is the same (down to having approximately the same number of GC's), except that there are no collections that free substantial numbers of compartments and we end up collecting over 11000+ compartments before the terminating GC, where we're killed because we take too long.
Oh, the other difference is that the before run takes 4 minutes and the after run takes 11 minutes. Carrying those extra compartments *really* adds up. Exponentially fast even. The following is the GC that clears up all the compartments from the before run:
GC(T+204.028s) =================================================================
Invocation Kind: Normal
Reason: COMPONENT_UTILS
Incremental: no - requested
Zones Collected: 5 of 5 (-1)
Compartments Collected: 5249 of 5249 (-5170)
MinorGCs since last GC: 1
Store Buffer Overflows: 0
MMU 20ms:0.0%; 50ms:0.0%
SCC Sweep Total (MaxPause): 114.209ms (114.209ms)
HeapSize: 993.664 MiB
Chunk Delta (magnitude): -731 (731)
Arenas Relocated: 0.000 MiB
---- Totals ----
Total Time: 711.769ms
Max Pause: 711.769ms
Begin Callback: 5881.877ms
Wait Background Thread: 363.486ms
Relazify Functions: 34.669ms
Purge: 0.187ms
Mark: 33.689ms
Other: 4.273ms
Mark Roots: 3.778ms
Other: 0.443ms
Mark Cross Compartment Wrappers: 0.110ms
Mark Rooters: 0.049ms
Mark Runtime-wide Data: 1.702ms
Mark Embedding: 0.145ms
Mark Compartments: 1.329ms
Unmark: 25.638ms
Sweep: 278.274ms
Other: 9.543ms
Mark During Sweeping: 4.132ms
Mark Incoming Black Pointers: 0.055ms
Mark Weak: 0.138ms
Mark Incoming Gray Pointers: 0.037ms
Mark Gray: 3.726ms
Mark Gray and Weak: 0.176ms
Finalize Start Callbacks: 3.457ms
Per-Slice Weak Callback: 1.056ms
Per-Compartment Weak Callback: 2.399ms
Sweep Atoms: 7.897ms
Sweep Symbol Registry: 0.049ms
Sweep Compartments: 65.518ms
Sweep Discard Code: 4.265ms
Sweep Inner Views: 0.260ms
Sweep Cross Compartment Wrappers: 1.525ms
Sweep Type Objects: 1.051ms
Sweep Breakpoints: 22.585ms
Sweep Regexps: 1.337ms
Sweep Miscellaneous: 395.137ms
Sweep type information: 30.236ms
Other: 30.232ms
Sweep type tables and compilations: 0.002ms
Free type arena: 0.002ms
Sweep Object: 78.921ms
Sweep String: 0.850ms
Sweep Script: 12.353ms
Sweep Shape: 45.617ms
Sweep JIT code: 0.048ms
Finalize End Callback: 13.593ms
Deallocate: 36.296ms
End Callback: 0.101ms
Minor GCs to Evict Nursery: 1.213ms
Other: 0.882ms
Mark Roots: 0.331ms
Other: 0.318ms
Mark Rooters: 0.013ms
Barriers: 0.002ms
Unmark gray: 0.001ms
The Start Callback is 5 seconds and the total GC time is .7 seconds, so that's weird. The before case does a COMPONENT_UTILS collection 168 times and the after does 177 COMPONENT_UTILS collections, so it's not a matter of how we're triggering the collection.
I still have no idea what's going wrong, but it appears to be something subtle.
Assignee | ||
Comment 28•8 years ago
|
||
Running |Cu.forceGC(); Cu.forceCC(); Cu.forceGC(); Cu.forceCC();| after ever test does not keep the ghost windows from piling up endlessly.
Assignee | ||
Comment 29•8 years ago
|
||
I lied. I was doing the new GC's in the parent process by accident. Doing the GC's in the content process is keeping the window/compartment count low, as expected. It seems that forceCC() followed by forceGC() is sufficient to flush all the garbage.
Talking to Bill this morning, it seems like the most likely reason that these windows are kept alive is wrapper remapping. We should have a mechanism to address this: ScheduledForDestruction. I need to figure out how that's supposed to work and see if I can figure out why it's not.
Assignee | ||
Comment 30•8 years ago
|
||
COMPARTMENT_REVIVED collections are getting triggered as expected, but they are not releasing any compartments.
Assignee | ||
Comment 31•8 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #30)
> COMPARTMENT_REVIVED collections are getting triggered as expected, but they
> are not releasing any compartments.
This is because the compartments being held live are held because the [HTMLDocument] wrapper is live because the nsDocument is held in the wrapper cache. COMPARTMENT_REVIVIED collections are only effective at cleaning up compartments that were held live via barriers.
Assignee | ||
Comment 32•8 years ago
|
||
This should all be resolved by bug 1296484.
Assignee | ||
Comment 33•8 years ago
|
||
Assignee | ||
Comment 34•8 years ago
|
||
It looks like bug 1292218 is disabling the collections from 1296484 by disrupting the COMPARTMENT_REVIVED heuristics. I think if we back that out with this landing we should be good. I'll do another try push. Otherwise the actual changes look good now.
Sorry to ask you to review Steve, but Jon is on PTO until next week and I'd like to get this in ASAP.
Attachment #8764437 -
Attachment is obsolete: true
Attachment #8786065 -
Flags: review?(sphink)
Assignee | ||
Comment 35•8 years ago
|
||
Comment 36•8 years ago
|
||
Comment on attachment 8786065 [details] [diff] [review]
use_perfect_forwarding_for_rekey-v1.diff
Review of attachment 8786065 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/public/HashTable.h
@@ +1867,5 @@
> MOZ_ASSERT(table);
> mozilla::ReentrancyGuard g(*this);
> MOZ_ASSERT(p.found());
> + using Entry = typename HashTableEntry<T>::NonConstT;
> + Entry copy(mozilla::Move(const_cast<Entry&>(*p)));
I do not like calling this 'copy', given that it's trying hard to *not* copy. 'entry' would work fine for me.
@@ +1873,2 @@
> remove(*p.entry_);
> + putNewInfallibleInternal(l, mozilla::Move(copy));
Ouch. Ok, this doesn't look right. l and k may be the same thing. k gets moved out via setKey, and then l is used. These are all references, so if eg Lookup and Key are Vector<U>, then the vector storage may have been stolen away and nulled out.
Attachment #8786065 -
Flags: review?(sphink)
Comment 38•8 years ago
|
||
Updated rekeying patch with Steve's review comments.
Attachment #8786065 -
Attachment is obsolete: true
Comment 39•8 years ago
|
||
Here's an alternative approach to removing the race. This just uses the incremental barrier flag which should always be off while we are collecting. I think this should work (but please check my logic), and it doesn't seem to provoke the reftest failures on try.
Attachment #8789853 -
Flags: review?(terrence)
Assignee | ||
Comment 40•8 years ago
|
||
Comment on attachment 8789853 [details] [diff] [review]
bug844769-alternate
Review of attachment 8789853 [details] [diff] [review]:
-----------------------------------------------------------------
Ah, good observation! Let's try it.
Attachment #8789853 -
Flags: review?(terrence) → review+
Comment 41•8 years ago
|
||
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0fda02f7c6a0
Don't access heap state from off main thread r=terrence
Comment 42•8 years ago
|
||
Comment on attachment 8789775 [details] [diff] [review]
bug844769-forward-for-rekey v2
Review of attachment 8789775 [details] [diff] [review]:
-----------------------------------------------------------------
I know you didn't ask for review yet, but this beautifully resolves the only issue I had with the patch. Whether you want to land this or not is up to you, but r=me.
Attachment #8789775 -
Flags: review+
Comment 43•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox51:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Comment 44•8 years ago
|
||
Since this is marked RESOLVED now and can easily get lost -- what's the state of the other patch? It would still be good to fix the rekeying barriers, if I'm understanding the state of things correctly. Is the patch landable, or should it be forked off to another bug?
needinfo both since Jon is on PTO.
Flags: needinfo?(terrence)
Flags: needinfo?(jcoppeard)
Updated•8 years ago
|
Flags: needinfo?(jcoppeard)
You need to log in
before you can comment on or make changes to this bug.
Description
•