Don't collect the nursery for every sweeping GCSlice
Categories
(Core :: JavaScript: GC, enhancement, P1)
Tracking
()
People
(Reporter: pbone, Assigned: jonco)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Keywords: perf)
Attachments
(1 file, 1 obsolete file)
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 1•6 years ago
|
||
Assignee | ||
Comment 2•6 years ago
|
||
Reporter | ||
Comment 3•6 years ago
|
||
Updated•6 years ago
|
Comment 4•6 years ago
|
||
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Reporter | ||
Comment 5•6 years ago
|
||
Reporter | ||
Comment 6•6 years ago
|
||
I seem to have the patches rebased, I mostly remember how they work. They work in the shell and a quick browser session. Now going to run them through try/talos.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 7•6 years ago
|
||
Hi Jonco, I want to check something with you. I get the following crash (recall this bug avoids nursery collection during sweeping). So it looks like some sweeping code, when sweeping some WeakCache containing something about shapes, it updates a reference to the nursery which is barriered. I think this still needs to get barriered so that whatever nursery object is being pointed to is evicted and not collected. However this is the kind of assertion that would be very useful to keep, so I don't want to remove it. I could create an alternative code path, but I am unsure how to get that setup with the barriers, or maybe change the assertion to some other condition that somehow knows that this kind of update is okay.
I wanted to ask you because there's a chance you have an idea I can try straight away while it might take me at least a few hours to think of something (not having looked into our barrier code before).
[task 2019-01-25T06:33:10.784Z] 06:33:10 INFO - Thread 3 (crashed)
[task 2019-01-25T06:33:10.785Z] 06:33:10 INFO - 0 libxul.so!void js::gc::StoreBuffer::put<js::gc::StoreBuffer::MonoTypeBuffer<js::gc::StoreBuffer::CellPtrEdge>, js::gc::StoreBuffer::CellPtrEdge>(js::gc::StoreBuffer::MonoTypeBuffer<js::gc::StoreBuffer::CellPtrEdge>&, js::gc::StoreBuffer::CellPtrEdge const&) [StoreBuffer.h:6266f50dad3377bf57e89133c11936289077ddaa : 429 + 0x0]
[task 2019-01-25T06:33:10.787Z] 06:33:10 INFO - eip = 0xf0e92c29 esp = 0xe83fe9d0 ebp = 0xe83fe9e8 ebx = 0xf53e9000
[task 2019-01-25T06:33:10.789Z] 06:33:10 INFO - esi = 0xe7f04e10 edi = 0xdc99e904 eax = 0xf3d9c345 ecx = 0x565bb51c
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - edx = 0x00000000 efl = 0x00010286
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - Found by: given as instruction pointer in context
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - 1 libxul.so!<name omitted> [StoreBuffer.h:6266f50dad3377bf57e89133c11936289077ddaa : 493 + 0x10]
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - eip = 0xf0ecf345 esp = 0xe83fe9f0 ebp = 0xe83fea18 ebx = 0xf53e9000
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - esi = 0xe7f04e10 edi = 0xdc99e904
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - Found by: call frame info
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - 2 libxul.so!js::InternalBarrierMethods<js::TaggedProto>::postBarrier(js::TaggedProto*, js::TaggedProto, js::TaggedProto) [Barrier.h:6266f50dad3377bf57e89133c11936289077ddaa : 265 + 0xd]
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - eip = 0xf12fb08a esp = 0xe83fea20 ebp = 0xe83fea38 ebx = 0xf53e9000
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - esi = 0xe7f04e10 edi = 0xdc99e904
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - Found by: call frame info
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - 3 libxul.so!void mozilla::detail::EntrySlot<js::InitialShapeEntry const>::setLive<js::InitialShapeEntry>(unsigned int, js::InitialShapeEntry&&) [Barrier.h:6266f50dad3377bf57e89133c11936289077ddaa : 604 + 0x5]
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - eip = 0xf1222b08 esp = 0xe83fea40 ebp = 0xe83fea68 ebx = 0xf53e9000
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - esi = 0xddb44368 edi = 0xdc99e904
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - Found by: call frame info
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - 4 libxul.so!mozilla::detail::HashTable<js::InitialShapeEntry const, mozilla::HashSet<js::InitialShapeEntry, js::InitialShapeEntry, js::SystemAllocPolicy>::SetHashPolicy, js::SystemAllocPolicy>::changeTableSize(unsigned int, mozilla::detail::HashTable<js::InitialShapeEntry const, mozilla::HashSet<js::InitialShapeEntry, js::InitialShapeEntry, js::SystemAllocPolicy>::SetHashPolicy, js::SystemAllocPolicy>::FailureBehavior) [HashTable.h:6266f50dad3377bf57e89133c11936289077ddaa : 1817 + 0x19]
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - eip = 0xf1222722 esp = 0xe83fea70 ebp = 0xe83feaa8 ebx = 0xf53e9000
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - esi = 0xddb401b4 edi = 0xe413745a
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - Found by: call frame info
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - 5 libxul.so!mozilla::detail::HashTable<js::InitialShapeEntry const, mozilla::HashSet<js::InitialShapeEntry, js::InitialShapeEntry, js::SystemAllocPolicy>::SetHashPolicy, js::SystemAllocPolicy>::compact() [HashTable.h:6266f50dad3377bf57e89133c11936289077ddaa : 1977 + 0x14]
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - eip = 0xf12221eb esp = 0xe83feab0 ebp = 0xe83feac8 ebx = 0xf53e9000
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - esi = 0xe14d35d0 edi = 0x00001000
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - Found by: call frame info
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - 6 libxul.so!JS::GCHashSet<js::InitialShapeEntry, js::InitialShapeEntry, js::SystemAllocPolicy>::sweep() [HashTable.h:6266f50dad3377bf57e89133c11936289077ddaa : 1459 + 0x8]
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - eip = 0xf1648240 esp = 0xe83fead0 ebp = 0xe83feb28 ebx = 0xf53e9000
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - esi = 0x00ffffff edi = 0xe83feae0
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - Found by: call frame info
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - 7 libxul.so!JS::WeakCache<JS::GCHashSet<js::InitialShapeEntry, js::InitialShapeEntry, js::SystemAllocPolicy> >::sweep() [GCHashTable.h:6266f50dad3377bf57e89133c11936289077ddaa : 565 + 0x8]
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - eip = 0xf1647fa5 esp = 0xe83feb30 ebp = 0xe83feb48 ebx = 0xf53e9000
[task 2019-01-25T06:33:10.790Z] 06:33:10 INFO - esi = 0x0000066a edi = 0xe14d35c0
[task 2019-01-25T06:33:10.791Z] 06:33:10 INFO - Found by: call frame info
[task 2019-01-25T06:33:10.791Z] 06:33:10 INFO - 8 libxul.so!IncrementalSweepWeakCacheTask::run() [GC.cpp:6266f50dad3377bf57e89133c11936289077ddaa : 6095 + 0x8]
[task 2019-01-25T06:33:10.791Z] 06:33:10 INFO - eip = 0xf160763b esp = 0xe83feb50 ebp = 0xe83feb68 ebx = 0xf53e9000
[task 2019-01-25T06:33:10.791Z] 06:33:10 INFO - esi = 0xfffdfa34 edi = 0xe14d35c0
[task 2019-01-25T06:33:10.791Z] 06:33:10 INFO - Found by: call frame info
[task 2019-01-25T06:33:10.791Z] 06:33:10 INFO - 9 libxul.so!js::GCParallelTask::runFromHelperThread(js::AutoLockHelperThreadState&) [GCParallelTask.h:6266f50dad3377bf57e89133c11936289077ddaa : 128 + 0x6]
[task 2019-01-25T06:33:10.791Z] 06:33:10 INFO - eip = 0xf10d0765 esp = 0xe83feb70 ebp = 0xe83febc8 ebx = 0xf53e9000
[task 2019-01-25T06:33:10.791Z] 06:33:10 INFO - esi = 0xe83ff308 edi = 0xe83fec20
[task 2019-01-25T06:33:10.791Z] 06:33:10 INFO - Found by: call frame info
[task 2019-01-25T06:33:10.791Z] 06:33:10 INFO - 10 libxul.so!js::HelperThread::handleGCParallelWorkload(js::AutoLockHelperThreadState&) [HelperThreads.cpp:6266f50dad3377bf57e89133c11936289077ddaa : 1647 + 0xc]
[task 2019-01-25T06:33:10.791Z] 06:33:10 INFO - eip = 0xf10d0cf7 esp = 0xe83febd0 ebp = 0xe83fec08 ebx = 0xf53e9000
[task 2019-01-25T06:33:10.791Z] 06:33:10 INFO - esi = 0xfffdfa34 edi = 0xe8530040
[task 2019-01-25T06:33:10.791Z] 06:33:10 INFO - Found by: call frame info
[task 2019-01-25T06:33:10.792Z] 06:33:10 INFO - 11 libxul.so!js::HelperThread::threadLoop() [HelperThreads.cpp:6266f50dad3377bf57e89133c11936289077ddaa : 2433 + 0x18]
[task 2019-01-25T06:33:10.792Z] 06:33:10 INFO - eip = 0xf10d287a esp = 0xe83fec10 ebp = 0xe83ff328 ebx = 0xf53e9000
[task 2019-01-25T06:33:10.792Z] 06:33:10 INFO - esi = 0xe8530040 edi = 0xe8530040
[task 2019-01-25T06:33:10.792Z] 06:33:10 INFO - Found by: call frame info
[task 2019-01-25T06:33:10.793Z] 06:33:10 INFO - 12 libxul.so!js::HelperThread::ThreadMain(void*) [HelperThreads.cpp:6266f50dad3377bf57e89133c11936289077ddaa : 1932 + 0x8]
[task 2019-01-25T06:33:10.793Z] 06:33:10 INFO - eip = 0xf10cd13d esp = 0xe83ff330 ebp = 0xe83ff348 ebx = 0xf53e9000
[task 2019-01-25T06:33:10.793Z] 06:33:10 INFO - esi = 0xe8530040 edi = 0x003d0f00
[task 2019-01-25T06:33:10.794Z] 06:33:10 INFO - Found by: call frame info
[task 2019-01-25T06:33:10.795Z] 06:33:10 INFO - 13 libxul.so!js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start(void*) [Thread.h:6266f50dad3377bf57e89133c11936289077ddaa : 239 + 0x5]
[task 2019-01-25T06:33:10.795Z] 06:33:10 INFO - eip = 0xf110297f esp = 0xe83ff350 ebp = 0xe83ff368 ebx = 0xf53e9000
[task 2019-01-25T06:33:10.795Z] 06:33:10 INFO - esi = 0xe8531050 edi = 0x003d0f00
[task 2019-01-25T06:33:10.796Z] 06:33:10 INFO - Found by: call frame info
[task 2019-01-25T06:33:10.797Z] 06:33:10 INFO - 14 libpthread-2.23.so + 0x6295
[task 2019-01-25T06:33:10.797Z] 06:33:10 INFO - eip = 0xf76ec295 esp = 0xe83ff370 ebp = 0xe83ff428 ebx = 0x00000000
[task 2019-01-25T06:33:10.798Z] 06:33:10 INFO - esi = 0x00000000 edi = 0x003d0f00
[task 2019-01-25T06:33:10.798Z] 06:33:10 INFO - Found by: call frame info
[task 2019-01-25T06:33:10.798Z] 06:33:10 INFO - 15 libc-2.23.so + 0xe70ae
[task 2019-01-25T06:33:10.798Z] 06:33:10 INFO - eip = 0xf74170ae esp = 0xe83ff430 ebp = 0x00000000
[task 2019-01-25T06:33:10.798Z] 06:33:10 INFO - Found by: previous frame's frame pointer
Assignee | ||
Comment 8•6 years ago
|
||
(In reply to Paul Bone [:pbone] from comment #7)
It's the MOZ_ASSERT(!JS::RuntimeHeapIsBusy()) assertion right?
Previously we never created store buffer entries while collecting because we always evicted the nursery first, so this was valid. What's happening here is that we're moving hash table entries around and one of them has a pointer into the nursery, so we do a put() which I guess would have been followed by an unput() if we hadn't hit that assert. This seems legit since we now can have nursery pointers that have been created since the start of collection.
Making an alternative code path would probably be quite painful. I'd suggest updating the assertion instead. Maybe we can allow this while in a major GC, but still disallow the other heap states - see the definition of JS::RuntimeHeapIsBusy().
Reporter | ||
Comment 9•6 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #8)
(In reply to Paul Bone [:pbone] from comment #7)
It's the MOZ_ASSERT(!JS::RuntimeHeapIsBusy()) assertion right?
Previously we never created store buffer entries while collecting because we always evicted the nursery first, so this was valid. What's happening here is that we're moving hash table entries around and one of them has a pointer into the nursery, so we do a put() which I guess would have been followed by an unput() if we hadn't hit that assert. This seems legit since we now can have nursery pointers that have been created since the start of collection.
That was my diagnosis also, and yeah, I NIed you because I figured this would be painful.
Making an alternative code path would probably be quite painful. I'd suggest updating the assertion instead. Maybe we can allow this while in a major GC, but still disallow the other heap states - see the definition of JS::RuntimeHeapIsBusy().
Okay.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Updated•6 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 10•4 years ago
|
||
The main problem here is that we sweep weak caches off-thread, and when we finish sweeping a hash table the Enum class' destructor can rehash or resize the table, causing store buffer entries to be added or removed (since the table may now contain nursery pointers).
To address this the patch adds a store buffer lock and establishes that all off-thread store buffer access from inside the GC must take place with this lock held. The changes to GCHashSet/Map are a little gross; perhaps it would be better to add an explicit API to hash tables to allow us to postpone the rehash/resize operations but I haven't done that here.
Other complications are:
The TypeSetRef generic buffer entries can contain pointers into TI data that is moved during sweeping. We therefore do need to collect the nursery if there are any of those present. This was relatively rare in testing.
Finally, swapping objects can result in pointers into dying objects being put in the whole cell store buffer (because we do tricks with skipping barriers when we remap wrappers to not keep otherwise dead wrappers alive). We need to collect the nursery if these are present to prevent them being accessed after the dying objects are finalized.
Reporter | ||
Comment 11•4 years ago
|
||
Thanks Jon.
My patch went with the API aproach, refusing to rehash tables when called from sweeping. But I was still getting crashes, i think that's because I hadn't yet found the TI and swapping issues.
Nice to see this done.
Comment 12•4 years ago
|
||
Comment 13•4 years ago
|
||
bugherder |
Updated•4 years ago
|
Updated•4 years ago
|
Updated•3 years ago
|
Description
•