Prototype concurrent marking implementation
Categories
(Core :: JavaScript: GC, enhancement, P3)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox149 | --- | fixed |
People
(Reporter: jonco, Assigned: jonco)
References
(Blocks 1 open bug)
Details
Attachments
(15 files, 1 obsolete file)
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
Bug 1693542 - Part 3: Delay freeing buffers that could be being traced by concurrent marking r?sfink
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review |
This bug aims to deliver a prototype implementation of concurrent GC marking on Intel 64 bit platforms. This will be used as a base for further testing and performance evaluation.
High level design:
- For the sake of simplicity only one thread will mark at a time (either the main thread or a helper thread)
- Write barriers on the main thread will add unmarked cells to a buffer to mark later in case concurrent marking is happening
- Not everything can be traced concurrently; when such things are encountered off the main thread they added to a buffer to mark later on the main thread
- There is occasional synchronisation between the main thread and the marking thread to exchange these buffers
- Concurrent marking may observe stale data but the snapshot-at-the-beginning invariant saves us by ensuring that any missed marking happens eventually
- Every pointer the marker follows must be atomic to avoid undefined behaviour and prevent problematic compiler optimizations
- Ditto for all metadata about cell layout necessary for tracing
Limitations:
- Some things will still get marked on main thread
- We need occasional ‘micro-slices’ to synchronize between main/marker threads
- The array shift optimization will be disabled during concurrent marking
- We must pre-initialize unused slot/element storage, which causes a small perf regression
- This requires 64 bit platforms because we need atomic JS::Value access
| Assignee | ||
Comment 1•4 years ago
|
||
The shell allocation metadata builder creates a bunch of objects for every
allocation, including a dummy representation of the stack at which the allocation
happened. This puts the stack frames into a JS array object. This would usually
use dense elements to store the indexed properties but the code uses property
attribute flags of zero which creates non-enumerable properties. This leads to
the object going in to dictionary mode instead which is much slower.
This causes a problem with the test
jit-test/tests/errors/overrecursed-double-fault-1.js which enables allocation
metadabta and then enters an infinite loop. The test can recurse ~40000 times
before throwing an overrecursion error. The error also contains a
representation of the stack, including a SavedFrame JS object for every frame.
The allocation metadata builder then tries to create metadata for each one of
these objects, and so this is quadratic in the number of frames. All of this
means that this test can easily time out.
The simplest solution for all this is to use the default attributes when
creating these properties which results in the faster dense element storage
being used.
Comment 2•4 years ago
|
||
Comment on attachment 9259553 [details]
Bug 1693542 - Part 1: Make the shell allocation metadata hook create properties with default attributes r?sfink
Revision D136248 was moved to bug 1750794. Setting attachment 9259553 [details] to obsolete.
| Assignee | ||
Comment 3•3 years ago
|
||
This turned out to be difficult for two main reasons:
-
It requires making everything traced by GC atomic which is a large and invasive change. This is much more than wrapped pointers like GCPtr - for example the JIT has a ton of malloc-allocated data structures that are traversed on GC. Also this results in a lot of API churn when we currently return a reference to something that can be traced by GC.
-
It interferes with several optimisations which must now be made synchronised. For example rope flattening would potentially require synchronising on the main thread and every time we mark a string off thread.
I didn't get to the point of stabilising this in the browser and there are sure to be other issues beyond these.
Problem 1 could be addressed by making more things into GC things and this is desirable in the long term, but is a large change.
Problem 2 is solvable but would potentially affect performance and would have to be evaluated carefully.
I think that this viable eventually but that in the short to medium term we can get more return on our efforts by working on parallel marking instead.
| Assignee | ||
Comment 4•5 months ago
|
||
Updated•5 months ago
|
| Assignee | ||
Comment 5•5 months ago
|
||
| Assignee | ||
Comment 6•5 months ago
|
||
This stops the buffer allocator freeing buffers that are explictly freed while
concurrent marking is happening, as the marking thread may still be accessing
them. The buffers will be freed during sweeping instead.
| Assignee | ||
Comment 7•5 months ago
|
||
Initialize dynamic slot allocations up to their maximum capacity and clear
freed slots.
Currently this is necessary because it's hard to correctly calculate the number
of used slots during concurrent marking as this may be changed by the mutator.
| Assignee | ||
Comment 8•5 months ago
|
||
This will be used for concurrent marking on a backgrond thread, allowing the
mutator to interrupt it.
| Assignee | ||
Comment 9•5 months ago
|
||
This extends the previous needsIncrementalBarrier_ field to be a bitset that
can be tested to check whether concurrent marking is happening in the zone.
The needsIncrementalBarrer() method is renamed to needsMarkingBarrier().
| Assignee | ||
Comment 10•5 months ago
|
||
Concurrent marking may have a pointer to the object elements header so it's not
safe to move this while marking concurrently.
This restriction may be revisted in a productized version.
| Assignee | ||
Comment 11•5 months ago
|
||
| Assignee | ||
Comment 12•5 months ago
|
||
During concurrent marking we need to avoid reading object fields more than
once, since we may observe different values for the fields on each occasion.
This means we can't call some of the existing methods on objects we use to
determine object layout since these can re-read the fields every time.
Instead, we can factor out the functions we need and make them take the object
field values direct. The marking code can then ensure these are only read once.
| Assignee | ||
Comment 13•5 months ago
|
||
Add a marking option and tracer kind for concurrent marking.
Add main thread trace buffers to hold cells that can't be traced concurrently.
Use this to ensure that we only call object class hooks and trace JitScripts on
the main thread.
Change object marking to make it more robust in the face of concurrent changes.
Update some assertions that no longer hold when a concurrent thread can mark
things.
| Assignee | ||
Comment 14•5 months ago
|
||
| Assignee | ||
Comment 15•5 months ago
|
||
This patch adds APIs to perform release and acquire memory fences for use with
relaxed atomics.
Note that heap memory does not use relaxed atomics yet with this set of patches
so prototype concurrent marking should not be considered fully thread safe
according to the C++ specification.
The approach is to use fence/fence synchronization to ensure that concurrent
marking does not observe uninitialized memory and instead observes some valid
state (it doesn't necessarily matter if it observes stale state as barriers
ensure everything gets marked).
(See https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence.html for
details about fence/fence synchronization)
To help acheive that this patch adds 1) a release fence between initializing
new GC memory and adding an edge to it from the GC heap and 2) an acquire fence
before tranversing a GC edge during concurrent marking.
Under the covers, acquire and release fences restrict reordering of loads or
stores respectively. They do this by restricting code motion in the compiler
and may generate CPU instructions depending on the hardware.
| Assignee | ||
Comment 16•5 months ago
|
||
As explained in the comments this synchronizes concurrent marking with changes
to string layout.
| Assignee | ||
Comment 17•5 months ago
|
||
| Assignee | ||
Comment 18•5 months ago
|
||
This adds a SpiderMonkey build to test concurrent marking, which currently
requires a special build option. This is a tier-2 linux build.
This is needed to test this code before it is enabled by default.
Comment 19•4 months ago
|
||
Comment 20•4 months ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/330e28a7f5ff
https://hg.mozilla.org/mozilla-central/rev/9d7b6f6a0d4e
https://hg.mozilla.org/mozilla-central/rev/3840e49f9b62
https://hg.mozilla.org/mozilla-central/rev/246360f747d5
https://hg.mozilla.org/mozilla-central/rev/998067fc96e2
https://hg.mozilla.org/mozilla-central/rev/efbc350ed5dc
https://hg.mozilla.org/mozilla-central/rev/0a9860113a15
https://hg.mozilla.org/mozilla-central/rev/5c83d0a2290f
https://hg.mozilla.org/mozilla-central/rev/dad07d189c49
https://hg.mozilla.org/mozilla-central/rev/2f928ce4ac35
https://hg.mozilla.org/mozilla-central/rev/553790c4b84a
https://hg.mozilla.org/mozilla-central/rev/d373c2d8fb05
https://hg.mozilla.org/mozilla-central/rev/182946fde440
https://hg.mozilla.org/mozilla-central/rev/71056befc94d
https://hg.mozilla.org/mozilla-central/rev/80d750be3fa6
Description
•