Closed Bug 1693542 Opened 5 years ago Closed 4 months ago

Prototype concurrent marking implementation

Categories

(Core :: JavaScript: GC, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
149 Branch
Tracking Status
firefox149 --- fixed

People

(Reporter: jonco, Assigned: jonco)

References

(Blocks 1 open bug)

Details

Attachments

(15 files, 1 obsolete file)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

This bug aims to deliver a prototype implementation of concurrent GC marking on Intel 64 bit platforms. This will be used as a base for further testing and performance evaluation.

High level design:

  • For the sake of simplicity only one thread will mark at a time (either the main thread or a helper thread)
  • Write barriers on the main thread will add unmarked cells to a buffer to mark later in case concurrent marking is happening
  • Not everything can be traced concurrently; when such things are encountered off the main thread they added to a buffer to mark later on the main thread
  • There is occasional synchronisation between the main thread and the marking thread to exchange these buffers
  • Concurrent marking may observe stale data but the snapshot-at-the-beginning invariant saves us by ensuring that any missed marking happens eventually
  • Every pointer the marker follows must be atomic to avoid undefined behaviour and prevent problematic compiler optimizations
  • Ditto for all metadata about cell layout necessary for tracing

Limitations:

  • Some things will still get marked on main thread
  • We need occasional ‘micro-slices’ to synchronize between main/marker threads
  • The array shift optimization will be disabled during concurrent marking
  • We must pre-initialize unused slot/element storage, which causes a small perf regression
  • This requires 64 bit platforms because we need atomic JS::Value access
Depends on: 1693590
Depends on: 1694209
Depends on: 1749329
Depends on: 1750792
Depends on: 1750794

The shell allocation metadata builder creates a bunch of objects for every
allocation, including a dummy representation of the stack at which the allocation
happened. This puts the stack frames into a JS array object. This would usually
use dense elements to store the indexed properties but the code uses property
attribute flags of zero which creates non-enumerable properties. This leads to
the object going in to dictionary mode instead which is much slower.

This causes a problem with the test
jit-test/tests/errors/overrecursed-double-fault-1.js which enables allocation
metadabta and then enters an infinite loop. The test can recurse ~40000 times
before throwing an overrecursion error. The error also contains a
representation of the stack, including a SavedFrame JS object for every frame.
The allocation metadata builder then tries to create metadata for each one of
these objects, and so this is quadratic in the number of frames. All of this
means that this test can easily time out.

The simplest solution for all this is to use the default attributes when
creating these properties which results in the faster dense element storage
being used.

Comment on attachment 9259553 [details]
Bug 1693542 - Part 1: Make the shell allocation metadata hook create properties with default attributes r?sfink

Revision D136248 was moved to bug 1750794. Setting attachment 9259553 [details] to obsolete.

Attachment #9259553 - Attachment is obsolete: true
Depends on: 1750964

This turned out to be difficult for two main reasons:

  1. It requires making everything traced by GC atomic which is a large and invasive change. This is much more than wrapped pointers like GCPtr - for example the JIT has a ton of malloc-allocated data structures that are traversed on GC. Also this results in a lot of API churn when we currently return a reference to something that can be traced by GC.

  2. It interferes with several optimisations which must now be made synchronised. For example rope flattening would potentially require synchronising on the main thread and every time we mark a string off thread.

I didn't get to the point of stabilising this in the browser and there are sure to be other issues beyond these.

Problem 1 could be addressed by making more things into GC things and this is desirable in the long term, but is a large change.

Problem 2 is solvable but would potentially affect performance and would have to be evaluated carefully.

I think that this viable eventually but that in the short to medium term we can get more return on our efforts by working on parallel marking instead.

Assignee: jcoppeard → nobody
Depends on: 2009450
Assignee: nobody → jcoppeard
Status: NEW → ASSIGNED

This stops the buffer allocator freeing buffers that are explictly freed while
concurrent marking is happening, as the marking thread may still be accessing
them. The buffers will be freed during sweeping instead.

Initialize dynamic slot allocations up to their maximum capacity and clear
freed slots.

Currently this is necessary because it's hard to correctly calculate the number
of used slots during concurrent marking as this may be changed by the mutator.

This will be used for concurrent marking on a backgrond thread, allowing the
mutator to interrupt it.

This extends the previous needsIncrementalBarrier_ field to be a bitset that
can be tested to check whether concurrent marking is happening in the zone.

The needsIncrementalBarrer() method is renamed to needsMarkingBarrier().

Concurrent marking may have a pointer to the object elements header so it's not
safe to move this while marking concurrently.

This restriction may be revisted in a productized version.

During concurrent marking we need to avoid reading object fields more than
once, since we may observe different values for the fields on each occasion.
This means we can't call some of the existing methods on objects we use to
determine object layout since these can re-read the fields every time.

Instead, we can factor out the functions we need and make them take the object
field values direct. The marking code can then ensure these are only read once.

Add a marking option and tracer kind for concurrent marking.

Add main thread trace buffers to hold cells that can't be traced concurrently.
Use this to ensure that we only call object class hooks and trace JitScripts on
the main thread.

Change object marking to make it more robust in the face of concurrent changes.

Update some assertions that no longer hold when a concurrent thread can mark
things.

This patch adds APIs to perform release and acquire memory fences for use with
relaxed atomics.

Note that heap memory does not use relaxed atomics yet with this set of patches
so prototype concurrent marking should not be considered fully thread safe
according to the C++ specification.

The approach is to use fence/fence synchronization to ensure that concurrent
marking does not observe uninitialized memory and instead observes some valid
state (it doesn't necessarily matter if it observes stale state as barriers
ensure everything gets marked).

(See https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence.html for
details about fence/fence synchronization)

To help acheive that this patch adds 1) a release fence between initializing
new GC memory and adding an edge to it from the GC heap and 2) an acquire fence
before tranversing a GC edge during concurrent marking.

Under the covers, acquire and release fences restrict reordering of loads or
stores respectively. They do this by restricting code motion in the compiler
and may generate CPU instructions depending on the hardware.

As explained in the comments this synchronizes concurrent marking with changes
to string layout.

This adds a SpiderMonkey build to test concurrent marking, which currently
requires a special build option. This is a tier-2 linux build.

This is needed to test this code before it is enabled by default.

Pushed by jcoppeard@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/96db6259ebca https://hg.mozilla.org/integration/autoland/rev/330e28a7f5ff Part 1: Add configuration option for concurrent marking build r=sfink https://github.com/mozilla-firefox/firefox/commit/be7ac4944ab7 https://hg.mozilla.org/integration/autoland/rev/9d7b6f6a0d4e Part 2: Add GC parameter and shell options to control concurrent marking r=sfink,dom-worker-reviewers,edenchuang https://github.com/mozilla-firefox/firefox/commit/c70e67589669 https://hg.mozilla.org/integration/autoland/rev/3840e49f9b62 Part 3: Delay freeing buffers that could be being traced by concurrent marking r=sfink https://github.com/mozilla-firefox/firefox/commit/108ce5c3bbe7 https://hg.mozilla.org/integration/autoland/rev/246360f747d5 Part 4: Preinitialize dynamic slots in concurrent marking builds r=jandem https://github.com/mozilla-firefox/firefox/commit/a5153a772e91 https://hg.mozilla.org/integration/autoland/rev/998067fc96e2 Part 5: Allow an interruptible unlimited slice budget r=sfink https://github.com/mozilla-firefox/firefox/commit/2e990c855a29 https://hg.mozilla.org/integration/autoland/rev/efbc350ed5dc Part 6: Set a flag on the zone when concurrent marking is in progress r=sfink https://github.com/mozilla-firefox/firefox/commit/c8c130673b59 https://hg.mozilla.org/integration/autoland/rev/0a9860113a15 Part 7: Don't move object elements header while marking concurrently r=jandem https://github.com/mozilla-firefox/firefox/commit/00468fbbd185 https://hg.mozilla.org/integration/autoland/rev/5c83d0a2290f Part 8: Add APIs to move all the work from one mark stack to another r=sfink https://github.com/mozilla-firefox/firefox/commit/4585855b8c95 https://hg.mozilla.org/integration/autoland/rev/dad07d189c49 Part 9: Factor out functions to get object layout r=jandem https://github.com/mozilla-firefox/firefox/commit/6b7bef8af450 https://hg.mozilla.org/integration/autoland/rev/2f928ce4ac35 Part 10: Marking changes r=sfink https://github.com/mozilla-firefox/firefox/commit/1b46536f11c3 https://hg.mozilla.org/integration/autoland/rev/553790c4b84a Part 11: Use a background task to mark concurrently with the mutator r=sfink https://github.com/mozilla-firefox/firefox/commit/4df2d905647b https://hg.mozilla.org/integration/autoland/rev/d373c2d8fb05 Part 12: Add memory fences for synchronization between mutator and marking thread r=sfink https://github.com/mozilla-firefox/firefox/commit/40a148bbb905 https://hg.mozilla.org/integration/autoland/rev/182946fde440 Part 13: Handle rope type transitions during marking r=sfink https://github.com/mozilla-firefox/firefox/commit/d54f7cb29460 https://hg.mozilla.org/integration/autoland/rev/71056befc94d Part 14: Update test expectations r=sfink https://github.com/mozilla-firefox/firefox/commit/dd4961257ed3 https://hg.mozilla.org/integration/autoland/rev/80d750be3fa6 Part 15: Add concurrent marking test build r=sfink,taskgraph-reviewers,bhearsum
Regressions: 2016094
Regressions: 2024333
No longer regressions: 2024333
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: