Closed Bug 1461619 Opened 2 years ago Closed 2 years ago

Updating atom mark bits can race with object group sweeping

Categories

(Core :: JavaScript: GC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 61+ fixed
firefox60 --- wontfix
firefox61 + fixed
firefox62 + fixed

People

(Reporter: jonco, Assigned: jonco)

References

Details

(Keywords: csectype-race, sec-moderate, Whiteboard: [adv-main61+][adv-esr60.1+][post-critsmash-triage])

Attachments

(4 files)

As reported by decoder in bug 1457703 comment 38, helgrind detects the following race:

==18222== Possible data race during write of size 8 at 0x5DFF8E0 by thread #4
==18222== Locks held: none
==18222==    at 0xDE4908: bitwiseOrRangeInto (Bitmap.h:53)
==18222==    by 0xDE4908: AddBitmapToChunkMarkBits<js::DenseBitmap> (AtomMarking.cpp:125)
==18222==    by 0xDE4908: js::gc::AtomMarkingRuntime::updateChunkMarkBits(JSRuntime*) (AtomMarking.cpp:148)
==18222==    by 0xDF3BDF: UpdateAtomsBitmap(js::GCParallelTask*) (GC.cpp:5400)
==18222==    by 0xAA8A1B: runTask (GCParallelTask.h:127)  
==18222==    by 0xAA8A1B: js::GCParallelTask::runFromHelperThread(js::AutoLockHelperThreadState&) (HelperThreads.cpp:1586)
==18222==    by 0xAAEFD3: js::HelperThread::handleGCParallelWorkload(js::AutoLockHelperThreadState&) (HelperThreads.cpp:1612)
==18222==    by 0xAA74EF: js::HelperThread::threadLoop() (HelperThreads.cpp:2385)
==18222==    by 0xAA761B: js::HelperThread::ThreadMain(void*) (HelperThreads.cpp:1869)
==18222==    by 0xABCC07: callMain<0ul> (Thread.h:242)
==18222==    by 0xABCC07: js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start(void*) (Thread.h:235)
==18222==    by 0x484BA37: mythread_wrapper (hg_intercepts.c:389)
==18222==    by 0x486AFC3: start_thread (pthread_create.c:335)
==18222==    by 0x4BDA2DF: thread_start (clone.S:89)
==18222==
==18222== This conflicts with a previous read of size 8 by thread #3
==18222== Locks held: none
==18222==    at 0xBC7B04: markBit (Heap.h:611)
==18222==    by 0xBC7B04: isMarkedAny (Heap.h:615)
==18222==    by 0xBC7B04: js::gc::TenuredCell::isMarkedAny() const (Cell.h:286)
==18222==    by 0xE3D96B: js::gc::IsAboutToBeFinalizedDuringSweep(js::gc::TenuredCell&) (Marking.cpp:3389)
==18222==    by 0xE5F5AF: operator()<JSString> (Marking.cpp:3423)
==18222==    by 0xE5F5AF: decltype ({parm#1}(static_cast<JSString*>((decltype(nullptr))0), (Forward<bool*>)({parm#3}))) js::DispatchTyped<IsAboutToBeFinalizedInternalFunctor<jsid>, bool*>(IsAboutToBeFinalizedInternalFunctor<jsid>, jsid const&, bool*&&) (Id.h:228)
==18222==    by 0xE5F607: IsAboutToBeFinalizedInternal<jsid> (Marking.cpp:3433)
==18222==    by 0xE5F607: bool js::gc::IsAboutToBeFinalizedUnbarriered<jsid>(jsid*) (Marking.cpp:3458)
==18222==    by 0xB64D0B: needsSweep (ObjectGroup.cpp:1083)
==18222==    by 0xB64D0B: needsSweep (GCPolicyAPI.h:87)   
==18222==    by 0xB64D0B: needsSweep (ObjectGroup.cpp:1772)
==18222==    by 0xB64D0B: sweep (GCHashTable.h:83)
==18222==    by 0xB64D0B: js::ObjectGroupCompartment::sweep() (ObjectGroup.cpp:1790)
==18222==    by 0xDF5D17: SweepObjectGroups(js::GCParallelTask*) (GC.cpp:5422)
==18222==    by 0xAA8A1B: runTask (GCParallelTask.h:127)  
==18222==    by 0xAA8A1B: js::GCParallelTask::runFromHelperThread(js::AutoLockHelperThreadState&) (HelperThreads.cpp:1586)
==18222==    by 0xAAEFD3: js::HelperThread::handleGCParallelWorkload(js::AutoLockHelperThreadState&) (HelperThreads.cpp:1612)
==18222==  Address 0x5dff8e0 is in a rw- anonymous segment
This concerns the per-zone atoms bitmaps, which are overapproximations of the set of atoms referenced by the zone.  When we start sweeping the atoms zone we OR the bitmaps for uncollected zones into the atoms zone's mark bits so that we don't throw away atoms referenced by these zones.  This allows us to collect the atoms zone without marking through all zones.

This happens in a parallel task (UpdateAtomsBitmap) which is run at the same time as several other tasks including sweeping of object groups (SweepObjectGroups).  This can query the mark bits for atoms, which races with the previous update.

What would happen is that some atoms could look like there going to be swept in a zone where they is no longer referenced, resulting in some other data being swept (e.g. entries in the plain objets table).

This sounds bad, but if the atoms were no longer referenced in that zone I don't think this would result in any observable difference in behaviour.  (It might actually be considered prefereable not to have stuff held alive in one zone because an atom is still in use by a different zone).

That said, we should still fix this.

It would be nice to update the atom zone mark bits at the start of marking like we do for incoming cross compartment references.  That's problematic at the moment because we use the final mark bits to refine the per-zone bitmaps for collected zones at the end of marking (before we do this update).  Perhaps we could clear the per-zone bitmap at the start of collection and mark atoms as referenced by that zone when they are traced, making the refinement step unnecessary.
Just to note, I can reproduce this on an x86_64-linux build, configured as
below, using the test case for bug 1457703 (also a sec bug).  So it's not
ARM64 specific.

  (cd ../src && autoconf-2.13) && \
  ../src/configure \
    --enable-debug --enable-optimize="-g -Og" \
    --enable-valgrind --disable-jemalloc
(In reply to Jon Coppeard (:jonco) from comment #1)
> Perhaps we could clear the per-zone bitmap at the start of collection and
> mark atoms as referenced by that zone when they are traced

That won't work because we don't know the source zone for the edge when we trace it.  But we can move the refinement step to when we start sweeping a zone and do it in parallel with all the other things we do there.
First of all, rename some methods to hopefully make it easier to understand what they do.
Assignee: nobody → jcoppeard
Attachment #8982464 - Flags: review?(sphink)
Patch to not update the atoms marking bitmaps in parallel.

I tried making the refinement part happen earlier as it should be possible to do this for each zone as it finishes sweeping rather than waiting until we sweep the atoms zone, but that caused crashes on try.  I don't know what that should be.  It's possible there are still some cases where we're not marking all the atoms we need to.
Attachment #8982502 - Flags: review?(sphink)
Attachment #8982464 - Flags: review?(sphink) → review+
Comment on attachment 8982502 [details] [diff] [review]
bug1461619-no-parallel-update

Review of attachment 8982502 [details] [diff] [review]:
-----------------------------------------------------------------

This seems safe, at least.
Attachment #8982502 - Flags: review?(sphink) → review+
https://hg.mozilla.org/mozilla-central/rev/aefbe3ce9476
https://hg.mozilla.org/mozilla-central/rev/c71b1bbac905

Is this something we should consider backporting or can it ride the trains? It grafts cleanly to Beta/ESR60.
Blocks: 1457703
Group: javascript-core-security → core-security-release
Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(jcoppeard)
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
(In reply to Ryan VanderMeulen [:RyanVM] from comment #8)
> https://hg.mozilla.org/mozilla-central/rev/aefbe3ce9476
> https://hg.mozilla.org/mozilla-central/rev/c71b1bbac905
> 
> Is this something we should consider backporting or can it ride the trains?
> It grafts cleanly to Beta/ESR60.

I was in two minds about this, but it is a race and it affects mark bits so it's best to be safe and backport this I think.
Flags: needinfo?(jcoppeard)
Attached patch bug1461619-betaSplinter Review
Approval Request Comment
[Feature/Bug causing the regression]: Bug 1369444
[User impact if declined]: Possible crash / security vulnerability.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: This change simplifies the current scheduling by no longer performing the update to the atoms mark bits in parallel with other work.
[String changes made/needed]: None.
Attachment #8984872 - Flags: approval-mozilla-beta?
Attached patch bug1461619-esr60Splinter Review
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: This a race condition with possibly bad consequences.  So it's not a definitely a problem but I think it would be prudent to fix it.
User impact if declined: Possible crash / security vulnerability.
Fix Landed on Version: 62.
Risk to taking this patch (and alternatives if risky): Low.
String or UUID changes made by this patch: None.
Attachment #8984873 - Flags: approval-mozilla-esr60?
Comment on attachment 8984872 [details] [diff] [review]
bug1461619-beta

Fixes a latent race condition in the GC which can be potentially exploitable. Approved for 61.0b14 and ESR 60.1.
Attachment #8984872 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8984873 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
Whiteboard: [adv-main61+][adv-esr60.1+]
Flags: qe-verify-
Whiteboard: [adv-main61+][adv-esr60.1+] → [adv-main61+][adv-esr60.1+][post-critsmash-triage]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.