Closed Bug 1342181 Opened 3 years ago Closed 3 years ago

FinalizeDeferredThings can change gray marking state between FixGrayBits GC and start of CC

Categories

(Core :: XPCOM, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr52 --- wontfix
firefox53 --- wontfix
firefox54 --- fixed

People

(Reporter: jonco, Assigned: jonco)

References

Details

(Keywords: sec-moderate, Whiteboard: [post-critsmash-triage][adv-main54+])

Attachments

(1 file, 1 obsolete file)

Looking into why the gray marking state assertions in bug 1335751 sometimes fail I found that the CC's callback that runs at the end of the GC calls FinalizeDeferredThings which can result in gray unmarking happening.  (The example I saw was triggering notifications because of nsAccessibilityService shutting down).  In general running destructors seems like it can result in pretty much any code in the system being executed.

I'm not sure what the best way forward is here.  I'd really like it if we never did anything like this from destructors and just used them to free resources, but the cost of fixing and working round this is going to be large.

Do you think it's possible we could defer this finalization somehow?

The easiest approach might just be to run FixWeakMappingGrayBits() even after the GC though.

This all raises another question though.  Since the CC is incremental, is it safe for gray unmarking to happen between slices if can break the no black->gray invariant?
Flags: needinfo?(continuation)
(In reply to Jon Coppeard (:jonco) from comment #0)
> Do you think it's possible we could defer this finalization somehow?

It probably wouldn't be too hard to detect that we're in the middle of a CC, and spin off a runnable to do the finalization there, so then it wouldn't run until we return from the CC. There already is some kind of deferred finalization mechanism here.

> The easiest approach might just be to run FixWeakMappingGrayBits() even
> after the GC though.

That would be reasonable, though in general you might have to run the GC again!

> Since the CC is incremental, is it
> safe for gray unmarking to happen between slices if can break the no
> black->gray invariant?

Hmm, yeah I guess I forgot about how UnmarkGray needs to be fixed up. The way that I deal with JS stuff getting UMG called on it in between ICC slices is that we check if any objects that we saw as gray during the CC are now black here at the end. (The basic philosophy of ICC is that if an object was touched during ICC, we treat it as live, because we're not sure if it is really alive or not.) It does seem like we should call FixWeakMappingGrayBits() before we do that. I'm not sure how to handle the case where we know our gray bits are bad. I guess you could treat all JS objects as live, though that could end badly.
Flags: needinfo?(continuation)
My first attempt was to skip calling these finalizers in the OnGC hook if a CC is in progress.  I also had to make sure that and ongoing incremental finalization was finished before we start the CC otherwise it will end up happening when we try to create the new runnable.  The finalization happens in the cleanup phase of the CC.

This causes all mochitests to leak, and I haven't worked out why.
My second attempt was to re-run the weak mapping fixup after the pre-CC GC.  This works in most cases but for a couple of devtools mochitests we end up looping for > 10 times.  So that's pretty bad.  I guess that since unmark gray is recursive (doesn't use a mark stack) for certain object graphs we can hit the stack limit.  There's a bug open for that already somewhere.
(In reply to Jon Coppeard (:jonco) from comment #3)
> I guess that since unmark
> gray is recursive (doesn't use a mark stack) for certain object graphs we
> can hit the stack limit.  

It turns out that the problem is that even if you request a full GC you may not get one if there is off-thread parsing happening, if this happens the 'gray bits are valid' flag is not set.
Duplicate of this bug: 1341065
Comment on attachment 8840986 [details] [diff] [review]
bug1342181-fixup-after-GC

With the other fixes that have gone in recently (in particular bug 1343986) this patch now works and we can assert the gray marking state is correct before CC.
Attachment #8840986 - Flags: review?(continuation)
Attachment #8840984 - Attachment is obsolete: true
Comment on attachment 8840986 [details] [diff] [review]
bug1342181-fixup-after-GC

Review of attachment 8840986 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsCycleCollector.cpp
@@ +3522,5 @@
> +  uint32_t count = 0;
> +  do {
> +    mJSContext->GarbageCollect(aForceGC ? JS::gcreason::SHUTDOWN_CC :
> +                                          JS::gcreason::CC_FORCED);
> +    aTimeLog.Checkpoint("FixGrayBits GC");

I'd just have a single Checkpoint after the entire loop.
Attachment #8840986 - Flags: review?(continuation) → review+
(In reply to Andrew McCreight [:mccr8] from comment #7)
> I'd just have a single Checkpoint after the entire loop.

Oh, sure.

Thanks for the quick review!
https://hg.mozilla.org/mozilla-central/rev/e9068fac3968
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Seems like backporting this to Beta53/ESR52 would be a sane idea?
Assignee: nobody → jcoppeard
Flags: needinfo?(jcoppeard)
Oh, yes that might be a good idea.  It's a bit of a sensitive area thought, this is the kind of thing I'd like to leave to settle for a while in nightly before backporting it everywhere.

This would also require the fix for bug 1343986 (which I just commented wasn't worth backporting).
Flags: needinfo?(jcoppeard)
Happy to let this bake awhile, it's easy enough to leave on the radar in the mean time.
Group: javascript-core-security → core-security-release
Hey Jon, just wanted to check in with you about the possibility of backporting this to Beta53/ESR52 along with bug 1343986 now?
Flags: needinfo?(jcoppeard)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #14)
> Hey Jon, just wanted to check in with you about the possibility of
> backporting this to Beta53/ESR52 along with bug 1343986 now?

It turns out that this will also require the fix for bug 1340537 which is in an unrelated area.

I'm a bit leery of backporting all this unless we have evidence of significant impact.  Are you OK with just letting this ride the trains?
Flags: needinfo?(jcoppeard)
If you think the risks outweigh the benefits, then sure.
My only question/concern is that there was a reduction in oranges that corresponds to the time around March 5th and I *would* be interested in taking this for, e.g. esr52, if this was one of the contributing factors to that.
Flags: needinfo?(jcoppeard)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #17)
This is most likely to be due to the final patch in bug 1338623 (a change which really should have been made as part of bug 1335117).  How about we just backport that patch?  These other bugs all have a bunch of dependencies and I think it's risky to start backporting everything.  I'll file a separate bug for this so it's clear(er) what's happening.
Flags: needinfo?(jcoppeard)
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main54+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.