Closed Bug 937818 Opened 11 years ago Closed 11 years ago

ICC: Safely handle object mutation in between ICC slices

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla29

People

(Reporter: mccr8, Assigned: mccr8)

References

Details

Attachments

(6 files, 1 obsolete file)

part 1 - Add objects to the purple buffer on AddRef. 11 years ago Andrew McCreight [:mccr8] 12.52 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
part 2 - Add js::ZoneGlobalsAreAllGray. 11 years ago Andrew McCreight [:mccr8] 1.61 KB, patch	jonco : review+	Details \| Diff \| Splinter Review
part 3 - Add ScanIncrementalRoots(). 11 years ago Andrew McCreight [:mccr8] 7.05 KB, patch		Details \| Diff \| Splinter Review
part 4 - Exceeded refcount nodes should already be black. 11 years ago Andrew McCreight [:mccr8] 3.33 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
part 3 - Add ScanIncrementalRoots(). 11 years ago Andrew McCreight [:mccr8] 7.83 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
part 1b - AddRef should call a specialized form of Suspect that really doesn't destruct anything. 11 years ago Andrew McCreight [:mccr8] 4.32 KB, patch		Details \| Diff \| Splinter Review
part 0 - Root newly created objects in Wrap across ADDREF. r=bz 11 years ago Andrew McCreight [:mccr8] 1.38 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review

Andrew McCreight [:mccr8]

Assignee

Description

•

11 years ago

Bug 937766 deals with objects dying during an incremental cycle collection, but a trickier issue is handling live objects changing while we are running the incremental collection. If this happens, we want to not end up unlinking an object that is actually alive, and we don't want to have to rescan objects.

Here's a simple example of what can go wrong. Say we have three objects, A, B and C. B points to A. A has a refcount of 2, from B and from some other non-CCed object. ICC starts, and looks at A and B. Then, B stops pointing to A, and C starts pointing to A. Then ICC looks at C, and sees that it points at A.

At this point, the CC sees A having a refcount of 2, and thinks that B and C point to it, even though only C does. If B and C are garbage, the CC will think A is also garbage, even though it is held alive by some non-CCed object.

The basic solution (from Levanoni and Petrank) is to notice if a new reference to an object has been created during the ICC, and if so, don't collect that object during the current collection. Instead, treat it as suspicious for the next CC. If the object is truly dead, we will be able to collect it then, because nothing else will touch it (this ignores weak references, which in theory can keep AddRefing and Releasing an object).

As in bug 937766, there are three cases: snow-white-able objects, JS objects, and XPCWN/XPCWJS. For each of these, we have two stages of stuff we have to do. First, while the browser is running, we have to do something to notice an object has been AddRefed. Then, in ScanRoots, when we're deciding if an object is alive or not, we have to check the stored information to see what needs to be treated as live.

For snow-white-able objects, we add them to the purple buffer on AddRef, in addition to Release. Then, at the end of graph building, if an object is in the purple buffer, we treat it as being live during the current CC. Because it is in the purple buffer, we will treat it as suspicious during the next CC. Note that this is an approximation, because if an object is Released but not AddRefed during the entire CC, it will be treated as live when it may not be. I don't think that's a very common case, and not worth using a ref count bit to distinguish the cases.

For JS, the cycle collector is only looking at gray objects. If a gray object is touched during ICC, it will be made black by UnmarkGray. Thus, if a JS object has become black during the ICC, we treat it as live. Merged JS zones have to be handled specially: we scan all zone globals. If any are black, we treat the zone as being black.

For XPCJS/XPCWN, again they can't be put in the purple buffer. Instead, I add a dirty bit to them, which gets set when the object is AddRefed (and maybe Released for consistency?). In ScanRoots, the CC iterates over all of these objects and checks for ones with the dirty bit set.

Logging also needs to record which objects were treated as incremental roots.

Andrew McCreight [:mccr8]

Assignee

Comment 1

•

11 years ago

Rather than hacking up weird infrastructure for dealing with XPCWN and XPCWJ, as described in comment 0, I am planning on turning them into regular purple buffered classes (bug 944492 and bug 942528).

Andrew McCreight [:mccr8]

Assignee

Comment 2

•

11 years ago

Hopefully BC is okay.  L64 BC is so backed up I just cancelled it.
https://tbpl.mozilla.org/?tree=Try&rev=1b251ea3b0ae

Andrew McCreight [:mccr8]

Assignee

Comment 3

•

11 years ago

Attached patch part 1 - Add objects to the purple buffer on AddRef. — Details — Splinter Review

ICC uses this to track objects that have been AddRef'd during ICC graph building.
For those objects, we may not have the proper information for them, so treat them
as live.

This may be the most controversial patch. It adds a check on each AddRef, but I think
that almost all objects that are AddRefed are Released very soon after, so we're not
hitting the slow path any more often.

Attachment #8345999 - Flags: review?(bugs)

Andrew McCreight [:mccr8]

Assignee

Comment 4

•

11 years ago

Attached patch part 2 - Add js::ZoneGlobalsAreAllGray. — Details — Splinter Review

If all globals in a zone are gray, then all live objects in that zone
should also be gray.

Attachment #8346000 - Flags: review?(jcoppeard)

Andrew McCreight [:mccr8]

Assignee

Comment 5

•

11 years ago

Attached patch part 3 - Add ScanIncrementalRoots(). (obsolete) — Details — Splinter Review

Any object that has been stored away somewhere in the middle of incremental graph
building must be treated as live, because we can't trust that the CC graph has
accurate information about it. If such an object is truly garbage, we'll unlink it
in the next cycle collection instead.

Attachment #8346001 - Flags: review?(bugs)

Andrew McCreight [:mccr8]

Assignee

Comment 6

•

11 years ago

Attached patch part 4 - Exceeded refcount nodes should already be black. — Details — Splinter Review

Due to graph mutation during an incremental cycle collection, objects in the CC graph
may end up with more things pointing to them than they have a ref count. However, these
objects should never become garbage.

Attachment #8346002 - Flags: review?(bugs)

Jon Coppeard (:jonco)

Updated

•

11 years ago

Attachment #8346000 - Flags: review?(jcoppeard) → review+

Andrew McCreight [:mccr8]

Assignee

Comment 7

•

11 years ago

The JS stuff in ScanIncrementalRoots may look a little scary in terms of performance, as it iterates over every object in the graph, but I haven't noticed it taking much time.

This is closing two TechCrunch tabs:

cc: ScanIncrementalRoots::fix JS took 1ms
cc: ScanRoots() took 7ms
cc: CollectWhite::Root took 1ms
cc: CollectWhite::Unlink took 15ms
cc: total cycle collector time was 393ms
cc: visited 15346 ref counted and 56503 GCed objects, freed 15058 ref counted and 56368 GCed objects.

That's after spending a total of about 70ms building the graph.  I should actually change how the TimeLog is done for ScanIncrementalRoots, come to think of it...

Andrew McCreight [:mccr8]

Assignee

Comment 8

•

11 years ago

Attached patch part 3 - Add ScanIncrementalRoots(). — Details — Splinter Review

Any object that has been stored away somewhere in the middle of incremental graph
building must be treated as live, because we can't trust that the CC graph has
accurate information about it. If such an object is truly garbage, we'll unlink it
in the next cycle collection instead.

Attachment #8346001 - Attachment is obsolete: true

Attachment #8346001 - Flags: review?(bugs)

Attachment #8346977 - Flags: review?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 9

•

11 years ago

Comment on attachment 8345999 [details] [diff] [review]
part 1 - Add objects to the purple buffer on AddRef.

>-    nsrefcnt count = mRefCnt.incr();                                          \
>+    nsrefcnt count = mRefCnt.incr(static_cast<void*>(this),                   \
>+                   _class::NS_CYCLE_COLLECTION_INNERCLASS::GetParticipant()); \
A bit odd indentation.
Perhaps 
>+    nsrefcnt count =                                                          \
>+       mRefCnt.incr(static_cast<void*>(this),                                 \
>+                   _class::NS_CYCLE_COLLECTION_INNERCLASS::GetParticipant()); \

Attachment #8345999 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

11 years ago

Attachment #8346002 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

11 years ago

Attachment #8346977 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 10

•

11 years ago

But how does this all work with forgetSkippable which may remove stuff from purple buffer?
Perhaps Bug 937960 explains.

Andrew McCreight [:mccr8]

Assignee

Comment 11

•

11 years ago

Good point.  There are two factors.  First, when we start up the slice timer, we disable the forgetSkippable timer, so they shouldn't be happening at the same time.

Secondly, and more fundamentally, forgetSkippable should only be removing things that are definitely alive, so if we did scan them, they'd just be registered as live anyways.  If something goes from definitely alive to not definitely alive, then it should get added to the purple buffer, and because it isn't definitely alive any more, it shouldn't get removed from the purple buffer.

Andrew McCreight [:mccr8]

Assignee

Comment 12

•

11 years ago

I suppose if something it not definitely alive during an ICC, then we scan it, then we addref it again and make it definitely alive, then we could end up removing it from the purple buffer, if we ran forgetSkippable during a CC, so we need to not do that.  I guess I should add an assert to that effect, if there isn't already one.

Andrew McCreight [:mccr8]

Assignee

Comment 13

•

11 years ago

Reading over the code, I think it is possible that we can trigger ForgetSkippable during the middle of an incremental cycle collection, but only when somebody calls CycleCollectNow and explicitly requests more forget skippables, which I don't think actually happens now.  I'll make ForgetSkippable not do anything if we're in the middle of a CC.  I don't think anything should depend on that right now.  Good catch!

Andrew McCreight [:mccr8]

Assignee

Comment 14

•

11 years ago

For now, I'll just add a dynamic check for idleness.  I filed bug 950949 for fixing this the right way, in the weird case where somebody wants extra forgetSkippable calls.

Andrew McCreight [:mccr8]

Assignee

Comment 15

•

11 years ago

I fixed the indentation, and a dynamic check that we're idle in ForgetSkippable.

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/261091719842
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/8f48aaba4752
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/a9715a3b0e13
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/ae01d2862600

Carsten Book [:Tomcat]

Comment 16

•

11 years ago

(In reply to Andrew McCreight [:mccr8] from comment #15)
> I fixed the indentation, and a dynamic check that we're idle in
> ForgetSkippable.
> 

Hi, sorry had to backout this changes in https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=03ff4ad4bc05 because of perma red testfailures like https://tbpl.mozilla.org/php/getParsedLog.php?id=32075416&tree=Mozilla-Inbound

Andrew McCreight [:mccr8]

Assignee

Comment 17

•

11 years ago

The problem is that this made the cycle collecting AddRef functions possibly GC, because now they can call SuspectAfterShutdown, where they can call destructors, which looks like they can, through some convoluted path GC.  Of course, Release is already like that (and I confirmed that at least in one case the Release can GC).  Next I'll look into why we don't get hazards from Release, which will either be some kind of odd whitelisting, or somebody actually just fixed all the problems.  Hopefully there's a safe way to do the former, because my patch added over 200 new hazards.

Andrew McCreight [:mccr8]

Assignee

Comment 18

•

11 years ago

After thinking about it some more, this is just a false positive. We should always call Suspect with a positive refcount from AddRef, so we will never invoke the destructor. To work around this, I've created a version of Suspect that never calls delete, and used that from AddRef.  Hopefully that will work.  I have no idea why Release is okay!  Maybe there are just no hazard sites.

analysis run: https://tbpl.mozilla.org/?tree=Try&rev=c2a5dc82ba35
debug mochitests: https://tbpl.mozilla.org/?tree=Try&rev=7653736e168e

(for the record: previous try run, which did not include the hazard analysis: https://tbpl.mozilla.org/?tree=Try&rev=4779339e52ab )

Andrew McCreight [:mccr8]

Assignee

Comment 19

•

11 years ago

Attached patch part 1b - AddRef should call a specialized form of Suspect that really doesn't destruct anything. — Details — Splinter Review

This will get folded into part 1.  I'll ask for review when it is a little further along on try.

Andrew McCreight [:mccr8]

Assignee

Comment 20

•

11 years ago

Attached patch part 0 - Root newly created objects in Wrap across ADDREF. r=bz — Details — Splinter Review

In Bind, we generate code like this:
  JSObject *obj;
  obj = JS_NewObject(aCx, &Class.mBase, proto, parent);
  [...]
  js::SetReservedSlot(obj, DOM_OBJECT_SLOT, PRIVATE_TO_JSVAL(aObject));
  NS_ADDREF(aObject);
  aCache->SetWrapper(obj);

With patch 1 in this bug, the rooting analysis thinks that AddRef could GC for
two reasons, due to calling NS_CycleCollectorSuspect3:

1. If you pass in an object to suspect with a refcount of 0, we may call a destructor,
which could GC. Of course, we don't actually do that here, and I could work around
this by making a special version of Suspect that never does that.

2. In debug builds, there's an assert that calls QI, which could GC.

Now, the analysis is supposed to not treat AddRef or Release for nsISupports objects
as possibly GCing, so I'm not sure why this is failing anyways, but I don't really see
why that's okay to do, so I'd rather just fix it.

With this patch, we'll root |obj|.

My main concern here is that this code is code could be very hot, so that rooting another
var would be too slow, but it looks like there are another few Roots in this function
already, plus the call to JS_NewObject, so hopefully it looks okay to you, Boris.

I could probably also mess around with codegen and call SetWrapper before the AddRef if
that seems better.

(This fixes all of the rooting hazards from my patches.)

Attachment #8349104 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 21

•

11 years ago

Comment on attachment 8349104 [details] [diff] [review]
part 0 - Root newly created objects in Wrap across ADDREF. r=bz

This seems fine, though maybe we should just simplify and always root?

The one worry I have is whether it's OK to GC before we've done the SetWrapper() call, but after the JS object points to the C++ one via the slot.  I guess it should be OK, since it doesn't do anything interesting with that slot...

Attachment #8349104 - Flags: review?(bzbarsky) → review+

Andrew McCreight [:mccr8]

Assignee

Comment 22

•

11 years ago

I changed it to always root, and added a comment that we don't actually need to always root.

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/654769041ad1
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/36a1e65b9497
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/72c963039107
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/b4a69193ee33
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/3d96bb9ca5b1

Carsten Book [:Tomcat]

Comment 23

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/654769041ad1
https://hg.mozilla.org/mozilla-central/rev/36a1e65b9497
https://hg.mozilla.org/mozilla-central/rev/72c963039107
https://hg.mozilla.org/mozilla-central/rev/b4a69193ee33
https://hg.mozilla.org/mozilla-central/rev/3d96bb9ca5b1

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla29

You need to log in before you can comment on or make changes to this bug.