Closed Bug 937818 Opened 11 years ago Closed 11 years ago

ICC: Safely handle object mutation in between ICC slices

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: mccr8, Assigned: mccr8)

References

Details

Attachments

(6 files, 1 obsolete file)

Bug 937766 deals with objects dying during an incremental cycle collection, but a trickier issue is handling live objects changing while we are running the incremental collection.  If this happens, we want to not end up unlinking an object that is actually alive, and we don't want to have to rescan objects.

Here's a simple example of what can go wrong. Say we have three objects, A, B and C.  B points to A.  A has a refcount of 2, from B and from some other non-CCed object.  ICC starts, and looks at A and B.  Then, B stops pointing to A, and C starts pointing to A.  Then ICC looks at C, and sees that it points at A.

At this point, the CC sees A having a refcount of 2, and thinks that B and C point to it, even though only C does.  If B and C are garbage, the CC will think A is also garbage, even though it is held alive by some non-CCed object.

The basic solution (from Levanoni and Petrank) is to notice if a new reference to an object has been created during the ICC, and if so, don't collect that object during the current collection.  Instead, treat it as suspicious for the next CC.  If the object is truly dead, we will be able to collect it then, because nothing else will touch it (this ignores weak references, which in theory can keep AddRefing and Releasing an object).

As in bug 937766, there are three cases: snow-white-able objects, JS objects, and XPCWN/XPCWJS.  For each of these, we have two stages of stuff we have to do.  First, while the browser is running, we have to do something to notice an object has been AddRefed.  Then, in ScanRoots, when we're deciding if an object is alive or not, we have to check the stored information to see what needs to be treated as live.

For snow-white-able objects, we add them to the purple buffer on AddRef, in addition to Release. Then, at the end of graph building, if an object is in the purple buffer, we treat it as being live during the current CC.  Because it is in the purple buffer, we will treat it as suspicious during the next CC. Note that this is an approximation, because if an object is Released but not AddRefed during the entire CC, it will be treated as live when it may not be.  I don't think that's a very common case, and not worth using a ref count bit to distinguish the cases.

For JS, the cycle collector is only looking at gray objects.  If a gray object is touched during ICC, it will be made black by UnmarkGray.  Thus, if a JS object has become black during the ICC, we treat it as live.  Merged JS zones have to be handled specially: we scan all zone globals.  If any are black, we treat the zone as being black.

For XPCJS/XPCWN, again they can't be put in the purple buffer. Instead, I add a dirty bit to them, which gets set when the object is AddRefed (and maybe Released for consistency?).  In ScanRoots, the CC iterates over all of these objects and checks for ones with the dirty bit set.

Logging also needs to record which objects were treated as incremental roots.
Rather than hacking up weird infrastructure for dealing with XPCWN and XPCWJ, as described in comment 0, I am planning on turning them into regular purple buffered classes (bug 944492 and bug 942528).
Hopefully BC is okay.  L64 BC is so backed up I just cancelled it.
https://tbpl.mozilla.org/?tree=Try&rev=1b251ea3b0ae
ICC uses this to track objects that have been AddRef'd during ICC graph building.
For those objects, we may not have the proper information for them, so treat them
as live.

This may be the most controversial patch. It adds a check on each AddRef, but I think
that almost all objects that are AddRefed are Released very soon after, so we're not
hitting the slow path any more often.
Attachment #8345999 - Flags: review?(bugs)
If all globals in a zone are gray, then all live objects in that zone
should also be gray.
Attachment #8346000 - Flags: review?(jcoppeard)
Any object that has been stored away somewhere in the middle of incremental graph
building must be treated as live, because we can't trust that the CC graph has
accurate information about it. If such an object is truly garbage, we'll unlink it
in the next cycle collection instead.
Attachment #8346001 - Flags: review?(bugs)
Due to graph mutation during an incremental cycle collection, objects in the CC graph
may end up with more things pointing to them than they have a ref count. However, these
objects should never become garbage.
Attachment #8346002 - Flags: review?(bugs)
Attachment #8346000 - Flags: review?(jcoppeard) → review+
The JS stuff in ScanIncrementalRoots may look a little scary in terms of performance, as it iterates over every object in the graph, but I haven't noticed it taking much time.

This is closing two TechCrunch tabs:

cc: ScanIncrementalRoots::fix JS took 1ms
cc: ScanRoots() took 7ms
cc: CollectWhite::Root took 1ms
cc: CollectWhite::Unlink took 15ms
cc: total cycle collector time was 393ms
cc: visited 15346 ref counted and 56503 GCed objects, freed 15058 ref counted and 56368 GCed objects.

That's after spending a total of about 70ms building the graph.  I should actually change how the TimeLog is done for ScanIncrementalRoots, come to think of it...
Any object that has been stored away somewhere in the middle of incremental graph
building must be treated as live, because we can't trust that the CC graph has
accurate information about it. If such an object is truly garbage, we'll unlink it
in the next cycle collection instead.
Attachment #8346001 - Attachment is obsolete: true
Attachment #8346001 - Flags: review?(bugs)
Attachment #8346977 - Flags: review?(bugs)
Comment on attachment 8345999 [details] [diff] [review]
part 1 - Add objects to the purple buffer on AddRef.

>-    nsrefcnt count = mRefCnt.incr();                                          \
>+    nsrefcnt count = mRefCnt.incr(static_cast<void*>(this),                   \
>+                   _class::NS_CYCLE_COLLECTION_INNERCLASS::GetParticipant()); \
A bit odd indentation.
Perhaps 
>+    nsrefcnt count =                                                          \
>+       mRefCnt.incr(static_cast<void*>(this),                                 \
>+                   _class::NS_CYCLE_COLLECTION_INNERCLASS::GetParticipant()); \
Attachment #8345999 - Flags: review?(bugs) → review+
Attachment #8346002 - Flags: review?(bugs) → review+
Attachment #8346977 - Flags: review?(bugs) → review+
But how does this all work with forgetSkippable which may remove stuff from purple buffer?
Perhaps Bug 937960 explains.
Good point.  There are two factors.  First, when we start up the slice timer, we disable the forgetSkippable timer, so they shouldn't be happening at the same time.

Secondly, and more fundamentally, forgetSkippable should only be removing things that are definitely alive, so if we did scan them, they'd just be registered as live anyways.  If something goes from definitely alive to not definitely alive, then it should get added to the purple buffer, and because it isn't definitely alive any more, it shouldn't get removed from the purple buffer.
I suppose if something it not definitely alive during an ICC, then we scan it, then we addref it again and make it definitely alive, then we could end up removing it from the purple buffer, if we ran forgetSkippable during a CC, so we need to not do that.  I guess I should add an assert to that effect, if there isn't already one.
Reading over the code, I think it is possible that we can trigger ForgetSkippable during the middle of an incremental cycle collection, but only when somebody calls CycleCollectNow and explicitly requests more forget skippables, which I don't think actually happens now.  I'll make ForgetSkippable not do anything if we're in the middle of a CC.  I don't think anything should depend on that right now.  Good catch!
For now, I'll just add a dynamic check for idleness.  I filed bug 950949 for fixing this the right way, in the weird case where somebody wants extra forgetSkippable calls.
(In reply to Andrew McCreight [:mccr8] from comment #15)
> I fixed the indentation, and a dynamic check that we're idle in
> ForgetSkippable.
> 

Hi, sorry had to backout this changes in https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=03ff4ad4bc05 because of perma red testfailures like https://tbpl.mozilla.org/php/getParsedLog.php?id=32075416&tree=Mozilla-Inbound
The problem is that this made the cycle collecting AddRef functions possibly GC, because now they can call SuspectAfterShutdown, where they can call destructors, which looks like they can, through some convoluted path GC.  Of course, Release is already like that (and I confirmed that at least in one case the Release can GC).  Next I'll look into why we don't get hazards from Release, which will either be some kind of odd whitelisting, or somebody actually just fixed all the problems.  Hopefully there's a safe way to do the former, because my patch added over 200 new hazards.
After thinking about it some more, this is just a false positive. We should always call Suspect with a positive refcount from AddRef, so we will never invoke the destructor. To work around this, I've created a version of Suspect that never calls delete, and used that from AddRef.  Hopefully that will work.  I have no idea why Release is okay!  Maybe there are just no hazard sites.

analysis run: https://tbpl.mozilla.org/?tree=Try&rev=c2a5dc82ba35
debug mochitests: https://tbpl.mozilla.org/?tree=Try&rev=7653736e168e

(for the record: previous try run, which did not include the hazard analysis: https://tbpl.mozilla.org/?tree=Try&rev=4779339e52ab )
This will get folded into part 1.  I'll ask for review when it is a little further along on try.
In Bind, we generate code like this:
  JSObject *obj;
  obj = JS_NewObject(aCx, &Class.mBase, proto, parent);
  [...]
  js::SetReservedSlot(obj, DOM_OBJECT_SLOT, PRIVATE_TO_JSVAL(aObject));
  NS_ADDREF(aObject);
  aCache->SetWrapper(obj);

With patch 1 in this bug, the rooting analysis thinks that AddRef could GC for
two reasons, due to calling NS_CycleCollectorSuspect3:

1. If you pass in an object to suspect with a refcount of 0, we may call a destructor,
which could GC. Of course, we don't actually do that here, and I could work around
this by making a special version of Suspect that never does that.

2. In debug builds, there's an assert that calls QI, which could GC.

Now, the analysis is supposed to not treat AddRef or Release for nsISupports objects
as possibly GCing, so I'm not sure why this is failing anyways, but I don't really see
why that's okay to do, so I'd rather just fix it.

With this patch, we'll root |obj|.

My main concern here is that this code is code could be very hot, so that rooting another
var would be too slow, but it looks like there are another few Roots in this function
already, plus the call to JS_NewObject, so hopefully it looks okay to you, Boris.

I could probably also mess around with codegen and call SetWrapper before the AddRef if
that seems better.

(This fixes all of the rooting hazards from my patches.)
Attachment #8349104 - Flags: review?(bzbarsky)
Comment on attachment 8349104 [details] [diff] [review]
part 0 - Root newly created objects in Wrap across ADDREF. r=bz

This seems fine, though maybe we should just simplify and always root?

The one worry I have is whether it's OK to GC before we've done the SetWrapper() call, but after the JS object points to the C++ one via the slot.  I guess it should be OK, since it doesn't do anything interesting with that slot...
Attachment #8349104 - Flags: review?(bzbarsky) → review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: