Closed
Bug 937818
Opened 11 years ago
Closed 11 years ago
ICC: Safely handle object mutation in between ICC slices
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla29
People
(Reporter: mccr8, Assigned: mccr8)
References
Details
Attachments
(6 files, 1 obsolete file)
12.52 KB,
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
1.61 KB,
patch
|
jonco
:
review+
|
Details | Diff | Splinter Review |
3.33 KB,
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
7.83 KB,
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
4.32 KB,
patch
|
Details | Diff | Splinter Review | |
1.38 KB,
patch
|
bzbarsky
:
review+
|
Details | Diff | Splinter Review |
Bug 937766 deals with objects dying during an incremental cycle collection, but a trickier issue is handling live objects changing while we are running the incremental collection. If this happens, we want to not end up unlinking an object that is actually alive, and we don't want to have to rescan objects. Here's a simple example of what can go wrong. Say we have three objects, A, B and C. B points to A. A has a refcount of 2, from B and from some other non-CCed object. ICC starts, and looks at A and B. Then, B stops pointing to A, and C starts pointing to A. Then ICC looks at C, and sees that it points at A. At this point, the CC sees A having a refcount of 2, and thinks that B and C point to it, even though only C does. If B and C are garbage, the CC will think A is also garbage, even though it is held alive by some non-CCed object. The basic solution (from Levanoni and Petrank) is to notice if a new reference to an object has been created during the ICC, and if so, don't collect that object during the current collection. Instead, treat it as suspicious for the next CC. If the object is truly dead, we will be able to collect it then, because nothing else will touch it (this ignores weak references, which in theory can keep AddRefing and Releasing an object). As in bug 937766, there are three cases: snow-white-able objects, JS objects, and XPCWN/XPCWJS. For each of these, we have two stages of stuff we have to do. First, while the browser is running, we have to do something to notice an object has been AddRefed. Then, in ScanRoots, when we're deciding if an object is alive or not, we have to check the stored information to see what needs to be treated as live. For snow-white-able objects, we add them to the purple buffer on AddRef, in addition to Release. Then, at the end of graph building, if an object is in the purple buffer, we treat it as being live during the current CC. Because it is in the purple buffer, we will treat it as suspicious during the next CC. Note that this is an approximation, because if an object is Released but not AddRefed during the entire CC, it will be treated as live when it may not be. I don't think that's a very common case, and not worth using a ref count bit to distinguish the cases. For JS, the cycle collector is only looking at gray objects. If a gray object is touched during ICC, it will be made black by UnmarkGray. Thus, if a JS object has become black during the ICC, we treat it as live. Merged JS zones have to be handled specially: we scan all zone globals. If any are black, we treat the zone as being black. For XPCJS/XPCWN, again they can't be put in the purple buffer. Instead, I add a dirty bit to them, which gets set when the object is AddRefed (and maybe Released for consistency?). In ScanRoots, the CC iterates over all of these objects and checks for ones with the dirty bit set. Logging also needs to record which objects were treated as incremental roots.
Assignee | ||
Comment 1•11 years ago
|
||
Rather than hacking up weird infrastructure for dealing with XPCWN and XPCWJ, as described in comment 0, I am planning on turning them into regular purple buffered classes (bug 944492 and bug 942528).
Assignee | ||
Comment 2•11 years ago
|
||
Hopefully BC is okay. L64 BC is so backed up I just cancelled it. https://tbpl.mozilla.org/?tree=Try&rev=1b251ea3b0ae
Assignee | ||
Comment 3•11 years ago
|
||
ICC uses this to track objects that have been AddRef'd during ICC graph building. For those objects, we may not have the proper information for them, so treat them as live. This may be the most controversial patch. It adds a check on each AddRef, but I think that almost all objects that are AddRefed are Released very soon after, so we're not hitting the slow path any more often.
Attachment #8345999 -
Flags: review?(bugs)
Assignee | ||
Comment 4•11 years ago
|
||
If all globals in a zone are gray, then all live objects in that zone should also be gray.
Attachment #8346000 -
Flags: review?(jcoppeard)
Assignee | ||
Comment 5•11 years ago
|
||
Any object that has been stored away somewhere in the middle of incremental graph building must be treated as live, because we can't trust that the CC graph has accurate information about it. If such an object is truly garbage, we'll unlink it in the next cycle collection instead.
Attachment #8346001 -
Flags: review?(bugs)
Assignee | ||
Comment 6•11 years ago
|
||
Due to graph mutation during an incremental cycle collection, objects in the CC graph may end up with more things pointing to them than they have a ref count. However, these objects should never become garbage.
Attachment #8346002 -
Flags: review?(bugs)
Updated•11 years ago
|
Attachment #8346000 -
Flags: review?(jcoppeard) → review+
Assignee | ||
Comment 7•11 years ago
|
||
The JS stuff in ScanIncrementalRoots may look a little scary in terms of performance, as it iterates over every object in the graph, but I haven't noticed it taking much time. This is closing two TechCrunch tabs: cc: ScanIncrementalRoots::fix JS took 1ms cc: ScanRoots() took 7ms cc: CollectWhite::Root took 1ms cc: CollectWhite::Unlink took 15ms cc: total cycle collector time was 393ms cc: visited 15346 ref counted and 56503 GCed objects, freed 15058 ref counted and 56368 GCed objects. That's after spending a total of about 70ms building the graph. I should actually change how the TimeLog is done for ScanIncrementalRoots, come to think of it...
Assignee | ||
Comment 8•11 years ago
|
||
Any object that has been stored away somewhere in the middle of incremental graph building must be treated as live, because we can't trust that the CC graph has accurate information about it. If such an object is truly garbage, we'll unlink it in the next cycle collection instead.
Attachment #8346001 -
Attachment is obsolete: true
Attachment #8346001 -
Flags: review?(bugs)
Attachment #8346977 -
Flags: review?(bugs)
Comment 9•11 years ago
|
||
Comment on attachment 8345999 [details] [diff] [review] part 1 - Add objects to the purple buffer on AddRef. >- nsrefcnt count = mRefCnt.incr(); \ >+ nsrefcnt count = mRefCnt.incr(static_cast<void*>(this), \ >+ _class::NS_CYCLE_COLLECTION_INNERCLASS::GetParticipant()); \ A bit odd indentation. Perhaps >+ nsrefcnt count = \ >+ mRefCnt.incr(static_cast<void*>(this), \ >+ _class::NS_CYCLE_COLLECTION_INNERCLASS::GetParticipant()); \
Attachment #8345999 -
Flags: review?(bugs) → review+
Updated•11 years ago
|
Attachment #8346002 -
Flags: review?(bugs) → review+
Updated•11 years ago
|
Attachment #8346977 -
Flags: review?(bugs) → review+
Comment 10•11 years ago
|
||
But how does this all work with forgetSkippable which may remove stuff from purple buffer? Perhaps Bug 937960 explains.
Assignee | ||
Comment 11•11 years ago
|
||
Good point. There are two factors. First, when we start up the slice timer, we disable the forgetSkippable timer, so they shouldn't be happening at the same time. Secondly, and more fundamentally, forgetSkippable should only be removing things that are definitely alive, so if we did scan them, they'd just be registered as live anyways. If something goes from definitely alive to not definitely alive, then it should get added to the purple buffer, and because it isn't definitely alive any more, it shouldn't get removed from the purple buffer.
Assignee | ||
Comment 12•11 years ago
|
||
I suppose if something it not definitely alive during an ICC, then we scan it, then we addref it again and make it definitely alive, then we could end up removing it from the purple buffer, if we ran forgetSkippable during a CC, so we need to not do that. I guess I should add an assert to that effect, if there isn't already one.
Assignee | ||
Comment 13•11 years ago
|
||
Reading over the code, I think it is possible that we can trigger ForgetSkippable during the middle of an incremental cycle collection, but only when somebody calls CycleCollectNow and explicitly requests more forget skippables, which I don't think actually happens now. I'll make ForgetSkippable not do anything if we're in the middle of a CC. I don't think anything should depend on that right now. Good catch!
Assignee | ||
Comment 14•11 years ago
|
||
For now, I'll just add a dynamic check for idleness. I filed bug 950949 for fixing this the right way, in the weird case where somebody wants extra forgetSkippable calls.
Assignee | ||
Comment 15•11 years ago
|
||
I fixed the indentation, and a dynamic check that we're idle in ForgetSkippable. remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/261091719842 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/8f48aaba4752 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/a9715a3b0e13 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/ae01d2862600
Comment 16•11 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #15) > I fixed the indentation, and a dynamic check that we're idle in > ForgetSkippable. > Hi, sorry had to backout this changes in https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=03ff4ad4bc05 because of perma red testfailures like https://tbpl.mozilla.org/php/getParsedLog.php?id=32075416&tree=Mozilla-Inbound
Assignee | ||
Comment 17•11 years ago
|
||
The problem is that this made the cycle collecting AddRef functions possibly GC, because now they can call SuspectAfterShutdown, where they can call destructors, which looks like they can, through some convoluted path GC. Of course, Release is already like that (and I confirmed that at least in one case the Release can GC). Next I'll look into why we don't get hazards from Release, which will either be some kind of odd whitelisting, or somebody actually just fixed all the problems. Hopefully there's a safe way to do the former, because my patch added over 200 new hazards.
Assignee | ||
Comment 18•11 years ago
|
||
After thinking about it some more, this is just a false positive. We should always call Suspect with a positive refcount from AddRef, so we will never invoke the destructor. To work around this, I've created a version of Suspect that never calls delete, and used that from AddRef. Hopefully that will work. I have no idea why Release is okay! Maybe there are just no hazard sites. analysis run: https://tbpl.mozilla.org/?tree=Try&rev=c2a5dc82ba35 debug mochitests: https://tbpl.mozilla.org/?tree=Try&rev=7653736e168e (for the record: previous try run, which did not include the hazard analysis: https://tbpl.mozilla.org/?tree=Try&rev=4779339e52ab )
Assignee | ||
Comment 19•11 years ago
|
||
This will get folded into part 1. I'll ask for review when it is a little further along on try.
Assignee | ||
Comment 20•11 years ago
|
||
In Bind, we generate code like this: JSObject *obj; obj = JS_NewObject(aCx, &Class.mBase, proto, parent); [...] js::SetReservedSlot(obj, DOM_OBJECT_SLOT, PRIVATE_TO_JSVAL(aObject)); NS_ADDREF(aObject); aCache->SetWrapper(obj); With patch 1 in this bug, the rooting analysis thinks that AddRef could GC for two reasons, due to calling NS_CycleCollectorSuspect3: 1. If you pass in an object to suspect with a refcount of 0, we may call a destructor, which could GC. Of course, we don't actually do that here, and I could work around this by making a special version of Suspect that never does that. 2. In debug builds, there's an assert that calls QI, which could GC. Now, the analysis is supposed to not treat AddRef or Release for nsISupports objects as possibly GCing, so I'm not sure why this is failing anyways, but I don't really see why that's okay to do, so I'd rather just fix it. With this patch, we'll root |obj|. My main concern here is that this code is code could be very hot, so that rooting another var would be too slow, but it looks like there are another few Roots in this function already, plus the call to JS_NewObject, so hopefully it looks okay to you, Boris. I could probably also mess around with codegen and call SetWrapper before the AddRef if that seems better. (This fixes all of the rooting hazards from my patches.)
Attachment #8349104 -
Flags: review?(bzbarsky)
Comment 21•11 years ago
|
||
Comment on attachment 8349104 [details] [diff] [review] part 0 - Root newly created objects in Wrap across ADDREF. r=bz This seems fine, though maybe we should just simplify and always root? The one worry I have is whether it's OK to GC before we've done the SetWrapper() call, but after the JS object points to the C++ one via the slot. I guess it should be OK, since it doesn't do anything interesting with that slot...
Attachment #8349104 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Comment 22•11 years ago
|
||
I changed it to always root, and added a comment that we don't actually need to always root. remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/654769041ad1 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/36a1e65b9497 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/72c963039107 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/b4a69193ee33 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/3d96bb9ca5b1
Comment 23•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/654769041ad1 https://hg.mozilla.org/mozilla-central/rev/36a1e65b9497 https://hg.mozilla.org/mozilla-central/rev/72c963039107 https://hg.mozilla.org/mozilla-central/rev/b4a69193ee33 https://hg.mozilla.org/mozilla-central/rev/3d96bb9ca5b1
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
You need to log in
before you can comment on or make changes to this bug.
Description
•