Igor pointed out that the patch in Bug 653248 will probably lead to the cycle collector sometimes classifying reachable JS objects as unreachable, which in turn will lead to the CC incorrectly freeing DOM objects that are reachable from JS side. I'm not sure what exactly happens, but I'd imagine that in the worst case it could make freed memory reachable from JS, potentially leading to crashes and other badness.
The scenario he came up with is this: a WeakMap m is reachable from JS roots, and m contains a key k and value v which are reachable only from DOM. After the patch in Bug 653248, the GC will mark m black, and k and v grey. When the cycle collector runs, its graph will have an edge from m to v, but not from k to v (which is wrong). Because m is marked black, the CC will not traverse it. If v is part of what looks like a garbage cycle through DOM, then the CC will mark v as garbage, as it doesn't see that k is holding on to v. The DOM part of the cycle involving v will be freed, but this is still reachable from JS via m and k.
With the previous behavior, the GC would mark the value black if either the map or the key is reachable in any way (ie are marked black or grey). This leaks, as it will never free cycles that pass through a WeakMap value, but is safe because it ensures that anything reachable from a reachable value is never freed.
The short term solution is to back out the patch in Bug 653248.
Longer term, the cycle collector needs a more accurate picture of the weak map edges. It seems likely this will require making the cycle collector actually aware of WeakMaps. The basic problem is that most edges act like an "or": if an object is reachable from any incoming edge, it is reachable. For a weak map, the map and key act more like "and" wrt the value: both map and key must be reachable for the value to be reachable via the weak map.
I'll try to come up with a test for this.
I believe that the behavior before Bug 653248 landed is safe (but leaky), so this should only affect Firefox 7.
If I want to back out that patch, should I put up a patch in here to do that for review or what?
FF7 hasn't branched yet, so we should try to fix it for real (without a leak).
Created attachment 542186 [details]
test behavior of weak map and GC
As part of looking at fixing the interaction of WeakMaps and the cycle collector, I've started working on some exhaustive tests. To start with, I have this test, that tests the interaction of WeakMaps with the JS GC. It tests out various combinations of the WeakMap and key being alive or dead, and sees if the value is alive or not. The value is reachable only via the WeakMap. The only time the value should be live is when both the WeakMap and the key are alive. I do this by using finalizeCount().
Of the four combinations, most work as expected, but oddly when the WeakMap is dead but the key is live the value is not being collected. It looks like this is because the WeakMap itself is not being collected. I don't know if this is a problem with the handling of WeakMaps or (more likely...) some kind of problem with my test, as this is basically the first JS test I have written. I'm not really sure offhand how this could happen, as I'd think that if the map is unreachable that the WeakMap code wouldn't even be invoked.
Oh, and I meant to add, is there any way to examine the JS GC behavior in a script that also has DOM nodes around? I want to observe the behavior of WeakMaps when grey objects are involved (eg reachable from XPCOM). It looks like finalizeCount is only available in jstests... can those interact with the DOM? At least when being tested with the browser, of course.
In the short term, I can dump cycle collector graphs and examine those to figure out what is happening, but I wouldn't want to use those for automated regression testing.
I am guessing the stack scanner is keeping your weak map alive.
I return from the function where the weak map is defined, and overwrite the value in the local before I leave in an attempt to combat this, but I guess junk could still be hanging around somewhere the stack scanner would find it? I could try adding a random recursive loop to overwrite the stack if that might help.
Hmm. Does look like it is something along those lines. I get this when I run that same function over and over again:
counts: 0 --> 0
counts: 0 --> 1
counts: 1 --> 2
counts: 2 --> 3
counts: 3 --> 4
counts: 4 --> 5
counts: 5 --> 6
counts: 6 --> 8
First time, the value is not freed, then 1 value is freed every call, then two values are freed at the end. (This is also with a function that recurses to a depth of 1000 in an apparently vain attempt to clear out the stack.) So I guess it works out at the end.
Hmm. I think that contrary to what I said earlier the original behavior was not safe either. Consider the following slight variation on Igor's original scenario: map is marked black, key is marked grey. Value will be (incorrectly) marked black. Value connects to an object that is part of a cycle that passes through DOM, but otherwise has no references.
As in Igor's scenario, the CC will not trace through the value, because it is marked. Thus, the CC thinks that the cycle the value connects to is garbage, and collects the DOM part of it, and we end up with the same sort of dangling pointer into DOM-land.
Wait, no, I am dumb. In that case, the JS GC will mark the cycle that the value is connected to black, so the CC won't free it. Sorry for the bugspam.
I'm going to back out Bug 653248 today to fix this until we can get the CC-WeakMap interaction fixed.
I don't see a solution for this without actually recording the edges m->v and k->v (which is why we needed bug 548733, no?).
Adding m->v and k->v edges to the graph doesn't actually give us the right results. That will make v black if either m or k are black, whereas we need v to be black only if both m and k are black.
Igor had an idea that sounds reasonable to me. First, don't add any edges to the CC graph for WeakMaps. Then, after ScanRoots() has run, iterate over all of the WeakMaps. If a WeakMap and a key were marked black, flood black on the value (if it isn't black already). Otherwise, flood white (if the value is grey). Like the GC's WeakMap algorithm, if you end up doing anything during a scan, you must rescan the WeakMap list, because you may have changed the color of a WeakMap.
The only modification to his idea that I think we need is that we need to add all of the WeakMap values (and anything reachable from there) to the CC graph while we are building it, because objects may be reachable only via the WeakMap, and we don't want to have to add them later. We don't have to add the map or key, but when we're scanning the WeakMap list we need to handle the case where the key or value are not in the graph, which is the same as them being marked white.
We could also add another optimization that we only add a WeakMap value if both the map and key are in the graph (after we've otherwise finished building the graph), but this will require the same sort of doing another pass over the WeakMaps if something is changed.
Backout has landed in moz-central, so I'm marking Firefox 7 as unaffected.
(In reply to comment #12)
> The only modification to his idea that I think we need is that we need to
> add all of the WeakMap values (and anything reachable from there) to the CC
> graph while we are building it, because objects may be reachable only via
> the WeakMap, and we don't want to have to add them later.
Only if they're not black though, right?
Do you mean, we don't want to add the value if it has been marked black by the JS GC? That's a good point.
Also, we'll need to update the CC graph dumping code to do WeakMaps correctly.
The backout "fixes" this bug (on 7/central). When bug 653248 finally lands it should incorporate fixes required to make this bug not happen again.
I filed Bug 668855 for general discussion of making the cycle collector work with WeakMaps. This bug is mostly about how half a fix causes dangling pointers.
Can anyone please provide a test case or with STR / guidelines that I can use to verify this fix? Do I need to create any specific environment to test the fix?
We never had a test case. This was more of a theoretical problem.
qa- as no QA verification needed