Closed Bug 597736 Opened 14 years ago Closed 14 years ago

TM: TreeFragment leak

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+
blocking1.9.2 --- -
status1.9.2 --- wontfix
status1.9.1 --- unaffected

People

(Reporter: igor, Assigned: igor)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 3 obsolete files)

The TreeFragment GC is implemented using the following algorithm:

1. During the marking phase enumerate in TraceMonitor::mark all fragments and mark all the GC things they contain.

2. During the sweeping phase when a script is destroyed purge the corresponding fragment.

Now suppose a trace contains an object that references the script itself. Then a normal GC would not collect it as TraceMonitor::mark would mark it creating a noncollectable loop. So such script would leak until it is expunged from the cache. That may take a while.
blocking2.0: --- → beta7+
Nice find!
Depends on: 597906
Here is a test case that demonstrates the leak (it is based on functions that I add to js shell in bug 597906):

function test() {
    for (var j = 0; j != 2; ++j) {
	var f = Function("a", "var s = 0; for (var i = 0; i != 100; ++i) s += a.b; return s;");

	var c = {b: 1, f: f, leakDetection: makeFinalizeObserver()};
	f({ __proto__: { __proto__: c}});
	f = c = a = null;
	gc();
    }
}

var base = finalizeCount();
test();
gc();
gc();
var n = finalizeCount();
if (base == finalizeCount())
    print("Leak");


Here the leak comes from an object embedded into the trace as a cached prototype.

Note that at shutdown, due to the JS_CommenceRuntimeShutDown call, all code caches effectively flushed before the final GC. For this reason the leak persists only until the last GC is run. Also, due to the size limit on the  code cache, eventually the generated traces would be expunged from it allowing the garbage scripts to be collected.

What is puzzling is that why this leak was not observed before. The global object is often embedded into the trace and the global object contains references to scripts of all functions that are compiled against it. I guess the size of the code cache plays a role here.
The code cache is flushed very frequently (minimum every time the window navigates, often more frequently). This might change soon though, so certainly a very nice fix.
To fix this my plan is to replace the fragment marking code in TraceMonitor::mark with something that sweeps all fragments right after marking phase and remove any that has about to be finalized gc things. Then TraceMonitor::mark will only mark the recorder and currently executed traces. This should also allow to remove PurgeScriptFragments that can be rather expensive with a big fragment cache.

I hope I have not missed something big here.
Complication for the above schema: if PurgeScriptFragments is removed, then the fragment sweep should also check if the script it refers to is AboutToBeFinalized. But the script is not a GC thing so js_AboutToBeFinalized cannot be called on it. I guess for now I should keep PurgeScriptFragments.
Attached patch v1 (obsolete) — Splinter Review
The patch flushes the whole tree if any of its fragments have dead GC things. This is enough to fix the leak but it could be too gross. A better way would be to flush just a particular fragment. But is is sound idea? I.e. is it possible to just remove a particular fragment without touching too much code?

Also for now I have kept PurgeScriptFragments until we have something like JSScript::isMarked.
Attachment #476794 - Flags: feedback?(gal)
blocking2.0: beta7+ → betaN+
Attached patch v2 (obsolete) — Splinter Review
Attachment #476794 - Attachment is obsolete: true
Attachment #477578 - Flags: review?(gal)
Attachment #476794 - Flags: feedback?(gal)
Comment on attachment 477578 [details] [diff] [review]
v2

How much time does this take? Can you do some quick measurements? (I like the patch in general, a lot actually).
(In reply to comment #8)
> Comment on attachment 477578 [details] [diff] [review]
> v2
> 
> How much time does this take? 

Do you worry about killing all the fragments for the given tree even if just one embeds a GC thing?
Not really. We can rebuild fragments easily. Purge whatever gets in your way. I am just worried about sweep time.
The patch effectively moves the fragment walk from TraceMonitor::mark into TraceMonitor::sweep replacing, for fragments with GC things, Mark(trc, v.asGCThing(), v.gcKind()) calls with js_IsAboutToBeFinalized. The latter is strictly faster. So the slowdown can only happen if the patch would trash a lot of dead trees. But those trees must be trushed at some point in any case so the extra trash activity should be temporary.
Comment on attachment 477578 [details] [diff] [review]
v2

There is no point to mark the recorded fragment - if it already contains dead gc things we better abort the recording.
Attachment #477578 - Attachment is obsolete: true
Attachment #477578 - Flags: review?(gal)
Attached patch v3 (obsolete) — Splinter Review
The new version removes TraceMonitor::mark moving all its functionality into TraceMonitor::sweep() and extra asserts.
Attachment #477644 - Flags: review?(gal)
Attached patch v4Splinter Review
Here is a rebased patch
Attachment #477644 - Attachment is obsolete: true
Attachment #478543 - Flags: review?(gal)
Attachment #477644 - Flags: review?(gal)
Comment on attachment 478543 [details] [diff] [review]
v4

The XXX should either big fixed in the patch, or get a proper bug #.
Attachment #478543 - Flags: review?(gal) → review+
http://hg.mozilla.org/tracemonkey/rev/b079aae53212
Whiteboard: fixed-in-tracemonkey
Should this leak be addressed on 1.9.2 branch?
blocking1.9.2: --- → ?
On the understanding that it's just a memory leak we don't want to muck with GC on the branches if we don't have to. Also slightly concerned that you appear to be removing a public entrypoint--JS_CommenceRuntimeShutDown()--which we frown on for branches.

But maybe this makes applying future security patches in this area harder? That would make the patch more interesting to us for the branch, but we can reconsider at that time if it comes to pass.
blocking1.9.2: ? → -
Dan, JS API entry points are not used in the same way as public (frozen or not) XPCOM interfaces. JS_CommenceRuntimeShutDownOMGMyNameIsTooLong was added only very recently (see bug 511252 comment 5 :-P), and it should be painless to remove.

/be
http://hg.mozilla.org/mozilla-central/rev/b079aae53212
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: