Closed
Bug 551598
Opened 15 years ago
Closed 2 years ago
Investigate and (try to) improve GC performance.
Categories
(Core :: JavaScript Engine, enhancement)
Core
JavaScript Engine
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: jbramley, Unassigned)
Details
Attachments
(1 file)
|
12.96 KB,
patch
|
Details | Diff | Splinter Review |
By hacking in profiling code using gettimeofday, I can get a break-down of the performance of calls to js_GC(). The following numbers came from a trivial JavaScript loop allocating (and then forgetting) Objects. They were run on an ARM platform:
js_GC() {
Time since last invocation: 4642215 us
Time until mark phase: 233 us
Mark phase time: 506 us
js_SweepAtomState: 76 us
CloseNativeIterators: 7 us
js_SweepWatchPoints: 5 us
FinalizeArenaList lead-in: 4 us
FinalizeArenaList big loop: 332316 us
16379 iterations were executed.
FinalizeArenaList<JSObject, FinalizeObject>: 332362 us
FinalizeArenaList lead-in: 5 us
FinalizeArenaList big loop: 17 us
3 iterations were executed.
FinalizeArenaList<JSFunction, FinalizeFunction>: 29 us
FinalizeArenaList<JSXML, FinalizeXML>: 4 us
FinalizeArenaList lead-in: 4 us
FinalizeArenaList big loop: 11 us
1 iterations were executed.
FinalizeArenaList<JSString, FinalizeString>: 333 us
Loop: FinalizeArenaList<JSString, FinalizeExternalString>: 7 us
Metering code (?): 6 us
Misc arena-handling code: 332744 us
js_SweepScopeProperties: 32 us
js_SweepScriptFilenames: 7 us
DestroyGCArenas: 34486 us
Sweep phase time: 367388 us
Post-GC clean-up: 7 us
}
The positions of the mark/sweep divisions (etc) are based on comments in the source (and obvious function calls). The output is awkward to read because it's nested and the name of the block as at the _end_ of the nest. It was a quick & dirty instrumentation hack, so I didn't bother too much about making it look pretty.
----
As Andreas mentioned, sweeping is clearly where we are weak. On x86, the sweep function doesn't stand out as much, but it's still by far the largest component in the profile:
js_GC() {
Time since last invocation: 868579 us
Time until mark phase: 44 us
Mark phase time: 59 us
js_SweepAtomState: 14 us
CloseNativeIterators: 2 us
js_SweepWatchPoints: 2 us
FinalizeArenaList lead-in: 2 us
FinalizeArenaList big loop: 35948 us
16379 iterations were executed.
FinalizeArenaList<JSObject, FinalizeObject>: 35963 us
FinalizeArenaList lead-in: 2 us
FinalizeArenaList big loop: 4 us
3 iterations were executed.
FinalizeArenaList<JSFunction, FinalizeFunction>: 8 us
FinalizeArenaList<JSXML, FinalizeXML>: 2 us
FinalizeArenaList lead-in: 2 us
FinalizeArenaList big loop: 3 us
1 iterations were executed.
FinalizeArenaList<JSString, FinalizeString>: 81 us
Loop: FinalizeArenaList<JSString, FinalizeExternalString>: 3 us
Metering code (?): 2 us
Misc arena-handling code: 36061 us
js_SweepScopeProperties: 6 us
js_SweepScriptFilenames: 3 us
DestroyGCArenas: 11895 us
Sweep phase time: 47990 us
Post-GC clean-up: 3 us
}
----
This gives us a clear candidate for optimizations: FinalizeArenaList. DestroyGCArenas is probably worth looking at too.
| Reporter | ||
Comment 1•15 years ago
|
||
In the interest of reproducibility, here is the instrumentation I used (with a couple of extra things that I added after gathering the figures).
Comment 2•15 years ago
|
||
Jacob, do you mind attaching the JS testcase you used as well?
I wonder how much of this time is calling FinalizeObject....
| Reporter | ||
Comment 3•15 years ago
|
||
The test-case is trivial:
----
function f() {}
for (var i = 0; i < 10000000; i++) {
var a = new String("I am a string.");
var b = "I am also a string.";
f(a,b);
}
----
It isn't clever, and certainly isn't representative, but the sweep phase is still too slow (at least on ARM).
| Reporter | ||
Comment 4•15 years ago
|
||
(In reply to comment #2)
> I wonder how much of this time is calling FinalizeObject....
Actually, the instrumentation patch I posted reports this, though that's something I added after gathering the numbers. I think finalize() is a fairly hefty component.
Comment 5•15 years ago
|
||
Hm. So if shark's not lying to me, then a pretty large fraction of the time under FinalizeObject seems to be the !obj->map test (which also accounts for something like 10% of total L2 misses during GC as well). The clasp->finalize test is the next hotspot there (1.5% of total L2 misses). At least on Mac FinalizeObject ends up inlined into js_GC, so the timer percentages are of total self js_GC time, but those are 40% and 17% respectively.
5% of the js_GC time is the !nextFree check after setting nextFree to a->info.freeList in FinalizeArenaList (41% of the L2 misses are here(?)).
I wonder whether it's worth running under something like cachegrind to get a better idea of what's going on here in L2 terms, and whether we can issue some sort of prefetch instructions that would help....
Comment 6•15 years ago
|
||
Oh, and the numbers in comment 5 are for an x86 with a 6MB L2; I can't imagine that the L2 miss situation is any better on ARM.
Comment 7•15 years ago
|
||
(In reply to comment #5)
> Hm. So if shark's not lying to me, then a pretty large fraction of the time
> under FinalizeObject seems to be the !obj->map test (which also accounts for
> something like 10% of total L2 misses during GC as well).
That we can fix if failed or dead objects would use a special non-null map.
> The clasp->finalize
> test is the next hotspot there (1.5% of total L2 misses).
That can be also dealt with if would allocate objects with and without finalize hook from separated GC arenas. But that increases inevitably the heap fragmentation and the size of working sets.
Comment 8•15 years ago
|
||
> That we can fix if failed or dead objects would use a special non-null map.
Maybe. If the time is actually spent in an L2 miss fetching the bits of *obj, then it doesn't matter what the actual value of map is, I'd think. Unless we think that an unconditional access would get reordered so as to start the fetch much sooner?
Comment 9•15 years ago
|
||
(In reply to comment #0)
> By hacking in profiling code using gettimeofday, I can get a break-down of the
> performance of calls to js_GC(). The following numbers came from a trivial
> JavaScript loop allocating (and then forgetting) Objects.
This test is not what happen on a typical web site, see the reference in bug 548388. That paper shows that objects typically are long-lived. That, of cause, does not mean that we should not optimize their finalization, but it is important that any optimization in that area does not slow down, for example, the marking phase of the GC.
Comment 10•15 years ago
|
||
Hi, just letting you know that I have filed a separate bug about a different instrumentation patch: bug 601949.
It does something different, and covers different use cases, than the patch discussed here. It's mostly for non-JS people wanting to know whether GC pauses are responsible for a performance/smoothness issue they are having.
| Assignee | ||
Updated•11 years ago
|
Assignee: general → nobody
Updated•3 years ago
|
Severity: normal → S3
Updated•2 years ago
|
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•