Closed
Bug 417052
Opened 17 years ago
Closed 17 years ago
Analyze quality of JS_MaybeGC heuristic
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla1.9
People
(Reporter: jorendorff, Assigned: igor)
References
Details
(Keywords: perf)
Attachments
(13 files, 5 obsolete files)
13.63 KB,
text/plain
|
Details | |
82.68 KB,
image/png
|
Details | |
157.84 KB,
image/png
|
Details | |
163.37 KB,
image/png
|
Details | |
166.68 KB,
image/png
|
Details | |
130.39 KB,
image/png
|
Details | |
145.35 KB,
image/png
|
Details | |
99.58 KB,
image/png
|
Details | |
106.14 KB,
image/png
|
Details | |
101.40 KB,
image/png
|
Details | |
111.38 KB,
image/png
|
Details | |
110.50 KB,
image/png
|
Details | |
2.35 KB,
patch
|
brendan
:
review+
brendan
:
approval1.9+
|
Details | Diff | Splinter Review |
<shaver> when running some of the sunspider tests in-browser, we trigger GC bunch
<shaver> 3d-cube is a good example, and access-nbody
<shaver> we would like to know what conditions in MaybeGC are tripping the GC
<shaver> and what the disposition of the garbage is
Comment 1•17 years ago
|
||
10:29 schrep do you think there are things we can improve r.e. gc timing?
10:30 igor Yes: it just a matter of retuning the heuristics.
10:30 igor It can take time, but is definitely fixable.
10:31 igor There are number in js_MaybeGC that one can definitely play with and the last ditch scheduling in js_NewGCThing can be altered as well.
10:31 igor I meant the last ditch *GC*
10:31 schrep yea sweet
10:32 schrep conceptually can we just run GC less often?
10:33 igor The problem now is that we run GC with very little garbage.
10:33 igor So yes, we can and should run it less often.
10:33 schrep is there an easy patch there?
10:34 igor JS_MaybeGC from jsapi.c contains:
10:34 igor bytes > 8192 && bytes > lastBytes + lastBytes / 3) ||
10:35 igor Changing that lastBytes/3 into lastBytes * x/ y with different x and y may help. For a start x = y = 1 can be a good case.
10:40 schrep k
10:46 igor There are lengthy explanations about that number. They should be updated in any case as using mmap for GC arena decreased internal heap fragmentation. As such less holes exists on the heap and less space among lastBytes wasted.
10:47 igor => changing the condition to bytes > 8192 && bytes > 2 * lastBytes can be a good start.
Assignee: general → jorendorff
Comment 2•17 years ago
|
||
--> Jason can you run with this?
Reporter | ||
Comment 4•17 years ago
|
||
The higher I push the constant, the faster it goes.
GC final speed----------------------------------
constant cycles heap size overall binary-trees nbody
======== ======= ========= ============ ============ ===========
1.33 552 1433600 (baseline) (baseline) (baseline)
2 244 1236992 10.0% faster 60% faster 58% faster
3 123 1224704 13.2% faster 57% faster 72% faster
8 29 1331200 16.6% faster 53% faster 102% faster
I'll attach a file containing SunSpider URLs in a second.
constant -- the heuristic constant in JS_MaybeGC.
GC cycles -- the number of times a call to JS_MaybeGC resulted in an actual collection during the test run
final heap size -- the value of rt->gcBytes at the end of the tests. This is the size of the SpiderMonkey GC heap; it doesn't include string buffers or anything outside the JS engine (like COM objects reachable from JS objects).
overall speed -- the SunSpider ** TOTAL ** number
binary-trees, nbody -- Times on the two tests that exercise the GC the most.
Increasing the constant from 4/3 to 3 actually *decreases* the final heap size (again, not taking into account string buffers etc.) My guess is that the higher constant gives new objects long enough to become unreachable before GC, thus allowing the GC to reclaim more completely empty pages. Don't know how plausible that is.
Pushing the constant to 8 makes the heap size go up again. We're performing GC quite rarely at this point.
Comment 5•17 years ago
|
||
(In reply to comment #4)
> Pushing the constant to 8 makes the heap size go up again. We're performing GC
> quite rarely at this point.
>
What happens if you take it to 11?
Comment 6•17 years ago
|
||
nbody
===========
102% faster
This is faster than I can get this test to run in the shell. We should consider taking a patch with a constant value of 3 while we continue to tune this, to get the noise out of our tests.
Comment 7•17 years ago
|
||
(In reply to comment #6)
> nbody
> ===========
> 102% faster
>
> This is faster than I can get this test to run in the shell. We should consider
> taking a patch with a constant value of 3 while we continue to tune this, to
> get the noise out of our tests.
>
Why not 8 - still a smaller heap than the current impl and faster.
Comment 8•17 years ago
|
||
(In reply to comment #7)
>
> Why not 8 - still a smaller heap than the current impl and faster.
Because the heap grew at 8. We'll take the smallest heap for now, and revisit speed vs. memory later, after more analysis and review.
Comment 9•17 years ago
|
||
(In reply to comment #8)
> (In reply to comment #7)
> >
> > Why not 8 - still a smaller heap than the current impl and faster.
>
> Because the heap grew at 8. We'll take the smallest heap for now, and revisit
> speed vs. memory later, after more analysis and review.
>
Also we have other patches that are going to change the speed number, so we can't make a reasonable tradeoff decision until those are in. Also note that we lose binary-trees speed with fewer cycles. Too much garbage builds up.
Comment 10•17 years ago
|
||
(In reply to comment #8)
> Because the heap grew at 8. We'll take the smallest heap for now, and revisit
> speed vs. memory later, after more analysis and review.
>
Sounds like a plan.
Reporter | ||
Comment 11•17 years ago
|
||
Whoa, slow down! I think my baseline run was abnormally bad, especially on the memory-intensive tests. I shut everything down, turned off my second monitor, rebooted, and ran the tests again.
I'll post numbers early tomorrow. To summarize: there *is* a pretty big speed gain to be had here, maybe 10% on SunSpider overall. I tested with the constant up to 12 and it kept getting faster. The "heap size" numbers in comment 4 are probably bogus though. More to come.
Reporter | ||
Comment 12•17 years ago
|
||
Checking in a constant of 3 or 4 would increase overall memory usage. (My swag is 2-3 MB in the case of starting the browser and running SunSpider once. This is based on numbers produced by "top", though, so take with a grain of salt.)
Comment 13•17 years ago
|
||
> Whoa, slow down!
> The "heap size" numbers in comment 4 are probably bogus though.
...sigh. ;)
Comment 14•17 years ago
|
||
jorendorff,
Could you post a patch with the exact constant you're tweaking? Not sure which of Igor's suggestions you're exploring. I'd like to test these against Massif and watch heap size over time.
Comment 15•17 years ago
|
||
Yea - an in particular you want to watch peak heap size during the test run
Reporter | ||
Comment 16•17 years ago
|
||
Reporter | ||
Comment 17•17 years ago
|
||
Does massif measure what we want to be measuring here?
Summary of my new run:
constant GC cycles overall speed
======== ========== =============
1.33 550 (baseline)
1.5 438 +3.9%
2 251 +8.3%
3 126 +11%
4 84 +13%
5 57 +14%
6 42 +15%
8 27 +15%
10 18 +17%
12 10 +19%
16 3 +22%
Inf 0 +22%
For the last run, I disabled JS_MaybeGC entirely. There's variability between runs, 2-3%.
Comment 18•17 years ago
|
||
Massif will give you lovely pretty pictures of the process' memory-growth over time, which is very handy. Also it will colorize/record allocation call-sites, so you can blame specific chunks of code for what they allocate.
Comment 19•17 years ago
|
||
If someone gives me a mac build with 16 as a constant I can do a talos multi-window pageload run as I was doing with Pav to tune out JEMalloc
Reporter | ||
Comment 20•17 years ago
|
||
I was equivocating yesterday because I found out that a little variability in the baseline numbers made a big difference in the results. But after running this a lot of times, I think the numbers in comment 17 are actually pretty solid.
Attaching raw SunSpider numbers, in case anybody cares... These numbers aren't terribly interesting in isolation. We need the memory usage info to make a decision.
Comment 21•17 years ago
|
||
schrep: When the TryServer is done building it, there will be a shiny Mac build with 16 ready-to-go for you.
Comment 22•17 years ago
|
||
Comment 23•17 years ago
|
||
Here's the results with the build brian provided compared to today's trunk doing a pageload run. Short summary - no difference.
Comment 24•17 years ago
|
||
Updated•17 years ago
|
Attachment #303321 -
Attachment is patch: false
Updated•17 years ago
|
Attachment #303321 -
Attachment mime type: text/plain → image/png
Comment 25•17 years ago
|
||
Comment on attachment 303321 [details]
massif graph for 3 sunspider laps, trunk
ugh, wrong graph
Attachment #303321 -
Attachment is obsolete: true
Comment 26•17 years ago
|
||
Comment 27•17 years ago
|
||
I am seeing higher terminal memory usage after a SS run. Let me do a few more cycles and see if that stays consistent.
Comment 28•17 years ago
|
||
Yep - on SS runs I see higher memory usage - but the total memory allocated does not grow after multiple runs.
On my Zimbra, GReader, Zoho tests the builds are within 1%. I'm also seeing a 1.19 improvement on SS times. I think it might be worth it to try this out and see what broader testing reveals. Also - is this possible to make an about:config pref to tune at runtime?
Comment 29•17 years ago
|
||
Comment 30•17 years ago
|
||
Yeah, we could make this about:configable without too much work.
Assignee | ||
Comment 31•17 years ago
|
||
(In reply to comment #28)
> Also - is this possible to make an
> about:config pref to tune at runtime?
This suggests to make the ratio dynamic and dd a time-dependent component to the tests. For example, if a script runs, user UI is blocked and it does not make sence to call GC unless there is a lot of garbage. On the other hand, when the system is idle, it is a perfect time to schedule a GC.
Assignee | ||
Comment 32•17 years ago
|
||
> For example, if a script runs, user UI is blocked and it does not
> make sence to call GC unless there is a lot of garbage.
I meant from GUI responsiveness point of view since GC cycles just prolongs the pause.
Comment 33•17 years ago
|
||
Comment 34•17 years ago
|
||
this one hurts max heap
Comment 35•17 years ago
|
||
Smaug has done work with jst & sicking reviewing, IIRC, on keeping GC out of the user's face, since cycle collection ups the ante.
/be
Comment 36•17 years ago
|
||
Comment on attachment 303339 [details]
massif graph for 3 sunspider laps, constant=8
pay no attention to the flat ending, I just forgot to close the browser quickly
Assignee | ||
Comment 37•17 years ago
|
||
(In reply to comment #34)
> Created an attachment (id=303339) [details]
> massif graph for 3 sunspider laps, constant=8
>
> this one hurts max heap
The data AFAIC is incomplete. JS uses mmap so to get full data JSRuntime.gcBytes must be included in the graph. Alternatively one can disable mmap when compiling. For that make sure that JS_GC_USE_MMAP define is set to 0 when compiling jsgc.c.
>
Comment 38•17 years ago
|
||
I think the most important question is how does it impact memory usage during normal browsing sessions...
Comment 39•17 years ago
|
||
Comment 40•17 years ago
|
||
(In reply to comment #38)
> I think the most important question is how does it impact memory usage during
> normal browsing sessions...
>
Oh, I'll get that too.
Comment 41•17 years ago
|
||
1.) open browser to http://google.com
2.) type "gmail.com" in the location bar
3.) sign into gmail
4.) view inbox
5.) open new window (mozilla default page)
6.) navigate new window to techcrunch.com
7.) close gmail window
8.) quit
Updated•17 years ago
|
Attachment #303353 -
Attachment description: massif data from gmail / techcrunch session → massif data from gmail / techcrunch session -- constant=1.33
Comment 42•17 years ago
|
||
Comment 43•17 years ago
|
||
Comment 44•17 years ago
|
||
Comment 45•17 years ago
|
||
Comment 46•17 years ago
|
||
So jason what's left here? I'm thinking we land 16 and see what it does in nightlies. Otherwise we've got other areas where you can help...
Reporter | ||
Comment 47•17 years ago
|
||
This changes the constant to 16. Kinda gutsy, but sayrer and I didn't see any major ill effects. We'll see if anyone complains.
Attachment #303277 -
Attachment is obsolete: true
Attachment #304238 -
Flags: review?(crowder)
Comment 48•17 years ago
|
||
Comment on attachment 304238 [details] [diff] [review]
v1
Looks good to me based on the testing in this bug.
Attachment #304238 -
Flags: review?(crowder)
Attachment #304238 -
Flags: review+
Attachment #304238 -
Flags: approval1.9?
Assignee | ||
Comment 49•17 years ago
|
||
Comment on attachment 304238 [details] [diff] [review]
v1
>- if ((bytes > 8192 && bytes > lastBytes + lastBytes / 3) ||
>+ if ((bytes > 8192 && bytes > 16 * lastBytes) ||
This can overflow with huge heap. Instead use bytes / 16 > lastBytes
Reporter | ||
Comment 50•17 years ago
|
||
Thanks to Igor for pointing this out.
Attachment #304238 -
Attachment is obsolete: true
Attachment #304240 -
Flags: review?(crowder)
Attachment #304238 -
Flags: approval1.9?
Updated•17 years ago
|
Attachment #304240 -
Flags: review?(crowder) → review+
Comment 51•17 years ago
|
||
Comment on attachment 304240 [details] [diff] [review]
v2 - fix overflow
>- if ((bytes > 8192 && bytes > lastBytes + lastBytes / 3) ||
>+ if ((bytes > 8192 && bytes / 16 > lastBytes) ||
Strongly suggest again that, for the sake of other embeddings, we parameterize using JS_SetGCParameter. A multiplier and divisor packed into 32 bits, split out into new uint16 JSRuntime members, could do it:
if (bytes > 8192 &&
(bytes * rt->gcMaybeGCMultiplier) / rt->gcMaybeGCDivisor > lastBytes) {
...
}
/be
Comment 52•17 years ago
|
||
And suggest that the defaults for the new params reflect historical norms that embeddings may have come to count on.
/be
Assignee | ||
Comment 53•17 years ago
|
||
(In reply to comment #51)
> Strongly suggest again that, for the sake of other embeddings, we parameterize
> using JS_SetGCParameter. A multiplier and divisor packed into 32 bits, split
> out into new uint16 JSRuntime members, could do it:
This means packing more and more logic into JS_MaybeGC that even in an advanced case may not suite an embedding. Note how XPConnect effectively disables the second condition JS_MaybeGC(), rt->gcMallocBytes >= rt->gcMaxMallocBytes, by setting gcMaxMallocBytes to UINT32_MAX.
So why not to allow for the embedding to make the decision for itself and call JS_GC as necessary? For that it is sufficient to expose rt->gc(Last)Bytes.
Comment 54•17 years ago
|
||
The embedding can make any decision it likes about when to call JS_GC, and I agree with you that it would be better for it to make an informed decision, rather than making a wish and calling JS_MaybeGC. JS_MaybeGC has always been a hack at best.
However, embeddings may count on it to behave as it has, and fiddling with the scaling of either term in the comparison may cause non-linearly bad space growth from too little maybe-GC'ing, *or* runtime cost from too much maybe-GC'ing. I am pleading for some semblance of API compatibility here.
If Gecko embeddings stop using JS_MaybeGC, great -- then we can leave it be, and hope to stub it out in a future where GC auto-schedules better.
/be
Reporter | ||
Comment 55•17 years ago
|
||
To avoid floating-point math (or parameters), risk of overflow, and having two parameters where there really should just be one, I arbitrarily set the current behavior = 100 on a linear scale. What we've been calling "16" will then be 4500. The formula is:
JSGC_MAYBEGC_PATIENCE parameter = (old constant - 1) * 300.
Examples:
100 = (4/3 - 1) * 300
4500 = (16 - 1) * 300
Attachment #304270 -
Flags: review?(crowder)
Reporter | ||
Updated•17 years ago
|
Attachment #304240 -
Attachment is obsolete: true
Assignee | ||
Comment 56•17 years ago
|
||
Here is an alternative that just patches nsJSEnvironment.cpp to call JS_GC() directly as necessary. It access JSRuntime members directly for now.
Assignee: jorendorff → igor
Attachment #304272 -
Flags: review?(brendan)
Reporter | ||
Updated•17 years ago
|
Attachment #304270 -
Attachment is obsolete: true
Attachment #304270 -
Flags: review?(crowder)
Comment 57•17 years ago
|
||
Comment on attachment 304272 [details] [diff] [review]
custom MaybeGC for dom
Great, who could object? I do not see any other JS_MaybeGC uses in Gecko.
/be
Attachment #304272 -
Flags: review?(brendan)
Attachment #304272 -
Flags: review+
Attachment #304272 -
Flags: approval1.9+
Comment 58•17 years ago
|
||
The patch makes me wonder whether JSRuntime.gcBytes, etc. should be size_t not uint32 -- do we cope with very large heaps on 64-bit systems well, or at all?
/be
Updated•17 years ago
|
Flags: blocking1.9+
Target Milestone: --- → mozilla1.9beta4
Comment 59•17 years ago
|
||
I checked this in to get data ASAP
Checking in nsJSEnvironment.cpp;
/cvsroot/mozilla/dom/src/base/nsJSEnvironment.cpp,v <-- nsJSEnvironment.cpp
new revision: 1.390; previous revision: 1.389
done
Target Milestone: mozilla1.9beta4 → ---
Updated•17 years ago
|
Target Milestone: --- → mozilla1.9beta4
Comment 60•17 years ago
|
||
http://graphs.mozilla.org/#spst=range&spss=1203465506.9558232&spse=1203474760&spstart=1202322756&spend=1203474760&bpst=Cursor&bpstart=1203459220&bpend=1203474760&m1tid=134821&m1bl=0&m1avg=0
xp02 has reported in. 845 pre vs 649 post (1.3x faster). So not up to vlad's 2 line 3x faster standards. But not bad :-)
Private-bytes looks unchanged:
http://graphs.mozilla.org/#spst=range&spss=1203421819.566265&spse=1203474760&spstart=1196883676&spend=1203474760&bpst=Cursor&bpstart=1203421819.566265&bpend=1203474760&m1tid=53258&m1bl=0&m1avg=0
If we leave this in overnight I can repeat in-browser mem tests with tomorrow's nightly.
Comment 61•17 years ago
|
||
About at 2% Ts win on windows as well:
http://graphs.mozilla.org/#spst=range&spss=1203466265.3052208&spse=1203481173.8323293&spstart=1202931767&spend=1203481726&bpst=Cursor&bpstart=1203466265.3052208&bpend=1203481173.8323293&m1tid=148133&m1bl=0&m1avg=0&m2tid=148131&m2bl=0&m2avg=0&m3tid=148165&m3bl=0&m3avg=0
Assignee | ||
Comment 62•17 years ago
|
||
(In reply to comment #60)
> Private-bytes looks unchanged:
Does private test account for memory allocated with direct mmap/Virtualalloc calls?
Comment 63•17 years ago
|
||
Moving bugs that aren't beta 4 blockers to target final release.
Target Milestone: mozilla1.9beta4 → mozilla1.9
Comment 64•17 years ago
|
||
This has landed. Please file followup bugs if necessary.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Updated•17 years ago
|
Flags: in-testsuite-
Flags: in-litmus-
You need to log in
before you can comment on or make changes to this bug.
Description
•