Closed Bug 417052 Opened 17 years ago Closed 17 years ago

Analyze quality of JS_MaybeGC heuristic

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9

People

(Reporter: jorendorff, Assigned: igor)

References

Details

(Keywords: perf)

Attachments

(13 files, 5 obsolete files)

13.63 KB, text/plain
Details
82.68 KB, image/png
Details
157.84 KB, image/png
Details
163.37 KB, image/png
Details
166.68 KB, image/png
Details
130.39 KB, image/png
Details
145.35 KB, image/png
Details
99.58 KB, image/png
Details
106.14 KB, image/png
Details
101.40 KB, image/png
Details
111.38 KB, image/png
Details
110.50 KB, image/png
Details
2.35 KB, patch
brendan
: review+
brendan
: approval1.9+
Details | Diff | Splinter Review
<shaver> when running some of the sunspider tests in-browser, we trigger GC bunch <shaver> 3d-cube is a good example, and access-nbody <shaver> we would like to know what conditions in MaybeGC are tripping the GC <shaver> and what the disposition of the garbage is
10:29 schrep do you think there are things we can improve r.e. gc timing? 10:30 igor Yes: it just a matter of retuning the heuristics. 10:30 igor It can take time, but is definitely fixable. 10:31 igor There are number in js_MaybeGC that one can definitely play with and the last ditch scheduling in js_NewGCThing can be altered as well. 10:31 igor I meant the last ditch *GC* 10:31 schrep yea sweet 10:32 schrep conceptually can we just run GC less often? 10:33 igor The problem now is that we run GC with very little garbage. 10:33 igor So yes, we can and should run it less often. 10:33 schrep is there an easy patch there? 10:34 igor JS_MaybeGC from jsapi.c contains: 10:34 igor bytes > 8192 && bytes > lastBytes + lastBytes / 3) || 10:35 igor Changing that lastBytes/3 into lastBytes * x/ y with different x and y may help. For a start x = y = 1 can be a good case. 10:40 schrep k 10:46 igor There are lengthy explanations about that number. They should be updated in any case as using mmap for GC arena decreased internal heap fragmentation. As such less holes exists on the heap and less space among lastBytes wasted. 10:47 igor => changing the condition to bytes > 8192 && bytes > 2 * lastBytes can be a good start.
Assignee: general → jorendorff
--> Jason can you run with this?
Yes.
Status: NEW → ASSIGNED
The higher I push the constant, the faster it goes. GC final speed---------------------------------- constant cycles heap size overall binary-trees nbody ======== ======= ========= ============ ============ =========== 1.33 552 1433600 (baseline) (baseline) (baseline) 2 244 1236992 10.0% faster 60% faster 58% faster 3 123 1224704 13.2% faster 57% faster 72% faster 8 29 1331200 16.6% faster 53% faster 102% faster I'll attach a file containing SunSpider URLs in a second. constant -- the heuristic constant in JS_MaybeGC. GC cycles -- the number of times a call to JS_MaybeGC resulted in an actual collection during the test run final heap size -- the value of rt->gcBytes at the end of the tests. This is the size of the SpiderMonkey GC heap; it doesn't include string buffers or anything outside the JS engine (like COM objects reachable from JS objects). overall speed -- the SunSpider ** TOTAL ** number binary-trees, nbody -- Times on the two tests that exercise the GC the most. Increasing the constant from 4/3 to 3 actually *decreases* the final heap size (again, not taking into account string buffers etc.) My guess is that the higher constant gives new objects long enough to become unreachable before GC, thus allowing the GC to reclaim more completely empty pages. Don't know how plausible that is. Pushing the constant to 8 makes the heap size go up again. We're performing GC quite rarely at this point.
(In reply to comment #4) > Pushing the constant to 8 makes the heap size go up again. We're performing GC > quite rarely at this point. > What happens if you take it to 11?
nbody =========== 102% faster This is faster than I can get this test to run in the shell. We should consider taking a patch with a constant value of 3 while we continue to tune this, to get the noise out of our tests.
(In reply to comment #6) > nbody > =========== > 102% faster > > This is faster than I can get this test to run in the shell. We should consider > taking a patch with a constant value of 3 while we continue to tune this, to > get the noise out of our tests. > Why not 8 - still a smaller heap than the current impl and faster.
(In reply to comment #7) > > Why not 8 - still a smaller heap than the current impl and faster. Because the heap grew at 8. We'll take the smallest heap for now, and revisit speed vs. memory later, after more analysis and review.
(In reply to comment #8) > (In reply to comment #7) > > > > Why not 8 - still a smaller heap than the current impl and faster. > > Because the heap grew at 8. We'll take the smallest heap for now, and revisit > speed vs. memory later, after more analysis and review. > Also we have other patches that are going to change the speed number, so we can't make a reasonable tradeoff decision until those are in. Also note that we lose binary-trees speed with fewer cycles. Too much garbage builds up.
(In reply to comment #8) > Because the heap grew at 8. We'll take the smallest heap for now, and revisit > speed vs. memory later, after more analysis and review. > Sounds like a plan.
Whoa, slow down! I think my baseline run was abnormally bad, especially on the memory-intensive tests. I shut everything down, turned off my second monitor, rebooted, and ran the tests again. I'll post numbers early tomorrow. To summarize: there *is* a pretty big speed gain to be had here, maybe 10% on SunSpider overall. I tested with the constant up to 12 and it kept getting faster. The "heap size" numbers in comment 4 are probably bogus though. More to come.
Checking in a constant of 3 or 4 would increase overall memory usage. (My swag is 2-3 MB in the case of starting the browser and running SunSpider once. This is based on numbers produced by "top", though, so take with a grain of salt.)
> Whoa, slow down! > The "heap size" numbers in comment 4 are probably bogus though. ...sigh. ;)
jorendorff, Could you post a patch with the exact constant you're tweaking? Not sure which of Igor's suggestions you're exploring. I'd like to test these against Massif and watch heap size over time.
Yea - an in particular you want to watch peak heap size during the test run
Attached patch patch showing what I'm tuning (obsolete) — Splinter Review
Does massif measure what we want to be measuring here? Summary of my new run: constant GC cycles overall speed ======== ========== ============= 1.33 550 (baseline) 1.5 438 +3.9% 2 251 +8.3% 3 126 +11% 4 84 +13% 5 57 +14% 6 42 +15% 8 27 +15% 10 18 +17% 12 10 +19% 16 3 +22% Inf 0 +22% For the last run, I disabled JS_MaybeGC entirely. There's variability between runs, 2-3%.
Massif will give you lovely pretty pictures of the process' memory-growth over time, which is very handy. Also it will colorize/record allocation call-sites, so you can blame specific chunks of code for what they allocate.
If someone gives me a mac build with 16 as a constant I can do a talos multi-window pageload run as I was doing with Pav to tune out JEMalloc
Attached file SunSpider results
I was equivocating yesterday because I found out that a little variability in the baseline numbers made a big difference in the results. But after running this a lot of times, I think the numbers in comment 17 are actually pretty solid. Attaching raw SunSpider numbers, in case anybody cares... These numbers aren't terribly interesting in isolation. We need the memory usage info to make a decision.
schrep: When the TryServer is done building it, there will be a shiny Mac build with 16 ready-to-go for you.
Here's the results with the build brian provided compared to today's trunk doing a pageload run. Short summary - no difference.
Attached image massif graph for 3 sunspider laps, trunk (obsolete) —
Attachment #303321 - Attachment is patch: false
Attachment #303321 - Attachment mime type: text/plain → image/png
Comment on attachment 303321 [details] massif graph for 3 sunspider laps, trunk ugh, wrong graph
Attachment #303321 - Attachment is obsolete: true
I am seeing higher terminal memory usage after a SS run. Let me do a few more cycles and see if that stays consistent.
Yep - on SS runs I see higher memory usage - but the total memory allocated does not grow after multiple runs. On my Zimbra, GReader, Zoho tests the builds are within 1%. I'm also seeing a 1.19 improvement on SS times. I think it might be worth it to try this out and see what broader testing reveals. Also - is this possible to make an about:config pref to tune at runtime?
Yeah, we could make this about:configable without too much work.
(In reply to comment #28) > Also - is this possible to make an > about:config pref to tune at runtime? This suggests to make the ratio dynamic and dd a time-dependent component to the tests. For example, if a script runs, user UI is blocked and it does not make sence to call GC unless there is a lot of garbage. On the other hand, when the system is idle, it is a perfect time to schedule a GC.
> For example, if a script runs, user UI is blocked and it does not > make sence to call GC unless there is a lot of garbage. I meant from GUI responsiveness point of view since GC cycles just prolongs the pause.
this one hurts max heap
Smaug has done work with jst & sicking reviewing, IIRC, on keeping GC out of the user's face, since cycle collection ups the ante. /be
Comment on attachment 303339 [details] massif graph for 3 sunspider laps, constant=8 pay no attention to the flat ending, I just forgot to close the browser quickly
(In reply to comment #34) > Created an attachment (id=303339) [details] > massif graph for 3 sunspider laps, constant=8 > > this one hurts max heap The data AFAIC is incomplete. JS uses mmap so to get full data JSRuntime.gcBytes must be included in the graph. Alternatively one can disable mmap when compiling. For that make sure that JS_GC_USE_MMAP define is set to 0 when compiling jsgc.c. >
I think the most important question is how does it impact memory usage during normal browsing sessions...
(In reply to comment #38) > I think the most important question is how does it impact memory usage during > normal browsing sessions... > Oh, I'll get that too.
1.) open browser to http://google.com 2.) type "gmail.com" in the location bar 3.) sign into gmail 4.) view inbox 5.) open new window (mozilla default page) 6.) navigate new window to techcrunch.com 7.) close gmail window 8.) quit
Attachment #303353 - Attachment description: massif data from gmail / techcrunch session → massif data from gmail / techcrunch session -- constant=1.33
So jason what's left here? I'm thinking we land 16 and see what it does in nightlies. Otherwise we've got other areas where you can help...
Attached patch v1 (obsolete) — Splinter Review
This changes the constant to 16. Kinda gutsy, but sayrer and I didn't see any major ill effects. We'll see if anyone complains.
Attachment #303277 - Attachment is obsolete: true
Attachment #304238 - Flags: review?(crowder)
Comment on attachment 304238 [details] [diff] [review] v1 Looks good to me based on the testing in this bug.
Attachment #304238 - Flags: review?(crowder)
Attachment #304238 - Flags: review+
Attachment #304238 - Flags: approval1.9?
Comment on attachment 304238 [details] [diff] [review] v1 >- if ((bytes > 8192 && bytes > lastBytes + lastBytes / 3) || >+ if ((bytes > 8192 && bytes > 16 * lastBytes) || This can overflow with huge heap. Instead use bytes / 16 > lastBytes
Attached patch v2 - fix overflow (obsolete) — Splinter Review
Thanks to Igor for pointing this out.
Attachment #304238 - Attachment is obsolete: true
Attachment #304240 - Flags: review?(crowder)
Attachment #304238 - Flags: approval1.9?
Attachment #304240 - Flags: review?(crowder) → review+
Comment on attachment 304240 [details] [diff] [review] v2 - fix overflow >- if ((bytes > 8192 && bytes > lastBytes + lastBytes / 3) || >+ if ((bytes > 8192 && bytes / 16 > lastBytes) || Strongly suggest again that, for the sake of other embeddings, we parameterize using JS_SetGCParameter. A multiplier and divisor packed into 32 bits, split out into new uint16 JSRuntime members, could do it: if (bytes > 8192 && (bytes * rt->gcMaybeGCMultiplier) / rt->gcMaybeGCDivisor > lastBytes) { ... } /be
And suggest that the defaults for the new params reflect historical norms that embeddings may have come to count on. /be
(In reply to comment #51) > Strongly suggest again that, for the sake of other embeddings, we parameterize > using JS_SetGCParameter. A multiplier and divisor packed into 32 bits, split > out into new uint16 JSRuntime members, could do it: This means packing more and more logic into JS_MaybeGC that even in an advanced case may not suite an embedding. Note how XPConnect effectively disables the second condition JS_MaybeGC(), rt->gcMallocBytes >= rt->gcMaxMallocBytes, by setting gcMaxMallocBytes to UINT32_MAX. So why not to allow for the embedding to make the decision for itself and call JS_GC as necessary? For that it is sufficient to expose rt->gc(Last)Bytes.
The embedding can make any decision it likes about when to call JS_GC, and I agree with you that it would be better for it to make an informed decision, rather than making a wish and calling JS_MaybeGC. JS_MaybeGC has always been a hack at best. However, embeddings may count on it to behave as it has, and fiddling with the scaling of either term in the comparison may cause non-linearly bad space growth from too little maybe-GC'ing, *or* runtime cost from too much maybe-GC'ing. I am pleading for some semblance of API compatibility here. If Gecko embeddings stop using JS_MaybeGC, great -- then we can leave it be, and hope to stub it out in a future where GC auto-schedules better. /be
Attached patch v3 - parameterized (obsolete) — Splinter Review
To avoid floating-point math (or parameters), risk of overflow, and having two parameters where there really should just be one, I arbitrarily set the current behavior = 100 on a linear scale. What we've been calling "16" will then be 4500. The formula is: JSGC_MAYBEGC_PATIENCE parameter = (old constant - 1) * 300. Examples: 100 = (4/3 - 1) * 300 4500 = (16 - 1) * 300
Attachment #304270 - Flags: review?(crowder)
Attachment #304240 - Attachment is obsolete: true
Here is an alternative that just patches nsJSEnvironment.cpp to call JS_GC() directly as necessary. It access JSRuntime members directly for now.
Assignee: jorendorff → igor
Attachment #304272 - Flags: review?(brendan)
Attachment #304270 - Attachment is obsolete: true
Attachment #304270 - Flags: review?(crowder)
Comment on attachment 304272 [details] [diff] [review] custom MaybeGC for dom Great, who could object? I do not see any other JS_MaybeGC uses in Gecko. /be
Attachment #304272 - Flags: review?(brendan)
Attachment #304272 - Flags: review+
Attachment #304272 - Flags: approval1.9+
The patch makes me wonder whether JSRuntime.gcBytes, etc. should be size_t not uint32 -- do we cope with very large heaps on 64-bit systems well, or at all? /be
Flags: blocking1.9+
Target Milestone: --- → mozilla1.9beta4
I checked this in to get data ASAP Checking in nsJSEnvironment.cpp; /cvsroot/mozilla/dom/src/base/nsJSEnvironment.cpp,v <-- nsJSEnvironment.cpp new revision: 1.390; previous revision: 1.389 done
Target Milestone: mozilla1.9beta4 → ---
Target Milestone: --- → mozilla1.9beta4
(In reply to comment #60) > Private-bytes looks unchanged: Does private test account for memory allocated with direct mmap/Virtualalloc calls?
Moving bugs that aren't beta 4 blockers to target final release.
Target Milestone: mozilla1.9beta4 → mozilla1.9
This has landed. Please file followup bugs if necessary.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Flags: in-testsuite-
Flags: in-litmus-
Depends on: 446379
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: