Last Comment Bug 731837 - GC: investigate pause time increase between Aurora and Nightly
: GC: investigate pause time increase between Aurora and Nightly
Status: RESOLVED FIXED
[qa-]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla14
Assigned To: Bill McCloskey (:billm)
:
Mentors:
Depends on: 759880
Blocks: IncrementalGC
  Show dependency treegraph
 
Reported: 2012-02-29 15:43 PST by Terrence Cole [:terrence]
Modified: 2012-05-30 13:31 PDT (History)
11 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
unaffected
+
fixed
+
fixed


Attachments
patch (6.34 KB, patch)
2012-04-12 16:24 PDT, Bill McCloskey (:billm)
no flags Details | Diff | Splinter Review
patch v2 (6.31 KB, patch)
2012-04-13 14:00 PDT, Bill McCloskey (:billm)
igor: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Terrence Cole [:terrence] 2012-02-29 15:43:46 PST
Split off from bug 731398.
Comment 1 David Mandelin [:dmandelin] 2012-02-29 17:36:31 PST
Can you provide a bit more description? (And please nominate for tracking if you think it should be tracked.)
Comment 2 Terrence Cole [:terrence] 2012-02-29 18:09:31 PST
The relevant text is: "From Aurora to Nightly the GC pause time seems to increase. The GC log shows only *RESET* GCs."

This is an incremental reset bug.  I opened this as a reminder to re-check the raytracer pause times after Bill fixes the other incremental reset bugs.
Comment 3 David Mandelin [:dmandelin] 2012-03-14 19:05:26 PDT
Fixed for now by the disabling of IGC.
Comment 4 Gregor Wagner [:gwagner] 2012-03-15 16:21:35 PDT
(In reply to David Mandelin from comment #3)
> Fixed for now by the disabling of IGC.

I don't see that this is fixed. The pause time for the realtime raytracer on Aurora (3/13) is about 85msec and Nightly with IGC disabled about 120 msec.
Comment 5 David Mandelin [:dmandelin] 2012-03-15 16:36:45 PDT
(In reply to Gregor Wagner [:gwagner] from comment #4)
> (In reply to David Mandelin from comment #3)
> > Fixed for now by the disabling of IGC.
> 
> I don't see that this is fixed. The pause time for the realtime raytracer on
> Aurora (3/13) is about 85msec and Nightly with IGC disabled about 120 msec.

Which versions should we be tracking? I think Aurora was 12 when you originally reported this but it's 13 now.
Comment 6 Gregor Wagner [:gwagner] 2012-03-15 16:48:17 PDT
(In reply to David Mandelin from comment #5)
> (In reply to Gregor Wagner [:gwagner] from comment #4)
> > (In reply to David Mandelin from comment #3)
> > > Fixed for now by the disabling of IGC.
> > 
> > I don't see that this is fixed. The pause time for the realtime raytracer on
> > Aurora (3/13) is about 85msec and Nightly with IGC disabled about 120 msec.
> 
> Which versions should we be tracking? I think Aurora was 12 when you
> originally reported this but it's 13 now.

Todays Aurora build is still 12. So 12 is not affected. Nightly is 14 but even with IGC disabled I see the regression. So I guess once Aurora 13 is out it will have the same regression.
Comment 7 Gregor Wagner [:gwagner] 2012-03-15 16:59:53 PDT
I am on the update-channel for Aurora.
Comment 8 David Mandelin [:dmandelin] 2012-03-15 17:00:52 PDT
(In reply to Gregor Wagner [:gwagner] from comment #6)
> Todays Aurora build is still 12. So 12 is not affected. Nightly is 14 but
> even with IGC disabled I see the regression. So I guess once Aurora 13 is
> out it will have the same regression.

Thanks. OK, back on the radar.
Comment 9 Bill McCloskey (:billm) 2012-03-16 18:17:18 PDT
Terrence tracked this down to the incremental GC landing. Before incremental GC landed, we were doing compartmental GCs. Now we're doing global GCs. I'll check on Monday if fixing that completely eliminates the regression.
Comment 10 Gregor Wagner [:gwagner] 2012-03-28 17:36:12 PDT
Also the flight of the navigator seems to see this regression. The average GC time increased from 70msec to 120msec
Comment 11 David Mandelin [:dmandelin] 2012-04-12 11:46:54 PDT
(In reply to Bill McCloskey (:billm) from comment #9)
> Terrence tracked this down to the incremental GC landing. Before incremental
> GC landed, we were doing compartmental GCs. Now we're doing global GCs. I'll
> check on Monday if fixing that completely eliminates the regression.

Did it fix it?
Comment 12 Bill McCloskey (:billm) 2012-04-12 16:24:41 PDT
Created attachment 614607 [details] [diff] [review]
patch

There were two issues here. The first was that we were doing full GCs instead of compartment GCs. This was easy to fix.

The other problem is that the marking fast path got a little slower. This patch makes some changes that bring us back to pre-incremental GC speed on my machine. I had to remove one of the isOverBudget checks to do this. It's possible that this will cause some incremental GC slices to be too long. However, we can address that when incremental GC is ready to be turned on again.
Comment 13 Igor Bukanov 2012-04-13 05:04:02 PDT
Comment on attachment 614607 [details] [diff] [review]
patch

Review of attachment 614607 [details] [diff] [review]:
-----------------------------------------------------------------

Is this a branch patch?

::: js/src/jsgcmark.cpp
@@ +1055,3 @@
>      }
>  
>      if (tag == ObjectTag) {

Use JS_LIKELY here so the compiler can optimize even without PGO. Without that GCC assumes that all values of a tag are likely. Based on that it may even push the code under "if" out of the fast path. With that change processMarkStackOther does not need JS_NEVER_INLINE.

@@ -1058,5 @@
>    scan_value_array:
>      JS_ASSERT(vp <= end);
>      while (vp != end) {
> -        budget.step();
> -        if (budget.isOverBudget()) {

I am surprised that this extra check costs so much. Have you tried to use JS_LIKELY in isOverBudget on the fast path and move budget.step() after the isOverBudget check?
Comment 14 Bill McCloskey (:billm) 2012-04-13 14:00:45 PDT
Created attachment 614906 [details] [diff] [review]
patch v2

The good news is that the JS_NEVER_INLINE turned out to be unnecessary. I added it along with some other stuff and never measured how much it mattered.

The bad news is that I added a JS_LIKELY to the (tag == ObjectTag) branch. That actually made us go a little bit slower (83ms instead of 81ms for a GC).

I also tried adding back the isOverBudget check to the scan_value_array section, but with JS_UNLIKELY around it. That increased the mark time to about about 100ms, which is about the same as it was without the JS_UNLIKELY.

I've seen JS_LIKELY help in some places, but unfortunately it doesn't seem to be having any effect here.

And regarding branches, I think we should try to land this on FF14 and aurora.
Comment 16 Bill McCloskey (:billm) 2012-04-17 12:45:22 PDT
Comment on attachment 614906 [details] [diff] [review]
patch v2

[Approval Request Comment]
Regression caused by (bug #): Bug 641025
User impact if declined: Slower GC mark times, longer GC pauses
Testing completed (on m-c, etc.): Landed on m-c
Risk to taking this patch (and alternatives if risky): Pretty low risk. It moves some code around and makes a few semantic changes that should only affect incremental GC, which is disabled for FF13.
String changes made by this patch: None
Comment 18 Lukas Blakk [:lsblakk] use ?needinfo 2012-04-18 15:21:50 PDT
[Triage comment]
low risk, incremental GC disabled in 13 so we shouldn't see issues there.
Comment 19 Bill McCloskey (:billm) 2012-04-23 17:32:52 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/480c0fe7f591
Comment 20 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-05-10 12:23:40 PDT
Is this something QA can verify?
Comment 21 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-05-29 10:39:55 PDT
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #20)
> Is this something QA can verify?

Taking silence as a 'no' and marking [qa-]. Please mark as [qa+] if there is something QA can do to verify this fix.

Note You need to log in before you can comment on or make changes to this bug.