Last Comment Bug 731837 - GC: investigate pause time increase between Aurora and Nightly
: GC: investigate pause time increase between Aurora and Nightly
Status: RESOLVED FIXED
[qa-]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
-- normal (vote)
: mozilla14
Assigned To: Bill McCloskey (:billm)
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on: 759880
Blocks: IncrementalGC
  Show dependency treegraph
 
Reported: 2012-02-29 15:43 PST by Terrence Cole [:terrence]
Modified: 2012-05-30 13:31 PDT (History)
11 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
unaffected
+
fixed
+
fixed


Attachments
patch (6.34 KB, patch)
2012-04-12 16:24 PDT, Bill McCloskey (:billm)
no flags Details | Diff | Splinter Review
patch v2 (6.31 KB, patch)
2012-04-13 14:00 PDT, Bill McCloskey (:billm)
igor: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description User image Terrence Cole [:terrence] 2012-02-29 15:43:46 PST
Split off from bug 731398.
Comment 1 User image David Mandelin [:dmandelin] 2012-02-29 17:36:31 PST
Can you provide a bit more description? (And please nominate for tracking if you think it should be tracked.)
Comment 2 User image Terrence Cole [:terrence] 2012-02-29 18:09:31 PST
The relevant text is: "From Aurora to Nightly the GC pause time seems to increase. The GC log shows only *RESET* GCs."

This is an incremental reset bug.  I opened this as a reminder to re-check the raytracer pause times after Bill fixes the other incremental reset bugs.
Comment 3 User image David Mandelin [:dmandelin] 2012-03-14 19:05:26 PDT
Fixed for now by the disabling of IGC.
Comment 4 User image Gregor Wagner [:gwagner] 2012-03-15 16:21:35 PDT
(In reply to David Mandelin from comment #3)
> Fixed for now by the disabling of IGC.

I don't see that this is fixed. The pause time for the realtime raytracer on Aurora (3/13) is about 85msec and Nightly with IGC disabled about 120 msec.
Comment 5 User image David Mandelin [:dmandelin] 2012-03-15 16:36:45 PDT
(In reply to Gregor Wagner [:gwagner] from comment #4)
> (In reply to David Mandelin from comment #3)
> > Fixed for now by the disabling of IGC.
> 
> I don't see that this is fixed. The pause time for the realtime raytracer on
> Aurora (3/13) is about 85msec and Nightly with IGC disabled about 120 msec.

Which versions should we be tracking? I think Aurora was 12 when you originally reported this but it's 13 now.
Comment 6 User image Gregor Wagner [:gwagner] 2012-03-15 16:48:17 PDT
(In reply to David Mandelin from comment #5)
> (In reply to Gregor Wagner [:gwagner] from comment #4)
> > (In reply to David Mandelin from comment #3)
> > > Fixed for now by the disabling of IGC.
> > 
> > I don't see that this is fixed. The pause time for the realtime raytracer on
> > Aurora (3/13) is about 85msec and Nightly with IGC disabled about 120 msec.
> 
> Which versions should we be tracking? I think Aurora was 12 when you
> originally reported this but it's 13 now.

Todays Aurora build is still 12. So 12 is not affected. Nightly is 14 but even with IGC disabled I see the regression. So I guess once Aurora 13 is out it will have the same regression.
Comment 7 User image Gregor Wagner [:gwagner] 2012-03-15 16:59:53 PDT
I am on the update-channel for Aurora.
Comment 8 User image David Mandelin [:dmandelin] 2012-03-15 17:00:52 PDT
(In reply to Gregor Wagner [:gwagner] from comment #6)
> Todays Aurora build is still 12. So 12 is not affected. Nightly is 14 but
> even with IGC disabled I see the regression. So I guess once Aurora 13 is
> out it will have the same regression.

Thanks. OK, back on the radar.
Comment 9 User image Bill McCloskey (:billm) 2012-03-16 18:17:18 PDT
Terrence tracked this down to the incremental GC landing. Before incremental GC landed, we were doing compartmental GCs. Now we're doing global GCs. I'll check on Monday if fixing that completely eliminates the regression.
Comment 10 User image Gregor Wagner [:gwagner] 2012-03-28 17:36:12 PDT
Also the flight of the navigator seems to see this regression. The average GC time increased from 70msec to 120msec
Comment 11 User image David Mandelin [:dmandelin] 2012-04-12 11:46:54 PDT
(In reply to Bill McCloskey (:billm) from comment #9)
> Terrence tracked this down to the incremental GC landing. Before incremental
> GC landed, we were doing compartmental GCs. Now we're doing global GCs. I'll
> check on Monday if fixing that completely eliminates the regression.

Did it fix it?
Comment 12 User image Bill McCloskey (:billm) 2012-04-12 16:24:41 PDT
Created attachment 614607 [details] [diff] [review]
patch

There were two issues here. The first was that we were doing full GCs instead of compartment GCs. This was easy to fix.

The other problem is that the marking fast path got a little slower. This patch makes some changes that bring us back to pre-incremental GC speed on my machine. I had to remove one of the isOverBudget checks to do this. It's possible that this will cause some incremental GC slices to be too long. However, we can address that when incremental GC is ready to be turned on again.
Comment 13 User image Igor Bukanov 2012-04-13 05:04:02 PDT
Comment on attachment 614607 [details] [diff] [review]
patch

Review of attachment 614607 [details] [diff] [review]:
-----------------------------------------------------------------

Is this a branch patch?

::: js/src/jsgcmark.cpp
@@ +1055,3 @@
>      }
>  
>      if (tag == ObjectTag) {

Use JS_LIKELY here so the compiler can optimize even without PGO. Without that GCC assumes that all values of a tag are likely. Based on that it may even push the code under "if" out of the fast path. With that change processMarkStackOther does not need JS_NEVER_INLINE.

@@ -1058,5 @@
>    scan_value_array:
>      JS_ASSERT(vp <= end);
>      while (vp != end) {
> -        budget.step();
> -        if (budget.isOverBudget()) {

I am surprised that this extra check costs so much. Have you tried to use JS_LIKELY in isOverBudget on the fast path and move budget.step() after the isOverBudget check?
Comment 14 User image Bill McCloskey (:billm) 2012-04-13 14:00:45 PDT
Created attachment 614906 [details] [diff] [review]
patch v2

The good news is that the JS_NEVER_INLINE turned out to be unnecessary. I added it along with some other stuff and never measured how much it mattered.

The bad news is that I added a JS_LIKELY to the (tag == ObjectTag) branch. That actually made us go a little bit slower (83ms instead of 81ms for a GC).

I also tried adding back the isOverBudget check to the scan_value_array section, but with JS_UNLIKELY around it. That increased the mark time to about about 100ms, which is about the same as it was without the JS_UNLIKELY.

I've seen JS_LIKELY help in some places, but unfortunately it doesn't seem to be having any effect here.

And regarding branches, I think we should try to land this on FF14 and aurora.
Comment 16 User image Bill McCloskey (:billm) 2012-04-17 12:45:22 PDT
Comment on attachment 614906 [details] [diff] [review]
patch v2

[Approval Request Comment]
Regression caused by (bug #): Bug 641025
User impact if declined: Slower GC mark times, longer GC pauses
Testing completed (on m-c, etc.): Landed on m-c
Risk to taking this patch (and alternatives if risky): Pretty low risk. It moves some code around and makes a few semantic changes that should only affect incremental GC, which is disabled for FF13.
String changes made by this patch: None
Comment 18 User image Lukas Blakk [:lsblakk] use ?needinfo 2012-04-18 15:21:50 PDT
[Triage comment]
low risk, incremental GC disabled in 13 so we shouldn't see issues there.
Comment 19 User image Bill McCloskey (:billm) 2012-04-23 17:32:52 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/480c0fe7f591
Comment 20 User image Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-05-10 12:23:40 PDT
Is this something QA can verify?
Comment 21 User image Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-05-29 10:39:55 PDT
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #20)
> Is this something QA can verify?

Taking silence as a 'no' and marking [qa-]. Please mark as [qa+] if there is something QA can do to verify this fix.

Note You need to log in before you can comment on or make changes to this bug.