Last Comment Bug 762580 - Pageload responsiveness perf regressions, likely from IGC
: Pageload responsiveness perf regressions, likely from IGC
Status: RESOLVED FIXED
[js:inv:p1]
: regression
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal (vote)
: mozilla17
Assigned To: Bill McCloskey (:billm)
:
:
Mentors:
Depends on:
Blocks: 735099
  Show dependency treegraph
 
Reported: 2012-06-07 10:41 PDT by Boris Zbarsky [:bz] (still a bit busy)
Modified: 2012-08-27 10:52 PDT (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
wontfix
fixed


Attachments
patch (1.02 KB, patch)
2012-08-09 11:12 PDT, Bill McCloskey (:billm)
continuation: review+
Details | Diff | Splinter Review

Description Boris Zbarsky [:bz] (still a bit busy) 2012-06-07 10:41:04 PDT
There are a bunch of huge (40+%) pageload test responsiveness regressions on inbound and m-c in the range when IGC landed.  See m.d.tree-management for May 32 and June 1, but a sample:

  Tp5 Row Major Responsiveness MozAfterPaint increase 41.3% on
    XP Mozilla-Inbound-Non-PGO
  Tp5 Row Major Responsiveness MozAfterPaint increase 36.7% on
    Win7 Mozilla-Inbound-Non-PGO
  Tp5 Row Major Responsiveness MozAfterPaint increase 37.7% on
    Win7 Mozilla-Inbound
  Tp5 Row Major Responsiveness MozAfterPaint increase 67.7% on
    MacOSX 10.6 (rev4) Mozilla-Inbound
  
oddly, nothing on Linux.

The range for that Mac regression, btw, _only_ includes IGC.
Comment 1 David Mandelin [:dmandelin] 2012-07-30 17:45:04 PDT
Bill, do you know what this is? I find it hard to believe it's real, given that nothing has been noticed otherwise. The numbers also look pretty crazy. Whatever it's measuring, it's hard to believe that it's 1000ms on NT 5.1, especially if it's really 500 ms on NT 5.2.
Comment 2 Andrew McCreight [:mccr8] 2012-07-30 17:47:02 PDT
IGC was disabled on 15, so I'd imagine there should be a corresponding drop in this metric when that happened, then another regression when it was turned on again.  Something to look into at least.
Comment 3 David Mandelin [:dmandelin] 2012-07-30 18:28:41 PDT
(In reply to Andrew McCreight [:mccr8] from comment #2)
> IGC was disabled on 15, so I'd imagine there should be a corresponding drop
> in this metric when that happened, then another regression when it was
> turned on again.  Something to look into at least.

I do see "up then down" on Aurora, e.g.:

http://graphs.mozilla.org/graph.html#tests=[[196,1,19],[196,1,12],[196,1,1],[196,1,13],[196,1,21],[196,52,12],[196,52,1],[196,52,21],[196,52,13]]&sel=none&displayrange=90&datatype=running
Comment 4 Bill McCloskey (:billm) 2012-08-03 18:45:33 PDT
It looks like this was just a change I made to nsDOMWindowUtils::GarbageCollect. I don't know why it would affect benchmarks like this. Anyway, I'm pretty sure I can revert it since it was just papering over some IGC bugs that have now been fixed.
Comment 5 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-08-03 19:33:04 PDT
(I'm assuming the clobbering of the status flags was accidental)
Comment 6 Bill McCloskey (:billm) 2012-08-09 11:12:15 PDT
Created attachment 650621 [details] [diff] [review]
patch

This seems like it makes the regression go away. We should really fix the tests so that they're not benchmarking an operation that is slow and that never happens during normal browsing. I don't really want to tackle that right now, though.
Comment 7 Andrew McCreight [:mccr8] 2012-08-09 14:49:42 PDT
Hurray!
Comment 8 Andrew McCreight [:mccr8] 2012-08-09 14:50:34 PDT
IGC has been disabled in 15, so I think it isn't affected, but the trains have rolled on so 17 is.
Comment 9 Andrew McCreight [:mccr8] 2012-08-09 14:52:58 PDT
Well, more to the point, this patch has already essentially been applied there, so it shouldn't be affected.
Comment 11 Ryan VanderMeulen [:RyanVM] 2012-08-11 19:55:11 PDT
https://hg.mozilla.org/mozilla-central/rev/90a3045cda15
Comment 12 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-08-11 22:33:16 PDT
Do you know why nsDOMWindowUtils::GarbageCollect was being called during talos performance tests?
Comment 13 Andrew McCreight [:mccr8] 2012-08-12 07:35:45 PDT
Should this be landed on Aurora?
Comment 14 Martin Best (:mbest) 2012-08-13 07:04:27 PDT
The sooner we can get this from a games perspective the better.  In tests, it helps reduce frame stutter in games that use code that triggers GC.  I'd love to try and get this for Aurora (Fx16).  It's a pretty big feature so I understand if the risk is too large but I would very much like to see the feature sooner than later.

The current flags are a bit confusing, says it's fixed in fx15 and fx17 but not in fx16?  I'd like to get a better idea of where this currently sits for my upcoming conversation with marketing about the fx15 release blog post.
Comment 15 Andrew McCreight [:mccr8] 2012-08-13 07:14:22 PDT
Incremental GC is already turned on in Fx16 and 17. This bug is about fixing some large Talos regressions that showed up when it was enabled, which appear to be due to Talos manually forcing GCs, which isn't an operation that a page would normally do. It should land on 16, but it won't affect the normal user experience.

As to the confusing flags, incremental GC is disabled in fx15, which involved applying the equivalent of this patch there. I'll set it to disabled, as maybe that is more germane to the description in this bug.
Comment 16 Andrew McCreight [:mccr8] 2012-08-13 07:17:23 PDT
Well, okay, fixed is right I think. The perf regressions were (I believe) fixed by disabling IGC in 15. In 16 and 17 they need to be fixed by this patch.  Sorry for the bugspam.
Comment 17 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-08-15 11:38:37 PDT
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #12)
> Do you know why nsDOMWindowUtils::GarbageCollect was being called during
> talos performance tests?

Looks like it was added in bug 403835. The intent seemed to be to measure the CC/GC time separately from the rest of the measurements, but I guess that got lost somewhere along the way.
Comment 18 Jonas Sicking (:sicking) No longer reading bugmail consistently 2012-08-15 14:46:39 PDT
Yes, we were having a few problems:

* The pageload talos numbers got added noise because any time a GC happened that run got significantly slower. Forcing a GC between runs reduced the likelyhood that a GC would happen during the pageload test.
* We were adding a lot of complexity to the GC logic in order to try to prevent it from triggering GC during talos runs. Despite that this wasn't helping real-world behavior at all since normally users take plenty of time between page loads and so we could use much simpler and saner logic could be used to GC during those times.
* It was harder to measure things separately since GC times ended up affecting pageload times, even though it's entirely different groups of people working on those two things.
Comment 19 Alex Keybl [:akeybl] 2012-08-22 16:19:10 PDT
Are we comfortable with uplifting this prior to our first FF16 beta build on Monday?
Comment 20 Bill McCloskey (:billm) 2012-08-27 10:49:06 PDT
I don't think there's any need to land this on 16. Users are not affected by this problem at all.

Note You need to log in before you can comment on or make changes to this bug.