The default bug view has changed. See this FAQ.

Pageload responsiveness perf regressions, likely from IGC

RESOLVED FIXED in Firefox 15

Status

()

Core
JavaScript Engine
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: bz, Assigned: billm)

Tracking

({regression})

unspecified
mozilla17
x86
Mac OS X
regression
Points:
---

Firefox Tracking Flags

(firefox15+ fixed, firefox16+ wontfix, firefox17 fixed)

Details

(Whiteboard: [js:inv:p1])

Attachments

(1 attachment)

There are a bunch of huge (40+%) pageload test responsiveness regressions on inbound and m-c in the range when IGC landed.  See m.d.tree-management for May 32 and June 1, but a sample:

  Tp5 Row Major Responsiveness MozAfterPaint increase 41.3% on
    XP Mozilla-Inbound-Non-PGO
  Tp5 Row Major Responsiveness MozAfterPaint increase 36.7% on
    Win7 Mozilla-Inbound-Non-PGO
  Tp5 Row Major Responsiveness MozAfterPaint increase 37.7% on
    Win7 Mozilla-Inbound
  Tp5 Row Major Responsiveness MozAfterPaint increase 67.7% on
    MacOSX 10.6 (rev4) Mozilla-Inbound
  
oddly, nothing on Linux.

The range for that Mac regression, btw, _only_ includes IGC.
tracking-firefox15: --- → ?
status-firefox15: --- → affected
status-firefox16: --- → affected
tracking-firefox15: ? → +
tracking-firefox16: --- → +
Whiteboard: [js:inv:p1]
Bill, do you know what this is? I find it hard to believe it's real, given that nothing has been noticed otherwise. The numbers also look pretty crazy. Whatever it's measuring, it's hard to believe that it's 1000ms on NT 5.1, especially if it's really 500 ms on NT 5.2.
IGC was disabled on 15, so I'd imagine there should be a corresponding drop in this metric when that happened, then another regression when it was turned on again.  Something to look into at least.
(In reply to Andrew McCreight [:mccr8] from comment #2)
> IGC was disabled on 15, so I'd imagine there should be a corresponding drop
> in this metric when that happened, then another regression when it was
> turned on again.  Something to look into at least.

I do see "up then down" on Aurora, e.g.:

http://graphs.mozilla.org/graph.html#tests=[[196,1,19],[196,1,12],[196,1,1],[196,1,13],[196,1,21],[196,52,12],[196,52,1],[196,52,21],[196,52,13]]&sel=none&displayrange=90&datatype=running
(Assignee)

Comment 4

5 years ago
It looks like this was just a change I made to nsDOMWindowUtils::GarbageCollect. I don't know why it would affect benchmarks like this. Anyway, I'm pretty sure I can revert it since it was just papering over some IGC bugs that have now been fixed.
Assignee: general → wmccloskey
status-firefox15: affected → ---
status-firefox16: affected → ---
tracking-firefox15: + → ?
tracking-firefox16: + → ---
(I'm assuming the clobbering of the status flags was accidental)
status-firefox15: --- → affected
status-firefox16: --- → affected
tracking-firefox15: ? → +
tracking-firefox16: --- → +
(Assignee)

Comment 6

5 years ago
Created attachment 650621 [details] [diff] [review]
patch

This seems like it makes the regression go away. We should really fix the tests so that they're not benchmarking an operation that is slow and that never happens during normal browsing. I don't really want to tackle that right now, though.
Attachment #650621 - Flags: review?(continuation)
Attachment #650621 - Flags: review?(continuation) → review+
Hurray!
IGC has been disabled in 15, so I think it isn't affected, but the trains have rolled on so 17 is.
status-firefox15: affected → disabled
status-firefox17: --- → affected
tracking-firefox17: --- → ?
Well, more to the point, this patch has already essentially been applied there, so it shouldn't be affected.
status-firefox15: disabled → fixed
(Assignee)

Comment 10

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/90a3045cda15

Also, here's a try link for it:
https://tbpl.mozilla.org/?tree=Try&rev=853838f9465c
https://hg.mozilla.org/mozilla-central/rev/90a3045cda15
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
Do you know why nsDOMWindowUtils::GarbageCollect was being called during talos performance tests?
Should this be landed on Aurora?
status-firefox17: affected → fixed
tracking-firefox17: ? → ---
The sooner we can get this from a games perspective the better.  In tests, it helps reduce frame stutter in games that use code that triggers GC.  I'd love to try and get this for Aurora (Fx16).  It's a pretty big feature so I understand if the risk is too large but I would very much like to see the feature sooner than later.

The current flags are a bit confusing, says it's fixed in fx15 and fx17 but not in fx16?  I'd like to get a better idea of where this currently sits for my upcoming conversation with marketing about the fx15 release blog post.
Incremental GC is already turned on in Fx16 and 17. This bug is about fixing some large Talos regressions that showed up when it was enabled, which appear to be due to Talos manually forcing GCs, which isn't an operation that a page would normally do. It should land on 16, but it won't affect the normal user experience.

As to the confusing flags, incremental GC is disabled in fx15, which involved applying the equivalent of this patch there. I'll set it to disabled, as maybe that is more germane to the description in this bug.
status-firefox15: fixed → disabled
Well, okay, fixed is right I think. The perf regressions were (I believe) fixed by disabling IGC in 15. In 16 and 17 they need to be fixed by this patch.  Sorry for the bugspam.
status-firefox15: disabled → fixed
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #12)
> Do you know why nsDOMWindowUtils::GarbageCollect was being called during
> talos performance tests?

Looks like it was added in bug 403835. The intent seemed to be to measure the CC/GC time separately from the rest of the measurements, but I guess that got lost somewhere along the way.
Yes, we were having a few problems:

* The pageload talos numbers got added noise because any time a GC happened that run got significantly slower. Forcing a GC between runs reduced the likelyhood that a GC would happen during the pageload test.
* We were adding a lot of complexity to the GC logic in order to try to prevent it from triggering GC during talos runs. Despite that this wasn't helping real-world behavior at all since normally users take plenty of time between page loads and so we could use much simpler and saner logic could be used to GC during those times.
* It was harder to measure things separately since GC times ended up affecting pageload times, even though it's entirely different groups of people working on those two things.
Are we comfortable with uplifting this prior to our first FF16 beta build on Monday?
(Assignee)

Comment 20

5 years ago
I don't think there's any need to land this on 16. Users are not affected by this problem at all.
status-firefox16: affected → wontfix
You need to log in before you can comment on or make changes to this bug.