Closed Bug 762580 Opened 12 years ago Closed 12 years ago

Pageload responsiveness perf regressions, likely from IGC

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla17
Tracking Status
firefox15 + fixed
firefox16 + wontfix
firefox17 --- fixed

People

(Reporter: bzbarsky, Assigned: billm)

References

Details

(Keywords: regression, Whiteboard: [js:inv:p1])

Attachments

(1 file)

There are a bunch of huge (40+%) pageload test responsiveness regressions on inbound and m-c in the range when IGC landed.  See m.d.tree-management for May 32 and June 1, but a sample:

  Tp5 Row Major Responsiveness MozAfterPaint increase 41.3% on
    XP Mozilla-Inbound-Non-PGO
  Tp5 Row Major Responsiveness MozAfterPaint increase 36.7% on
    Win7 Mozilla-Inbound-Non-PGO
  Tp5 Row Major Responsiveness MozAfterPaint increase 37.7% on
    Win7 Mozilla-Inbound
  Tp5 Row Major Responsiveness MozAfterPaint increase 67.7% on
    MacOSX 10.6 (rev4) Mozilla-Inbound
  
oddly, nothing on Linux.

The range for that Mac regression, btw, _only_ includes IGC.
Whiteboard: [js:inv:p1]
Bill, do you know what this is? I find it hard to believe it's real, given that nothing has been noticed otherwise. The numbers also look pretty crazy. Whatever it's measuring, it's hard to believe that it's 1000ms on NT 5.1, especially if it's really 500 ms on NT 5.2.
IGC was disabled on 15, so I'd imagine there should be a corresponding drop in this metric when that happened, then another regression when it was turned on again.  Something to look into at least.
(In reply to Andrew McCreight [:mccr8] from comment #2)
> IGC was disabled on 15, so I'd imagine there should be a corresponding drop
> in this metric when that happened, then another regression when it was
> turned on again.  Something to look into at least.

I do see "up then down" on Aurora, e.g.:

http://graphs.mozilla.org/graph.html#tests=[[196,1,19],[196,1,12],[196,1,1],[196,1,13],[196,1,21],[196,52,12],[196,52,1],[196,52,21],[196,52,13]]&sel=none&displayrange=90&datatype=running
It looks like this was just a change I made to nsDOMWindowUtils::GarbageCollect. I don't know why it would affect benchmarks like this. Anyway, I'm pretty sure I can revert it since it was just papering over some IGC bugs that have now been fixed.
Assignee: general → wmccloskey
(I'm assuming the clobbering of the status flags was accidental)
Attached patch patchSplinter Review
This seems like it makes the regression go away. We should really fix the tests so that they're not benchmarking an operation that is slow and that never happens during normal browsing. I don't really want to tackle that right now, though.
Attachment #650621 - Flags: review?(continuation)
Attachment #650621 - Flags: review?(continuation) → review+
Hurray!
IGC has been disabled in 15, so I think it isn't affected, but the trains have rolled on so 17 is.
Well, more to the point, this patch has already essentially been applied there, so it shouldn't be affected.
https://hg.mozilla.org/mozilla-central/rev/90a3045cda15
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
Do you know why nsDOMWindowUtils::GarbageCollect was being called during talos performance tests?
Should this be landed on Aurora?
The sooner we can get this from a games perspective the better.  In tests, it helps reduce frame stutter in games that use code that triggers GC.  I'd love to try and get this for Aurora (Fx16).  It's a pretty big feature so I understand if the risk is too large but I would very much like to see the feature sooner than later.

The current flags are a bit confusing, says it's fixed in fx15 and fx17 but not in fx16?  I'd like to get a better idea of where this currently sits for my upcoming conversation with marketing about the fx15 release blog post.
Incremental GC is already turned on in Fx16 and 17. This bug is about fixing some large Talos regressions that showed up when it was enabled, which appear to be due to Talos manually forcing GCs, which isn't an operation that a page would normally do. It should land on 16, but it won't affect the normal user experience.

As to the confusing flags, incremental GC is disabled in fx15, which involved applying the equivalent of this patch there. I'll set it to disabled, as maybe that is more germane to the description in this bug.
Well, okay, fixed is right I think. The perf regressions were (I believe) fixed by disabling IGC in 15. In 16 and 17 they need to be fixed by this patch.  Sorry for the bugspam.
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #12)
> Do you know why nsDOMWindowUtils::GarbageCollect was being called during
> talos performance tests?

Looks like it was added in bug 403835. The intent seemed to be to measure the CC/GC time separately from the rest of the measurements, but I guess that got lost somewhere along the way.
Yes, we were having a few problems:

* The pageload talos numbers got added noise because any time a GC happened that run got significantly slower. Forcing a GC between runs reduced the likelyhood that a GC would happen during the pageload test.
* We were adding a lot of complexity to the GC logic in order to try to prevent it from triggering GC during talos runs. Despite that this wasn't helping real-world behavior at all since normally users take plenty of time between page loads and so we could use much simpler and saner logic could be used to GC during those times.
* It was harder to measure things separately since GC times ended up affecting pageload times, even though it's entirely different groups of people working on those two things.
Are we comfortable with uplifting this prior to our first FF16 beta build on Monday?
I don't think there's any need to land this on 16. Users are not affected by this problem at all.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: