Closed Bug 1088154 Opened 10 years ago Closed 1 year ago

35% OSX 10.8|10.6 TP5 Scroll regression on Inbound (v.36) on Oct 21/22 from revision fa9c6845338e

Categories

(Core :: Graphics: Layers, defect, P3)

36 Branch
All
macOS
defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression][gfx-noted])

We also got a decent tscroll win from this changeset, right?

Is there any easy way to get a breakdown of which individual test pages regressed?
I see a 5.5% tscroll win here:
http://alertmanager.allizom.org:8080/alerts.html?rev=fa9c6845338e&showAll=1&table=1

it could be that other improvements have not shown up yet, or are miscategorized to another revision (based on coalescing, etc.)
There's a m.d.tree-management email showing a 15.9% tscrollx win for 10.8 for this changeset.
got it, that showed up for m-c (I was looking at inbound), here is a list of all the alerts (almost all are generated when we merge between branches and pgo/non-pgo):
http://alertmanager.allizom.org:8080/alerts.html?rev=fa9c6845338e&showAll=1

thanks for being persistent there, now let me figure out what tp5 scroll pages are causing problems.
it appears that about half of the pages had a noticeable regression:
https://datazilla.mozilla.org/?start=1413439230&stop=1414096959&product=Firefox&repository=Mozilla-Inbound&os=mac&os_version=OS%20X%2010.8&test=tp5o_scroll&graph_search=fa9c6845338e&tr_id=7456526&graph=naver.com&x86=true&x86_64=true&project=talos (be patient this did load for me)

the ones that regressed it was a large enough break in the graph and it has sustained.
Sweet, thanks Joel! A few of those are real clear shifts, should be easy enough to test with.
do ask if you need any help running locally
I can't reproduce any change locally. I'm getting a 'save file' dialog for every page load though, not sure if that's affecting things. I'm running via ./mach talos-test tp5o.
tp5 scroll will need to be tested via ./mach.  also mach doesn't set up the pagesets.

Can you follow the directions here:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

you can find tp5n.zip here: http://people.mozilla.org/~jmaher/taloszips/zips/.  Download that and put it in your talos/talos/page_load_test directory, then unzip it.  Finally you can run talos!

if you are in your virtualenv, do:
./talos -e <path/to/firefox> -a tp5o_scroll --develop --results_url tp5.out --datazilla-url tp5.json
Yeah, got it running fine, but I'm not seeing any regression locally.
I suspect this boils down to machine specifics.  Here is a summary of the machines we use in automation:
https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation

I have double checked and the changeset is definitely a culprit, although if this doesn't reproduce locally, that reduces the severity of it.
this has landed on Aurora with a 45.4% regression, the original 35% and another ~10% around november 17th.  OSX 10.6 shows the same two regressions, just much milder (17% overall)
Summary: 35% OSX 10.8 TP5 Scroll regression on Inbound (v.36) on Oct 21/22 from revision fa9c6845338e → 35% OSX 10.8|10.6 TP5 Scroll regression on Inbound (v.36) on Oct 21/22 from revision fa9c6845338e
I wonder if the culling can break a layer from one draw call to multiple draw calls:

XXX    XXX
XXX -> X.X
XXX    XXX

where . is a culled rect. We'd now need to do 4 draw calls instead of one. This could explain a regression.
:BenWa, could we fix this in Beta or Aurora?  Not sure if there is a good reason for doing 3 extra draw calls.
Flags: needinfo?(bgirard)
So I just disabled matt' culling on inbound and it's reporting regressions on bug TP5+tscroll on 10.8. So the culling probably makes things better overall and I'll probably re-enable it shortly.

What I point out in Comment 13 is just a theoretical problem. I need to find time to look at it closer but I don't have any bandwidth left. This may have to wait until I can run some tests (so we don't land a patch blind without understanding it).
Flags: needinfo?(bgirard)
ok, we can schedule this in due time.  Thanks for looking into this so far!
bug 1136766 will bring additional performance improvement to culling. It should make it such that it performs no more draw calls. We're hoping this will address this regression.
No longer blocks: 1085223
Depends on: 1136766, 1085223
Version: unspecified → 36 Branch
Whiteboard: [talos_regression] → [talos_regression][gfx-noted]
Severity: normal → S3
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.