Closed Bug 1513521 Opened 5 years ago Closed 5 years ago

(rasterizing blobs twice) 23.48 - 60.08% tsvg_static / tsvgr_opacity / tsvgx (linux64-qr, windows10-64-qr) regression on push 5d24ac329431289a8966eabc3325228dfd7f2c4d (Tue Dec 11 2018)

Categories

(Core :: Graphics: WebRender, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla66
Tracking Status
firefox-esr60 --- unaffected
firefox64 --- unaffected
firefox65 --- wontfix
firefox66 --- fixed

People

(Reporter: Bebe, Assigned: jrmuizel)

References

Details

(Keywords: perf, regression, talos-regression)

Attachments

(1 file)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=5d24ac329431289a8966eabc3325228dfd7f2c4d

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 60%  tsvgx opt e10s stylo             220.25 -> 352.57
 59%  tsvgx opt e10s stylo             346.94 -> 552.44
 36%  tsvgr_opacity opt e10s stylo     119.98 -> 162.59
 34%  tsvg_static opt e10s stylo       64.03 -> 86.09
 27%  tsvgr_opacity opt e10s stylo     112.24 -> 143.03
 23%  tsvg_static opt e10s stylo       45.92 -> 56.70


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=18133

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling
Blocks: 1512730, 1513237
Component: General → Graphics: WebRender
Flags: needinfo?(dmalyshau)
Product: Testing → Core
Version: Version 3 → unspecified
This also caused a regression on raptor:

== Change summary for alert #18115 (as of Tue, 11 Dec 2018 00:07:31 GMT) ==

Regressions:

  8%  raptor-motionmark-animometer-firefox opt      39.17 -> 35.91
  5%  raptor-motionmark-animometer-firefox opt      38.79 -> 36.77

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=18115
I think we might be hitting a case more often where we allocate the heap space for rasterized blobs and move stuff around. I don't think we do more work actually rasterizing any SvGs. Working on a quick fix now.
Assignee: nobody → dmalyshau
Status: NEW → ASSIGNED
Flags: needinfo?(dmalyshau)
Priority: -- → P2
This caused a regression in beta

== Change summary for alert #18249 (as of Thu, 13 Dec 2018 18:20:53 GMT) ==

Regressions:

 69%  tsvgx opt e10s stylo             320.02 -> 542.08
 58%  tsvgx opt e10s stylo             218.93 -> 345.76
 39%  tsvgr_opacity opt e10s stylo     115.22 -> 160.58
 35%  tsvg_static opt e10s stylo       62.54 -> 84.37

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=18249
Also the raptor regression

== Change summary for alert #18250 (as of Thu, 13 Dec 2018 18:20:53 GMT) ==

Regressions:

  6%  raptor-motionmark-animometer-firefox opt      39.54 -> 36.99

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=18250
Wow, that's quite a significant change here, and not expected. I'm actually puzzled on how this small change could have caused a 50% `tsvgx` regression. Going to look at it ASAP.
Does bug 1513746 just need an uplift? Did it fix the perf regression on Nightly?
(In reply to Dzmitry Malyshau [:kvark] from comment #7)
> I looked at tsvg_static tests from the try run, and they seemed to indicate
> an improvement:
> https://treeherder.mozilla.org/perf.html#/
> comparesubtest?originalProject=try&originalRevision=a571d97ed59e0341e27380c37
> 8b9b33353be7cb1&newProject=try&newRevision=26ea31fd69e8ed47167f33afefbd711424
> f0250a&originalSignature=6e4abd046a0f519e61e3bf63050d34467c391e19&newSignatur
> e=6e4abd046a0f519e61e3bf63050d34467c391e19&framework=1
> 
> Was I looking at the wrong thing?

This link is showing that the patch https://hg.mozilla.org/try/rev/ca44fb823d41296681792f75c3cc777c0c32f077 made things worse. i.e. the "new" results from [1] are worse than the "base" results from [2].

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=26ea31fd69e8ed47167f33afefbd711424f0250a
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=a571d97ed59e0341e27380c378b9b33353be7cb1

(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #6)
> Does bug 1513746 just need an uplift? Did it fix the perf regression on
> Nightly?

It doesn't look like the regression is fixed on Nightly:

Windows: https://treeherder.mozilla.org/perf.html#/graphs?series=autoland,1686448,1,1&series=autoland,1686495,1,1&series=autoland,1686440,1,1
Linux: https://treeherder.mozilla.org/perf.html#/graphs?series=autoland,1683650,1,1&series=autoland,1683603,1,1&series=autoland,1683595,1,1
Oh right, it's base vs new, not the other way around :/ I read it wrong.
See Also: → 1517039
(In reply to Dzmitry Malyshau [:kvark] from comment #9)
> Oh right, it's base vs new, not the other way around :/ I read it wrong.

Any updates on this matter?
Flags: needinfo?(dmalyshau)

It looks like tsvgx 4.xml was one of the subsets that moved the most.

Here is what I know about the situation:

  1. I'm just back from a long holiday, sorry about the silence!
  2. The patch fixed a glaring correctness issue that was causing a number one crash. Note: it's not the follow-ups that caused the regression (i.e https://github.com/servo/webrender/pull/3405), they are harmless.
  3. The patch makes it mostly correct, and fully addressess the crashing. There is a scenario though where the frame builder would work with newer version of the blobs than the scene has, so it's not fully correct.
  4. It is possible that making it fully correct would resolve the performance regression, even though I don't think calling it "regression" is fare at all, since the old code wasn't proper in the first place.
  5. Regardless, we just need to investigate the case in more depth and fix it. If :nical is currently too busy to do so, I'll be willing to take care of it.
Flags: needinfo?(dmalyshau) → needinfo?(nical.bugzilla)

So I did some quick counting of blob image rasterizations before and after https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=5d24ac329431289a8966eabc3325228dfd7f2c4d.

Before, we rasterize testing/talos/talos/tests/svgx/hixie-004.xml about 1444 times. After, we rasterize the same test 2888 times. This explains the performance regression.

On trunk we still rasterize 2888 times.

Assignee: dmalyshau → nical.bugzilla
Flags: needinfo?(nical.bugzilla)
Blocks: 1477369
Summary: 23.48 - 60.08% tsvg_static / tsvgr_opacity / tsvgx (linux64-qr, windows10-64-qr) regression on push 5d24ac329431289a8966eabc3325228dfd7f2c4d (Tue Dec 11 2018) → (rasterizing blobs twice) 23.48 - 60.08% tsvg_static / tsvgr_opacity / tsvgx (linux64-qr, windows10-64-qr) regression on push 5d24ac329431289a8966eabc3325228dfd7f2c4d (Tue Dec 11 2018)
Attached patch A fixSplinter Review

This patch seems to fix the problem. I don't yet know why.

Assignee: nical.bugzilla → jmuizelaar
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
Target Milestone: --- → mozilla66
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: