Closed Bug 1365333 Opened 3 years ago Closed 3 years ago

Firefox is very slow on Live Nation ticket finder, with APZ enabled

Categories

(Core :: Panning and Zooming, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox-esr52 55+ fixed
firefox53 --- wontfix
firefox54 + fixed
firefox55 + fixed

People

(Reporter: dmajor, Assigned: botond)

Details

(Keywords: regression, Whiteboard: [qf:p1])

Attachments

(1 file)

https://perfht.ml/2qp7fq1 -- See content process 2 after 11s.

It eats a full CPU core roughly forever. I only sampled a few seconds so my profiler wouldn't fail the upload!

STR:
- https://concerts1.livenation.com/event/06005266C5C24E09
- Click See Tickets
- Zoom in until seats visible
- Pan around
Here's a link directly to the correct thread & region of time:
https://perf-html.io/public/13533b874ddc9fc15d6867fce9ed8a36bc87ce32/calltree/?range=10.5637_15.2759&thread=5

Looks like we're spending all our time in nsLayoutUtils::PaintFrame, if I'm interpreting the profile correctly. And in there, we're spending most of our time (1967ms) in:
 nsLayoutUtils::PaintFrame::BuildDisplayList

Here are a few of the nested function calls inside of that, each of which are also reported as 1967ms or close to it:
 nsSVGOuterSVGFrame::BuildDisplayList
 nsDisplayLayerEventRegions::AddFrame(nsDisplayListBuilder *,nsIFrame *)
 moz_pixman_region32_union

So this might be an SVG issue or a Graphics issue.  I'll start it out in SVG.
Component: General → SVG
Whiteboard: [qf] → [qf:p3]
Could you elaborate a bit on the reasoning behind the priority triage? (I found it surprising.) The user experience in this scenario is pretty horrible and I could see it being the kind of thing that would make a person switch browsers and tell their friends that Firefox is slow.
Flags: needinfo?(nihsanullah)
[I'll steal the needinfo, since I was in the triage meeting]

Yeah, on second examination (and on visiting the site myself), this does feel pretty painful and it seems like we're spinning our wheels doing something wrong.  And livenation is a pretty prominent site for event tickets. I'll bump this to p1 for now.

Also, by my testing, this is a regression! Regression range:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=891ee0d0ba3ec42b6484cf0205b3c95e21c58f74&tochange=096c0f407f8ba3ef7cfe4e0b831761993cac38b1

Before that range, I can pan around the seat view pretty responsively.  (Small delay in update -- not butter-smooth -- but it's usable.)
After that range, there's often ~second-long janks when I do a long click-and-drag operation to pan around the seat view.

dvander, could bug 1207041 have caused this sort of painting perf regression?
> 1a3938f2dc56 David Anderson — Treat mix-blend-mode layers as transparent for occlusion culling. (bug 1207041, r=mstange)

I searched the pushlog for "paint", "layer", and "svg", and there's only one other bug -- bug 1208661 -- which seems to be about logging rather than actual painting.
Flags: needinfo?(nihsanullah) → needinfo?(dvander)
Keywords: regression
Whiteboard: [qf:p3] → [qf:p1]
(In reply to Daniel Holbert [:dholbert] from comment #3)
> dvander, could bug 1207041 have caused this sort of painting perf regression?

My mistake -- I've now confirmed that the proximal cause is APZ, and the regression range does include the pref-flip for "Kartikaya Gupta — Bug 1143856 - Enable APZ on Linux desktop nightly builds. r=botond"

I confirmed this by toggling the pref "layers.async-pan-zoom.enabled" to "false" in current Nightly, and restarting, and then retrying the STR. That basically fixes the bug for me.  dmajor, could you confirm that this suppresses the bug on your end, too?

Kats, this is a pretty serious perf regression caused by APZ -- maybe you could take a look?  (Or maybe you know if this is a dupe of a known bug?)
Component: SVG → Panning and Zooming
Flags: needinfo?(dvander)
Summary: Firefox is very slow on Live Nation ticket finder → Firefox is very slow on Live Nation ticket finder, with APZ enabled
Flags: needinfo?(bugmail)
Flags: needinfo?(dmajor)
> I confirmed this by toggling the pref "layers.async-pan-zoom.enabled" to
> "false" in current Nightly, and restarting, and then retrying the STR. That
> basically fixes the bug for me.  dmajor, could you confirm that this
> suppresses the bug on your end, too?

Confirmed.
Flags: needinfo?(dmajor)
Oh fun. Another event-regions scalability issue. I'll look into this.
Flags: needinfo?(bugmail) → needinfo?(botond)
Yep, we're building a maybe-hit region with 580,000 rects. It's basically another incarnation of bug 1363423, except that in this case the rects are all going into the same event-regions display item, so the performance problem manifests in nsDisplayLayerEventRegions::AddFrame(), rather than PaintedLayerData::AccumulateEventRegions().
Assignee: nobody → botond
Flags: needinfo?(botond)
The attached patch helps significantly for me. Daniel, could you confirm that it does for you too?
Flags: needinfo?(dholbert)
Yes, that patch helps significantly for me as well.
Flags: needinfo?(dholbert)
Comment on attachment 8873134 [details]
Bug 1365333 - Avoid quadratic performance in nsDisplayLayerEventRegions::AddFrame() when the maybe-hit region has many rects.

https://reviewboard.mozilla.org/r/144600/#review148522
Attachment #8873134 - Flags: review?(tnikkel) → review+
Pushed by bballo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/44c831aee2d0
Avoid quadratic performance in nsDisplayLayerEventRegions::AddFrame() when the maybe-hit region has many rects. r=tnikkel
https://hg.mozilla.org/mozilla-central/rev/44c831aee2d0
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Thanks Botond! The repro page works much better in today's nightly. I wonder if this is worth a backport?
(In reply to David Major [:dmajor] from comment #16)
> I wonder if this is worth a backport?

It seems a bit late in the cycle for a backport to 54, but we can try.
Comment on attachment 8873134 [details]
Bug 1365333 - Avoid quadratic performance in nsDisplayLayerEventRegions::AddFrame() when the maybe-hit region has many rects.

Approval Request Comment
[Feature/Bug causing the regression]:
  APZ (first enabled in 48 for most platforms, or 52 in
  some configurations)

[User impact if declined]:
  Some pages containing complex SVG images, such as the
  LiveNation ticket finder page at issue in this bug, take
  an unreasonably long time (tens of seconds) to paint, 
  making them effectively unusable.

[Is this code covered by automated tests?]:
  This specific performance issue isn't, but the event-
  regions code being touched has pretty good coverage.

[Has the fix been verified in Nightly?]:
  Yes. The difference in performance when loading the
  affected page without the fix vs. with the fix is
  readily apparent.

[Needs manual test from QE? If yes, steps to reproduce]: 
  No.

[List of other uplifts needed for the feature/fix]:
  None.

[Is the change risky?]:
  No.

[Why is the change risky/not risky?]:
  The change is a one-line addition, and is essentially
  the same fix we made previously in bug 1363423, but in
  a different place in the code.

  A change of this nature - making the maybe-hit region 
  larger by simplifying it to contain fewer rects - should not
  cause correctness regressions. In theory, the maybe-hit 
  region being larger than it needs to be means that during 
  event handling, for some events we can potentially take a 
  slower main-thread code path over a faster async code path, 
  but this theoretical disadvantage is far outweighed by the 
  very concrete advantage of avoiding performance that's
  quadratic in the number of elements of a large SVG.

[String changes made/needed]:
  None.
Attachment #8873134 - Flags: approval-mozilla-beta?
Hi :kats & :jrmuizel,
I might need your second opinion for this severe perf issue. Since we've already built the 54 RC build today, what do you think if we have to take this in 54 and if this is a safe/low-risk fix?
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(bugmail)
I'm in favour of uplifting, the fix should be pretty safe.
Flags: needinfo?(bugmail)
Track 54+/55+ as perf regression.
This should be pretty safe.
Flags: needinfo?(jmuizelaar)
Comment on attachment 8873134 [details]
Bug 1365333 - Avoid quadratic performance in nsDisplayLayerEventRegions::AddFrame() when the maybe-hit region has many rects.

Fix a perf issue and was verified. Let's take it in 54 RC2.
Attachment #8873134 - Flags: approval-mozilla-release+
Attachment #8873134 - Flags: approval-mozilla-beta?
Attachment #8873134 - Flags: approval-mozilla-beta+
(In reply to Botond Ballo [:botond] from comment #18)
> [Is this code covered by automated tests?]:
>   This specific performance issue isn't, but the event-
>   regions code being touched has pretty good coverage.
> 
> [Has the fix been verified in Nightly?]:
>   Yes. The difference in performance when loading the
>   affected page without the fix vs. with the fix is
>   readily apparent.
> 
> [Needs manual test from QE? If yes, steps to reproduce]: 
>   No.

Setting qe-verify- based on Botond's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
Comment on attachment 8873134 [details]
Bug 1365333 - Avoid quadratic performance in nsDisplayLayerEventRegions::AddFrame() when the maybe-hit region has many rects.

This is a pretty ugly perf issue on a major site. See comment 18 for uplift justification.
Attachment #8873134 - Flags: approval-mozilla-esr52?
Comment on attachment 8873134 [details]
Bug 1365333 - Avoid quadratic performance in nsDisplayLayerEventRegions::AddFrame() when the maybe-hit region has many rects.

Fix a perf issue. Let's uplift to ESR52.3.
Attachment #8873134 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
You need to log in before you can comment on or make changes to this bug.