1398697 - SVG sprite sheets can peg the main thread at some zoom-levels, when the tab is focused (originally at web.whatsapp.com)

Reporter

Description

•

7 years ago

Up to date nightly on OSX 10.12.6 on a 2016 15" Macbook Pro.

STR:
- open `web.whatsapp.com`, associate your phone with the QR code or something
- start typing in a message

Expected:
- Text entry is instant

Actual:
- characters are inserted in the text box with a 2-3s delay, letters are sometimes not in the correct order (I was typing very carefuly a slowly to make sur it was not a user error).

Main thread for this process is pegged to 100% CPU when tab is active. I have captured a gecko profiler and an instruments profile. Both would tend to point towards an SVG-related issue when painting a CSS background.

Relevant profile [0], `web.whatsapp.com` was in content process 3. This short profile shows the following:
- Switching to the `web.whatsapp.com`, that is already logged in
- Focusing the message box at the bottom
- Typing in "asd asd asd asd"

[0]: https://perfht.ml/2vZLu4c. I've saved the instruments profile, available on request.

Paul Adenot (:padenot)

Reporter

Comment 1

•

7 years ago

tracking? 57 for obvious reasons. I'm able to repro consistently on OSX, and sometimes Linux. I don't use windows often enough these day to know if it also happens there as well or not.

Restarting the browser does not make the issue go away.

tracking-firefox57: --- → ?

Paul Adenot (:padenot)

Reporter

Comment 2

•

7 years ago

Jamie, Bas, any idea about this one?

Flags: needinfo?(jnicol)

Flags: needinfo?(bas)

Jeff Muizelaar [:jrmuizel]

Comment 3

•

7 years ago

This is us mostly repainting vector images: https://perfht.ml/2jhW54D. I expect we or what's app need to be doing something to avoid this happening. If you turn on paint flashing it should give a better picture of what's going on.

Flags: needinfo?(padenot)

Jamie Nicol [:jnicol]

Assignee

Comment 4

•

7 years ago

Could be some bad layerisation. The output with layout.display-list.dump enabled would be very useful.

As would be a screencast with nglayout.debug.paint_flashing enabled, if you have the time. So we can see what it is that is being repainted constantly.

FWIW I cannot reproduce on my Linux laptop, but I will try on my macbook when I get home tonight.

Flags: needinfo?(jnicol)

Bas Schouten (:bas.schouten)

Comment 5

•

7 years ago

I remember a bunch of people in the flow meeting talking about WhatsApp perf issues.

Flags: needinfo?(bas)

Jamie Nicol [:jnicol]

Assignee

Comment 6

•

7 years ago

I have tested on my macbook (2013, 13", 10.12.6), and I can't reproduce the poor performance when at rest or typing in to the message box.

I still have terrible performance when switching conversations and when moving the mouse around, which is bug 1328933, but this seems unrelated.

Perhaps there is some content in your messages which does not exist in mine.

Is this a regression, and would it be possible to use mozregression to find what caused it?

Paul Adenot (:padenot)

Reporter

Comment 7

•

7 years ago

I can't repro without my profile. I could however make a screencast, it's very sad.

Flags: needinfo?(padenot)

Paul Adenot (:padenot)

Reporter

Comment 8

•

7 years ago

Screencast: http://paul.cx/public/screencast-whatsapp-paint-flashing.mp4

We see it takes some time to load the page. I also start typing at the 0:34 mark, the character appears in the text box around 0:40.

Paul Adenot (:padenot)

Reporter

Updated

•

7 years ago

Flags: needinfo?(jnicol)

Jamie Nicol [:jnicol]

Assignee

Comment 9

•

7 years ago

It seems like all your icons, which are presumably svgs, are continually being repainted.

This does not happen for me. Mozregression has options for using an existing profile, so it might be possible to find a regression range using that.

Another log but with nglayout.debug.invalidation=true as well as layout.display-list.dump would be useful. (This might need to be a debug build, or built with MOZ_DUMP_PAINTING defined.)

Flags: needinfo?(jnicol)

Paul Adenot (:padenot)

Reporter

Comment 10

•

7 years ago

This seem to be caused by setting a non-100% zoom level on the page. I can repro with 110% and jamie can repro with 133%.

Julien Cristau [:jcristau] (back April 22)

Comment 11

•

7 years ago

tracking for 57

tracking-firefox57: ? → +

Jamie Nicol [:jnicol]

Assignee

Updated

•

7 years ago

Assignee: nobody → jnicol

Jamie Nicol [:jnicol]

Assignee

Comment 12

•

7 years ago

This was quite fun to work out.

Whatsapp's icons are all in a large svg atlas, and it renders different subregions for each icon. So when we go to paint the first icon, we render the whole svg at the correct size and stick it in the surface cache. Then we go to paint the second icon. Because we are zoomed this happens to be at a different subpixel offset, so when the dest rect is snapped it happens to be 1 pixel larger or smaller than for the first icon. So we now render the svg at this second size. Unfortunately there isn't any room left in the cache, so we evict the first one. Then we go to paint the third icon, whose subpixel offset means the dest rect is snapped to the same size as for the first icon. We get a cache miss for this surface size so we have to render the entire svg again, and evict the other sized surface because the cache is full. This repeats forever.

The guilty code is in nsLayoutUtils.cpp ComputeSnappedImageDrawingParameters() and gfxContext::UserToDevicePixelSnapped().
In UserToDevicePixelSnapped, rather than rounding the topleft and bottomright of the rect, I think we need to round the topleft and the size. Haven't quite figured that out yet.

Jamie Nicol [:jnicol]

Assignee

Comment 13

•

7 years ago

Doing what I proposed in comment 12 is wrong. Layout deliberately rounds to the nearest pixels rather than keeping size constant. This would break a lot.

An alternative approach might be for VectorImage::Draw to check the SurfaceCache for surfaces +- 1px (perhaps only when greater than a certain size). Andrew, does that sound reasonable to you?

Flags: needinfo?(aosmond)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 14

•

7 years ago

This is not a (recent) regression in Firefox but a change to whatsapp.com, based on the comments?

status-firefox55: --- → fix-optional

status-firefox56: --- → affected

status-firefox-esr52: --- → affected

Priority: -- → P2

Whiteboard: [gfx-noted]

Andrew Overholt [:overholt]

Updated

•

7 years ago

Comment 15

•

7 years ago

Here's a reduced test case: https://jrmuizel.github.io/implementation-tests/svg-sprite-sheet.html

If you zoom in on that to 120% with paint flashing on you get really slow painting and you can see that we paint the svg sprite sheet multiple times. We should stop doing so.

I can't think of a reason that we shouldn't be able to reuse the same image on all of the lines.

Jamie, I'm not convinced that comment 12 is wrong. Can you elaborate? I don't special casing +-1 should be necessary.

Jamie Nicol [:jnicol]

Assignee

Comment 16

•

7 years ago

Layout snaps things to the nearest pixel. So when a rect is (x=0.3 w=9.1) it gets snapped to (l=0 r=9), but (x=0.4 w=9.1) it gets snapped to (l=0 r=10). Speaking to Bas he said this maybe isn't a spec, but seems to be a very widespread rule.

Here was the try push when I tried that: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0fd570147bc561d02cf27ce4b5355de050b8f832

It might be possible to change this behaviour throughout gecko to fix the reftests, but I'm not sure if that's a good idea.

Because of how we snapped the rects, in some cases whatsapp wants to draw a (eg) 40x40 icon from a 80000x80000 atlas, and other times wants to draw a 41x40 icon from a 80001x80000 atlas. I figure in the latter case drawing from the 80000x80000 atlas might be good enough, but we want to limit this to the surface lookup rather than changing the size of things drawn to the screen.

This could also perhaps be fixed in a different way - by not caching the entire svg but only subregions of it. And perhaps only in cases where the entire image is much larger than the area being drawn.

Jeff Muizelaar [:jrmuizel]

Comment 17

•

7 years ago

Where are the different scales coming from in the test case that I posted?

Flags: needinfo?(jnicol)

Jeff Muizelaar [:jrmuizel]

Comment 18

•

7 years ago

I've updated the test case to actually show the issue. OptimalImageSizeForDest gets called with 1034.0 1049.0 and 1033.0 1049.0

Jeff Muizelaar [:jrmuizel]

Comment 19

•

7 years ago

This suggests that the problem is in ComputeSnappedImageDrawingParameters. It's not clear to me yet how it should be fixed.

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

7 years ago

Blocks: 1396479

Mike Taylor [:miketaylr]

Updated

•

7 years ago

URL: https://webcompat.com/issues/9848

Jamie Nicol [:jnicol]

Assignee

Comment 20

•

7 years ago

If we change how rects are snapped then things will be the wrong sizes. And if we rasterize the svg at a different size than the snapped dest rect then things will look blurry. For small images rasterizing the svg at couple of different sizes isn't a huge issue in terms of memory or time, and in fact is visually correct. This is only an issue due to the massive size.

Potential solutions I see are therefore:
* Enforce a maximum size in VectorImage::OptimalImageSizeForDest()
* Above a certain size (so as to not make most content blurry) clamp VectorImage::OptimalImageSizeForDest() to e.g. the previous power of two.
* Rasterize only the fill rect and store it in the cache (perhaps only when fill rect is much smaller than dest rect)


The latter would presumably be a fairly significant change to how the SurfaceCache works, so I'd lean towards either or both of the first two.

Flags: needinfo?(jnicol)

Jeff Muizelaar [:jrmuizel]

Comment 21

•

7 years ago

(In reply to Jamie Nicol [:jnicol] from comment #20)
> If we change how rects are snapped then things will be the wrong sizes. And
> if we rasterize the svg at a different size than the snapped dest rect then
> things will look blurry. For small images rasterizing the svg at couple of
> different sizes isn't a huge issue in terms of memory or time, and in fact
> is visually correct. This is only an issue due to the massive size

What if we change the area of the image that's sampled? Will that let us have nice things?

Jeff Muizelaar [:jrmuizel]

Comment 22

•

7 years ago

Here's a test case that shows us not leaking the background while other browsers do:
https://jrmuizel.github.io/implementation-tests/background-leaking.html

If we choose to also leak we can have nice things.

Andrew Osmond [:aosmond] (he/him)

Updated

•

7 years ago

Flags: needinfo?(aosmond)

Jamie Nicol [:jnicol]

Assignee

Comment 23

•

7 years ago

Attached patch leak-background.patch — Details — Splinter Review

This patch isn't right. Here's the try run here - https://treeherder.mozilla.org/#/jobs?repo=try&revision=b07e6f1a2c05e01f2841294c828eff867ff948f9

The R3 and R5 failures are from leaking, and chrome fails them too. But the R1 failures are because we're still downscaling.

It uses the unsnapped dest rect to calculate the optimal image size, which I'm pretty sure is what we want. Then I tried to use the snapped fill and dest rects to calculate the sample subimage so that we do not scale when rendering, but that isn't correct.

As Jeff said on irc the correct fix probably requires simplifying that function a fair bit, rather than the tinkering I was trying to do there.

Liz Henry (:lizzard) (relman/hg->git project)

Comment 24

•

7 years ago

Too late for a patch in 57, but we could still potentially fix this in 58.

status-firefox55: fix-optional → wontfix

status-firefox56: affected → wontfix

status-firefox57: affected → wontfix

status-firefox58: --- → affected

Jamie Nicol [:jnicol]

Assignee

Comment 25

•

7 years ago

Oops, this sort of fell off my radar. I haven't yet been able to come up with a general-case clamping algorithm that does what we want. For what is's worth, however, the problem has now been fixed on whatsapp's end.

Daniel Holbert [:dholbert]

Comment 26

•

6 years ago

Given comment 25, it sounds like the summary has fallen out of sync with what this bug is actually about.
--> Taking a stab at an clarified summary to reflect reality; please correct if my update is off the mark.

Summary: web.whatsapp.com pegging the main thread when the tab is focused. → SVG sprite sheets can peg the main thread at some zoom-levels, when the tab is focused (originally at web.whatsapp.com)

Marcela (Please NI request to Brindusa Tot)

Comment 27

•

3 years ago

Hi Jamie!
I'm trying to reproduce old bugs to see if we can resolve some. I was unable to reproduce it on latest Nightly 96.01 (2021-11-02)(64-bit) on macOS 10.15.
Since this bug is still active, is there any ongoing investigation about this?
Thanks!

Flags: needinfo?(jnicol)

Jamie Nicol [:jnicol]

Assignee

Comment 28

•

2 years ago

The bug was resolved on whatsapp's end years ago. And we have bug 1673653 which has ongoing work to solve the root of this problem. So let's close this.

Status: NEW → RESOLVED

Closed: 2 years ago

Flags: needinfo?(jnicol)

Resolution: --- → FIXED