Closed Bug 1398697 Opened 7 years ago Closed 2 years ago

SVG sprite sheets can peg the main thread at some zoom-levels, when the tab is focused (originally at web.whatsapp.com)

Categories

(Core :: Graphics, defect, P2)

x86_64
All
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr52 --- affected
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 + wontfix
firefox58 --- affected

People

(Reporter: padenot, Assigned: jnicol)

References

()

Details

(Whiteboard: [gfx-noted])

Attachments

(1 file)

Up to date nightly on OSX 10.12.6 on a 2016 15" Macbook Pro.

STR:
- open `web.whatsapp.com`, associate your phone with the QR code or something
- start typing in a message

Expected:
- Text entry is instant

Actual:
- characters are inserted in the text box with a 2-3s delay, letters are sometimes not in the correct order (I was typing very carefuly a slowly to make sur it was not a user error).

Main thread for this process is pegged to 100% CPU when tab is active. I have captured a gecko profiler and an instruments profile. Both would tend to point towards an SVG-related issue when painting a CSS background.

Relevant profile [0], `web.whatsapp.com` was in content process 3. This short profile shows the following:
- Switching to the `web.whatsapp.com`, that is already logged in
- Focusing the message box at the bottom
- Typing in "asd asd asd asd"

[0]: https://perfht.ml/2vZLu4c. I've saved the instruments profile, available on request.
tracking? 57 for obvious reasons. I'm able to repro consistently on OSX, and sometimes Linux. I don't use windows often enough these day to know if it also happens there as well or not.

Restarting the browser does not make the issue go away.
Jamie, Bas, any idea about this one?
Flags: needinfo?(jnicol)
Flags: needinfo?(bas)
This is us mostly repainting vector images: https://perfht.ml/2jhW54D. I expect we or what's app need to be doing something to avoid this happening. If you turn on paint flashing it should give a better picture of what's going on.
Flags: needinfo?(padenot)
Could be some bad layerisation. The output with layout.display-list.dump enabled would be very useful.

As would be a screencast with nglayout.debug.paint_flashing enabled, if you have the time. So we can see what it is that is being repainted constantly.

FWIW I cannot reproduce on my Linux laptop, but I will try on my macbook when I get home tonight.
Flags: needinfo?(jnicol)
I remember a bunch of people in the flow meeting talking about WhatsApp perf issues.
Flags: needinfo?(bas)
I have tested on my macbook (2013, 13", 10.12.6), and I can't reproduce the poor performance when at rest or typing in to the message box.

I still have terrible performance when switching conversations and when moving the mouse around, which is bug 1328933, but this seems unrelated.

Perhaps there is some content in your messages which does not exist in mine.

Is this a regression, and would it be possible to use mozregression to find what caused it?
I can't repro without my profile. I could however make a screencast, it's very sad.
Flags: needinfo?(padenot)
Screencast: http://paul.cx/public/screencast-whatsapp-paint-flashing.mp4

We see it takes some time to load the page. I also start typing at the 0:34 mark, the character appears in the text box around 0:40.
Flags: needinfo?(jnicol)
It seems like all your icons, which are presumably svgs, are continually being repainted.

This does not happen for me. Mozregression has options for using an existing profile, so it might be possible to find a regression range using that.

Another log but with nglayout.debug.invalidation=true as well as layout.display-list.dump would be useful. (This might need to be a debug build, or built with MOZ_DUMP_PAINTING defined.)
Flags: needinfo?(jnicol)
This seem to be caused by setting a non-100% zoom level on the page. I can repro with 110% and jamie can repro with 133%.
Assignee: nobody → jnicol
This was quite fun to work out.

Whatsapp's icons are all in a large svg atlas, and it renders different subregions for each icon. So when we go to paint the first icon, we render the whole svg at the correct size and stick it in the surface cache. Then we go to paint the second icon. Because we are zoomed this happens to be at a different subpixel offset, so when the dest rect is snapped it happens to be 1 pixel larger or smaller than for the first icon. So we now render the svg at this second size. Unfortunately there isn't any room left in the cache, so we evict the first one. Then we go to paint the third icon, whose subpixel offset means the dest rect is snapped to the same size as for the first icon. We get a cache miss for this surface size so we have to render the entire svg again, and evict the other sized surface because the cache is full. This repeats forever.

The guilty code is in nsLayoutUtils.cpp ComputeSnappedImageDrawingParameters() and gfxContext::UserToDevicePixelSnapped().
In UserToDevicePixelSnapped, rather than rounding the topleft and bottomright of the rect, I think we need to round the topleft and the size. Haven't quite figured that out yet.
Doing what I proposed in comment 12 is wrong. Layout deliberately rounds to the nearest pixels rather than keeping size constant. This would break a lot.

An alternative approach might be for VectorImage::Draw to check the SurfaceCache for surfaces +- 1px (perhaps only when greater than a certain size). Andrew, does that sound reasonable to you?
Flags: needinfo?(aosmond)
This is not a (recent) regression in Firefox but a change to whatsapp.com, based on the comments?
Priority: -- → P2
Whiteboard: [gfx-noted]
See Also: → 1376536
Here's a reduced test case: https://jrmuizel.github.io/implementation-tests/svg-sprite-sheet.html

If you zoom in on that to 120% with paint flashing on you get really slow painting and you can see that we paint the svg sprite sheet multiple times. We should stop doing so.

I can't think of a reason that we shouldn't be able to reuse the same image on all of the lines.

Jamie, I'm not convinced that comment 12 is wrong. Can you elaborate? I don't special casing +-1 should be necessary.
Layout snaps things to the nearest pixel. So when a rect is (x=0.3 w=9.1) it gets snapped to (l=0 r=9), but (x=0.4 w=9.1) it gets snapped to (l=0 r=10). Speaking to Bas he said this maybe isn't a spec, but seems to be a very widespread rule.

Here was the try push when I tried that: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0fd570147bc561d02cf27ce4b5355de050b8f832

It might be possible to change this behaviour throughout gecko to fix the reftests, but I'm not sure if that's a good idea.

Because of how we snapped the rects, in some cases whatsapp wants to draw a (eg) 40x40 icon from a 80000x80000 atlas, and other times wants to draw a 41x40 icon from a 80001x80000 atlas. I figure in the latter case drawing from the 80000x80000 atlas might be good enough, but we want to limit this to the surface lookup rather than changing the size of things drawn to the screen.

This could also perhaps be fixed in a different way - by not caching the entire svg but only subregions of it. And perhaps only in cases where the entire image is much larger than the area being drawn.
Where are the different scales coming from in the test case that I posted?
Flags: needinfo?(jnicol)
I've updated the test case to actually show the issue. OptimalImageSizeForDest gets called with 1034.0 1049.0 and 1033.0 1049.0
This suggests that the problem is in ComputeSnappedImageDrawingParameters. It's not clear to me yet how it should be fixed.
If we change how rects are snapped then things will be the wrong sizes. And if we rasterize the svg at a different size than the snapped dest rect then things will look blurry. For small images rasterizing the svg at couple of different sizes isn't a huge issue in terms of memory or time, and in fact is visually correct. This is only an issue due to the massive size.

Potential solutions I see are therefore:
* Enforce a maximum size in VectorImage::OptimalImageSizeForDest()
* Above a certain size (so as to not make most content blurry) clamp VectorImage::OptimalImageSizeForDest() to e.g. the previous power of two.
* Rasterize only the fill rect and store it in the cache (perhaps only when fill rect is much smaller than dest rect)


The latter would presumably be a fairly significant change to how the SurfaceCache works, so I'd lean towards either or both of the first two.
Flags: needinfo?(jnicol)
(In reply to Jamie Nicol [:jnicol] from comment #20)
> If we change how rects are snapped then things will be the wrong sizes. And
> if we rasterize the svg at a different size than the snapped dest rect then
> things will look blurry. For small images rasterizing the svg at couple of
> different sizes isn't a huge issue in terms of memory or time, and in fact
> is visually correct. This is only an issue due to the massive size

What if we change the area of the image that's sampled? Will that let us have nice things?
Here's a test case that shows us not leaking the background while other browsers do:
https://jrmuizel.github.io/implementation-tests/background-leaking.html

If we choose to also leak we can have nice things.
Flags: needinfo?(aosmond)
This patch isn't right. Here's the try run here - https://treeherder.mozilla.org/#/jobs?repo=try&revision=b07e6f1a2c05e01f2841294c828eff867ff948f9

The R3 and R5 failures are from leaking, and chrome fails them too. But the R1 failures are because we're still downscaling.

It uses the unsnapped dest rect to calculate the optimal image size, which I'm pretty sure is what we want. Then I tried to use the snapped fill and dest rects to calculate the sample subimage so that we do not scale when rendering, but that isn't correct.

As Jeff said on irc the correct fix probably requires simplifying that function a fair bit, rather than the tinkering I was trying to do there.
Too late for a patch in 57, but we could still potentially fix this in 58.
Oops, this sort of fell off my radar. I haven't yet been able to come up with a general-case clamping algorithm that does what we want. For what is's worth, however, the problem has now been fixed on whatsapp's end.
Given comment 25, it sounds like the summary has fallen out of sync with what this bug is actually about.
--> Taking a stab at an clarified summary to reflect reality; please correct if my update is off the mark.
Summary: web.whatsapp.com pegging the main thread when the tab is focused. → SVG sprite sheets can peg the main thread at some zoom-levels, when the tab is focused (originally at web.whatsapp.com)

Hi Jamie!
I'm trying to reproduce old bugs to see if we can resolve some. I was unable to reproduce it on latest Nightly 96.01 (2021-11-02)(64-bit) on macOS 10.15.
Since this bug is still active, is there any ongoing investigation about this?
Thanks!

Flags: needinfo?(jnicol)

The bug was resolved on whatsapp's end years ago. And we have bug 1673653 which has ongoing work to solve the root of this problem. So let's close this.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(jnicol)
Resolution: --- → FIXED
See Also: → deferrable-blobs
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: