Closed
Bug 1398697
Opened 7 years ago
Closed 2 years ago
SVG sprite sheets can peg the main thread at some zoom-levels, when the tab is focused (originally at web.whatsapp.com)
Categories
(Core :: Graphics, defect, P2)
Tracking
()
People
(Reporter: padenot, Assigned: jnicol)
References
()
Details
(Whiteboard: [gfx-noted])
Attachments
(1 file)
2.58 KB,
patch
|
Details | Diff | Splinter Review |
Up to date nightly on OSX 10.12.6 on a 2016 15" Macbook Pro. STR: - open `web.whatsapp.com`, associate your phone with the QR code or something - start typing in a message Expected: - Text entry is instant Actual: - characters are inserted in the text box with a 2-3s delay, letters are sometimes not in the correct order (I was typing very carefuly a slowly to make sur it was not a user error). Main thread for this process is pegged to 100% CPU when tab is active. I have captured a gecko profiler and an instruments profile. Both would tend to point towards an SVG-related issue when painting a CSS background. Relevant profile [0], `web.whatsapp.com` was in content process 3. This short profile shows the following: - Switching to the `web.whatsapp.com`, that is already logged in - Focusing the message box at the bottom - Typing in "asd asd asd asd" [0]: https://perfht.ml/2vZLu4c. I've saved the instruments profile, available on request.
Reporter | ||
Comment 1•7 years ago
|
||
tracking? 57 for obvious reasons. I'm able to repro consistently on OSX, and sometimes Linux. I don't use windows often enough these day to know if it also happens there as well or not. Restarting the browser does not make the issue go away.
tracking-firefox57:
--- → ?
Reporter | ||
Comment 2•7 years ago
|
||
Jamie, Bas, any idea about this one?
Flags: needinfo?(jnicol)
Flags: needinfo?(bas)
Comment 3•7 years ago
|
||
This is us mostly repainting vector images: https://perfht.ml/2jhW54D. I expect we or what's app need to be doing something to avoid this happening. If you turn on paint flashing it should give a better picture of what's going on.
Flags: needinfo?(padenot)
Assignee | ||
Comment 4•7 years ago
|
||
Could be some bad layerisation. The output with layout.display-list.dump enabled would be very useful. As would be a screencast with nglayout.debug.paint_flashing enabled, if you have the time. So we can see what it is that is being repainted constantly. FWIW I cannot reproduce on my Linux laptop, but I will try on my macbook when I get home tonight.
Flags: needinfo?(jnicol)
Comment 5•7 years ago
|
||
I remember a bunch of people in the flow meeting talking about WhatsApp perf issues.
Flags: needinfo?(bas)
Assignee | ||
Comment 6•7 years ago
|
||
I have tested on my macbook (2013, 13", 10.12.6), and I can't reproduce the poor performance when at rest or typing in to the message box. I still have terrible performance when switching conversations and when moving the mouse around, which is bug 1328933, but this seems unrelated. Perhaps there is some content in your messages which does not exist in mine. Is this a regression, and would it be possible to use mozregression to find what caused it?
Reporter | ||
Comment 7•7 years ago
|
||
I can't repro without my profile. I could however make a screencast, it's very sad.
Flags: needinfo?(padenot)
Reporter | ||
Comment 8•7 years ago
|
||
Screencast: http://paul.cx/public/screencast-whatsapp-paint-flashing.mp4 We see it takes some time to load the page. I also start typing at the 0:34 mark, the character appears in the text box around 0:40.
Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(jnicol)
Assignee | ||
Comment 9•7 years ago
|
||
It seems like all your icons, which are presumably svgs, are continually being repainted. This does not happen for me. Mozregression has options for using an existing profile, so it might be possible to find a regression range using that. Another log but with nglayout.debug.invalidation=true as well as layout.display-list.dump would be useful. (This might need to be a debug build, or built with MOZ_DUMP_PAINTING defined.)
Flags: needinfo?(jnicol)
Reporter | ||
Comment 10•7 years ago
|
||
This seem to be caused by setting a non-100% zoom level on the page. I can repro with 110% and jamie can repro with 133%.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → jnicol
Assignee | ||
Comment 12•7 years ago
|
||
This was quite fun to work out. Whatsapp's icons are all in a large svg atlas, and it renders different subregions for each icon. So when we go to paint the first icon, we render the whole svg at the correct size and stick it in the surface cache. Then we go to paint the second icon. Because we are zoomed this happens to be at a different subpixel offset, so when the dest rect is snapped it happens to be 1 pixel larger or smaller than for the first icon. So we now render the svg at this second size. Unfortunately there isn't any room left in the cache, so we evict the first one. Then we go to paint the third icon, whose subpixel offset means the dest rect is snapped to the same size as for the first icon. We get a cache miss for this surface size so we have to render the entire svg again, and evict the other sized surface because the cache is full. This repeats forever. The guilty code is in nsLayoutUtils.cpp ComputeSnappedImageDrawingParameters() and gfxContext::UserToDevicePixelSnapped(). In UserToDevicePixelSnapped, rather than rounding the topleft and bottomright of the rect, I think we need to round the topleft and the size. Haven't quite figured that out yet.
Assignee | ||
Comment 13•7 years ago
|
||
Doing what I proposed in comment 12 is wrong. Layout deliberately rounds to the nearest pixels rather than keeping size constant. This would break a lot. An alternative approach might be for VectorImage::Draw to check the SurfaceCache for surfaces +- 1px (perhaps only when greater than a certain size). Andrew, does that sound reasonable to you?
Flags: needinfo?(aosmond)
This is not a (recent) regression in Firefox but a change to whatsapp.com, based on the comments?
status-firefox55:
--- → fix-optional
status-firefox56:
--- → affected
status-firefox-esr52:
--- → affected
Priority: -- → P2
Whiteboard: [gfx-noted]
Comment 15•7 years ago
|
||
Here's a reduced test case: https://jrmuizel.github.io/implementation-tests/svg-sprite-sheet.html If you zoom in on that to 120% with paint flashing on you get really slow painting and you can see that we paint the svg sprite sheet multiple times. We should stop doing so. I can't think of a reason that we shouldn't be able to reuse the same image on all of the lines. Jamie, I'm not convinced that comment 12 is wrong. Can you elaborate? I don't special casing +-1 should be necessary.
Assignee | ||
Comment 16•7 years ago
|
||
Layout snaps things to the nearest pixel. So when a rect is (x=0.3 w=9.1) it gets snapped to (l=0 r=9), but (x=0.4 w=9.1) it gets snapped to (l=0 r=10). Speaking to Bas he said this maybe isn't a spec, but seems to be a very widespread rule. Here was the try push when I tried that: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0fd570147bc561d02cf27ce4b5355de050b8f832 It might be possible to change this behaviour throughout gecko to fix the reftests, but I'm not sure if that's a good idea. Because of how we snapped the rects, in some cases whatsapp wants to draw a (eg) 40x40 icon from a 80000x80000 atlas, and other times wants to draw a 41x40 icon from a 80001x80000 atlas. I figure in the latter case drawing from the 80000x80000 atlas might be good enough, but we want to limit this to the surface lookup rather than changing the size of things drawn to the screen. This could also perhaps be fixed in a different way - by not caching the entire svg but only subregions of it. And perhaps only in cases where the entire image is much larger than the area being drawn.
Comment 17•7 years ago
|
||
Where are the different scales coming from in the test case that I posted?
Flags: needinfo?(jnicol)
Comment 18•7 years ago
|
||
I've updated the test case to actually show the issue. OptimalImageSizeForDest gets called with 1034.0 1049.0 and 1033.0 1049.0
Comment 19•7 years ago
|
||
This suggests that the problem is in ComputeSnappedImageDrawingParameters. It's not clear to me yet how it should be fixed.
Updated•7 years ago
|
Assignee | ||
Comment 20•7 years ago
|
||
If we change how rects are snapped then things will be the wrong sizes. And if we rasterize the svg at a different size than the snapped dest rect then things will look blurry. For small images rasterizing the svg at couple of different sizes isn't a huge issue in terms of memory or time, and in fact is visually correct. This is only an issue due to the massive size. Potential solutions I see are therefore: * Enforce a maximum size in VectorImage::OptimalImageSizeForDest() * Above a certain size (so as to not make most content blurry) clamp VectorImage::OptimalImageSizeForDest() to e.g. the previous power of two. * Rasterize only the fill rect and store it in the cache (perhaps only when fill rect is much smaller than dest rect) The latter would presumably be a fairly significant change to how the SurfaceCache works, so I'd lean towards either or both of the first two.
Flags: needinfo?(jnicol)
Comment 21•7 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #20) > If we change how rects are snapped then things will be the wrong sizes. And > if we rasterize the svg at a different size than the snapped dest rect then > things will look blurry. For small images rasterizing the svg at couple of > different sizes isn't a huge issue in terms of memory or time, and in fact > is visually correct. This is only an issue due to the massive size What if we change the area of the image that's sampled? Will that let us have nice things?
Comment 22•7 years ago
|
||
Here's a test case that shows us not leaking the background while other browsers do: https://jrmuizel.github.io/implementation-tests/background-leaking.html If we choose to also leak we can have nice things.
Updated•7 years ago
|
Flags: needinfo?(aosmond)
Assignee | ||
Comment 23•7 years ago
|
||
This patch isn't right. Here's the try run here - https://treeherder.mozilla.org/#/jobs?repo=try&revision=b07e6f1a2c05e01f2841294c828eff867ff948f9 The R3 and R5 failures are from leaking, and chrome fails them too. But the R1 failures are because we're still downscaling. It uses the unsnapped dest rect to calculate the optimal image size, which I'm pretty sure is what we want. Then I tried to use the snapped fill and dest rects to calculate the sample subimage so that we do not scale when rendering, but that isn't correct. As Jeff said on irc the correct fix probably requires simplifying that function a fair bit, rather than the tinkering I was trying to do there.
Comment 24•7 years ago
|
||
Too late for a patch in 57, but we could still potentially fix this in 58.
Assignee | ||
Comment 25•7 years ago
|
||
Oops, this sort of fell off my radar. I haven't yet been able to come up with a general-case clamping algorithm that does what we want. For what is's worth, however, the problem has now been fixed on whatsapp's end.
Comment 26•6 years ago
|
||
Given comment 25, it sounds like the summary has fallen out of sync with what this bug is actually about. --> Taking a stab at an clarified summary to reflect reality; please correct if my update is off the mark.
Summary: web.whatsapp.com pegging the main thread when the tab is focused. → SVG sprite sheets can peg the main thread at some zoom-levels, when the tab is focused (originally at web.whatsapp.com)
Comment 27•3 years ago
|
||
Hi Jamie!
I'm trying to reproduce old bugs to see if we can resolve some. I was unable to reproduce it on latest Nightly 96.01 (2021-11-02)(64-bit) on macOS 10.15.
Since this bug is still active, is there any ongoing investigation about this?
Thanks!
Flags: needinfo?(jnicol)
Assignee | ||
Comment 28•2 years ago
|
||
The bug was resolved on whatsapp's end years ago. And we have bug 1673653 which has ongoing work to solve the root of this problem. So let's close this.
Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(jnicol)
Resolution: --- → FIXED
See Also: → deferrable-blobs
You need to log in
before you can comment on or make changes to this bug.
Description
•