web.whatsapp.com pegging the main thread when the tab is focused.

Assigned to



6 months ago
5 months ago


(Reporter: padenot, Assigned: jnicol)



Firefox Tracking Flags

(firefox-esr52 affected, firefox55 wontfix, firefox56 wontfix, firefox57+ wontfix, firefox58 affected)


(Whiteboard: [gfx-noted], URL)


(1 attachment)



6 months ago
Up to date nightly on OSX 10.12.6 on a 2016 15" Macbook Pro.

- open `web.whatsapp.com`, associate your phone with the QR code or something
- start typing in a message

- Text entry is instant

- characters are inserted in the text box with a 2-3s delay, letters are sometimes not in the correct order (I was typing very carefuly a slowly to make sur it was not a user error).

Main thread for this process is pegged to 100% CPU when tab is active. I have captured a gecko profiler and an instruments profile. Both would tend to point towards an SVG-related issue when painting a CSS background.

Relevant profile [0], `web.whatsapp.com` was in content process 3. This short profile shows the following:
- Switching to the `web.whatsapp.com`, that is already logged in
- Focusing the message box at the bottom
- Typing in "asd asd asd asd"

[0]: https://perfht.ml/2vZLu4c. I've saved the instruments profile, available on request.

Comment 1

6 months ago
tracking? 57 for obvious reasons. I'm able to repro consistently on OSX, and sometimes Linux. I don't use windows often enough these day to know if it also happens there as well or not.

Restarting the browser does not make the issue go away.
tracking-firefox57: --- → ?

Comment 2

6 months ago
Jamie, Bas, any idea about this one?
Flags: needinfo?(jnicol)
Flags: needinfo?(bas)
This is us mostly repainting vector images: https://perfht.ml/2jhW54D. I expect we or what's app need to be doing something to avoid this happening. If you turn on paint flashing it should give a better picture of what's going on.
Flags: needinfo?(padenot)

Comment 4

6 months ago
Could be some bad layerisation. The output with layout.display-list.dump enabled would be very useful.

As would be a screencast with nglayout.debug.paint_flashing enabled, if you have the time. So we can see what it is that is being repainted constantly.

FWIW I cannot reproduce on my Linux laptop, but I will try on my macbook when I get home tonight.
Flags: needinfo?(jnicol)
I remember a bunch of people in the flow meeting talking about WhatsApp perf issues.
Flags: needinfo?(bas)

Comment 6

6 months ago
I have tested on my macbook (2013, 13", 10.12.6), and I can't reproduce the poor performance when at rest or typing in to the message box.

I still have terrible performance when switching conversations and when moving the mouse around, which is bug 1328933, but this seems unrelated.

Perhaps there is some content in your messages which does not exist in mine.

Is this a regression, and would it be possible to use mozregression to find what caused it?

Comment 7

6 months ago
I can't repro without my profile. I could however make a screencast, it's very sad.
Flags: needinfo?(padenot)

Comment 8

6 months ago
Screencast: http://paul.cx/public/screencast-whatsapp-paint-flashing.mp4

We see it takes some time to load the page. I also start typing at the 0:34 mark, the character appears in the text box around 0:40.


6 months ago
Flags: needinfo?(jnicol)

Comment 9

6 months ago
It seems like all your icons, which are presumably svgs, are continually being repainted.

This does not happen for me. Mozregression has options for using an existing profile, so it might be possible to find a regression range using that.

Another log but with nglayout.debug.invalidation=true as well as layout.display-list.dump would be useful. (This might need to be a debug build, or built with MOZ_DUMP_PAINTING defined.)
Flags: needinfo?(jnicol)

Comment 10

6 months ago
This seem to be caused by setting a non-100% zoom level on the page. I can repro with 110% and jamie can repro with 133%.
tracking for 57
tracking-firefox57: ? → +


6 months ago
Assignee: nobody → jnicol

Comment 12

6 months ago
This was quite fun to work out.

Whatsapp's icons are all in a large svg atlas, and it renders different subregions for each icon. So when we go to paint the first icon, we render the whole svg at the correct size and stick it in the surface cache. Then we go to paint the second icon. Because we are zoomed this happens to be at a different subpixel offset, so when the dest rect is snapped it happens to be 1 pixel larger or smaller than for the first icon. So we now render the svg at this second size. Unfortunately there isn't any room left in the cache, so we evict the first one. Then we go to paint the third icon, whose subpixel offset means the dest rect is snapped to the same size as for the first icon. We get a cache miss for this surface size so we have to render the entire svg again, and evict the other sized surface because the cache is full. This repeats forever.

The guilty code is in nsLayoutUtils.cpp ComputeSnappedImageDrawingParameters() and gfxContext::UserToDevicePixelSnapped().
In UserToDevicePixelSnapped, rather than rounding the topleft and bottomright of the rect, I think we need to round the topleft and the size. Haven't quite figured that out yet.

Comment 13

6 months ago
Doing what I proposed in comment 12 is wrong. Layout deliberately rounds to the nearest pixels rather than keeping size constant. This would break a lot.

An alternative approach might be for VectorImage::Draw to check the SurfaceCache for surfaces +- 1px (perhaps only when greater than a certain size). Andrew, does that sound reasonable to you?
Flags: needinfo?(aosmond)
This is not a (recent) regression in Firefox but a change to whatsapp.com, based on the comments?
status-firefox55: --- → fix-optional
status-firefox56: --- → affected
status-firefox-esr52: --- → affected
Priority: -- → P2
Whiteboard: [gfx-noted]
Here's a reduced test case: https://jrmuizel.github.io/implementation-tests/svg-sprite-sheet.html

If you zoom in on that to 120% with paint flashing on you get really slow painting and you can see that we paint the svg sprite sheet multiple times. We should stop doing so.

I can't think of a reason that we shouldn't be able to reuse the same image on all of the lines.

Jamie, I'm not convinced that comment 12 is wrong. Can you elaborate? I don't special casing +-1 should be necessary.

Comment 16

6 months ago
Layout snaps things to the nearest pixel. So when a rect is (x=0.3 w=9.1) it gets snapped to (l=0 r=9), but (x=0.4 w=9.1) it gets snapped to (l=0 r=10). Speaking to Bas he said this maybe isn't a spec, but seems to be a very widespread rule.

Here was the try push when I tried that: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0fd570147bc561d02cf27ce4b5355de050b8f832

It might be possible to change this behaviour throughout gecko to fix the reftests, but I'm not sure if that's a good idea.

Because of how we snapped the rects, in some cases whatsapp wants to draw a (eg) 40x40 icon from a 80000x80000 atlas, and other times wants to draw a 41x40 icon from a 80001x80000 atlas. I figure in the latter case drawing from the 80000x80000 atlas might be good enough, but we want to limit this to the surface lookup rather than changing the size of things drawn to the screen.

This could also perhaps be fixed in a different way - by not caching the entire svg but only subregions of it. And perhaps only in cases where the entire image is much larger than the area being drawn.
Where are the different scales coming from in the test case that I posted?
Flags: needinfo?(jnicol)
I've updated the test case to actually show the issue. OptimalImageSizeForDest gets called with 1034.0 1049.0 and 1033.0 1049.0
This suggests that the problem is in ComputeSnappedImageDrawingParameters. It's not clear to me yet how it should be fixed.
Blocks: 1396479

Comment 20

6 months ago
If we change how rects are snapped then things will be the wrong sizes. And if we rasterize the svg at a different size than the snapped dest rect then things will look blurry. For small images rasterizing the svg at couple of different sizes isn't a huge issue in terms of memory or time, and in fact is visually correct. This is only an issue due to the massive size.

Potential solutions I see are therefore:
* Enforce a maximum size in VectorImage::OptimalImageSizeForDest()
* Above a certain size (so as to not make most content blurry) clamp VectorImage::OptimalImageSizeForDest() to e.g. the previous power of two.
* Rasterize only the fill rect and store it in the cache (perhaps only when fill rect is much smaller than dest rect)

The latter would presumably be a fairly significant change to how the SurfaceCache works, so I'd lean towards either or both of the first two.
Flags: needinfo?(jnicol)
(In reply to Jamie Nicol [:jnicol] from comment #20)
> If we change how rects are snapped then things will be the wrong sizes. And
> if we rasterize the svg at a different size than the snapped dest rect then
> things will look blurry. For small images rasterizing the svg at couple of
> different sizes isn't a huge issue in terms of memory or time, and in fact
> is visually correct. This is only an issue due to the massive size

What if we change the area of the image that's sampled? Will that let us have nice things?
Here's a test case that shows us not leaking the background while other browsers do:

If we choose to also leak we can have nice things.
Flags: needinfo?(aosmond)

Comment 23

6 months ago
Created attachment 8913214 [details] [diff] [review]

This patch isn't right. Here's the try run here - https://treeherder.mozilla.org/#/jobs?repo=try&revision=b07e6f1a2c05e01f2841294c828eff867ff948f9

The R3 and R5 failures are from leaking, and chrome fails them too. But the R1 failures are because we're still downscaling.

It uses the unsnapped dest rect to calculate the optimal image size, which I'm pretty sure is what we want. Then I tried to use the snapped fill and dest rects to calculate the sample subimage so that we do not scale when rendering, but that isn't correct.

As Jeff said on irc the correct fix probably requires simplifying that function a fair bit, rather than the tinkering I was trying to do there.
Too late for a patch in 57, but we could still potentially fix this in 58.
status-firefox55: fix-optional → wontfix
status-firefox56: affected → wontfix
status-firefox57: affected → wontfix
status-firefox58: --- → affected

Comment 25

5 months ago
Oops, this sort of fell off my radar. I haven't yet been able to come up with a general-case clamping algorithm that does what we want. For what is's worth, however, the problem has now been fixed on whatsapp's end.
You need to log in before you can comment on or make changes to this bug.