Closed Bug 1274991 Opened 4 years ago Closed 4 years ago

Glitches/high memory usage when using hardware acceleration to render HTML element with large scale

Categories

(Core :: Graphics: Layers, defect)

43 Branch
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla49
Tracking Status
firefox46 --- wontfix
firefox47 --- wontfix
firefox48 + verified
firefox49 + verified

People

(Reporter: mail.vince, Assigned: jnicol)

References

Details

(Keywords: regression, testcase, Whiteboard: [gfx-noted])

Attachments

(3 files)

Attached file gpu-config.txt
User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0
Build ID: 20160502172042

Steps to reproduce:

The following JS Bin is an example isolated from a web app that renders documents in a scalable viewport. Objects can be moved around. The example contains an SVG triangle, animated rightwards using the translate(x y) style. The SVG is within a DIV that has 150x scaling applied:

http://jsbin.com/pekedupihi/edit?html,js,output


Actual results:

Under Windows 10, Firefox 46.0.1 newly refreshed, and latest graphics drivers (downloaded from Dell website), the triangle blinks briefly, but then fails to render at all for the rest of the animation. See attached file for GPU config, and note listed errors.

Under Mac OS 10.9.5, Firefox 46.0.1 newly refreshed, the above example fails to render correctly. Additionally, memory consumption increases to over 1Gb, causing the entire machine to slow down.


Expected results:

Triangle should be shown as animating smoothly across the page, ideally at full resolution (it's a vector).

The high memory usage under Mac OS suggests that the browser is attempting to rasterise the entire SVG element at full resolution. Reducing the scaling of the DIV, and turning on layers.draw-borders (about:config) shows that the browser is attempting to accelerate the entire element.

Note that it may be necessary to increase the DIV scaling, or increase the window size, in order to recreate the issue.

The test passes, and memory usage is normal, if hardware acceleration is turned off in Settings > Advanced. Furthermore, after the test fails, Firefox can sometimes turn off acceleration of its own accord, allowing the test to run as expected (still shows as turned on in Settings); a refresh is required to turn acceleration back on.

The full web app appeared to work successfully in an older Firefox version (43). The shape was rendered at lower resolution (presumably at 1x scaling), thereby allowing successful acceleration. On a further machine running the latest Firefox (46), the same strategy appeared to be used to allow successful acceleration. Why does the latest version of the browser not enforce a maximum resolution for an accelerated layer? Has this heuristic decision changed recently?
With Win 10, you should update the drivers to the latest versions, especially for your Nvidia GPU. I know Dell sucks to provide the latest drivers, but check on the Nvidia website if you can install vanilla drivers.
Sage advice regarding Win 10 drivers - thanks. I was able to update to the latest Nvidia drivers directly from their website, but unfortunately this didn't improve matters; the shape in the example now fails to show at all. I was not allowed to install the latest drivers from Intel, so I've stuck with the Dell ones.
That JSBin also fails in Firefox on my machine running Windows 8.1. Mac Mini with latest drivers. (Intel GPU). Works fine in all other browsers on the same machine.
I'm able to reproduce the issue with FF46 on Win 7 when HWA is disabled. But I see the red triangle with HWA enabled.

Regression range:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9e4218058d633480b4a17db6cf3d3831418b6ec3&tochange=29f37fd4c13e4309b97e15a5879931819c8220e9

Nicolas Silva — Bug 1224254 - Don't try to allocate unreasonably large textures. r=Bas

Nicolas, could you explain why the result is different with HWA on/off on Windows then with Mac OSX.
Blocks: 1224254
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(nical.bugzilla)
Keywords: regression, testcase
When accelerated compositing is disabled, gecko more aggressively flattens the layer tree and draws the triangle in the layer underneath it directly, even if animating This is because compositing on the CPU is expensive, therefore we don't gain from having the animated element in its own layer. As a result the triangle is drawn without allocating a layer of the size of the triangle.
With acceleration enabled however, we render the triangle on its own layer because compositing is cheaper. The problem is that this triangle is pretty big and with a high DPI display, we end up trying to allocate a layer that is bigger than the resolution that is safe to allocate.
GPU APIs are sometimes terrible at reporting the maximum size they can handle. They rather tell you the size starting with which they will be certain to fail, rather than sizes they are certain they can handle. That's why we had to tighten the limit. It worked before this patch, but probably not reliably across devices and drivers.

I think that what makes sense here is to detect that the animated element is bigger than what we can layerize and flatten if need be, even when accelerated compositing is on.

Jamie, does that sound reasonable to you (you know that stuff way better than I do)?
Flags: needinfo?(nical.bugzilla) → needinfo?(jnicol)
Yes that sounds reasonable to me. We already have a mechanism in place to not layerize too large an area due to animated offset properties. See the AGR budget in nsDisplayList.cpp. But since the example uses shape.style.transform this does not apply. Indeed if you comment out that line, and uncomment the next line, the budget would apply.

The size of the budget probably needs lowered for desktop (I set it low for mobile, and deliberately high enough for desktop to not affect anything so that it would land.) We may want to add an max size per individual item, or make it not related to the size of the pres shell but an absolute value. And of course we'd need to make it so that animated transform properties are linked to this budget. But I believe that would be the best course of action.
Flags: needinfo?(jnicol)
Whiteboard: [gfx-noted]
Bug 1224254 landed in 45 and got uplifted to 43, tagging as such.
Version: 46 Branch → 43 Branch
Assignee: nobody → jnicol
My understanding in comment 7 wasn't quite correct. Transforms are actively/inactively layerized by nsDisplayTransform::GetLayerState(), not by having a separate AGR.

In nsDisplayTransform::GetLayerState() we actively layerize the nsDisplayTransform if the transform css property is changing (by calling MayBeAnimated()). We already check that the item is larger than a minimum size here, we could check that it is smaller than a maximum size too. Doing so is complicated by the fact that the nsDisplayTransform with the changing translation is small, but the layer is large due to the parent nsDisplayTransform which has a large scale. Presumably not an insurmountable problem.

However, I'm wondering if the correct solution is actually just to ensure that the parent layer's display port applies to the resulting active layer. This would limit the size of the layer's visible region, and we'd no longer attempt to allocate a massive texture. For some reason the display port isn't being applied, however. I'm trying to work out why.
The reason we were rendering the entire layer was because the svg is small and its transform is animated, so nsDisplayTransform::ShouldPrerenderTransformedContent() decides we should prerender the entire thing. But it ends up not being small because of the scale transform. The above patch takes the scale transform into account in ShouldPrerenderTransformedContent(). This means we no longer prerender the entire shape, and only render the portion of it which falls within the display port.
Comment on attachment 8759593 [details]
Bug 1274991 - Consider ancestor scale transforms when deciding whether to prerender transformed content;

https://reviewboard.mozilla.org/r/57554/#review54312

::: layout/base/nsDisplayList.cpp:5868
(Diff revision 1)
>      nsRect visual = aFrame->GetVisualOverflowRect();
> -    if (visual.width <= maxInAppUnits && visual.height <= maxInAppUnits) {
> +    if (visual.width * scale.width <= maxInAppUnits &&
> +        visual.height * scale.height <= maxInAppUnits) {

Let's remove "visual" here and use frameSize. The sizes of aFrame->GetVisualOverflowRectRelativeToSelf() and aFrame->GetVisualOverflowRect() are the same, I think somebody missed an opportunity to simplify this code at some point.
Attachment #8759593 - Flags: review?(mstange) → review+
Comment on attachment 8759593 [details]
Bug 1274991 - Consider ancestor scale transforms when deciding whether to prerender transformed content;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/57554/diff/1-2/
updated patch just makes mstange's recommended change. r+ carries

try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8a3873092b567e0f9049af055cb0c0afb56daed9&selectedJob=21932572
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d935da74d415
Consider ancestor scale transforms when deciding whether to prerender transformed content. r=mstange
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/d935da74d415
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Jamie or Markus, is this something you'd like to uplift to beta, or are you OK with letting this fix ride along with 49?
Flags: needinfo?(mstange)
Flags: needinfo?(jnicol)
Comment on attachment 8759593 [details]
Bug 1274991 - Consider ancestor scale transforms when deciding whether to prerender transformed content;

This should be safe to uplift, but I'm not sure how important it is / how many real world cases of it there are.

Approval Request Comment
[Feature/regressing bug #]: N/A
[User impact if declined]: higher memory usage / incorrectly rendering due to failed large allocations
[Describe test coverage new/current, TreeHerder]: Been on nightly for a couple of weeks without problems
[Risks and why]: Fairly low, this makes us more conservative about large allocations
[String/UUID change made/needed]: N/A
Flags: needinfo?(jnicol)
Attachment #8759593 - Flags: approval-mozilla-beta?
Attachment #8759593 - Flags: approval-mozilla-aurora?
We have a real-world graphics app that is broken in FF at present because of this issue, and that works in all other browsers.
Comment on attachment 8759593 [details]
Bug 1274991 - Consider ancestor scale transforms when deciding whether to prerender transformed content;

ok, with Charles's comment, I think we should take it. Thanks for your input.
As we are early in the cycle, we will have time to backout it if it causes unexpected regressions
Flags: needinfo?(mstange)
Attachment #8759593 - Flags: approval-mozilla-beta?
Attachment #8759593 - Flags: approval-mozilla-beta+
Attachment #8759593 - Flags: approval-mozilla-aurora?
Attachment #8759593 - Flags: approval-mozilla-aurora+
This landed while 49 was on trunk. 49 got merged to aurora a day or so later, so nothing needs uplifted to 49.
I managed to reproduce this bug using Fx 49.0a1, build ID:20160523030225 on Windows 10 x64.
Confirmed the fix on Fx 48.0b4, build ID 20160627144420 and Fx 49.0a2, build ID 20160628004053, on Windows 10 x64, Mac 10.9.5 and Ubuntu 14.04.
Status: RESOLVED → VERIFIED
Flags: qe-verify+ → qe-verify-
Depends on: 1292856
Depends on: 1431778
You need to log in before you can comment on or make changes to this bug.