Closed Bug 1085267 Opened 10 years ago Closed 10 years ago

Browser hang up due to huge memory and CPU usage

Categories

(Core :: Graphics, defect)

35 Branch
x86_64
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 1081926
Tracking Status
firefox34 --- unaffected
firefox35 --- affected
firefox36 --- affected

People

(Reporter: alice0775, Assigned: mvujovic)

References

()

Details

(Keywords: hang, memory-footprint, regression)

Attachments

(3 files)

[Tracking Requested - why for this release]:

[Tracking Requested - why for this release]:

Build Identifier:
https://hg.mozilla.org/mozilla-central/rev/33c0181c4a25
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0 ID:20141019030207


Reported http://forums.mozillazine.org/viewtopic.php?p=13828103#p13828103

Steps To Reproduce:
1. Open http://www.apple.com/imac-with-retina/

Actual Results:
Huge memory usage and CPU

Regression window(m-i)
Good:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9b157630e5ab
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0 ID:20141003095048
Bad:
https://hg.mozilla.org/integration/mozilla-inbound/rev/697c4b245de0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0 ID:20141003100802
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9b157630e5ab&tochange=697c4b245de0

Regressed by
697c4b245de0	Tim Nguyen — Bug 1057180 - Enable CSS Filters by default. r=dbaron
Stack when hang(useng crashfirefox.exe): bp-9bdfb1de-bbb9-483c-8443-715f22141020
s/useng/using/
Setting gfx.direct2d.use1_1 = false fixes the problem.
So,I add dependency to Bug 902952 .
Blocks: 902952
Graphics
--------

Adapter Description: ATI Radeon HD 4300/4500 Series
Adapter Drivers: aticfx64 aticfx64 aticfx32 aticfx32 atiumd64 atidxx64 atiumdag atidxx32 atiumdva atiumd6a atitmm64
Adapter RAM: 512
ClearType Parameters: Gamma: 2200 Pixel Structure: R ClearType Level: 50 Enhanced Contrast: 200
Device ID: 0x954f
Direct2D Enabled: true
DirectWrite Enabled: true (6.2.9200.16571)
Driver Date: 4-29-2013
Driver Version: 8.970.100.1100
GPU #2 Active: false
GPU Accelerated Windows: 1/1 Direct3D 11 (OMTC)
Subsys ID: 00000000
Vendor ID: 0x1002
WebGL Renderer: Google Inc. -- ANGLE (ATI Radeon HD 4300/4500 Series Direct3D9Ex vs_3_0 ps_3_0)
windowLayerManagerRemote: true
AzureCanvasBackend: direct2d
AzureContentBackend: direct2d 1.1
AzureFallbackCanvasBackend: cairo
AzureSkiaAccelerated: 0
Thanks for the report! I'm taking a look at this.
Assignee: nobody → mvujovic
Status: NEW → ASSIGNED
In addition to the STR.
To reproduce the huge memory and CPU usage, you might need to reload(f5) the page.
I tried this on Mac, and we can see significant CPU usage and rendering problems as well.

First, regarding the CPU usage:

With CSS filters enabled, the rich scrolling experience portion of the page is also enabled. With CSS filters disabled, the whole section containing the scrolling experience is display: none. The scrolling experience is contained within the element with class “section-intro”.

The scrolling experience involves zooming into an extremely large image as you scroll down the page (try it in Safari or Chrome).

The scrolling experience itself doesn’t use CSS filters much. There’s only one element on the page with a CSS filter, and that element always has “filter:opacity(1)”. The element has the class “intro-content”. If you remove this CSS filter from the page, there is no visible effect on performance or rendering.

That particular element (“intro-content”) contains a 5x3 grid of 1310x1547 canvases (resulting in a fairly large element). These canvases are the slices of the extremely large image that’s being “zoomed into” as you scroll down.

Interestingly, the high CPU usage is not from these elements. Rather, it appears to be from a small circular button with the text “Scroll” at the bottom of the page. The scroll button has a element with the class “shimmer”. This element has a gradient background and an infinite CSS animation of background-position. This gives the scroll button a “shimmery” effect to get the user’s attention and indicate that they should scroll down. For this element to appear on the page, you have to stay on the page for a few seconds and not scroll.

When you delete the “shimmer” element, CPU usage (on my MBP) in Firefox Nightly drops from 50% to <10%. Similarly, in Safari, CPU usage is at 30% with the “shimmer” element present.

I profiled the CPU usage, and it appears to be mostly painting (and display list reconstruction), apparently due to the continuous background-position animation. CSS filters related code doesn’t show up as anything significant in the profile.

Regarding the rendering problems:

Ignoring the CPU usage problem, the scrolling experience doesn’t appear to work properly. The extremely large image made up of 15 canvases is not centered in the viewport as it should be (as seen in Safari and Chrome). Instead, its top left corner is positioned at the top left corner of the viewport. Also, there is no “zooming” effect as you scroll down the page. Additionally, the shimmer button doesn’t look right. It’s not clipped to a circle and you see can an opaque gradient background animating.

It seems like the developers of the page need to do some work to get the scrolling experience to work as expected in Firefox. However, it seems they haven’t had a chance to do that yet, and their CSS Filters feature detection is turning on the scrolling experience prematurely in Firefox. We should probably get in touch with them.

On our side, I’m not quite sure what to do here. From a perf standpoint, we can keep making painting faster in general and perhaps look at this specific animation. I’m not quite sure how to fix the rendering problems on the page itself (the JS code is minified, so it’s hard to tell what’s causing incompatibility between Firefox and Safari + Chrome).
This is the profile / trace for Mac. You can open it up in Instruments. It shows there's a lot of painting going on, and nothing much related to CSS filters.
I've sent Apple website feedback at [1] regarding that URL to give them a heads up that the rich scrolling experience needs some testing in Firefox.

[1]: http://www.apple.com/contact/feedback.html
This bug affects me using Aurora 35.0a2 under Linux as well. When I visit that page it hangs the browser and increases the memory usage and swap by a couple of gigs until I force kill Firefox.
I've managed to repro the hang during page load on Windows. (It did not occur on Mac or Linux for me.)

The hang is in a JavaScript loop which creates 15 canvas elements (sized 1310x1547) and draws the same massive image [1] (sized 6500x4610) into each one using the canvas drawImage JS API.

I attached the Visual Studio debugger and during the hang, if you pause, you usually get a call stack as shown below. It looks like Firefox is spinning its wheels creating the bitmap for the massive 6500x4610 image over and over again (6500x4610 * 4 bytes = 114MB!).

In CanvasRenderingContext2D::DrawImage, we do a CanvasImageCache::Lookup, using the img element and the canvas element as the cache key. I'm wondering why we don't share an image cache across all canvas elements? That might avoid regenerating the image 15 times.

---
xul.dll!ID2D1RenderTarget::CreateBitmap(D2D_SIZE_U size, const D2D1_BITMAP_PROPERTIES & bitmapProperties, ID2D1Bitmap * * bitmap) Line 3227     C++

xul.dll!mozilla::gfx::SourceSurfaceD2D::InitFromData(unsigned char * aData, const mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> & aSize, int aStride, mozilla::gfx::SurfaceFormat aFormat, ID2D1RenderTarget * aRT) Line 79     C++

xul.dll!mozilla::gfx::DrawTargetD2D::OptimizeSourceSurface(mozilla::gfx::SourceSurface * aSurface) Line 1223     C++

xul.dll!nsLayoutUtils::SurfaceFromElement(nsIImageLoadingContent * aElement, unsigned int aSurfaceFlags, mozilla::gfx::DrawTarget * aTarget) Line 5851     C++

xul.dll!nsLayoutUtils::SurfaceFromElement(mozilla::dom::Element * aElement, unsigned int aSurfaceFlags, mozilla::gfx::DrawTarget * aTarget) Line 5997     C++

xul.dll!mozilla::dom::CanvasRenderingContext2D::DrawImage(const mozilla::dom::HTMLImageElementOrHTMLCanvasElementOrHTMLVideoElement & image, double sx, double sy, double sw, double sh, double dx, double dy, double dw, double dh, unsigned char optional_argc, mozilla::ErrorResult & error) Line 3918     C++

xul.dll!mozilla::dom::CanvasRenderingContext2D::DrawImage(const mozilla::dom::HTMLImageElementOrHTMLCanvasElementOrHTMLVideoElement & image, double dx, double dy, mozilla::ErrorResult & error) Line 275     C++

xul.dll!mozilla::dom::CanvasRenderingContext2DBinding::drawImage(JSContext * cx, JS::Handle<JSObject *> obj, mozilla::dom::CanvasRenderingContext2D * self, const JSJitMethodCallArgs & args) Line 3969     C++

xul.dll!mozilla::dom::GenericBindingMethod(JSContext * cx, unsigned int argc, JS::Value * vp) Line 2425     C++

xul.dll!js::CallJSNative(JSContext * cx, bool (JSContext *, unsigned int, JS::Value *) * native, const JS::CallArgs & args) Line 231     C++
---

[1]: http://images.apple.com/v/imac-with-retina/a/images/overview/intro_large_2x.jpg
Related to the above comment, Rik tried making all canvases share the same image cache, and it fixed the hang. (He passed in NULL for the canvas part of the (image element, canvas) cache key.)

@roc - Is there a reason to keep images cached separately for different canvas elements?

CanvasImageCache: http://dxr.mozilla.org/mozilla-central/source/dom/canvas/CanvasImageCache.cpp#23
Flags: needinfo?(roc)
Thanks for delving into this, Max! Since it's not really your bug :-).

There's a security issue with just making the cache global. On a cache miss, CanvasRenderingContext2D::DrawImage calls DoDrawImageSecurityCheck, but on a cache hit, it doesn't. The idea is that the first draw of an image to a particular canvas does the security check and after that we don't need to do it again (since the canvas will already have been tainted if necessary). Using a global cache would break that.

We need to make nsLayoutUtils::SurfaceFromElement return the same optimized surface every time it's called. In this case, I don't understand why OptimizeSourceSurface is doing work here. The image should already have been optimized for D2D, I would have thought, as soon as decoding finished. Seth, am I right? Max, can you check whether imgFrame::Optimize was called for the image and if not, why not? Thanks!!!
Flags: needinfo?(roc) → needinfo?(seth)
BTW on Linux right now I'm not seeing this effect you're describing. Has Apple changed the page already?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #16)
> Thanks for delving into this, Max! Since it's not really your bug :-).
> 
> There's a security issue with just making the cache global. On a cache miss,
> CanvasRenderingContext2D::DrawImage calls DoDrawImageSecurityCheck, but on a
> cache hit, it doesn't. The idea is that the first draw of an image to a
> particular canvas does the security check and after that we don't need to do
> it again (since the canvas will already have been tainted if necessary).
> Using a global cache would break that.

I think we can probably make a global cache work, though as you say the security issues as you mentioned make it slightly tricky. We could for example store a list of principals a given imgFrame has already been checked against, and pass the principal as part of the cache lookup. I'd love for us to use the ImageLib SurfaceCache for this, but as I understand it we also store video frames in the CanvasImageCache, so what to do with them also requires some thought.

> We need to make nsLayoutUtils::SurfaceFromElement return the same optimized
> surface every time it's called. In this case, I don't understand why
> OptimizeSourceSurface is doing work here. The image should already have been
> optimized for D2D, I would have thought, as soon as decoding finished. Seth,
> am I right? Max, can you check whether imgFrame::Optimize was called for the
> image and if not, why not? Thanks!!!

It's true that imgFrame::Optimize should be called whenever we finish decoding. Two possible gotchas are:

1. The backend might be different for the canvas than for content, so maybe OptimizeSourceSurface needs to do a conversion. I think that's not true on desktop, though, right?

2. We *deoptimize* the imgFrame when we do HQ scaling, and we keep it deoptimized until scaling is finished. I don't think that should be happening for images that are only drawn into a canvas and nowhere else, since we should be drawing without FLAG_HIGH_QUALITY_SCALING set.

There are also other scenarios where we might not have an optimized surface that I don't think apply here (animated images, single color images).
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #17)
> BTW on Linux right now I'm not seeing this effect you're describing. Has
> Apple changed the page already?

We discussed the issue with the lack of centering in #gfx when Apple posted the page, and ISTR that they noticed the problem and changed the page within a couple of hours of posting it.
Flags: needinfo?(seth)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #16)
> Thanks for delving into this, Max! Since it's not really your bug :-).
> 
> There's a security issue with just making the cache global. On a cache miss,
> CanvasRenderingContext2D::DrawImage calls DoDrawImageSecurityCheck, but on a
> cache hit, it doesn't. The idea is that the first draw of an image to a
> particular canvas does the security check and after that we don't need to do
> it again (since the canvas will already have been tainted if necessary).
> Using a global cache would break that.

Would it help if you make the cache per document instead of per canvas instance?
(In reply to Rik Cabanier from comment #20)
> Would it help if you make the cache per document instead of per canvas
> instance?

No. Then a page could draw an image to canvas A, tainting canvas A and putting the image in the CanvasImageCache for the document. Then the page could draw the image to canvas B, getting a cache hit and bypassing DoDrawImageSecurityCheck so canvas B is not tainted.
(In reply to Seth Fowler [:seth] from comment #18)
> It's true that imgFrame::Optimize should be called whenever we finish
> decoding. Two possible gotchas are:
> 
> 1. The backend might be different for the canvas than for content, so maybe
> OptimizeSourceSurface needs to do a conversion. I think that's not true on
> desktop, though, right?

Yeah, it should be DrawTargetD2D in both cases.

Now the page is gone, I guess it's going to be hard to figure out why the image wasn't already optimized :-(. We may have to resolve this RESOLVED INCOMPLETE unless someone can come up with a testcase that shows the problem.
Isn't content currently DrawTargetD2D1 and canvas DrawTargetD2D, because we couldn't enable D2D 1.1 for canvas yet?
Attached image Linux Screenshot
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #22)
> (In reply to Seth Fowler [:seth] from comment #18)
> Now the page is gone, I guess it's going to be hard to figure out why the
> image wasn't already optimized :-(. We may have to resolve this RESOLVED
> INCOMPLETE unless someone can come up with a testcase that shows the problem.

Thanks for chiming in! I still see the broken scrolling experience on Linux and other platforms (see attachment). However, I've only seen the hang during page load on Windows.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #22)
> Now the page is gone,

I can still reproduce the huge memory problem on the page in Windows7.
(In reply to Milan Sreckovic [:milan] from comment #23)
> Isn't content currently DrawTargetD2D1 and canvas DrawTargetD2D, because we
> couldn't enable D2D 1.1 for canvas yet?

You are correct.
There's a bug involving huge memory usage with images being drawing to multiple canvases being fixed in bug 1081926. Might help with this page on windows.
(In reply to Michael Wu [:mwu] from comment #27)
> There's a bug involving huge memory usage with images being drawing to
> multiple canvases being fixed in bug 1081926. Might help with this page on
> windows.

Thanks, Michael. I'll take a look.

I'm halfway done with a potential fix for this one. Unfortunately, I got derailed this week with a different project. Hoping to post a patch next week.
See Also: → 1081926
Try build of Bug 1081926 Comment 36 fixes this problem.
So, I marked as duplicate.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
See Also: 1081926
(In reply to Max Vujovic from comment #28)
> Hoping to post a patch next week.

Nevermind! Looks like you have a patch up on bug 1081926 that'll fix this. Thanks, Michael.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: