1008942 - Firefox crashes rendering 100 images and it takes 1500ms for 10 images.

Reporter

Description

•

10 years ago

Attached file foobar.zip — Details

I'm not sure this testcase can be public: there is an image with a logo of some pharmaceutical company and I don't think we can show it. Unfortunately this bug can be reproduced just with this image.

It seems that the problem is in the image rendering: What I know is that this image was generated from a PPT or a PDF file which some online software.

In attach a html page + an image. Chromium spends 40~ ms for 10 images. Firefox: 1500ms and if I use 100 images, it crashes.

Andrew Overholt [:overholt]

Comment 1

•

10 years ago

Milan suggested CCing Seth and Jeff and himself.

Timothy Nikkel (:tnikkel)

Assignee

Comment 2

•

10 years ago

I think our logic to decode images or not isn't working well here leading us to decode all the images, and the decode being complete will block the onload handler in this case (but not in all cases). The crash with 100 images is likely due to oom. I don't think there is anything special about the image except for it's size. So we can replace it with a generic image of the same size and see the same issue, which we should do so we can open this bug up.

What I was seeing was that our post-first reflow callback was deciding the images were visible because they get a default size of 24 x 24 before the size of the image is known. This means that all the images fit on the screen at this size and are visible. However giving the images an explicit size of their actual size doesn't always fix the problem. I suspect that the fragile code in nsImageLoadingContent::OnStopRequest which asks for a decode under some circumstances while we have paint suppressed plays a role because doing the img element creation, insertion, etc after load via a button click doesn't exhibit the problem.

Andrea Marchesini [:baku]

Reporter

Comment 3

•

10 years ago

Maybe it's useful to know that I wrote that JS code just to use 1 single image. This issue can be recreated also with a simple list of image tags: <img src="image1.jpg"><img src="image2.jpg">..

Milan Sreckovic [:milan] (needinfo for best results)

Comment 4

•

10 years ago

Is this test representing a "real world" (tm) situation?

Andrea Marchesini [:baku]

Reporter

Comment 5

•

10 years ago

(In reply to Milan Sreckovic [:milan] from comment #4)
> Is this test representing a "real world" (tm) situation?

The "real world" (tm) situation was a kiosk app, a html page fullscreen with several images like that and a touch-screen monitor. Firefox crashed all the time.

Timothy Nikkel (:tnikkel)

Assignee

Comment 6

•

10 years ago

And the intended experience for the user at the kiosk is to just scroll down a webpage through a bunch of images?

It's important that the testcase accurately represents the use case as it is trivial to make a page that requires arbitrary amounts of memory.

Andrea Marchesini [:baku]

Reporter

Comment 7

•

10 years ago

(In reply to Timothy Nikkel (:tn) from comment #6)
> And the intended experience for the user at the kiosk is to just scroll down
> a webpage through a bunch of images?

not really. This is a test case. The purpose of test case is to show the problem without all the complexity of the 'real world' :) Part of the kiosk was a slide show between these images.

Timothy Nikkel (:tnikkel)

Assignee

Comment 8

•

10 years ago

Scrolling through images and a slideshow of images are very different testcases. So you may have simplified out important parts of the real world (although reduced testcases are very much appreciated) so that although you have a testcase that shows _a_ problem, it may not show _the_ problem that the kiosk app is having.

Would it be possible to get something that more closely matches what the kiosk is doing? Even a complicated un-simplified testcase would be fine.

Timothy Nikkel (:tnikkel)

Assignee

Comment 9

•

10 years ago

I would love to investigate a testcase that more accurately represents this specific use case.

Flags: needinfo?(amarchesini)

Andrea Marchesini [:baku]

Reporter

Comment 10

•

10 years ago

(In reply to Timothy Nikkel (:tn) from comment #9)
> I would love to investigate a testcase that more accurately represents this
> specific use case.

Ok! Finally I received the all zip file. It's 98mb. what about if I upload it somewhere instead bugzilla?
Maybe you can ping me on IRC.

Flags: needinfo?(amarchesini) → needinfo?(tnikkel)

Timothy Nikkel (:tnikkel)

Assignee

Comment 11

•

10 years ago

Got the zip from Andrea. Leaving the needinfo to investigate its contents.

Timothy Nikkel (:tnikkel)

Assignee

Comment 12

•

10 years ago

I can reproduce spikes of memory used by decoded images going to 3.8GB. The steady state seems to be about 300mb.

Timothy Nikkel (:tnikkel)

Assignee

Comment 13

•

10 years ago

When decode-on-draw was implemented code was added to start decoding of images at the very start of page loading when we are suppressing painting because the lack of painting would lead to no drawing, and hence no decoding of images until paint suppression ended, at which point we would have wasted time we could have used to decode images.

In http://hg.mozilla.org/mozilla-central/rev/f2ff265af286 I made this smarter: instead of just blindly asking to decode any images whose network requests complete during paint suppression we first check if there is a frame for the image that has had a first reflow and hence has already checked if it is (nearly) visible. This means we don't have to decode images that aren't near the viewport if their network requests happen to finish during paint suppression.

The testcase has all of the images it ever uses (3.9GB decoded, the testcase is an app, with several 'pages', each one containing some subset of the images) in the page DOM at page load, and they are all in a display none subtree. This means they don't have a frame (and they won't get a frame until the user navigates to one of the sub-'pages' in the app). This means the smarts of f2ff265af286 can't kick in and we just have to fall back to decoding the image.

The problem with improving upon this is there isn't an efficient way that I can think of to determine that an element isn't going to get a frame at the next flush. In this case we'd have to walk up the DOM tree looking for an element with an undisplayed map entry. So we can't distinguish 1) the element will never have a frame from 2) the element will get a frame, it just needs a flush.

However, I think we should consider elements without a frame not visible. In the unlikely case that the element will get a frame at the next flush and it's related network request just happens to complete in that small time window (and during paint suppression) then the element will get a frame at the next refresh driver tick, which is at most 16.6ms away since this code is only applied to visible presshells, and it's first reflow will determine if it's visible or not and kick off a decode if needed.

Timothy Nikkel (:tnikkel)

Assignee

Comment 14

•

10 years ago

(In reply to Timothy Nikkel (:tn) from comment #13)
> However, I think we should consider elements without a frame not visible. In
> the unlikely case that the element will get a frame at the next flush and
> it's related network request just happens to complete in that small time
> window (and during paint suppression) then the element will get a frame at
> the next refresh driver tick, which is at most 16.6ms away since this code
> is only applied to visible presshells, and it's first reflow will determine
> if it's visible or not and kick off a decode if needed.

One slight complication: we might get a generic frame type (ie inline) if the image size isn't known at that time (and layout doesn't give it one). And so it won't check its own visibility. However we have all of the image data (since we hit OnStopRequest) and so if the size isn't already known it is sure to be soon.

Timothy Nikkel (:tnikkel)

Assignee

Comment 15

•

10 years ago

Attached patch patch — Details — Splinter Review

Assignee: nobody → tnikkel

Attachment #8445756 - Flags: review?(mats)

Flags: needinfo?(tnikkel)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 16

•

10 years ago

Nice description, change to the heuristic seems reasonable.  I'd like to have a public bug on this, so that we can publicize this change and get good testing on it - since we're worried about the data in the attachment, maybe once we have a fix, we open the bug, and make that attachment private?

Mats Palmgren (inactive)

Comment 17

•

10 years ago

Comment on attachment 8445756 [details] [diff] [review]
patch

Nit: s/will get very/will get one very/

r=mats

Attachment #8445756 - Flags: review?(mats) → review+

Timothy Nikkel (:tnikkel)

Assignee

Comment 18

•

10 years ago

(In reply to Milan Sreckovic [:milan] from comment #16)
> Nice description, change to the heuristic seems reasonable.  I'd like to
> have a public bug on this, so that we can publicize this change and get good
> testing on it - since we're worried about the data in the attachment, maybe
> once we have a fix, we open the bug, and make that attachment private?

Yes, that's what I was planning. We can do that now. Except either I don't know how to make an attachment private or my bugzilla account doesn't have the ability to, anyone know how?

Timothy Nikkel (:tnikkel)

Assignee

Comment 19

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/11ce0e5823f4

Timothy Nikkel (:tnikkel)

Assignee

Comment 20

•

10 years ago

The attachment has been made private. I'll post two testcases with generic contents, one for the attachment, and one to represent the problem from the testcase Andrea sent me (comment 10). We should be able to make this bug public now, unless anyone objects.

Carsten Book [:Tomcat]

Comment 21

•

10 years ago

sorry had to backout this out for test failures like https://tbpl.mozilla.org/php/getParsedLog.php?id=42509086&tree=Mozilla-Inbound

Timothy Nikkel (:tnikkel)

Assignee

Updated

•

10 years ago

Depends on: 1051530

Timothy Nikkel (:tnikkel)

Assignee

Comment 22

•

10 years ago

The patch in bug 1051530 should fix the test failure and let us land this patch.

Timothy Nikkel (:tnikkel)

Assignee

Comment 23

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/cb5509ae7f96

Ryan VanderMeulen [:RyanVM]

Comment 24

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/cb5509ae7f96

Status: NEW → RESOLVED

Closed: 10 years ago

status-firefox34: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla34

Ryan VanderMeulen [:RyanVM]

Updated

•

10 years ago

Depends on: 1061894

Timothy Nikkel (:tnikkel)

Assignee

Comment 25

•

10 years ago

ni'ing myself for comment 20

Flags: needinfo?(tnikkel)

Timothy Nikkel (:tnikkel)

Assignee

Updated

•

10 years ago

Depends on: 1060876

Timothy Nikkel (:tnikkel)

Assignee

Comment 26

•

10 years ago

Opening up as the testcase is now hidden. No reason to keep it closed.

Group: mozilla-employee-confidential

Timothy Nikkel (:tnikkel)

Assignee

Updated

•

10 years ago

Depends on: 1063973

Timothy Nikkel (:tnikkel)

Assignee

Updated

•

1 year ago

Flags: needinfo?(tnikkel)