Closed Bug 1008942 Opened 6 years ago Closed 5 years ago

Firefox crashes rendering 100 images and it takes 1500ms for 10 images.

Categories

(Core :: ImageLib, defect, major)

defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla34
Tracking Status
firefox34 --- fixed

People

(Reporter: baku, Assigned: tnikkel, NeedInfo)

References

(Depends on 1 open bug)

Details

Attachments

(1 file)

Attached file foobar.zip
I'm not sure this testcase can be public: there is an image with a logo of some pharmaceutical company and I don't think we can show it. Unfortunately this bug can be reproduced just with this image.

It seems that the problem is in the image rendering: What I know is that this image was generated from a PPT or a PDF file which some online software.

In attach a html page + an image. Chromium spends 40~ ms for 10 images. Firefox: 1500ms and if I use 100 images, it crashes.
Milan suggested CCing Seth and Jeff and himself.
I think our logic to decode images or not isn't working well here leading us to decode all the images, and the decode being complete will block the onload handler in this case (but not in all cases). The crash with 100 images is likely due to oom. I don't think there is anything special about the image except for it's size. So we can replace it with a generic image of the same size and see the same issue, which we should do so we can open this bug up.

What I was seeing was that our post-first reflow callback was deciding the images were visible because they get a default size of 24 x 24 before the size of the image is known. This means that all the images fit on the screen at this size and are visible. However giving the images an explicit size of their actual size doesn't always fix the problem. I suspect that the fragile code in nsImageLoadingContent::OnStopRequest which asks for a decode under some circumstances while we have paint suppressed plays a role because doing the img element creation, insertion, etc after load via a button click doesn't exhibit the problem.
Maybe it's useful to know that I wrote that JS code just to use 1 single image. This issue can be recreated also with a simple list of image tags: <img src="image1.jpg"><img src="image2.jpg">..
Is this test representing a "real world" (tm) situation?
(In reply to Milan Sreckovic [:milan] from comment #4)
> Is this test representing a "real world" (tm) situation?

The "real world" (tm) situation was a kiosk app, a html page fullscreen with several images like that and a touch-screen monitor. Firefox crashed all the time.
And the intended experience for the user at the kiosk is to just scroll down a webpage through a bunch of images?

It's important that the testcase accurately represents the use case as it is trivial to make a page that requires arbitrary amounts of memory.
(In reply to Timothy Nikkel (:tn) from comment #6)
> And the intended experience for the user at the kiosk is to just scroll down
> a webpage through a bunch of images?

not really. This is a test case. The purpose of test case is to show the problem without all the complexity of the 'real world' :) Part of the kiosk was a slide show between these images.
Scrolling through images and a slideshow of images are very different testcases. So you may have simplified out important parts of the real world (although reduced testcases are very much appreciated) so that although you have a testcase that shows _a_ problem, it may not show _the_ problem that the kiosk app is having.

Would it be possible to get something that more closely matches what the kiosk is doing? Even a complicated un-simplified testcase would be fine.
I would love to investigate a testcase that more accurately represents this specific use case.
Flags: needinfo?(amarchesini)
(In reply to Timothy Nikkel (:tn) from comment #9)
> I would love to investigate a testcase that more accurately represents this
> specific use case.

Ok! Finally I received the all zip file. It's 98mb. what about if I upload it somewhere instead bugzilla?
Maybe you can ping me on IRC.
Flags: needinfo?(amarchesini) → needinfo?(tnikkel)
Got the zip from Andrea. Leaving the needinfo to investigate its contents.
I can reproduce spikes of memory used by decoded images going to 3.8GB. The steady state seems to be about 300mb.
When decode-on-draw was implemented code was added to start decoding of images at the very start of page loading when we are suppressing painting because the lack of painting would lead to no drawing, and hence no decoding of images until paint suppression ended, at which point we would have wasted time we could have used to decode images.

In http://hg.mozilla.org/mozilla-central/rev/f2ff265af286 I made this smarter: instead of just blindly asking to decode any images whose network requests complete during paint suppression we first check if there is a frame for the image that has had a first reflow and hence has already checked if it is (nearly) visible. This means we don't have to decode images that aren't near the viewport if their network requests happen to finish during paint suppression.

The testcase has all of the images it ever uses (3.9GB decoded, the testcase is an app, with several 'pages', each one containing some subset of the images) in the page DOM at page load, and they are all in a display none subtree. This means they don't have a frame (and they won't get a frame until the user navigates to one of the sub-'pages' in the app). This means the smarts of f2ff265af286 can't kick in and we just have to fall back to decoding the image.

The problem with improving upon this is there isn't an efficient way that I can think of to determine that an element isn't going to get a frame at the next flush. In this case we'd have to walk up the DOM tree looking for an element with an undisplayed map entry. So we can't distinguish 1) the element will never have a frame from 2) the element will get a frame, it just needs a flush.

However, I think we should consider elements without a frame not visible. In the unlikely case that the element will get a frame at the next flush and it's related network request just happens to complete in that small time window (and during paint suppression) then the element will get a frame at the next refresh driver tick, which is at most 16.6ms away since this code is only applied to visible presshells, and it's first reflow will determine if it's visible or not and kick off a decode if needed.
(In reply to Timothy Nikkel (:tn) from comment #13)
> However, I think we should consider elements without a frame not visible. In
> the unlikely case that the element will get a frame at the next flush and
> it's related network request just happens to complete in that small time
> window (and during paint suppression) then the element will get a frame at
> the next refresh driver tick, which is at most 16.6ms away since this code
> is only applied to visible presshells, and it's first reflow will determine
> if it's visible or not and kick off a decode if needed.

One slight complication: we might get a generic frame type (ie inline) if the image size isn't known at that time (and layout doesn't give it one). And so it won't check its own visibility. However we have all of the image data (since we hit OnStopRequest) and so if the size isn't already known it is sure to be soon.
Attached patch patchSplinter Review
Assignee: nobody → tnikkel
Attachment #8445756 - Flags: review?(mats)
Flags: needinfo?(tnikkel)
Nice description, change to the heuristic seems reasonable.  I'd like to have a public bug on this, so that we can publicize this change and get good testing on it - since we're worried about the data in the attachment, maybe once we have a fix, we open the bug, and make that attachment private?
Comment on attachment 8445756 [details] [diff] [review]
patch

Nit: s/will get very/will get one very/

r=mats
Attachment #8445756 - Flags: review?(mats) → review+
(In reply to Milan Sreckovic [:milan] from comment #16)
> Nice description, change to the heuristic seems reasonable.  I'd like to
> have a public bug on this, so that we can publicize this change and get good
> testing on it - since we're worried about the data in the attachment, maybe
> once we have a fix, we open the bug, and make that attachment private?

Yes, that's what I was planning. We can do that now. Except either I don't know how to make an attachment private or my bugzilla account doesn't have the ability to, anyone know how?
The attachment has been made private. I'll post two testcases with generic contents, one for the attachment, and one to represent the problem from the testcase Andrea sent me (comment 10). We should be able to make this bug public now, unless anyone objects.
Depends on: 1051530
The patch in bug 1051530 should fix the test failure and let us land this patch.
https://hg.mozilla.org/mozilla-central/rev/cb5509ae7f96
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Depends on: 1061894
ni'ing myself for comment 20
Flags: needinfo?(tnikkel)
Depends on: 1060876
Opening up as the testcase is now hidden. No reason to keep it closed.
Group: mozilla-employee-confidential
Depends on: 1063973
You need to log in before you can comment on or make changes to this bug.