Closed Bug 523528 Opened 12 years ago Closed 12 years ago

Crashes at [@imgFrame::Draw(gfxContext*, gfxPattern::GraphicsFilter, gfxMatrix const&, gfxRect const&, nsIntMargin const&, nsIntRect const&) ]

Categories

(Core :: ImageLib, defect, P2)

x86
Windows XP
defect

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- beta4-fixed

People

(Reporter: jrmuizel, Assigned: joe)

References

(Blocks 1 open bug)

Details

(Keywords: regression, topcrash)

Crash Data

Attachments

(2 files, 2 obsolete files)

This is a top crasher on 1.9.2 and trunk
Keywords: topcrash
Summary: topcrash: [@imgFrame::Draw(gfxContext*, gfxPattern::GraphicsFilter, gfxMatrix const&, gfxRect const&, nsIntMargin const&, nsIntRect const&) ] → Crashes at [@imgFrame::Draw(gfxContext*, gfxPattern::GraphicsFilter, gfxMatrix const&, gfxRect const&, nsIntMargin const&, nsIntRect const&) ]
These crashes are on Windows, not OS X :-)
OS: Mac OS X → Windows XP
Blocking to get us to reproduce this, and we've seen it enough to worry me.  Jeff, can we get it over and over?
Flags: blocking1.9.2? → blocking1.9.2+
Priority: -- → P2
I asked for minidumps for these because the backtraces don't make a ton of sense. I'm not sure if that ended up happening. Aravind?
I don't recall you asking me for any minidumps.  Is there a bug you filed that I overlooked?  If not, what minidumps do you need?
Lars and I were talking with you about the 24-hour storing of minidumps. That's probably necessary in this case, so we can grab one for this crash.
(In reply to comment #6)
> Lars and I were talking with you about the 24-hour storing of minidumps. That's
> probably necessary in this case, so we can grab one for this crash.

There was a request sometime back for a way to store crashes for 24 hours.  That request was done and the crashes on mounted on people.  There is another bug on file to have some sort of a rolling window with the last 24 hours of crashes saved constantly.  I am assuming you are talking about that one.  Its being tracked in bug 523650 (currently blocked by bug 523660).
OK. This bug is blocked on that bug, then, but I can't mark it as such since I don't have access to bug 523650.
Assignee: nobody → joe
Whiteboard: [depends on 523650]
Whiteboard: [depends on 523650]
Whiteboard: [#6 3.6b1 topcrash]
Whiteboard: [#6 3.6b1 topcrash]
this is near the top of list for 3.6b1 crashes

http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.6b1&query_search=signature&query_type=startswith&query=imgFrame%3A%3ADraw&date=&range_value=1&range_unit=weeks&do_query=1&signature=imgFrame%3A%3ADraw%28gfxContext*%2C%20gfxPattern%3A%3AGraphicsFilter%2C%20gfxMatrix%20const%26%2C%20gfxRect%20const%26%2C%20nsIntMargin%20const%26%2C%20nsIntRect%20const%26%29


doesn't look like its happening near startup or restart.

17 total crashes for imgFrame::Draw on 20091102-crashdata.csv
1 start up crashes inside 3 minutes

imgFrame::Draw signature breakdown
signature distribution
       1
signature list
  17 imgFrame::Draw(gfxContext*, gfxPattern::GraphicsFilter, gfxMatrix const&, gfxRect const&, nsIntMargin const&, nsIntRect const&)

os breakdown
  10 imgFrame::Draw(gfxContext*, gfxPattern::GraphicsFilter, gfxMatrix const&, gfxRect const&, nsIntMargin const&, nsIntRect const&) Windows NT 5.1.2600 Service Pack 2
   6 imgFrame::Draw(gfxContext*, gfxPattern::GraphicsFilter, gfxMatrix const&, gfxRect const&, nsIntMargin const&, nsIntRect const&) Windows NT 5.1.2600 Service Pack 3
   1 imgFrame::Draw(gfxContext*, gfxPattern::GraphicsFilter, gfxMatrix const&, gfxRect const&, nsIntMargin const&, nsIntRect const&) Windows NT 5.1.2600 Service Pack 3, v.3244

distribution of all versions where the imgFrame::Draw crash was found on 20091102-crashdata.csv
  16 Firefox 3.6b1
   1 Firefox 3.6a1

top of the stack source file in most of these crashes is
http://hg.mozilla.org/releases/mozilla-1.9.2/annotate/3b49c063bb42/modules/libpr0n/src/imgFrame.cpp#l521

I don't see this on any of the 3.0.x/3.5.x  releases.  could this be related to some changes landed last summer?
http://hg.mozilla.org/releases/mozilla-1.9.2/log/c60105da01a0/modules/libpr0n/src/imgFrame.cpp
Keywords: regression
Whiteboard: [crashkill]
Summary: Crashes at [@imgFrame::Draw(gfxContext*, gfxPattern::GraphicsFilter, gfxMatrix const&, gfxRect const&, nsIntMargin const&, nsIntRect const&) ] → Crashes at [@ imgFrame::Draw(gfxContext*, gfxPattern::GraphicsFilter, gfxMatrix const&, gfxRect const&, nsIntMargin const&, nsIntRect const&) ]
This occurs since surface is null?  (ThebesSurface() returns null?).
ThebesSurface() will only return null if the frame is a single-pixel image that has been optimized away, or if the frame is paletted. Jeff Muizelaar and I did some debugging yesterday using a minidump that showed that it cannot be a single-pixel image, so clearly the second is true. I have not yet figured out how it is possible for us to enter Draw() on a paletted frame.
Pretty sure this is an OOM condition when trying to create a compositing frame. I was able to replicate this exact crash by setting compositingFrame to null unconditionally in DoComposite().

There's not a lot we can do to be more graceful here, as it's very likely the browser's going to crash shortly. However, this precise crash can be solved by adding a couple of checks, which I will do tomorrow.
Component: Image: Painting → HTML: Parser
Summary: Crashes at [@ imgFrame::Draw(gfxContext*, gfxPattern::GraphicsFilter, gfxMatrix const&, gfxRect const&, nsIntMargin const&, nsIntRect const&) ] → Crashes at [@imgFrame::Draw(gfxContext*, gfxPattern::GraphicsFilter, gfxMatrix const&, gfxRect const&, nsIntMargin const&, nsIntRect const&) ]
Whiteboard: [crashkill]
Assignee: joe → nobody
Component: HTML: Parser → Graphics
QA Contact: image.gfx → thebes
Are we more likely to hit OOM when creating a compositing frame than elsewhere?
Not any more than anywhere else we allocate frame-sized chunks of data, but it's in fairly complex code, unlike the other imagelib allocations.

We actually do have a fair number of crashes in nsThebesImage::Draw() in 1.9.1 and 1.9.0, though they aren't as clear-cut as these (all the crashes in 3.6b1 are at 0x4). That at least lets me know that there is _something_ wrong here, and I haven't been able to find other ways this can crash yet.
It's possible for us to end up with bad compositing frames in a couple of different situations: if we're out of memory, or even close to out of memory, we will fail to Init() the imgFrame. (Note also that the implementation of IsLowMemory() changed in 1.9.2, so we might hit this in different situations.)

This patch makes us a little more robust in those situations. It's untested (uncompiled, even), but I wanted to get it out there for review ASAP, because we're going to be spinning a 3.6b2 build, and I want to see if this fixes our problems.

Note that I also made this check very specific: if my assumption about why this problem occurs is wrong, we'll continue to crash as before, which is a good thing.
Assignee: nobody → joe
Attachment #410664 - Flags: superreview?(vladimir)
Attachment #410664 - Flags: review?(jmuizelaar)
Attachment #410664 - Flags: review?(bobbyholley)
Actually, after thinking about this, I think we should have two functions - GetImgFrame and GetDrawableImgFrame, the latter of which should be permitted to return null. I'll leave the review requests as-is, though, to let this percolate through my mind a little more.
Joe, you asked me to hold the beta refresh for this bug; just wondering if you could give some more context about why that is. Because of the number of people it will catch out (do we have an understanding of what that audience looks like) or because of the risk of the fix?
The problem is that we don't have a guaranteed fix. I am 95% confident that what I've identified as the problem is actually the problem, but if I don't test that out, I won't be sure. I don't want the first time this is tested out to be an RC, and since we don't have a guaranteed third beta build, I'd like this to be included in the second beta.
Attachment #410664 - Flags: review?(jmuizelaar) → review-
Comment on attachment 410664 [details] [diff] [review]
potential fix - handle failure more gracefully

>diff --git a/modules/libpr0n/src/imgContainer.cpp b/modules/libpr0n/src/imgContainer.cpp
>--- a/modules/libpr0n/src/imgContainer.cpp
>+++ b/modules/libpr0n/src/imgContainer.cpp
>@@ -531,25 +531,35 @@ NS_IMETHODIMP imgContainer::CopyFrame(PR
>   PRUint32 frameIndex = (aWhichFrame == FRAME_FIRST) ?
>                         0 : GetCurrentImgFrameIndex();
>   imgFrame *frame = GetImgFrame(frameIndex);
>   if (!frame) {
>     *_retval = nsnull;
>     return NS_ERROR_FAILURE;
>   }
> 
>+  // If we failed to create a composite frame and we're paletted, we can't
>+  // draw. Bail out.
>+  if (mAnim && mAnim->lastCompositedFrameIndex == -1 && frame->GetIsPaletted())
>+    return NS_ERROR_FAILURE;
>+

I don't understand this condition or why it belongs here.

>@@ -1437,21 +1447,21 @@ NS_IMETHODIMP imgContainer::Notify(nsITi
>     imgFrame *prevFrame = mFrames[previousFrameIndex];
>     if (!prevFrame)
>       return NS_OK;
> 
>     // Change frame and announce it
>     if (NS_FAILED(DoComposite(&dirtyRect, prevFrame,
>                               nextFrame, nextFrameIndex))) {
>       // something went wrong, move on to next
>-      NS_WARNING("imgContainer::Notify(): Composing Frame Failed\n");

Needless churn?

>       mAnim->currentAnimationFrameIndex = nextFrameIndex;
>       return NS_OK;
>     }
>   }
>+

And more?

>   // Set currentAnimationFrameIndex at the last possible moment
>   mAnim->currentAnimationFrameIndex = nextFrameIndex;
>   // Refreshes the screen
>   observer->FrameChanged(this, &dirtyRect);
>   
>   return NS_OK;
> }
> 
>@@ -2320,16 +2335,21 @@ NS_IMETHODIMP imgContainer::Draw(gfxCont
> 
>+  // If we failed to create a composite frame and we're paletted, we can't
>+  // draw. Bail out.
>+  if (mAnim && mAnim->lastCompositedFrameIndex == -1 && frame->GetIsPaletted())
>+    return NS_ERROR_FAILURE;
>+

This looks like the same condition as above?
The fundamental problem here is that it's possible for us to fail to composite frames. In order to make sure that we handle that failure to composite, and don't accidentally paper over other latent bugs, we add a CompositingFailed field to imgFrame, and set it to true only in those cases that compositing actually failed. This'll ensure that if we (buggily) use a paletted frame for reasons other than compositing fail, we will continue to crash, and we'll be able to find the cause of that bug.

As a follow-up to this bug, I'm going to change GetImgFrame() to always return a frame from mFrames, and never return the compositing frame. This will ensure that when GetFrameColormap is called, we don't try to get the (absent) colormap of the compositing frame. This is a source of latent bugs, but probably isn't a current issue because the relevant functions are only called from the decoder, before Notify() is called.
Attachment #410664 - Attachment is obsolete: true
Attachment #410884 - Flags: review?(jmuizelaar)
Attachment #410884 - Flags: review?(bobbyholley)
Attachment #410664 - Flags: superreview?(vladimir)
Attachment #410664 - Flags: review?(bobbyholley)
Comment on attachment 410884 [details] [diff] [review]
(against 1.9.2) Add GetDrawableImgFrame(), and call it appropriately

>diff --git a/modules/libpr0n/src/imgContainer.cpp b/modules/libpr0n/src/imgContainer.cpp
>--- a/modules/libpr0n/src/imgContainer.cpp
>+++ b/modules/libpr0n/src/imgContainer.cpp
>@@ -140,17 +140,17 @@ NS_IMETHODIMP imgContainer::ExtractCurre
> {
>   NS_ENSURE_ARG_POINTER(_retval);
> 
>   nsRefPtr<imgContainer> img(new imgContainer());
>   NS_ENSURE_TRUE(img, NS_ERROR_OUT_OF_MEMORY);
> 
>   img->Init(aRegion.width, aRegion.height, nsnull);
> 
>-  imgFrame *frame = GetCurrentImgFrame();
>+  imgFrame *frame = GetCurrentDrawableImgFrame();
>   NS_ENSURE_TRUE(frame, NS_ERROR_FAILURE);
> 
>   // The frame can be smaller than the image. We want to extract only the part
>   // of the frame that actually exists.
>   nsIntRect framerect = frame->GetRect();
>   framerect.IntersectRect(framerect, aRegion);
> 
>   if (framerect.IsEmpty())
>@@ -198,29 +198,58 @@ imgFrame *imgContainer::GetImgFrame(PRUi
>     NS_ASSERTION(framenum == 0, "Don't ask for a frame > 0 if we're not animated!");
>     return mFrames.SafeElementAt(0, nsnull);
>   }
>   if (mAnim->lastCompositedFrameIndex == PRInt32(framenum))
>     return mAnim->compositingFrame;
>   return mFrames.SafeElementAt(framenum, nsnull);
> }
> 

This should probably just call GetImgFrame?

>+imgFrame *imgContainer::GetDrawableImgFrame(PRUint32 framenum)
>+{
>+  nsresult rv = RestoreDiscardedData();
>+  NS_ENSURE_SUCCESS(rv, nsnull);
>+
>+  imgFrame *frame = nsnull;
>+
>+  if (!mAnim) {
>+    NS_ASSERTION(framenum == 0, "Don't ask for a frame > 0 if we're not animated!");
>+    frame = mFrames.SafeElementAt(0, nsnull);
>+  }
>+  else if (mAnim->lastCompositedFrameIndex == PRInt32(framenum))
>+    frame = mAnim->compositingFrame;
>+  else
>+    frame = mFrames.SafeElementAt(framenum, nsnull);
>+
>+  // We will return a paletted frame if it's not marked as compositing failed
>+  // so we can catch crashes for reasons we haven't investigated.
>+  if (frame && frame->GetIsPaletted() && frame->GetCompositingFailed())
>+    return nsnull;

As discussed with joe offline, the check for frame->GetIsPaletted() is only for 1.9.2.

Overall, the approach here is much better than last time. It doesn't feel great, but I'm not really sure how it could be improved, especially on short time frame.
Attachment #410884 - Flags: review?(jmuizelaar) → review-
This addresses Jeff's requested changes. I'm asking for review from these different folks because Bobby's been busy, so I'm reaching out to anyone with domain expertise.
Attachment #410884 - Attachment is obsolete: true
Attachment #411244 - Flags: review?(vladimir)
Attachment #411244 - Flags: review?(jmuizelaar)
Attachment #411244 - Flags: review?(bobbyholley)
Attachment #410884 - Flags: review?(bobbyholley)
Attachment #411244 - Flags: review?(alfredkayser)
Attachment #411244 - Flags: review?(jmuizelaar) → review+
Comment on attachment 411244 [details] [diff] [review]
(against 1.9.2) Add GetDrawableImgFrame() v2

r=ak.
It is indeed a good thing to differentiate between a 'normal' getFrame, which gets the current source frame, and a 'composition' frame which should be the current display/draw frame, showing the animation as it currently looks like.

Note, that even the first could return null if not all frames were successfully decoded (or it is then always skipped?).
Attachment #411244 - Flags: review?(alfredkayser) → review+
Component: Graphics → ImageLib
QA Contact: thebes → imagelib
Whiteboard: [has patch]
Pushed a merged-to-trunk version of this to m-c: http://hg.mozilla.org/mozilla-central/rev/fe0a0e18a51f

This needs to stay open until we're sure it's fixed, which we probably won't (due to low usage on trunk builds) until after a 3.6 beta is pushed out with the fix. I'll push this to 3.6 once it's baked on m-c a little bit, probably tomorrow.

Alfred: Luckily, the first frame returning null is handled properly everywhere. Also, the first frame is always RGBA, never paletted.
Severity: normal → critical
So far, so good. Recently, we've been having these crashes on trunk builds, but they have yet to occur with a buildID of today's build. In response, I've pushed this to 1.9.2: http://hg.mozilla.org/releases/mozilla-1.9.2/rev/ab1659095001
Unfortunately we can still crash at this precise location - bug 289763 has some testcases. Investigating....
Depends on: 289763
Whiteboard: [has patch]
I don't know how this bit got lost from the patch I ended up checking in, but it's pretty essential. If compositingFrame is non-null, we assume it's initialized successfully, which isn't the case here. And the check we added, for CompositingFailed, wasn't used here.

This fixes the crash in bug 289763.
Attachment #412745 - Flags: review?(jmuizelaar)
Attachment #412745 - Flags: review?(bobbyholley)
Comment on attachment 412745 [details] [diff] [review]
handle compositing frame initialization failure

Seems sane.
Attachment #412745 - Flags: review?(jmuizelaar) → review+
And this should get a test case.
http://hg.mozilla.org/mozilla-central/rev/cccd2091409c

and a basic crashtest from bug 289763: http://hg.mozilla.org/mozilla-central/rev/9ee50f360dd4
Blocks: 289763
Status: NEW → RESOLVED
Closed: 12 years ago
No longer depends on: 289763
Resolution: --- → FIXED
I would prefer more separation between "the image frame to draw" and "the image frame as generated by the decoder". Ideally, I would like GetImgFrame to always return pure decoded frames, and only multiplex with the composited frames at the stage of GetDrawableImgFrame. The current way seems kind of confusing to me, because GetDrawableImgFrame just presents the possibility of nulling-out something otherwise non-null. Furthermore, I don't the weirdness of GetImgFrame returning something totally different depending on whether we call DoComposite first. Not all consumers of GetImgFrame() do so with the intent of drawing (for example, the decoders use it), so having it alter its behavior based on drawing state seems wrong.

So I'm proposing that GetImgFrame become a straight-up accessor to the decoded frame array. If somebody wants to draw a frame, they call GetDrawableImgFrame(index), which ensures that all the proper compositing is iteratively done from 0 to index if it hasn't been done already, and then returns the appropriate drawable.

Thoughts joe?
Duplicate of this bug: 530940
This is also #4 top ranked in new 3.6beata regression crashes not previously seen in 3.5.x

http://people.mozilla.com/~chofmann/crash-data/new-crashes/new-36b3-from-35x-as-of-20091122.html

so we need to keep a close eye on beta 4 to confirm and verify the fix(es).
This will never appear in 3.5, because imgFrame does not exist there.
Attachment #411244 - Flags: review?(bobbyholley+bmo) → feedback?(bobbyholley+bmo)
Attachment #412745 - Flags: review?(bobbyholley+bmo) → feedback?(bobbyholley+bmo)
Attachment #411244 - Flags: feedback?(bobbyholley+bmo)
(In reply to comment #32)
> I would prefer more separation between "the image frame to draw" and "the image
> frame as generated by the decoder". Ideally, I would like GetImgFrame to always
> return pure decoded frames, and only multiplex with the composited frames at
> the stage of GetDrawableImgFrame. The current way seems kind of confusing to
> me, because GetDrawableImgFrame just presents the possibility of nulling-out
> something otherwise non-null. Furthermore, I don't the weirdness of GetImgFrame
> returning something totally different depending on whether we call DoComposite
> first. Not all consumers of GetImgFrame() do so with the intent of drawing (for
> example, the decoders use it), so having it alter its behavior based on drawing
> state seems wrong.
> 
> So I'm proposing that GetImgFrame become a straight-up accessor to the decoded
> frame array. If somebody wants to draw a frame, they call
> GetDrawableImgFrame(index), which ensures that all the proper compositing is
> iteratively done from 0 to index if it hasn't been done already, and then
> returns the appropriate drawable.
> 
> Thoughts joe?

Joe - thoughts on this? Is it worth filing a bug?
Attachment #412745 - Flags: feedback?(bobbyholley+bmo)
Hey, sorry for never getting back to you - that sounds fine. File a bug!
Crash Signature: [@imgFrame::Draw(gfxContext*, gfxPattern::GraphicsFilter, gfxMatrix const&, gfxRect const&, nsIntMargin const&, nsIntRect const&) ]
You need to log in before you can comment on or make changes to this bug.