Last Comment Bug 525326 - Crashes in gif decoder [@ xul.dll@0x348945][@ xul.dll@0x348864][@nsGIFDecoder2::GifWrite(unsigned char const*, unsigned int)]
: Crashes in gif decoder [@ xul.dll@0x348945][@ xul.dll@0x348864][@nsGIFDecoder...
Status: RESOLVED FIXED
[sg:dos] null deref
: crash, testcase, topcrash, verified1.9.0.16, verified1.9.1
Product: Core
Classification: Components
Component: ImageLib (show other bugs)
: Trunk
: All All
: P1 critical (vote)
: mozilla1.9.3a1
Assigned To: Jeff Muizelaar [:jrmuizel]
:
:
Mentors:
: 527116 (view as bug list)
Depends on:
Blocks: 525814
  Show dependency treegraph
 
Reported: 2009-10-29 14:33 PDT by Joe Drew (not getting mail)
Modified: 2011-06-13 10:01 PDT (History)
17 users (show)
samuel.sidler+old: blocking1.9.0.16+
samuel.sidler+old: wanted1.9.0.x+
joe: in‑testsuite+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
.5+
.5-fixed


Attachments
Gif that causes a crash (673 bytes, image/gif)
2009-10-29 23:21 PDT, Jeff Muizelaar [:jrmuizel]
no flags Details
A gif from the wild that crashes (11.33 KB, image/gif)
2009-10-29 23:30 PDT, Jeff Muizelaar [:jrmuizel]
no flags Details
a patch to avoid dereferencing null mImageFrame's (551 bytes, patch)
2009-10-30 11:27 PDT, Jeff Muizelaar [:jrmuizel]
vladimir: review+
alfredkayser: review+
Details | Diff | Splinter Review
Patch including the checkin message (1.01 KB, patch)
2009-10-31 08:55 PDT, Jeff Muizelaar [:jrmuizel]
samuel.sidler+old: approval1.9.1.5+
samuel.sidler+old: approval1.9.0.16+
Details | Diff | Splinter Review
gif tests (1.49 KB, patch)
2009-11-02 07:24 PST, Jeff Muizelaar [:jrmuizel]
joe: review+
Details | Diff | Splinter Review

Description Joe Drew (not getting mail) 2009-10-29 14:33:51 PDT
We're seeing lots of crashes in the GIF decoder, involving nsGIFDecoder2::ProcessData calling GifWrite(), then a null dereference. Right now, this is showing up in the noted crash site, but this is a Windows-specific, 3.5.4-specific crash site, since the offset in xul.dll will change with every build, and obviously xul.dll has no meaning on OS X or Linux.

Our first priority is figuring out where we're crashing; Jeff Muizelaar and I are working on that. A parallel process that someone else could look into is whether this started in 3.5.4, or whether similar crashes existed elsewhere.
Comment 1 Mike Beltzner [:beltzner, not reading bugmail] 2009-10-29 14:52:21 PDT
A lot of the comments for this crash implicate the bookmarks stuff, which makes me think that it's to do with .ICO files. Is there a bandaid that we might be able to issue or way we can figure out if the favicon service might be causing this to fail hard?
Comment 3 Samuel Sidler (old account; do not CC) 2009-10-29 14:55:04 PDT
(Note that there very well could be a Mac- or Linux-specific crash with a different signature that's exactly the same, but it might not be in the top 100.)
Comment 4 Joe Drew (not getting mail) 2009-10-29 14:56:55 PDT
It shouldn't have anything to do with .ICO files, but if a site specifies a GIF as its favicon, I think we'll happily load it.
Comment 5 Samuel Sidler (old account; do not CC) 2009-10-29 14:58:25 PDT
The first time I see this stack signature on 1.9.1 is in 3.5.4 builds on 10/22. It's not present in 3.5.4pre builds prior to that, which almost makes me think we regressed this in the patch we took for build 3 of 3.5.4... wasn't that a gif bug?
Comment 7 Samuel Sidler (old account; do not CC) 2009-10-29 15:00:11 PDT
I'm thinking of bug 519589.
Comment 8 Samuel Sidler (old account; do not CC) 2009-10-29 15:38:09 PDT
I'm a liar. Joe pointed out that I should be searching for other xul.dll crashes in general.

In the first 3.5.4 build, we saw some of these crashes at xul.dll@0x350ca9. e.g.: http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.4&query_search=signature&query_type=startswith&query=xul.dll&date=10%2F20%2F2009&range_value=1&range_unit=weeks&do_query=1&signature=xul.dll%400x350ca9

There's not much data for 3.5.4pre builds (because there's not many users) and there's only a few crashes in 7-day periods that crash in xul.dll, but the first time I see this crash on 1.9.1 is bp-7c622a06-1d57-4d34-afd8-540af2091016. That crash happened using the 20091002095927 nightly.

Note: I also discovered that this crash has a second xul.dll signature in 3.5.4: http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.4&query_search=signature&query_type=startswith&query=xul.dll&date=&range_value=1&range_unit=weeks&do_query=1&signature=xul.dll%400x348864
Comment 9 Joe Drew (not getting mail) 2009-10-29 16:12:20 PDT
Things I need to fix this bug: 
 - A line number this crashes on, or at least a general area; or
 - a GIF file that can reproduce this.

I believe that Sam has a couple of URLs that look suspicious; if someone on Windows (though I believe it to be a general problem, no use limiting ourselves) wants to talk to Sam, then browse through some of the URLs, trying to crash, that would be a big help.
Comment 10 Mike Beltzner [:beltzner, not reading bugmail] 2009-10-29 16:13:54 PDT
Can we get in touch (via email, if provided) from some of the people who wrote comments in the crash stacks asking them to tell us the URLs of the sites in those folders? That might be a way to get GIFs that cause the crash.
Comment 12 Mike Beltzner [:beltzner, not reading bugmail] 2009-10-29 16:23:28 PDT
http://twitter.com/beltzner/status/5272436091 - let's see if that helps
Comment 14 Jeff Muizelaar [:jrmuizel] 2009-10-29 20:52:51 PDT
It's pretty weird that we don't get any symbols or line numbers for this location. There's no relevant information in xul.sym for this range, so the problem isn't in the stack unwinder. Interestingly enough the symbol information for GifWrite() is duplicated in xul.sym. I'll check the pdb tomorrow to see if it's also a mess. Perhaps this problem is related to bug 520651.
Comment 15 Jeff Muizelaar [:jrmuizel] 2009-10-29 21:13:57 PDT
Having a quick read through of the assembly at that address I'd guess that the crash is near this block at line 1137 in 1.9.1

        if (mGIFStruct.images_decoded) {
          // Copy global colormap into the palette of current frame
          PRUint32 size;
          mImageFrame->GetPaletteData(&mColormap, &size);
          memcpy(mColormap, mGIFStruct.global_colormap, size);
        }
Comment 16 Samuel Sidler (old account; do not CC) 2009-10-29 21:30:43 PDT
(In reply to comment #14)
> It's pretty weird that we don't get any symbols or line numbers for this
> location. 

17:37 < lars> or do you mean that you're looking to get an explanation from breakpad as to why there are no symbols for that offset into xul.dll?  In the past, I've seen ted indicate that it might mean a jump into random memory because other offsets in that dll do have symbols
19:08 <@ted> ss: yeah, that looks like a jump to random memory, basically
19:10 <@ted> looks like GifWrite got inlined into ProcessData there
Comment 17 Jeff Muizelaar [:jrmuizel] 2009-10-29 21:34:53 PDT
Here's my suggested annotation of the assembly:

We crash dereferencing eax which is presumably null.
52f78945 8b10            mov     edx,dword ptr [eax]
if we assume that eax is mImageFrame then edx will be a pointer to the vtable

52f78947 8b5240          mov     edx,dword ptr [edx+40h]
here we load the address of GetPaletteData into edx

52f7894a 8d4c241c        lea     ecx,[esp+1Ch]

52f7894e 51              push    ecx
push the address of 'size' onto the stack
52f7894f 57              push    edi
push the address of 'mColormap' onto the stack
52f78950 50              push    eax
push 'this' onto the stack
52f78951 ffd2            call    edx
call GetPaletteData


52f78953 8b44241c        mov     eax,dword ptr [esp+1Ch]
put the value of 'size' into eax. esp+1C is stack local variable

52f78957 8b0f            mov     ecx,dword ptr [edi]
put the value of 'mColormap' onto the stack

52f78959 50              push    eax
push the size onto the stack

52f7895a 8d86b8210000    lea     eax,[esi+21B8h]
get the address mGIFStruct.global_colormap? 

52f78960 50              push    eax
push mGIFStruct.global_colormap?
52f78961 51              push    ecx
push mColormap

52f78962 e8ef7df3ff      call    xul!memcpy (52eb0756)


This annotation fits very well with my original hypothesis. So it looks like the problem is that mImageFrame is null.
Comment 18 Jeff Muizelaar [:jrmuizel] 2009-10-29 21:38:43 PDT
(In reply to comment #16)
> (In reply to comment #14)
> > It's pretty weird that we don't get any symbols or line numbers for this
> > location. 
> 
> 17:37 < lars> or do you mean that you're looking to get an explanation from
> breakpad as to why there are no symbols for that offset into xul.dll?  In the
> past, I've seen ted indicate that it might mean a jump into random memory
> because other offsets in that dll do have symbols
> 19:08 <@ted> ss: yeah, that looks like a jump to random memory, basically
> 19:10 <@ted> looks like GifWrite got inlined into ProcessData there

From what I can tell, there was no jump into random memory. The code at 0x348945 is part of the GIF decoder and seems to make sense. If you look at the addresses in the .sym file there are no symbols even close to this region...
Comment 19 Jeff Muizelaar [:jrmuizel] 2009-10-29 21:50:02 PDT
This area of code saw a lot of churn during the 3.5.4 release:
bug 514776, bug 511689 and bug 519589
Comment 20 Samuel Sidler (old account; do not CC) 2009-10-29 21:58:26 PDT
Joe was pointing at bug 511689 on IRC earlier. It's not bug 519589 because that landed on 1.9.1 after our first sighting of this crash.
Comment 21 Jeff Muizelaar [:jrmuizel] 2009-10-29 21:58:47 PDT
My next guess is that the problem is related to invalid depth handling.
Comment 22 Jeff Muizelaar [:jrmuizel] 2009-10-29 23:21:17 PDT
Created attachment 409290 [details]
Gif that causes a crash

Looks like my guess may not have been right.

The attached image has two frames, the second one has dimensions of 65535x65535. These dimensions cause BeginImageFrame to fail, after which mImageFrame is null.

Since, mGIFStruct.images_decoded is true we pass into this block:

           // Copy global colormap into the palette of current frame
          PRUint32 size;
          mImageFrame->GetPaletteData(&mColormap, &size);
          memcpy(mColormap, mGIFStruct.global_colormap, size);

and proceed to crash. The likely cause of this is bug 511689, but I haven't figured out why.

I should've looked at the url's before making a test case, but for what it's worth, this page: http://gallery.site.hu/u/Snokuk_54/GO/Vegyes_egyebek/smileyk/smileyk/?g2_page=2 has gif that also causes crashes on 3.5.4. I've not investigated why.
Comment 23 Jeff Muizelaar [:jrmuizel] 2009-10-29 23:30:07 PDT
Created attachment 409291 [details]
A gif from the wild that crashes

Here's the gif from the url that I mentioned earlier that crashes.

That's it for me tonight.
Comment 24 Ted Mielczarek [:ted.mielczarek] 2009-10-30 03:58:37 PDT
Yeah, "random memory" was too strong of a statement there. Interesting that we don't have symbols for that area, then. Is the symbol dumper missing anything, or is this just the optimizer confusing the heck out of us? I noticed that it appeared to have inlined all of GifWrite.
Comment 25 Jeff Muizelaar [:jrmuizel] 2009-10-30 09:20:33 PDT
Some new information:
- Trunk and 1.9.2 do not seem to crash with the examples I've found. Have we seen any crashes like this with those trees?
- The gif from the wild seems to have a similar problem to the one I created.
Comment 26 Jeff Muizelaar [:jrmuizel] 2009-10-30 11:11:53 PDT
The reason we don't crash on trunk and 1.9.2 is that we don't assume that mImageFrame is valid when copying over palette data.
Comment 27 Jeff Muizelaar [:jrmuizel] 2009-10-30 11:27:41 PDT
Created attachment 409368 [details] [diff] [review]
a patch to avoid dereferencing null mImageFrame's

This patch will fix the problem, but I'm not sure about its risk. We didn't have the check before and were fine, so I don't want to break anything else. Still, this patch is better than what we currently have.
Comment 28 Henrik Skupin (:whimboo) [away 09/30 - 10/06] 2009-10-30 12:56:24 PDT
Just for reference. This also crashes on OS X: bp-6011f77f-de07-4131-b61d-986502091030

First 5 frames:
0  	XUL  	nsGIFDecoder2::GifWrite  	 modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:1141
1 	XUL 	nsGIFDecoder2::ProcessData 	modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:250
2 	XUL 	ReadDataOut 	modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:190
3 	XUL 	nsInputStreamTee::WriteSegmentFun 	xpcom/io/nsInputStreamTee.cpp:102
4 	XUL 	nsPipeInputStream::ReadSegments 	xpcom/io/nsPipe3.cpp:799
5 	XUL 	nsGIFDecoder2::WriteFrom 	modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:269
Comment 29 Jeff Muizelaar [:jrmuizel] 2009-10-30 14:28:44 PDT
This bug was actually caused by bug 514776 which removed the check for null mImageFrame. I don't know why that check was removed. The patch that I posted earlier should be safe to go in.
Comment 30 Jeff Muizelaar [:jrmuizel] 2009-10-31 08:55:23 PDT
Created attachment 409517 [details] [diff] [review]
Patch including the checkin message

Pushed to 1.9.1 as:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/544bc11bc532

I don't have cvs access so I'll need a volunteer to push this to 1.9.0 cvs.
Comment 31 Reed Loden [:reed] (use needinfo?) 2009-10-31 12:24:50 PDT
Comment on attachment 409517 [details] [diff] [review]
Patch including the checkin message

Why did this land on 1.9.1 without approval? Please read http://groups.google.com/group/mozilla.dev.planning/browse_thread/thread/3ca007cf277a5147.
Comment 32 Jeff Muizelaar [:jrmuizel] 2009-10-31 12:41:27 PDT
(In reply to comment #31)
> (From update of attachment 409517 [details] [diff] [review])
> Why did this land on 1.9.1 without approval? Please read
> http://groups.google.com/group/mozilla.dev.planning/browse_thread/thread/3ca007cf277a5147.

I didn't realize that was required. Should I back it out?
Comment 33 Samuel Sidler (old account; do not CC) 2009-10-31 12:51:09 PDT
Comment on attachment 409517 [details] [diff] [review]
Patch including the checkin message

No, don't back it out. The more coverage this gets, the better.
Comment 34 Samuel Sidler (old account; do not CC) 2009-10-31 12:52:09 PDT
(Approved across those two branches.)
Comment 35 Samuel Sidler (old account; do not CC) 2009-11-01 02:48:18 PST
Joe/Jeff (or anyone else): Can you please get this landed on 1.9.1 ASAP? We'd like as much coverage as we can get.
Comment 36 Samuel Sidler (old account; do not CC) 2009-11-01 13:21:27 PST
(What am I talking about? It already landed. *sigh*)
Comment 37 Joe Drew (not getting mail) 2009-11-01 19:56:49 PST
Checking in modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp;
/cvsroot/mozilla/modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp,v  <--  nsGIFDecoder2.cpp
new revision: 1.108; previous revision: 1.107
Comment 38 Joe Drew (not getting mail) 2009-11-01 20:02:09 PST
I do not believe there is any reason to leave the gifs that cause crashes as secret attachments, correct? This is just a DoS, and one that has been well-described in checkin comments, at that?

Also, Jeff, can you check in a crash test for this bug (m-c only is ok)?

Finally, I think we should probably file a bug on breakpad, or at least our installation of it, for serving areas of GifWrite without symbol information? Or do we honestly think that that 483-line function was inlined?
Comment 39 Samuel Sidler (old account; do not CC) 2009-11-01 20:53:32 PST
(In reply to comment #38)
> I do not believe there is any reason to leave the gifs that cause crashes as
> secret attachments, correct? This is just a DoS, and one that has been
> well-described in checkin comments, at that?

Correct. Open them up and land them away!
Comment 40 Ted Mielczarek [:ted.mielczarek] 2009-11-02 04:35:34 PST
(In reply to comment #38)
> Finally, I think we should probably file a bug on breakpad, or at least our
> installation of it, for serving areas of GifWrite without symbol information?
> Or do we honestly think that that 483-line function was inlined?

We could file a bug on this yes. I do believe it was inlined, since it's only called in one place. Why wouldn't it be? PGO does some pretty aggressive inlining. Lacking symbols for that case is bad, of course.
Comment 41 Jeff Muizelaar [:jrmuizel] 2009-11-02 07:24:19 PST
Created attachment 409684 [details] [diff] [review]
gif tests
Comment 42 Samuel Sidler (old account; do not CC) 2009-11-02 08:15:12 PST
This needs to land on GECKO1914_20091006_RELBRANCH for 1.9.1.5.
Comment 43 Joe Drew (not getting mail) 2009-11-02 10:02:14 PST
thur ya go
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/edf189567edc
Comment 44 Samuel Sidler (old account; do not CC) 2009-11-02 11:19:39 PST
Verified this is fixed on 1.9.1 by making sure both testcases didn't crash 3.5.3, did crash 3.5.4, and didn't crash using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.5pre) Gecko/20091102 Shiretoko/3.5.5pre

I'll save the keyword for when we verify using official builds.
Comment 45 Jeff Muizelaar [:jrmuizel] 2009-11-02 11:42:35 PST
Test case pushed:
http://hg.mozilla.org/mozilla-central/rev/a954f52e3149
Comment 46 Henrik Skupin (:whimboo) [away 09/30 - 10/06] 2009-11-02 17:49:29 PST
Jeff, when this patch will be checked into 1.9.2?
Comment 47 Jeff Muizelaar [:jrmuizel] 2009-11-02 17:52:08 PST
(In reply to comment #46)
> Jeff, when this patch will be checked into 1.9.2?

1.9.2 is not affected.
Comment 48 timeless 2009-11-03 00:17:07 PST
can the tests be pushed to 1.9.2?
Comment 49 Al Billings [:abillings] 2009-11-03 10:47:35 PST
Verified for 1.9.1.5 with the official build 1 (Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5). 

Verified the crash in 1.9.1.4 and the lack of crash in 1.9.1.3 as well.
Comment 50 Ted Mielczarek [:ted.mielczarek] 2009-11-04 07:01:13 PST
(In reply to comment #38) 
> Finally, I think we should probably file a bug on breakpad, or at least our
> installation of it, for serving areas of GifWrite without symbol information?
> Or do we honestly think that that 483-line function was inlined?

I filed bug 526484 on investigating this.
Comment 51 Daniel Veditz [:dveditz] 2009-11-06 14:53:22 PST
*** Bug 527116 has been marked as a duplicate of this bug. ***
Comment 52 Jeff Muizelaar [:jrmuizel] 2009-11-08 18:57:30 PST
Comment on attachment 409517 [details] [diff] [review]
Patch including the checkin message

This patch doesn't need 1.9.1.6 approval because it is already in 1.9.1.5
Comment 53 Al Billings [:abillings] 2009-11-23 10:41:05 PST
Verified again for 1.9.0.16 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.16pre) Gecko/2009111921 GranParadiso/3.0.16pre (.NET CLR 3.5.30729). Behavior is the same (good) behavior as 1.9.0.15.

Note You need to log in before you can comment on or make changes to this bug.