Closed Bug 525326 Opened 10 years ago Closed 10 years ago

Crashes in gif decoder [@ xul.dll@0x348945][@ xul.dll@0x348864][@nsGIFDecoder2::GifWrite(unsigned char const*, unsigned int)]

Categories

(Core :: ImageLib, defect, P1, critical)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1
Tracking Status
blocking1.9.1 --- .5+
status1.9.1 --- .5-fixed

People

(Reporter: joe, Assigned: jrmuizel)

References

Details

(5 keywords, Whiteboard: [sg:dos] null deref)

Crash Data

Attachments

(4 files, 1 obsolete file)

We're seeing lots of crashes in the GIF decoder, involving nsGIFDecoder2::ProcessData calling GifWrite(), then a null dereference. Right now, this is showing up in the noted crash site, but this is a Windows-specific, 3.5.4-specific crash site, since the offset in xul.dll will change with every build, and obviously xul.dll has no meaning on OS X or Linux.

Our first priority is figuring out where we're crashing; Jeff Muizelaar and I are working on that. A parallel process that someone else could look into is whether this started in 3.5.4, or whether similar crashes existed elsewhere.
A lot of the comments for this crash implicate the bookmarks stuff, which makes me think that it's to do with .ICO files. Is there a bandaid that we might be able to issue or way we can figure out if the favicon service might be causing this to fail hard?
(Note that there very well could be a Mac- or Linux-specific crash with a different signature that's exactly the same, but it might not be in the top 100.)
It shouldn't have anything to do with .ICO files, but if a site specifies a GIF as its favicon, I think we'll happily load it.
The first time I see this stack signature on 1.9.1 is in 3.5.4 builds on 10/22. It's not present in 3.5.4pre builds prior to that, which almost makes me think we regressed this in the patch we took for build 3 of 3.5.4... wasn't that a gif bug?
I'm a liar. Joe pointed out that I should be searching for other xul.dll crashes in general.

In the first 3.5.4 build, we saw some of these crashes at xul.dll@0x350ca9. e.g.: http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.4&query_search=signature&query_type=startswith&query=xul.dll&date=10%2F20%2F2009&range_value=1&range_unit=weeks&do_query=1&signature=xul.dll%400x350ca9

There's not much data for 3.5.4pre builds (because there's not many users) and there's only a few crashes in 7-day periods that crash in xul.dll, but the first time I see this crash on 1.9.1 is bp-7c622a06-1d57-4d34-afd8-540af2091016. That crash happened using the 20091002095927 nightly.

Note: I also discovered that this crash has a second xul.dll signature in 3.5.4: http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.4&query_search=signature&query_type=startswith&query=xul.dll&date=&range_value=1&range_unit=weeks&do_query=1&signature=xul.dll%400x348864
Summary: Crashes in gif decoder [@ xul.dll@0x348945] → Crashes in gif decoder [@ xul.dll@0x348945][@ xul.dll@0x348864]
Things I need to fix this bug: 
 - A line number this crashes on, or at least a general area; or
 - a GIF file that can reproduce this.

I believe that Sam has a couple of URLs that look suspicious; if someone on Windows (though I believe it to be a general problem, no use limiting ourselves) wants to talk to Sam, then browse through some of the URLs, trying to crash, that would be a big help.
Can we get in touch (via email, if provided) from some of the people who wrote comments in the crash stacks asking them to tell us the URLs of the sites in those folders? That might be a way to get GIFs that cause the crash.
Keywords: testcase-wanted
Whiteboard: [sg:dos] null deref
It's pretty weird that we don't get any symbols or line numbers for this location. There's no relevant information in xul.sym for this range, so the problem isn't in the stack unwinder. Interestingly enough the symbol information for GifWrite() is duplicated in xul.sym. I'll check the pdb tomorrow to see if it's also a mess. Perhaps this problem is related to bug 520651.
Having a quick read through of the assembly at that address I'd guess that the crash is near this block at line 1137 in 1.9.1

        if (mGIFStruct.images_decoded) {
          // Copy global colormap into the palette of current frame
          PRUint32 size;
          mImageFrame->GetPaletteData(&mColormap, &size);
          memcpy(mColormap, mGIFStruct.global_colormap, size);
        }
(In reply to comment #14)
> It's pretty weird that we don't get any symbols or line numbers for this
> location. 

17:37 < lars> or do you mean that you're looking to get an explanation from breakpad as to why there are no symbols for that offset into xul.dll?  In the past, I've seen ted indicate that it might mean a jump into random memory because other offsets in that dll do have symbols
19:08 <@ted> ss: yeah, that looks like a jump to random memory, basically
19:10 <@ted> looks like GifWrite got inlined into ProcessData there
Here's my suggested annotation of the assembly:

We crash dereferencing eax which is presumably null.
52f78945 8b10            mov     edx,dword ptr [eax]
if we assume that eax is mImageFrame then edx will be a pointer to the vtable

52f78947 8b5240          mov     edx,dword ptr [edx+40h]
here we load the address of GetPaletteData into edx

52f7894a 8d4c241c        lea     ecx,[esp+1Ch]

52f7894e 51              push    ecx
push the address of 'size' onto the stack
52f7894f 57              push    edi
push the address of 'mColormap' onto the stack
52f78950 50              push    eax
push 'this' onto the stack
52f78951 ffd2            call    edx
call GetPaletteData


52f78953 8b44241c        mov     eax,dword ptr [esp+1Ch]
put the value of 'size' into eax. esp+1C is stack local variable

52f78957 8b0f            mov     ecx,dword ptr [edi]
put the value of 'mColormap' onto the stack

52f78959 50              push    eax
push the size onto the stack

52f7895a 8d86b8210000    lea     eax,[esi+21B8h]
get the address mGIFStruct.global_colormap? 

52f78960 50              push    eax
push mGIFStruct.global_colormap?
52f78961 51              push    ecx
push mColormap

52f78962 e8ef7df3ff      call    xul!memcpy (52eb0756)


This annotation fits very well with my original hypothesis. So it looks like the problem is that mImageFrame is null.
(In reply to comment #16)
> (In reply to comment #14)
> > It's pretty weird that we don't get any symbols or line numbers for this
> > location. 
> 
> 17:37 < lars> or do you mean that you're looking to get an explanation from
> breakpad as to why there are no symbols for that offset into xul.dll?  In the
> past, I've seen ted indicate that it might mean a jump into random memory
> because other offsets in that dll do have symbols
> 19:08 <@ted> ss: yeah, that looks like a jump to random memory, basically
> 19:10 <@ted> looks like GifWrite got inlined into ProcessData there

From what I can tell, there was no jump into random memory. The code at 0x348945 is part of the GIF decoder and seems to make sense. If you look at the addresses in the .sym file there are no symbols even close to this region...
This area of code saw a lot of churn during the 3.5.4 release:
bug 514776, bug 511689 and bug 519589
Joe was pointing at bug 511689 on IRC earlier. It's not bug 519589 because that landed on 1.9.1 after our first sighting of this crash.
My next guess is that the problem is related to invalid depth handling.
Attached image Gif that causes a crash
Looks like my guess may not have been right.

The attached image has two frames, the second one has dimensions of 65535x65535. These dimensions cause BeginImageFrame to fail, after which mImageFrame is null.

Since, mGIFStruct.images_decoded is true we pass into this block:

           // Copy global colormap into the palette of current frame
          PRUint32 size;
          mImageFrame->GetPaletteData(&mColormap, &size);
          memcpy(mColormap, mGIFStruct.global_colormap, size);

and proceed to crash. The likely cause of this is bug 511689, but I haven't figured out why.

I should've looked at the url's before making a test case, but for what it's worth, this page: http://gallery.site.hu/u/Snokuk_54/GO/Vegyes_egyebek/smileyk/smileyk/?g2_page=2 has gif that also causes crashes on 3.5.4. I've not investigated why.
Here's the gif from the url that I mentioned earlier that crashes.

That's it for me tonight.
blocking1.9.1: --- → ?
Flags: blocking1.9.2?
Flags: blocking1.9.0.16?
blocking1.9.1: ? → .5+
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.16?
Flags: blocking1.9.0.16+
Yeah, "random memory" was too strong of a statement there. Interesting that we don't have symbols for that area, then. Is the symbol dumper missing anything, or is this just the optimizer confusing the heck out of us? I noticed that it appeared to have inlined all of GifWrite.
blocking1.9.1: .5+ → ---
Flags: blocking1.9.2? → blocking1.9.2+
Priority: -- → P1
Some new information:
- Trunk and 1.9.2 do not seem to crash with the examples I've found. Have we seen any crashes like this with those trees?
- The gif from the wild seems to have a similar problem to the one I created.
blocking1.9.1: --- → .5+
The reason we don't crash on trunk and 1.9.2 is that we don't assume that mImageFrame is valid when copying over palette data.
blocking1.9.1: .5+ → ---
blocking1.9.1: --- → .5+
This patch will fix the problem, but I'm not sure about its risk. We didn't have the check before and were fine, so I don't want to break anything else. Still, this patch is better than what we currently have.
Attachment #409368 - Flags: superreview?
Attachment #409368 - Flags: review?(joe)
Attachment #409368 - Flags: review?(alfredkayser)
Attachment #409368 - Flags: superreview?
Just for reference. This also crashes on OS X: bp-6011f77f-de07-4131-b61d-986502091030

First 5 frames:
0  	XUL  	nsGIFDecoder2::GifWrite  	 modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:1141
1 	XUL 	nsGIFDecoder2::ProcessData 	modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:250
2 	XUL 	ReadDataOut 	modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:190
3 	XUL 	nsInputStreamTee::WriteSegmentFun 	xpcom/io/nsInputStreamTee.cpp:102
4 	XUL 	nsPipeInputStream::ReadSegments 	xpcom/io/nsPipe3.cpp:799
5 	XUL 	nsGIFDecoder2::WriteFrom 	modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:269
Assignee: nobody → jmuizelaar
Severity: normal → critical
Status: NEW → ASSIGNED
Keywords: crash, topcrash
OS: Mac OS X → All
Hardware: x86 → All
Summary: Crashes in gif decoder [@ xul.dll@0x348945][@ xul.dll@0x348864] → Crashes in gif decoder [@ xul.dll@0x348945][@ xul.dll@0x348864][@nsGIFDecoder2::GifWrite(unsigned char const*, unsigned int)]
This bug was actually caused by bug 514776 which removed the check for null mImageFrame. I don't know why that check was removed. The patch that I posted earlier should be safe to go in.
Attachment #409368 - Flags: review?(alfredkayser) → review+
Pushed to 1.9.1 as:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/544bc11bc532

I don't have cvs access so I'll need a volunteer to push this to 1.9.0 cvs.
Attachment #409368 - Attachment is obsolete: true
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment on attachment 409517 [details] [diff] [review]
Patch including the checkin message

Why did this land on 1.9.1 without approval? Please read http://groups.google.com/group/mozilla.dev.planning/browse_thread/thread/3ca007cf277a5147.
Attachment #409517 - Flags: approval1.9.1.5?
Attachment #409517 - Flags: approval1.9.0.16?
(In reply to comment #31)
> (From update of attachment 409517 [details] [diff] [review])
> Why did this land on 1.9.1 without approval? Please read
> http://groups.google.com/group/mozilla.dev.planning/browse_thread/thread/3ca007cf277a5147.

I didn't realize that was required. Should I back it out?
Comment on attachment 409517 [details] [diff] [review]
Patch including the checkin message

No, don't back it out. The more coverage this gets, the better.
Attachment #409517 - Flags: approval1.9.1.5?
Attachment #409517 - Flags: approval1.9.1.5+
Attachment #409517 - Flags: approval1.9.0.16?
Attachment #409517 - Flags: approval1.9.0.16+
(Approved across those two branches.)
Joe/Jeff (or anyone else): Can you please get this landed on 1.9.1 ASAP? We'd like as much coverage as we can get.
(What am I talking about? It already landed. *sigh*)
Checking in modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp;
/cvsroot/mozilla/modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp,v  <--  nsGIFDecoder2.cpp
new revision: 1.108; previous revision: 1.107
Keywords: fixed1.9.0.16
I do not believe there is any reason to leave the gifs that cause crashes as secret attachments, correct? This is just a DoS, and one that has been well-described in checkin comments, at that?

Also, Jeff, can you check in a crash test for this bug (m-c only is ok)?

Finally, I think we should probably file a bug on breakpad, or at least our installation of it, for serving areas of GifWrite without symbol information? Or do we honestly think that that 483-line function was inlined?
(In reply to comment #38)
> I do not believe there is any reason to leave the gifs that cause crashes as
> secret attachments, correct? This is just a DoS, and one that has been
> well-described in checkin comments, at that?

Correct. Open them up and land them away!
(In reply to comment #38)
> Finally, I think we should probably file a bug on breakpad, or at least our
> installation of it, for serving areas of GifWrite without symbol information?
> Or do we honestly think that that 483-line function was inlined?

We could file a bug on this yes. I do believe it was inlined, since it's only called in one place. Why wouldn't it be? PGO does some pretty aggressive inlining. Lacking symbols for that case is bad, of course.
Attached patch gif testsSplinter Review
Attachment #409684 - Flags: review?(joe)
Attachment #409517 - Flags: approval1.9.1.5+
blocking1.9.1: .6+ → .5+
This needs to land on GECKO1914_20091006_RELBRANCH for 1.9.1.5.
Attachment #409684 - Flags: review?(joe) → review+
Verified this is fixed on 1.9.1 by making sure both testcases didn't crash 3.5.3, did crash 3.5.4, and didn't crash using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.5pre) Gecko/20091102 Shiretoko/3.5.5pre

I'll save the keyword for when we verify using official builds.
Flags: in-testsuite?
Target Milestone: --- → mozilla1.9.3a1
Flags: in-testsuite? → in-testsuite+
Attachment #409290 - Attachment is private: false
Attachment #409291 - Attachment is private: false
Jeff, when this patch will be checked into 1.9.2?
(In reply to comment #46)
> Jeff, when this patch will be checked into 1.9.2?

1.9.2 is not affected.
Flags: blocking1.9.2+
can the tests be pushed to 1.9.2?
Verified for 1.9.1.5 with the official build 1 (Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5). 

Verified the crash in 1.9.1.4 and the lack of crash in 1.9.1.3 as well.
Keywords: verified1.9.1
(In reply to comment #38) 
> Finally, I think we should probably file a bug on breakpad, or at least our
> installation of it, for serving areas of GifWrite without symbol information?
> Or do we honestly think that that 483-line function was inlined?

I filed bug 526484 on investigating this.
Duplicate of this bug: 527116
Comment on attachment 409517 [details] [diff] [review]
Patch including the checkin message

This patch doesn't need 1.9.1.6 approval because it is already in 1.9.1.5
Attachment #409517 - Flags: approval1.9.1.6+
Verified again for 1.9.0.16 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.16pre) Gecko/2009111921 GranParadiso/3.0.16pre (.NET CLR 3.5.30729). Behavior is the same (good) behavior as 1.9.0.15.
Crash Signature: [@ xul.dll@0x348945] [@ xul.dll@0x348864] [@nsGIFDecoder2::GifWrite(unsigned char const*, unsigned int)]
You need to log in before you can comment on or make changes to this bug.