The default bug view has changed. See this FAQ.

Crashes in gif decoder [@ xul.dll@0x348945][@ xul.dll@0x348864][@nsGIFDecoder2::GifWrite(unsigned char const*, unsigned int)]

RESOLVED FIXED in mozilla1.9.3a1

Status

()

Core
ImageLib
P1
critical
RESOLVED FIXED
8 years ago
6 years ago

People

(Reporter: Joe Drew (not getting mail), Assigned: jrmuizel)

Tracking

(5 keywords)

Trunk
mozilla1.9.3a1
crash, testcase, topcrash, verified1.9.0.16, verified1.9.1
Points:
---
Bug Flags:
blocking1.9.0.16 +
wanted1.9.0.x +
in-testsuite +

Firefox Tracking Flags

(blocking1.9.1 .5+, status1.9.1 .5-fixed)

Details

(Whiteboard: [sg:dos] null deref, crash signature)

Attachments

(4 attachments, 1 obsolete attachment)

(Reporter)

Description

8 years ago
We're seeing lots of crashes in the GIF decoder, involving nsGIFDecoder2::ProcessData calling GifWrite(), then a null dereference. Right now, this is showing up in the noted crash site, but this is a Windows-specific, 3.5.4-specific crash site, since the offset in xul.dll will change with every build, and obviously xul.dll has no meaning on OS X or Linux.

Our first priority is figuring out where we're crashing; Jeff Muizelaar and I are working on that. A parallel process that someone else could look into is whether this started in 3.5.4, or whether similar crashes existed elsewhere.
A lot of the comments for this crash implicate the bookmarks stuff, which makes me think that it's to do with .ICO files. Is there a bandaid that we might be able to issue or way we can figure out if the favicon service might be causing this to fail hard?
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.4&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=xul.dll%400x348945 shows crashes on 3.5.4 in the last week
(Note that there very well could be a Mac- or Linux-specific crash with a different signature that's exactly the same, but it might not be in the top 100.)
(Reporter)

Comment 4

8 years ago
It shouldn't have anything to do with .ICO files, but if a site specifies a GIF as its favicon, I think we'll happily load it.
The first time I see this stack signature on 1.9.1 is in 3.5.4 builds on 10/22. It's not present in 3.5.4pre builds prior to that, which almost makes me think we regressed this in the patch we took for build 3 of 3.5.4... wasn't that a gif bug?
Oh, sorry. URL: http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.4&query_search=signature&query_type=exact&query=xul.dll%400x348945&date=10%2F23%2F2009&range_value=1&range_unit=weeks&do_query=1&signature=xul.dll%400x348945
I'm thinking of bug 519589.
I'm a liar. Joe pointed out that I should be searching for other xul.dll crashes in general.

In the first 3.5.4 build, we saw some of these crashes at xul.dll@0x350ca9. e.g.: http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.4&query_search=signature&query_type=startswith&query=xul.dll&date=10%2F20%2F2009&range_value=1&range_unit=weeks&do_query=1&signature=xul.dll%400x350ca9

There's not much data for 3.5.4pre builds (because there's not many users) and there's only a few crashes in 7-day periods that crash in xul.dll, but the first time I see this crash on 1.9.1 is bp-7c622a06-1d57-4d34-afd8-540af2091016. That crash happened using the 20091002095927 nightly.

Note: I also discovered that this crash has a second xul.dll signature in 3.5.4: http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.4&query_search=signature&query_type=startswith&query=xul.dll&date=&range_value=1&range_unit=weeks&do_query=1&signature=xul.dll%400x348864
Summary: Crashes in gif decoder [@ xul.dll@0x348945] → Crashes in gif decoder [@ xul.dll@0x348945][@ xul.dll@0x348864]
(Reporter)

Comment 9

8 years ago
Things I need to fix this bug: 
 - A line number this crashes on, or at least a general area; or
 - a GIF file that can reproduce this.

I believe that Sam has a couple of URLs that look suspicious; if someone on Windows (though I believe it to be a general problem, no use limiting ourselves) wants to talk to Sam, then browse through some of the URLs, trying to crash, that would be a big help.
Can we get in touch (via email, if provided) from some of the people who wrote comments in the crash stacks asking them to tell us the URLs of the sites in those folders? That might be a way to get GIFs that cause the crash.
http://twitter.com/beltzner/status/5272436091 - let's see if that helps
Keywords: testcase-wanted
Whiteboard: [sg:dos] null deref
(Assignee)

Comment 14

8 years ago
It's pretty weird that we don't get any symbols or line numbers for this location. There's no relevant information in xul.sym for this range, so the problem isn't in the stack unwinder. Interestingly enough the symbol information for GifWrite() is duplicated in xul.sym. I'll check the pdb tomorrow to see if it's also a mess. Perhaps this problem is related to bug 520651.
(Assignee)

Comment 15

8 years ago
Having a quick read through of the assembly at that address I'd guess that the crash is near this block at line 1137 in 1.9.1

        if (mGIFStruct.images_decoded) {
          // Copy global colormap into the palette of current frame
          PRUint32 size;
          mImageFrame->GetPaletteData(&mColormap, &size);
          memcpy(mColormap, mGIFStruct.global_colormap, size);
        }
(In reply to comment #14)
> It's pretty weird that we don't get any symbols or line numbers for this
> location. 

17:37 < lars> or do you mean that you're looking to get an explanation from breakpad as to why there are no symbols for that offset into xul.dll?  In the past, I've seen ted indicate that it might mean a jump into random memory because other offsets in that dll do have symbols
19:08 <@ted> ss: yeah, that looks like a jump to random memory, basically
19:10 <@ted> looks like GifWrite got inlined into ProcessData there
(Assignee)

Comment 17

8 years ago
Here's my suggested annotation of the assembly:

We crash dereferencing eax which is presumably null.
52f78945 8b10            mov     edx,dword ptr [eax]
if we assume that eax is mImageFrame then edx will be a pointer to the vtable

52f78947 8b5240          mov     edx,dword ptr [edx+40h]
here we load the address of GetPaletteData into edx

52f7894a 8d4c241c        lea     ecx,[esp+1Ch]

52f7894e 51              push    ecx
push the address of 'size' onto the stack
52f7894f 57              push    edi
push the address of 'mColormap' onto the stack
52f78950 50              push    eax
push 'this' onto the stack
52f78951 ffd2            call    edx
call GetPaletteData


52f78953 8b44241c        mov     eax,dword ptr [esp+1Ch]
put the value of 'size' into eax. esp+1C is stack local variable

52f78957 8b0f            mov     ecx,dword ptr [edi]
put the value of 'mColormap' onto the stack

52f78959 50              push    eax
push the size onto the stack

52f7895a 8d86b8210000    lea     eax,[esi+21B8h]
get the address mGIFStruct.global_colormap? 

52f78960 50              push    eax
push mGIFStruct.global_colormap?
52f78961 51              push    ecx
push mColormap

52f78962 e8ef7df3ff      call    xul!memcpy (52eb0756)


This annotation fits very well with my original hypothesis. So it looks like the problem is that mImageFrame is null.
(Assignee)

Comment 18

8 years ago
(In reply to comment #16)
> (In reply to comment #14)
> > It's pretty weird that we don't get any symbols or line numbers for this
> > location. 
> 
> 17:37 < lars> or do you mean that you're looking to get an explanation from
> breakpad as to why there are no symbols for that offset into xul.dll?  In the
> past, I've seen ted indicate that it might mean a jump into random memory
> because other offsets in that dll do have symbols
> 19:08 <@ted> ss: yeah, that looks like a jump to random memory, basically
> 19:10 <@ted> looks like GifWrite got inlined into ProcessData there

From what I can tell, there was no jump into random memory. The code at 0x348945 is part of the GIF decoder and seems to make sense. If you look at the addresses in the .sym file there are no symbols even close to this region...
(Assignee)

Comment 19

8 years ago
This area of code saw a lot of churn during the 3.5.4 release:
bug 514776, bug 511689 and bug 519589
Joe was pointing at bug 511689 on IRC earlier. It's not bug 519589 because that landed on 1.9.1 after our first sighting of this crash.
(Assignee)

Comment 21

8 years ago
My next guess is that the problem is related to invalid depth handling.
(Assignee)

Comment 22

8 years ago
Created attachment 409290 [details]
Gif that causes a crash

Looks like my guess may not have been right.

The attached image has two frames, the second one has dimensions of 65535x65535. These dimensions cause BeginImageFrame to fail, after which mImageFrame is null.

Since, mGIFStruct.images_decoded is true we pass into this block:

           // Copy global colormap into the palette of current frame
          PRUint32 size;
          mImageFrame->GetPaletteData(&mColormap, &size);
          memcpy(mColormap, mGIFStruct.global_colormap, size);

and proceed to crash. The likely cause of this is bug 511689, but I haven't figured out why.

I should've looked at the url's before making a test case, but for what it's worth, this page: http://gallery.site.hu/u/Snokuk_54/GO/Vegyes_egyebek/smileyk/smileyk/?g2_page=2 has gif that also causes crashes on 3.5.4. I've not investigated why.
(Assignee)

Comment 23

8 years ago
Created attachment 409291 [details]
A gif from the wild that crashes

Here's the gif from the url that I mentioned earlier that crashes.

That's it for me tonight.
(Assignee)

Updated

8 years ago
blocking1.9.1: --- → ?
Flags: blocking1.9.2?
Flags: blocking1.9.0.16?
blocking1.9.1: ? → .5+
status1.9.1: --- → wanted
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.16?
Flags: blocking1.9.0.16+
Yeah, "random memory" was too strong of a statement there. Interesting that we don't have symbols for that area, then. Is the symbol dumper missing anything, or is this just the optimizer confusing the heck out of us? I noticed that it appeared to have inlined all of GifWrite.
blocking1.9.1: .5+ → ---
status1.9.1: wanted → ---
Flags: blocking1.9.2? → blocking1.9.2+
Priority: -- → P1
(Assignee)

Comment 25

8 years ago
Some new information:
- Trunk and 1.9.2 do not seem to crash with the examples I've found. Have we seen any crashes like this with those trees?
- The gif from the wild seems to have a similar problem to the one I created.
blocking1.9.1: --- → .5+
status1.9.1: --- → wanted
(Assignee)

Comment 26

8 years ago
The reason we don't crash on trunk and 1.9.2 is that we don't assume that mImageFrame is valid when copying over palette data.
blocking1.9.1: .5+ → ---
status1.9.1: wanted → ---
blocking1.9.1: --- → .5+
status1.9.1: --- → wanted
(Assignee)

Comment 27

8 years ago
Created attachment 409368 [details] [diff] [review]
a patch to avoid dereferencing null mImageFrame's

This patch will fix the problem, but I'm not sure about its risk. We didn't have the check before and were fine, so I don't want to break anything else. Still, this patch is better than what we currently have.
(Assignee)

Updated

8 years ago
Attachment #409368 - Flags: superreview?
Attachment #409368 - Flags: review?(joe)
Attachment #409368 - Flags: review?(alfredkayser)
(Assignee)

Updated

8 years ago
Attachment #409368 - Flags: superreview?
Just for reference. This also crashes on OS X: bp-6011f77f-de07-4131-b61d-986502091030

First 5 frames:
0  	XUL  	nsGIFDecoder2::GifWrite  	 modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:1141
1 	XUL 	nsGIFDecoder2::ProcessData 	modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:250
2 	XUL 	ReadDataOut 	modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:190
3 	XUL 	nsInputStreamTee::WriteSegmentFun 	xpcom/io/nsInputStreamTee.cpp:102
4 	XUL 	nsPipeInputStream::ReadSegments 	xpcom/io/nsPipe3.cpp:799
5 	XUL 	nsGIFDecoder2::WriteFrom 	modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:269
Assignee: nobody → jmuizelaar
Severity: normal → critical
Status: NEW → ASSIGNED
Keywords: crash, topcrash
OS: Mac OS X → All
Hardware: x86 → All
Summary: Crashes in gif decoder [@ xul.dll@0x348945][@ xul.dll@0x348864] → Crashes in gif decoder [@ xul.dll@0x348945][@ xul.dll@0x348864][@nsGIFDecoder2::GifWrite(unsigned char const*, unsigned int)]
(Assignee)

Comment 29

8 years ago
This bug was actually caused by bug 514776 which removed the check for null mImageFrame. I don't know why that check was removed. The patch that I posted earlier should be safe to go in.
Attachment #409368 - Flags: review?(joe) → review+

Updated

8 years ago
Attachment #409368 - Flags: review?(alfredkayser) → review+
(Assignee)

Comment 30

8 years ago
Created attachment 409517 [details] [diff] [review]
Patch including the checkin message

Pushed to 1.9.1 as:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/544bc11bc532

I don't have cvs access so I'll need a volunteer to push this to 1.9.0 cvs.
Attachment #409368 - Attachment is obsolete: true
(Assignee)

Updated

8 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
status1.9.1: wanted → .5-fixed
Resolution: --- → FIXED
Comment on attachment 409517 [details] [diff] [review]
Patch including the checkin message

Why did this land on 1.9.1 without approval? Please read http://groups.google.com/group/mozilla.dev.planning/browse_thread/thread/3ca007cf277a5147.
Attachment #409517 - Flags: approval1.9.1.5?
Attachment #409517 - Flags: approval1.9.0.16?
(Assignee)

Comment 32

8 years ago
(In reply to comment #31)
> (From update of attachment 409517 [details] [diff] [review])
> Why did this land on 1.9.1 without approval? Please read
> http://groups.google.com/group/mozilla.dev.planning/browse_thread/thread/3ca007cf277a5147.

I didn't realize that was required. Should I back it out?
Comment on attachment 409517 [details] [diff] [review]
Patch including the checkin message

No, don't back it out. The more coverage this gets, the better.
Attachment #409517 - Flags: approval1.9.1.5?
Attachment #409517 - Flags: approval1.9.1.5+
Attachment #409517 - Flags: approval1.9.0.16?
Attachment #409517 - Flags: approval1.9.0.16+
(Approved across those two branches.)
Keywords: testcase-wanted → testcase
Joe/Jeff (or anyone else): Can you please get this landed on 1.9.1 ASAP? We'd like as much coverage as we can get.
(What am I talking about? It already landed. *sigh*)
(Reporter)

Comment 37

8 years ago
Checking in modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp;
/cvsroot/mozilla/modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp,v  <--  nsGIFDecoder2.cpp
new revision: 1.108; previous revision: 1.107
Keywords: fixed1.9.0.16
(Reporter)

Comment 38

8 years ago
I do not believe there is any reason to leave the gifs that cause crashes as secret attachments, correct? This is just a DoS, and one that has been well-described in checkin comments, at that?

Also, Jeff, can you check in a crash test for this bug (m-c only is ok)?

Finally, I think we should probably file a bug on breakpad, or at least our installation of it, for serving areas of GifWrite without symbol information? Or do we honestly think that that 483-line function was inlined?
(In reply to comment #38)
> I do not believe there is any reason to leave the gifs that cause crashes as
> secret attachments, correct? This is just a DoS, and one that has been
> well-described in checkin comments, at that?

Correct. Open them up and land them away!
Blocks: 525814
(In reply to comment #38)
> Finally, I think we should probably file a bug on breakpad, or at least our
> installation of it, for serving areas of GifWrite without symbol information?
> Or do we honestly think that that 483-line function was inlined?

We could file a bug on this yes. I do believe it was inlined, since it's only called in one place. Why wouldn't it be? PGO does some pretty aggressive inlining. Lacking symbols for that case is bad, of course.
(Assignee)

Comment 41

8 years ago
Created attachment 409684 [details] [diff] [review]
gif tests
Attachment #409684 - Flags: review?(joe)
Attachment #409517 - Flags: approval1.9.1.5+
blocking1.9.1: .6+ → .5+
status1.9.1: .6-fixed → ---
This needs to land on GECKO1914_20091006_RELBRANCH for 1.9.1.5.
(Reporter)

Comment 43

8 years ago
thur ya go
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/edf189567edc
status1.9.1: --- → .5-fixed
(Reporter)

Updated

8 years ago
Attachment #409684 - Flags: review?(joe) → review+
Verified this is fixed on 1.9.1 by making sure both testcases didn't crash 3.5.3, did crash 3.5.4, and didn't crash using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.5pre) Gecko/20091102 Shiretoko/3.5.5pre

I'll save the keyword for when we verify using official builds.
(Assignee)

Comment 45

8 years ago
Test case pushed:
http://hg.mozilla.org/mozilla-central/rev/a954f52e3149
Flags: in-testsuite?
Target Milestone: --- → mozilla1.9.3a1
(Reporter)

Updated

8 years ago
Flags: in-testsuite? → in-testsuite+
(Reporter)

Updated

8 years ago
Attachment #409290 - Attachment is private: false
(Reporter)

Updated

8 years ago
Attachment #409291 - Attachment is private: false
Jeff, when this patch will be checked into 1.9.2?
(Assignee)

Comment 47

8 years ago
(In reply to comment #46)
> Jeff, when this patch will be checked into 1.9.2?

1.9.2 is not affected.
Flags: blocking1.9.2+

Comment 48

8 years ago
can the tests be pushed to 1.9.2?
Verified for 1.9.1.5 with the official build 1 (Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5). 

Verified the crash in 1.9.1.4 and the lack of crash in 1.9.1.3 as well.
Keywords: verified1.9.1
(In reply to comment #38) 
> Finally, I think we should probably file a bug on breakpad, or at least our
> installation of it, for serving areas of GifWrite without symbol information?
> Or do we honestly think that that 483-line function was inlined?

I filed bug 526484 on investigating this.
Duplicate of this bug: 527116
(Assignee)

Comment 52

8 years ago
Comment on attachment 409517 [details] [diff] [review]
Patch including the checkin message

This patch doesn't need 1.9.1.6 approval because it is already in 1.9.1.5
Attachment #409517 - Flags: approval1.9.1.6+
Verified again for 1.9.0.16 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.16pre) Gecko/2009111921 GranParadiso/3.0.16pre (.NET CLR 3.5.30729). Behavior is the same (good) behavior as 1.9.0.15.
Keywords: fixed1.9.0.16 → verified1.9.0.16
Crash Signature: [@ xul.dll@0x348945] [@ xul.dll@0x348864] [@nsGIFDecoder2::GifWrite(unsigned char const*, unsigned int)]
You need to log in before you can comment on or make changes to this bug.