Note: There are a few cases of duplicates in user autocompletion which are being worked on.

Stack corruption in Windows' nsIconChannel::MakeInputStream due to GetDIBits being incorrectly used

RESOLVED FIXED in mozilla2.0b7

Status

()

Core
ImageLib
RESOLVED FIXED
7 years ago
3 years ago

People

(Reporter: sid0, Assigned: sid0)

Tracking

unspecified
mozilla2.0b7
x86
Windows 7
Points:
---
Bug Flags:
sec-bounty +

Firefox Tracking Flags

(blocking2.0 betaN+, blocking1.9.2 needed, status1.9.2 .11-fixed, status1.9.1 unaffected)

Details

(Whiteboard: [sg:critical?])

Attachments

(2 attachments, 4 obsolete attachments)

I can reproduce this on the official builds: Mozilla/5.0 (Windows; Windows NT 6.1; WOW64; rv:2.0b3) Gecko/20100805 Firefox/4.0b3 as well as on Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b5pre) Gecko/20100823 Minefield/4.0b5pre (.NET CLR 3.5.30729). However, I can't reproduce this on a debug build, either VC8 or VC10.

A call in nsIconChannel::MakeInputStream [1] to Win32's GetDIBits [2] seems to clobber the stack on returning. This happens whenever this point in code is reached. In most cases those locations aren't touched any more. However, in at least one case at least one location is -- when stockIcon [3] contains a shared string, during its destruction that value on the stack (now 0x00ffffff) is used.

An easy way to trigger a crash on Windows 7 using it is to load the URL moz-icon://stock/uac-shield. This immediately causes a crash, e.g. [4], which I submitted.

[1] http://mxr.mozilla.org/mozilla-central/source/modules/libpr0n/decoders/icon/win/nsIconChannel.cpp#538
[2] http://msdn.microsoft.com/en-us/library/dd144879%28VS.85%29.aspx -- the call is in http://mxr.mozilla.org/mozilla-central/source/modules/libpr0n/decoders/icon/win/nsIconChannel.cpp#632
[3] http://mxr.mozilla.org/mozilla-central/source/modules/libpr0n/decoders/icon/win/nsIconChannel.cpp#551
[4] http://crash-stats.mozilla.com/report/index/bp-fb1225e0-ca4a-4c4b-99ae-84df22100824

There doesn't seem to be anything obviously wrong with the API calls. This might be something hidden in the API or a codegen/PGO bug.
http://crash-stats.mozilla.com/report/index/bp-fb1225e0-ca4a-4c4b-99ae-84df22100824
0  	nspr4.dll  	PR_AtomicDecrement  	 nsprpub/pr/src/misc/pratom.c:312
1 	xul.dll 	nsACString_internal::Finalize 	xpcom/string/src/nsTSubstring.cpp:186
2 	xul.dll 	nsIconChannel::MakeInputStream 	modules/libpr0n/decoders/icon/win/nsIconChannel.cpp:559
3 	xul.dll 	nsIconChannel::AsyncOpen 	modules/libpr0n/decoders/icon/win/nsIconChannel.cpp:246
4 	xul.dll 	nsURILoader::OpenURI 	uriloader/base/nsURILoader.cpp:863
5 	xul.dll 	nsDocShell::DoChannelLoad 	docshell/base/nsDocShell.cpp:8862
6 	xul.dll 	nsDocShell::DoURILoad 	docshell/base/nsDocShell.cpp:8704
7 	xul.dll 	nsDocShell::InternalLoad 	docshell/base/nsDocShell.cpp:8374
8 	xul.dll 	nsDocShell::LoadURI 	docshell/base/nsDocShell.cpp:1414

Updated

7 years ago
blocking2.0: --- → betaN+
When did this start happening? I've recently landed a bunch of imagelib changes...
(Assignee)

Comment 3

7 years ago
(In reply to comment #2)
> When did this start happening? I've recently landed a bunch of imagelib
> changes...

Not sure, but at least the crash doesn't happen with 3.6.8.

However it seems we've been using the API incorrectly all this while -- the BITMAPINFO structure [1] actually expects an array of RGBQUADs rather an a single RGBQUAD. The number of RGBQUADs is determined by biBitCount and biClrUsed [2]. A mask has 1 bit per pixel, which means according to [2] that bmiColors has to contain two entries, not the one you would get by allocating on the stack. This results in the four bytes right after maskInfo being overwritten.

[1] http://msdn.microsoft.com/en-us/library/dd183375%28v=VS.85%29.aspx
[2] http://msdn.microsoft.com/en-us/library/dd183376%28v=VS.85%29.aspx
Bobby, can you look at this crash?
Assignee: nobody → bobbyholley+bmo
Whiteboard: [sg:critical?]

Comment 5

7 years ago
(In reply to comment #2)
> When did this start happening? I've recently landed a bunch of imagelib
> changes...

we did see a bump in this signature shortly after the release of beta1 around July 7 going from 15-25 crashes per day  before 7/7 to 30-55 crashes per day after 7/7.   then we started to see a decline again around July 27.

peak volume day was july 15.  here is the breakdown of releases for this signature on that day.

checking --- PR_AtomicDecrement 20100715-crashdata.csv
found in: 4.0b1 3.0.19 3.0 3.0.5 3.7a5 3.0b4 3.0.4 3.0.3 3.0.10 3.0.1
release total-crashes
              PR_AtomicDecrement crashes
                         pct.
all     292787  55      0.00018785
4.0b1   20302   37      0.00182248
3.0.19  7892    6       0.000760264
3.0     818     4       0.00488998
3.0.5   769     2       0.00260078
3.7a5   795     1       0.00125786
3.0b4   254     1       0.00393701
3.0.4   493     2       0.0040568
3.0.3   510     1       0.00196078
3.0.10  420     1       0.00238095
3.0.1   1202    1       0.000831947


maybe skip listing would help to sort out possible multiple different crashes under the signature PR_AtomicDecrement.

It looks like we still see the crash on the trunk/beta3.  here is the breakdown for yesterday.

checking --- PR_AtomicDecrement 20100823-crashdata.csv
found in: 4.0b3 4.0b4pre 3.0 4.0b5pre 3.0.19 4.0b3pre 4.0b1 3.6.8 3.0.1
release total-crashes
              PR_AtomicDecrement crashes
                         pct.
all     287185  41      0.000142765
4.0b3   17548   24      0.00136768
4.0b4pre 146     6       0.0410959
3.0      822     3       0.00364964
4.0b5pre 1962    2       0.00101937
3.0.19   7382    2       0.000270929
4.0b3pre 115     1       0.00869565
4.0b1    1689    1       0.000592066
3.6.8    186488  1       5.36228e-06
3.0.1    1176    1       0.00085034
Given that this would involve me investigating things on windows (not my native platform - I have a VM, but not really a development environment), I don't really have the bandwidth for it right now. Maybe in a few weeks? Or maybe joe could look at it?
Resetting assignee to default to signify the fact that I'm not owning this for the time being.
Assignee: bobbyholley+bmo → nobody
(Assignee)

Comment 8

7 years ago
Updating summary to reflect current information.
Summary: Possible stack corruption in Windows' nsIconChannel::MakeInputStream with Firefox release build → Stack corruption in Windows' nsIconChannel::MakeInputStream due to GetDIBits being incorrectly used
(Assignee)

Comment 9

7 years ago
I bet bug 526038 has the same cause.
Created attachment 469080 [details] [diff] [review]
possible fix

Here's one possible way to fix it. The variable length of BITMAPINFO means that we'll need to allocate it on the heap.

I'm still not sure whether the colour table should never be part of the icon. (If I do include it with a 32-bit icon, the image doesn't come out right, so I removed it.)
Attachment #469080 - Flags: review?
(Assignee)

Updated

7 years ago
Attachment #469080 - Flags: review? → review?(joe)
Created attachment 469350 [details] [diff] [review]
Possible fix, v1.01

Oops, missed a couple of error checks.
Attachment #469080 - Attachment is obsolete: true
Attachment #469350 - Flags: review?(joe)
Attachment #469080 - Flags: review?(joe)
Comment on attachment 469350 [details] [diff] [review]
Possible fix, v1.01


>+// Given a header and a size, creates a freshly allocated BITMAPINFO structure.
>+// It is the caller's responsibility to free the structure.
>+static BITMAPINFO* CreateBitmapInfo(BITMAPINFOHEADER* aHeader,
>+                                    size_t aColorTableSize)
>+{
>+  BITMAPINFO* bmi = (BITMAPINFO*) moz_xmalloc(sizeof(BITMAPINFOHEADER) +
>+                                              aColorTableSize);

We should probably use fallible alloc here, because this is a user-controlled, mostly arbitrarily-large size. I know that'll complicate things and add error handling, though. The style here is also to use new[]/delete[], so it might not be a bad thing to use fallible new[].

Looks good otherwise, but this also needs sr.
Attachment #469350 - Flags: superreview?(vladimir)
Attachment #469350 - Flags: review?(joe)
Attachment #469350 - Flags: review+
Comment on attachment 469350 [details] [diff] [review]
Possible fix, v1.01

Looks good to me, though most of the patch seems to be some cleanup (using a BITMAPINFOHEADER instead of a BITMAPINFO), with the core change happening in the last hunk.  I don't have anything against the cleanup, but would be nice to split it up into two separate changesets for better inspectability.
Attachment #469350 - Flags: superreview?(vladimir) → superreview+
(In reply to comment #12)
> The style here is also to use new[]/delete[] , so it might not
> be a bad thing to use fallible new[].

The size of the pointer is statically unknown, and I'd rather do a malloc than a new char[].

(In reply to comment #13)
> Comment on attachment 469350 [details] [diff] [review]
> Possible fix, v1.01
> 
> Looks good to me, though most of the patch seems to be some cleanup (using a
> BITMAPINFOHEADER instead of a BITMAPINFO)

Oh, I only did that because a statically allocated BITMAPINFO is 4 bytes larger than just the header, and at that stage just the header's required.
Created attachment 469854 [details] [diff] [review]
patch addressing review comments, to check in

OK, I've switched to fallible operator new (there could also possibly be alignment issues with new char[]).

The bug affects trunk and all release branches. How is this going to be landed?
Assignee: nobody → sid.bugzilla
Attachment #469350 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #469854 - Flags: superreview+
Attachment #469854 - Flags: review+
Created attachment 469987 [details] [diff] [review]
fix for one more potential issue

Hmm, I noticed one potential issue my patch could introduce. We could convince ourselves to allocate a potentially unbounded amount of memory if the header is malformed and has a strange value for aHeader->biClrUsed. Is failing the right thing to do here?

(sorry for the additional review requests!)
Attachment #469854 - Attachment is obsolete: true
Attachment #469987 - Flags: superreview?(vladimir)
Attachment #469987 - Flags: review?(joe)
Created attachment 469991 [details] [diff] [review]
... correct patch to fix one more potential issue

Er, and here's the correct patch. Stupid hg.
Attachment #469987 - Attachment is obsolete: true
Attachment #469991 - Flags: superreview?(vladimir)
Attachment #469991 - Flags: review?(joe)
Attachment #469987 - Flags: superreview?(vladimir)
Attachment #469987 - Flags: review?(joe)
Attachment #469991 - Flags: review?(joe) → review+
Attachment #469991 - Flags: superreview?(vladimir) → superreview+
Landed: http://hg.mozilla.org/mozilla-central/rev/08937a340174
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Comment on attachment 469991 [details] [diff] [review]
... correct patch to fix one more potential issue

This bug affects 1.9.2 as well, even though we don't outright crash there. The stack corruption still occurs (I verified this using the VS debugger's memory view).

The patch applies cleanly on 1.9.2.
Attachment #469991 - Flags: approval1.9.2.10?
Comment on attachment 469991 [details] [diff] [review]
... correct patch to fix one more potential issue

... but it doesn't actually build, since 1.9.2 doesn't have infallible malloc.
Attachment #469991 - Flags: approval1.9.2.10?
Created attachment 475169 [details] [diff] [review]
patch for 1.9.2

This works.

As for 1.9.1, it seems to be unaffected by the bug, but entirely by accident. In <http://mxr.mozilla.org/mozilla1.9.1/source/modules/libpr0n/decoders/icon/win/nsIconChannel.cpp?mark=497-505#497>, the pointer that's passed in for the bitmap info is an offset of the buffer itself, so I think it first writes the colour table out, then overwrites it with the actual bitmap.
Attachment #475169 - Flags: approval1.9.2.10?
(Assignee)

Updated

7 years ago
Target Milestone: --- → mozilla2.0b7
blocking1.9.2: --- → needed
status1.9.1: --- → unaffected
status1.9.2: --- → wanted
Comment on attachment 475169 [details] [diff] [review]
patch for 1.9.2

Approved for 1.9.2.11, a=dveditz

Does this patch work for 1.9.1.x too, or do we need a different version there?
Attachment #475169 - Flags: approval1.9.2.11? → approval1.9.2.11+
- 1.9.1 is unaffected by the actual corruption issue by accident, as I mentioned in comment 21.
- The 1.9.1 code is completely different, so if we do decide to explicitly fix it there we'll have to write a completely different patch.
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/7e02b0747e53
status1.9.2: wanted → .11-fixed
(In reply to comment #25)
> Created attachment 482359 [details]
> Bug Bounty Nomination

I haven't been able to do anything worse than a denial of service, and I think DEP + ASLR would protect against this, so this bug might not be eligible. On the other hand, the executable is 32-bit only, so one might in theory be able to brute-force one's way past ASLR easily.
Of course, ASLR isn't present on Windows XP.
... and Win2k doesn't have DEP, and we still support it. Never mind. :)
Group: core-security
Flags: sec-bounty+
You need to log in before you can comment on or make changes to this bug.