Closed Bug 1409440 Opened 7 years ago Closed 6 years ago

Crash in sse2::convolve_horizontally

Categories

(Core :: Graphics, defect, P3)

56 Branch
x86
Windows 7
defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 60+ fixed
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 + fixed
firefox61 + fixed

People

(Reporter: jesup, Assigned: aosmond)

References

Details

(5 keywords, Whiteboard: [gfx-noted][adv-main60+][adv-esr52.8+][post-critsmash-triage])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-7b0725ed-cf5d-4bf6-b5e2-555e90171016.
=============================================================

UAF/wildptr crashes starting in 56 (56b1 seems to be the earliest crash recorded).  Writes and reads.
Group: core-security → gfx-core-security
-> milan for triage

Note: this is in JPEG decoding, trying to write to the bitmap generally.  It appears something is letting the buffer go away
Flags: needinfo?(milan)
Flags: needinfo?(milan) → needinfo?(tnikkel)
Priority: -- → P3
Whiteboard: [gfx-noted]
Has Regression Range: --- → yes
Has STR: --- → no
Keywords: testcase-wanted
STR will be tough; it smells like a timing issue (perhaps with navigation?)
Is this any different then bug 1353620?
Flags: needinfo?(tnikkel)
Flags: needinfo?(rjesup)
maybe not - the one I looked at there was called from PNG decode; but I suspect this is a weakness in all the decoder paths (buffer gets yanked from underneath the decoder).

It also might be a signature change starting in 56
Flags: needinfo?(rjesup)
Hi Milan:

I have assigned these security bugs to you to reassign them to appropriate developers in your team to investigate and fix them.

Thanks!

Wennie
Assignee: nobody → milan
Flags: needinfo?(milan)
Look if this is still around after we do a Skia update in 60.
Assignee: milan → lsalzman
Looking at the crash report, you can clearly see this is the imagelib code sending a deallocated address into the Skia convolver code, rather than the Skia convolver code in particular doing anything wrong. It looks like mWindow itself is no longer valid, or mLinesInBuffer is overreading it into a currently freed section of the heap, so that when mWindow[mLinesInBuffer++] yields the bogus address in question (0xFFFFE5E5) to the convolver code, it goes boom.

I'm going to punt this over to Andrew and Timothy to look into.
Assignee: lsalzman → aosmond
Flags: needinfo?(tnikkel)
Flags: needinfo?(aosmond)
I can see the mLinesInBuffer is not initialized in Downscaler::Downscaler called in Decoder::PostSize but it should be done by Downscaler::BeginFrame which, from what I can tell, is always called before we start using mDownscaler for anything.
What seems more plausible to me is that mLinesInBuffer++ overflowed mWindow. Since mWindow is a UniquePtr of a pointer array instead of an nsTArray, it won't crash immediately if we overflow.
This happens with other decoders too that use the DownscalingFilter / SurfacePipe instead, e.g. PNG decoder:

https://crash-stats.mozilla.com/report/index/c202fb4a-3238-4e1a-b67a-c43af0180319

The relevant logic is more-or-less identical, so if there was a problem in Downscaler::mLinesInBuffer itself or the filter methods it derives the value from, this corroborates that.
There is another crash variant for convolving vertically, e.g.:

https://crash-stats.mozilla.com/report/index/6f9e6f11-d155-44e0-944d-038080180321

Going through minidumps makes me question whether or not we actually overflowed mWindow in Downscaler::CommitRow. It should not be possible to overflow mWindow via mYFilter.ConvolveVertically, since it should be larger than the biggest filter, but we still get 0xe5e5e5e5.

In Downscaler::DownscaleInputLine we do some row swapping, and we don't do any bounds checking here. So in theory it could overflow mWindow, swap in 0xe5e5e5e5, but not crash there since it doesn't dereference it. Then we trip it on either the vertical or horizontal convolve. That said, I haven't manage to poke a hole in the swapping logic yet.
I have not reproduced myself, so while it is an educated guess at a fix, this is still speculative based on the crash reports. I cannot conceive of an image original size and downscale size that will create the necessary conditions, but when we create the filters in ConvolutionFilter::ComputeResizeFilter, there is some rounding from floating point going on that may be playing into this.

I tweaked the algorithm to support the case where we already have more rows buffered in our window than the next filter requires. Before we would underflow mWindow when performing the row pointer swaps, and with this patch we will not perform the swap at all (because we have all of the data we need) and continue looping on DownscaleInputLine/Row.

Additionally, if by some happenstance mLinesInBuffer/mRowsInWindow exceeds the window capacity (it should never, by definition) when we recalculate it after a vertical convolution, I limit it to mWindowCapacity in the calculations, similar to how we prevent it from going negative (which it should also never). This should prevent any overflows during swapping. (By definition mWindowCapacity >= filterLength, so I didn't check for that again. The filter lengths are fixed at the time of calling ComputeResizeFilter.)

Finally, I release assert that the index is less than mWindowCapacity as a final failsafe. If we end hitting that release assert, then we know something else went wrong, but it is a safe crash.
Flags: needinfo?(tnikkel)
Flags: needinfo?(milan)
Flags: needinfo?(aosmond)
Attachment #8962427 - Flags: review?(tnikkel)
Crash Signature: [@ sse2::convolve_horizontally] → [@ sse2::convolve_horizontally] [@ sse2::ConvolveVertically<T>]
Comment on attachment 8962427 [details] [diff] [review]
0001-Bug-1409440.patch, v1

Review of attachment 8962427 [details] [diff] [review]:
-----------------------------------------------------------------

Nice catch to all of this.

::: image/Downscaler.cpp
@@ +201,5 @@
>  
>      MOZ_ASSERT(mCurrentOutLine < mTargetSize.height,
>                 "Writing past end of output");
>  
> +    while (mLinesInBuffer >= filterLength) {

I could definitely see use hitting this condition.
Attachment #8962427 - Flags: review?(tnikkel) → review+
https://hg.mozilla.org/mozilla-central/rev/52c14d32d981

Judging by the crash frequency on Nightly, we may want to just proceed with the Beta uplift request before formal verification to get a better sense of whether there's more to be done here or not. It grafts cleanly as-landed, so go ahead and request approval when you're comfortable doing so.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(aosmond)
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Comment on attachment 8962427 [details] [diff] [review]
0001-Bug-1409440.patch, v1

Approval Request Comment
[Feature/Bug causing the regression]: N/A
[User impact if declined]: In rare cases it may overwrite memory elsewhere in the process while downscaling a JPEG during decoding. It may be possible to craft a JPEG / website to cause this consistently if you understand the mechanism.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: It is in nightly, but the low reproduction rate in the wild and no STR make verification difficult. Uplifting to beta will help confirm/deny the fix.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: The common case is unchanged. The patch only affects edge cases which are currently already broken and causing overwrites/crashes. 
[String changes made/needed]: None.
Flags: needinfo?(aosmond)
Attachment #8962427 - Flags: approval-mozilla-beta?
Comment on attachment 8962427 [details] [diff] [review]
0001-Bug-1409440.patch, v1

Approved for 60.0b8.
Attachment #8962427 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Group: gfx-core-security → core-security-release
Comment on attachment 8962427 [details] [diff] [review]
0001-Bug-1409440.patch, v1

It is only now that I am realizing I missed the sec-approval... Mea culpa x100 :(. It is already in mozilla-central.

[Security approval request comment]
> How easily could an exploit be constructed based on the patch?

The theory of what to do (craft an image of some size, and request downscale to a certain size) is very intuitive from the patch. What that combination is not clear to me.

> Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

The changes themselves are sufficient to indicate we overflowed.

> Which older supported branches are affected by this flaw?

I believe the issue was introduced in 38, and crash reports in the last few months suggest at most 40. So all branches.

> If not all supported branches, which bug introduced the flaw?

Bug 1045929.

> Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

This code has not evolved much, and the patch applies to release and beta cleanly. It applies to esr52 *almost* cleanly; I will attach a patch for it forthwith. The changes were just to help the merge along, very small.

> How likely is this patch to cause regressions; how much testing does it need?

I believe it is unlikely to cause regressions. It is well contained, and should only change behaviour when it was already broken and causing the issue at hand.
Attachment #8962427 - Flags: sec-approval?
(In reply to Andrew Osmond [:aosmond] from comment #17)
> Comment on attachment 8962427 [details] [diff] [review]
> 0001-Bug-1409440.patch, v1
> 
> It is only now that I am realizing I missed the sec-approval... Mea culpa
> x100 :(. It is already in mozilla-central.

Sigh.

sec-approval+ after the fact. At least you didn't check in tests.

We need an ESR52 patch made and nominated?
Attachment #8962427 - Flags: sec-approval? → sec-approval+
Comment on attachment 8963283 [details] [diff] [review]
[esr52] 0001-Bug-1409440.patch, v1 [carries r=tnikkel]

Approved for ESR 52.8.
Attachment #8963283 - Flags: approval-mozilla-esr52+
Looked at the crash reports on nightly since it landed, but I suspect it is unrelated. The pointer looks fine / aligned, and SIGBUS could easily just be Linux's overcommit was too aggressive. It was for a PNG, and assuming it has has alpha, then the first time we write to a downscaled PNG would be in the downscaler path:

https://crash-stats.mozilla.com/report/index/ccbcf90d-a08e-4f69-bc72-e6d730180330

The same install received a similar SIGBUS crash here:

https://crash-stats.mozilla.com/report/index/e96c964b-e1f1-4803-bb10-23bb50180330

This time for a JPEG, but it was when creating the frame, not in the downscaler. We memset for JPEGs since they lack alpha, and so the first time we write is sooner.

I also note the user has WebRender enabled which has a higher memory footprint at this time.
Bug 1353620 was for skia::ConvolveHorizonally, and that signature seems to end with FF 55
Whiteboard: [gfx-noted] → [gfx-noted][adv-main60+][adv-esr52.8+]
Flags: qe-verify-
Whiteboard: [gfx-noted][adv-main60+][adv-esr52.8+] → [gfx-noted][adv-main60+][adv-esr52.8+][post-critsmash-triage]
Group: core-security-release

It is hard for me to tell if the remaining volume is just noise or a different problem. We release assert for the original problem, so I'm fairly confident we are not hitting that anymore.

Flags: needinfo?(aosmond)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: