Closed Bug 1409440 Opened 7 years ago Closed 6 years ago

Crash in sse2::convolve_horizontally

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla61

Tracking Flags:

Tracking

Status

firefox-esr52

60+

fixed

firefox56

---

wontfix

firefox57

---

wontfix

firefox58

---

wontfix

firefox59

---

wontfix

firefox60

fixed

firefox61

fixed

People

(Reporter: jesup, Assigned: aosmond)

References

Details

(5 keywords, Whiteboard: [gfx-noted][adv-main60+][adv-esr52.8+][post-critsmash-triage])

Crash Data

Attachments

(2 files)

0001-Bug-1409440.patch, v1 6 years ago Andrew Osmond [:aosmond] (he/him) 4.44 KB, patch	tnikkel : review+ RyanVM : approval-mozilla-beta+ abillings : sec-approval+	Details \| Diff \| Splinter Review
[esr52] 0001-Bug-1409440.patch, v1 [carries r=tnikkel] 6 years ago Andrew Osmond [:aosmond] (he/him) 3.51 KB, patch	aosmond : review+ RyanVM : approval-mozilla-esr52+	Details \| Diff \| Splinter Review

Randell Jesup [:jesup] (needinfo me)

Reporter

Description

•

7 years ago

This bug was filed from the Socorro interface and is 
report bp-7b0725ed-cf5d-4bf6-b5e2-555e90171016.
=============================================================

UAF/wildptr crashes starting in 56 (56b1 seems to be the earliest crash recorded).  Writes and reads.

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Group: core-security → gfx-core-security

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 1

•

7 years ago

-> milan for triage

Note: this is in JPEG decoding, trying to write to the bitmap generally.  It appears something is letting the buffer go away

Flags: needinfo?(milan)

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

7 years ago

Flags: needinfo?(milan) → needinfo?(tnikkel)

Priority: -- → P3

Whiteboard: [gfx-noted]

Panos Astithas (he/him) [:past] (please ni?)

Updated

•

7 years ago

status-firefox56: affected → wontfix

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Has Regression Range: --- → yes

Has STR: --- → no

Keywords: testcase-wanted

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 2

•

7 years ago

STR will be tough; it smells like a timing issue (perhaps with navigation?)

Timothy Nikkel (:tnikkel)

Comment 3

•

7 years ago

Is this any different then bug 1353620?

Flags: needinfo?(tnikkel)

Timothy Nikkel (:tnikkel)

Updated

•

7 years ago

Flags: needinfo?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 4

•

7 years ago

maybe not - the one I looked at there was called from PNG decode; but I suspect this is a weakness in all the decoder paths (buffer gets yanked from underneath the decoder).

It also might be a signature change starting in 56

Flags: needinfo?(rjesup)

Wennie

Comment 5

•

7 years ago

Hi Milan:

I have assigned these security bugs to you to reassign them to appropriate developers in your team to investigate and fix them.

Thanks!

Wennie

Assignee: nobody → milan

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

7 years ago

Flags: needinfo?(milan)

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox57: affected → wontfix

status-firefox58: affected → wontfix

status-firefox59: --- → affected

status-firefox60: --- → affected

Milan Sreckovic [:milan] (needinfo for best results)

Comment 6

•

6 years ago

Look if this is still around after we do a Skia update in 60.

Assignee: milan → lsalzman

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox59: affected → wontfix

status-firefox61: --- → affected

Lee Salzman [:lsalzman]

Comment 7

•

6 years ago

Looking at the crash report, you can clearly see this is the imagelib code sending a deallocated address into the Skia convolver code, rather than the Skia convolver code in particular doing anything wrong. It looks like mWindow itself is no longer valid, or mLinesInBuffer is overreading it into a currently freed section of the heap, so that when mWindow[mLinesInBuffer++] yields the bogus address in question (0xFFFFE5E5) to the convolver code, it goes boom.

I'm going to punt this over to Andrew and Timothy to look into.

Assignee: lsalzman → aosmond

Flags: needinfo?(tnikkel)

Flags: needinfo?(aosmond)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 8

•

6 years ago

I can see the mLinesInBuffer is not initialized in Downscaler::Downscaler called in Decoder::PostSize but it should be done by Downscaler::BeginFrame which, from what I can tell, is always called before we start using mDownscaler for anything.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 9

•

6 years ago

What seems more plausible to me is that mLinesInBuffer++ overflowed mWindow. Since mWindow is a UniquePtr of a pointer array instead of an nsTArray, it won't crash immediately if we overflow.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 10

•

6 years ago

This happens with other decoders too that use the DownscalingFilter / SurfacePipe instead, e.g. PNG decoder:

https://crash-stats.mozilla.com/report/index/c202fb4a-3238-4e1a-b67a-c43af0180319

The relevant logic is more-or-less identical, so if there was a problem in Downscaler::mLinesInBuffer itself or the filter methods it derives the value from, this corroborates that.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 11

•

6 years ago

There is another crash variant for convolving vertically, e.g.:

https://crash-stats.mozilla.com/report/index/6f9e6f11-d155-44e0-944d-038080180321

Going through minidumps makes me question whether or not we actually overflowed mWindow in Downscaler::CommitRow. It should not be possible to overflow mWindow via mYFilter.ConvolveVertically, since it should be larger than the biggest filter, but we still get 0xe5e5e5e5.

In Downscaler::DownscaleInputLine we do some row swapping, and we don't do any bounds checking here. So in theory it could overflow mWindow, swap in 0xe5e5e5e5, but not crash there since it doesn't dereference it. Then we trip it on either the vertical or horizontal convolve. That said, I haven't manage to poke a hole in the swapping logic yet.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 12

•

6 years ago

Attached patch 0001-Bug-1409440.patch, v1 — Details — Splinter Review

I have not reproduced myself, so while it is an educated guess at a fix, this is still speculative based on the crash reports. I cannot conceive of an image original size and downscale size that will create the necessary conditions, but when we create the filters in ConvolutionFilter::ComputeResizeFilter, there is some rounding from floating point going on that may be playing into this.

I tweaked the algorithm to support the case where we already have more rows buffered in our window than the next filter requires. Before we would underflow mWindow when performing the row pointer swaps, and with this patch we will not perform the swap at all (because we have all of the data we need) and continue looping on DownscaleInputLine/Row.

Additionally, if by some happenstance mLinesInBuffer/mRowsInWindow exceeds the window capacity (it should never, by definition) when we recalculate it after a vertical convolution, I limit it to mWindowCapacity in the calculations, similar to how we prevent it from going negative (which it should also never). This should prevent any overflows during swapping. (By definition mWindowCapacity >= filterLength, so I didn't check for that again. The filter lengths are fixed at the time of calling ComputeResizeFilter.)

Finally, I release assert that the index is less than mWindowCapacity as a final failsafe. If we end hitting that release assert, then we know something else went wrong, but it is a safe crash.

Flags: needinfo?(tnikkel)

Flags: needinfo?(milan)

Flags: needinfo?(aosmond)

Attachment #8962427 - Flags: review?(tnikkel)

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

6 years ago

Crash Signature: [@ sse2::convolve_horizontally] → [@ sse2::convolve_horizontally] [@ sse2::ConvolveVertically<T>]

Timothy Nikkel (:tnikkel)

Comment 13

•

6 years ago

Comment on attachment 8962427 [details] [diff] [review]
0001-Bug-1409440.patch, v1

Review of attachment 8962427 [details] [diff] [review]:
-----------------------------------------------------------------

Nice catch to all of this.

::: image/Downscaler.cpp
@@ +201,5 @@
>  
>      MOZ_ASSERT(mCurrentOutLine < mTargetSize.height,
>                 "Writing past end of output");
>  
> +    while (mLinesInBuffer >= filterLength) {

I could definitely see use hitting this condition.

Attachment #8962427 - Flags: review?(tnikkel) → review+

Ryan VanderMeulen [:RyanVM]

Comment 14

•

6 years ago

https://hg.mozilla.org/mozilla-central/rev/52c14d32d981

Judging by the crash frequency on Nightly, we may want to just proceed with the Beta uplift request before formal verification to get a better sense of whether there's more to be done here or not. It grafts cleanly as-landed, so go ahead and request approval when you're comfortable doing so.

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox61: affected → fixed

Flags: needinfo?(aosmond)

Resolution: --- → FIXED

Target Milestone: --- → mozilla61

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 15

•

6 years ago

Comment on attachment 8962427 [details] [diff] [review]
0001-Bug-1409440.patch, v1

Approval Request Comment
[Feature/Bug causing the regression]: N/A
[User impact if declined]: In rare cases it may overwrite memory elsewhere in the process while downscaling a JPEG during decoding. It may be possible to craft a JPEG / website to cause this consistently if you understand the mechanism.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: It is in nightly, but the low reproduction rate in the wild and no STR make verification difficult. Uplifting to beta will help confirm/deny the fix.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: The common case is unchanged. The patch only affects edge cases which are currently already broken and causing overwrites/crashes. 
[String changes made/needed]: None.

Flags: needinfo?(aosmond)

Attachment #8962427 - Flags: approval-mozilla-beta?

Ryan VanderMeulen [:RyanVM]

Comment 16

•

6 years ago

Comment on attachment 8962427 [details] [diff] [review]
0001-Bug-1409440.patch, v1

Approved for 60.0b8.

Attachment #8962427 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: gfx-core-security → core-security-release

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 17

•

6 years ago

Comment on attachment 8962427 [details] [diff] [review]
0001-Bug-1409440.patch, v1

It is only now that I am realizing I missed the sec-approval... Mea culpa x100 :(. It is already in mozilla-central.

[Security approval request comment]
> How easily could an exploit be constructed based on the patch?

The theory of what to do (craft an image of some size, and request downscale to a certain size) is very intuitive from the patch. What that combination is not clear to me.

> Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

The changes themselves are sufficient to indicate we overflowed.

> Which older supported branches are affected by this flaw?

I believe the issue was introduced in 38, and crash reports in the last few months suggest at most 40. So all branches.

> If not all supported branches, which bug introduced the flaw?

Bug 1045929.

> Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

This code has not evolved much, and the patch applies to release and beta cleanly. It applies to esr52 *almost* cleanly; I will attach a patch for it forthwith. The changes were just to help the merge along, very small.

> How likely is this patch to cause regressions; how much testing does it need?

I believe it is unlikely to cause regressions. It is well contained, and should only change behaviour when it was already broken and causing the issue at hand.

Attachment #8962427 - Flags: sec-approval?

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 18

•

6 years ago

Attached patch [esr52] 0001-Bug-1409440.patch, v1 [carries r=tnikkel] — Details — Splinter Review

Attachment #8963283 - Flags: review+

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

6 years ago

status-firefox-esr52: unaffected → affected

Ryan VanderMeulen [:RyanVM]

Comment 19

•

6 years ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/e724c32284b2

status-firefox60: affected → fixed

tracking-firefox60: --- → +

tracking-firefox61: --- → +

tracking-firefox-esr52: --- → 60+

Al Billings [:abillings - ex-MoCo]

Comment 20

•

6 years ago

(In reply to Andrew Osmond [:aosmond] from comment #17)
> Comment on attachment 8962427 [details] [diff] [review]
> 0001-Bug-1409440.patch, v1
> 
> It is only now that I am realizing I missed the sec-approval... Mea culpa
> x100 :(. It is already in mozilla-central.

Sigh.

sec-approval+ after the fact. At least you didn't check in tests.

We need an ESR52 patch made and nominated?

Al Billings [:abillings - ex-MoCo]

Updated

•

6 years ago

Attachment #8962427 - Flags: sec-approval? → sec-approval+

Ryan VanderMeulen [:RyanVM]

Comment 21

•

6 years ago

Comment on attachment 8963283 [details] [diff] [review]
[esr52] 0001-Bug-1409440.patch, v1 [carries r=tnikkel]

Approved for ESR 52.8.

Attachment #8963283 - Flags: approval-mozilla-esr52+

Ryan VanderMeulen [:RyanVM]

Comment 22

•

6 years ago

uplift

https://hg.mozilla.org/releases/mozilla-esr52/rev/2f3e1ccf1661

status-firefox-esr52: affected → fixed

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 23

•

6 years ago

Looked at the crash reports on nightly since it landed, but I suspect it is unrelated. The pointer looks fine / aligned, and SIGBUS could easily just be Linux's overcommit was too aggressive. It was for a PNG, and assuming it has has alpha, then the first time we write to a downscaled PNG would be in the downscaler path:

https://crash-stats.mozilla.com/report/index/ccbcf90d-a08e-4f69-bc72-e6d730180330

The same install received a similar SIGBUS crash here:

https://crash-stats.mozilla.com/report/index/e96c964b-e1f1-4803-bb10-23bb50180330

This time for a JPEG, but it was when creating the frame, not in the downscaler. We memset for JPEGs since they lack alpha, and so the first time we write is sooner.

I also note the user has WebRender enabled which has a higher memory footprint at this time.

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 25

•

6 years ago

Bug 1353620 was for skia::ConvolveHorizonally, and that signature seems to end with FF 55

Al Billings [:abillings - ex-MoCo]

Updated

•

6 years ago

Whiteboard: [gfx-noted] → [gfx-noted][adv-main60+][adv-esr52.8+]

Andrei Vaida [:avaida]

Updated

•

6 years ago

Flags: qe-verify-

Whiteboard: [gfx-noted][adv-main60+][adv-esr52.8+] → [gfx-noted][adv-main60+][adv-esr52.8+][post-critsmash-triage]

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 26

•

6 years ago

andrew - I see a number of 60b crashes after this landed, such as https://crash-stats.mozilla.com/report/index/81e34421-e6e3-4983-a4c5-58f060180426 with wildptr addresses.  The e5e5 crashes do seem to have gone away (so far, volume is low).

https://crash-stats.mozilla.com/signature/?signature=sse2%3A%3Aconvolve_horizontally&date=%3E%3D2018-04-08T08%3A12%3A59.000Z&date=%3C2018-05-08T08%3A12%3A59.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-version&_sort=-date&page=1

There are a bunch of 61 (only) ConvolveVertically<> crashes with SIGBUS; these seem to have started when your patch landed.  (There are 2 before your patch in total, and 24 since it landed) - if it's not new, it spiked on your landing.

https://crash-stats.mozilla.com/signature/?platform=Linux&signature=sse2%3A%3AConvolveVertically%3CT%3E&date=%3E%3D2018-02-08T06%3A10%3A01.000Z&date=%3C2018-05-08T08%3A10%3A01.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1

Flags: needinfo?(aosmond)

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: core-security-release

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 27

•

4 years ago

It is hard for me to tell if the remaining volume is just noise or a different problem. We release assert for the original problem, so I'm fairly confident we are not hitting that anymore.

Flags: needinfo?(aosmond)

You need to log in before you can comment on or make changes to this bug.