Closed Bug 1235605 Opened 8 years ago Closed 8 years ago

Integer overflow in Deinterlacer::Deinterlacer leading to OOM crash

Categories

(Core :: Graphics: ImageLib, defect)

43 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox42 --- unaffected
firefox43 + wontfix
firefox44 + fixed
firefox45 + fixed
firefox46 + fixed
firefox-esr38 --- unaffected
firefox-esr45 - fixed
b2g-v2.0 --- unaffected
b2g-v2.0M --- unaffected
b2g-v2.1 --- unaffected
b2g-v2.1S --- unaffected
b2g-v2.2 --- unaffected
b2g-v2.5 --- fixed
b2g-v2.2r --- unaffected
b2g-master --- fixed

People

(Reporter: arroway, Assigned: eflores)

References

Details

(4 keywords, Whiteboard: [dupe of 1231761][adv-main44+])

Crash Data

Attachments

(5 files)

Seth, it seems to be image related. Does it ring a bell? Thanks
Flags: needinfo?(seth)
Having a security rating would be great!
Crash Signature: [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::detail::UniqueSelector<T>::UnknownBound mozilla::MakeUnique<T>]
Keywords: crash
Summary: Crash in Firefox 43 → crash in OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::detail::UniqueSelector<T>::UnknownBound mozilla::MakeUnique<T>
Al, Andrew, can either of you have a look if you're around? People at 32C3 are twittering about it as a Firefox 0day.
Flags: needinfo?(continuation)
Flags: needinfo?(abillings)
Flags: needinfo?(abillings)
Attached file poc.html
Attached the POC, a lot easier than following the link via the mentioned tweet in comment #0.

STR:

- simply open the poc.html file in fx (you'll get an instant crash)

Tried it on fx 46.0a1 [Build ID: 20151228030213 Changeset: 7c83da46ea74]

- https://crash-stats.mozilla.com/report/index/8d45ef79-e48f-4f2d-8e66-7053f2151229
- https://crash-stats.mozilla.com/report/index/4a7b5dfc-0662-4826-8139-586652151229
- https://crash-stats.mozilla.com/report/index/e567d95e-922c-49b9-912f-c65c22151229
The crash signatures in comment 0 and comment 4 are totally safe OOMs. However, we have had a number of recent issues with GIF images. It is possible this is a related unfixed issue, or maybe this is something that has been fixed on trunk already. Somebody should try this in an ASan build and see if it shows up as something other than an OOM.
Component: Security → ImageLib
(In reply to Andrew McCreight [:mccr8] (PTO-ish) from comment #5)
> Somebody should try this in an ASan build and see if it shows up as something other than an OOM.

Also, trying the test case in a debug build would be useful, as we do have some additional checks there.
Flags: needinfo?(continuation)
I had Gerv do a quick look (because I trust his opinions even if this isn't his area). He said to me in IRC:

If I had to guess, I'd say the GIF has bogusly massive values for either the logical screen width/height or the image width/height and we are simply mallocing an array big enough and OOMing.

To verify, one would need to de-base64 the GIF, open it in a hex editor, and start reading the spec and decoding the fields. The spec isn't complex http://www.w3.org/Graphics/GIF/spec-gif89a.txt

http://hg.mozilla.org/mozilla-central/annotate/7c83da46ea74/image/Deinterlacer.cpp#l18
allocates widthxheight bytes

I'd be very suspicious of where it got those values given that the GIF format has fields for them. I bet they are untrusted.
Nathan, Edwin, maybe you can look at this if Seth is on PTO right now.
Edwin has been looking at some similar issues.
Flags: needinfo?(edwin)
Bug 1229825 is an unfixed issue in related code.
Doesn't crash esr38 or Fx42. I'm betting on this being a regression from bug 1194058, but will mozregression it to confirm. I'm also running a local build with the patch from bug 1229825 applied to see if that helps.
Attached file oom-asan.txt
I was asked to run this crashtest with an ASan build on Linux, this is the result. Looks for me like a typical OOM crash which we get a lot of with our cluster.
Looks like the same one Christoph attached via comment #13... Here's the trace symbolized:

==2738==WARNING: AddressSanitizer failed to allocate 0xffffffffffe40028 bytes
==2738==AddressSanitizer's allocator is terminating the process instead of returning 0
==2738==If you don't like this behavior set allocator_may_return_null=1
==2738==AddressSanitizer CHECK failed: /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:149 "((0)) != (0)" (0x0, 0x0)
    #0 0x47b6eb in AsanCheckFailed _asan_rtl_
    #1 0x481ca1 in CheckFailed sanitizer_common.cc:76
    #2 0x480940 in AllocatorReturnNull sanitizer_allocator.cc:149
    #3 0x475128 in __interceptor_malloc _asan_rtl_
    #4 0x48dd5d in moz_xmalloc mozalloc.cpp:83
    #5 0x7ff9b6997204 in operator new[] mozalloc.h:198
    #6 0x7ff9b69f3346 in emplace<const mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> &> Maybe.h:386
    #7 0x7ff9b6993ad2 in Write Decoder.cpp:183
    #8 0x7ff9b6991d8c in Decode Decoder.cpp:128
    #9 0x7ff9b69917b2 in Decode DecodePool.cpp:453
    #10 0x7ff9b69b039c in Run DecodePool.cpp:282
    #11 0x7ff9b49d3f0f in ProcessNextEvent nsThread.cpp:972
    #12 0x7ff9b4a4d2ea in NS_ProcessNextEvent nsThreadUtils.cpp:297
    #13 0x7ff9b52f983f in Run MessagePump.cpp:326
    #14 0x7ff9b5266e8c in RunInternal message_loop.cc:234
    #15 0x7ff9b49cfc20 in ThreadFunc nsThread.cpp:384
    #16 0x7ff9c1f724b5 in _pt_root ptthread.c:212
    #17 0x7ff9c25b1181 in start_thread pthread_create.c:312 (discriminator 2)
    #18 0x7ff9b241947c in clone clone.S:111
I tried a local debug build and it just basically looks the same as the opt crash signatures.
This may have been previously reported as bug 1231761, which we unhid because it looked like just an OOM. At least, it has the same OOM value, and the same stack.

While this particular crash doesn't look bad, there's definitely something bad going on in this test case.

This is where the crash is in Deinterlacer::Deinterlacer:
  , mBuffer(MakeUnique<uint8_t[]>(mImageSize.width *
                                  mImageSize.height *
                                  sizeof(uint32_t)))

The width is 65534 and the height is 65531, and sizeof(uint32_t) is 4. The size that this is OOMing on is 18446744073707716648. If I multiply them together using the OSX calculator, I get 17178034216, so it seems like something is going wrong. 

It looks like multiplication in C++ is right-to-left associative, so it sounds like this should be calculated like (mImageSize.width * (mImageSize.height * sizeof(uint32_t)))) but the result it is giving seems like it is doing (mImageSize.width * (mImageSize.height * sizeof(uint32_t)))). I'm not too familiar with integer promotion rules. It all depends on when the result is converted to size_t.

Anyways, if you change the above to
  , mBuffer(MakeUnique<uint8_t[]>(mImageSize.width *
                                  (mImageSize.height *
                                   sizeof(uint32_t))))

then it at least doesn't OOM crash any more. I'm not sure if that is the right fix.
I'm not sure exactly how exploitable some variation of this might be. In this particular case, it looks like we end up requesting a far larger size than what is expected. What you'd want for an exploit would be something that overflows in a way that we end up with a size smaller than what is expected. Some other later code might then end up writing off the end of mBuffer, which would be a buffer overflow. I'm not sure if that is possible or not.
> but the result it is giving seems like it is doing (mImageSize.width * (mImageSize.height * sizeof(uint32_t))))

Sorry, that should be: ((mImageSize.width * mImageSize.height) * sizeof(uint32_t))
Sounds like we just need to use a CheckedInt<> and make the allocation fallible. Can put a patch up in the morning. Leaving ni?me so I don't forget.
18446744073707716648 is a little under 2^64 (1834968 less).

Gerv
Hello everyone,

I created the reported jsbin and posted it on reddit. It is just the same file from bug 1231761 encoded as base64 to avoid loading an extra file (there is nothing malicious, you can verify it is exactly the same file).  Sorry if i created panic about it! I asked at least two times if the bug was security related and i got a negative answer.
Looks like the problem here is mostly due to mImageSize.{width,height} being signed 32 bit ints.

65534 * 65531 * 4 truncated to 32 bits is 0xffe40028 which is negative. It looks like it keeps the sign when it's expanded to 64 bits which makes it 0xffffffffffe40028, and *then* interpreted as unsigned.
Assignee: nobody → edwin
Status: NEW → ASSIGNED
Attached patch 1235605.patchSplinter Review
Flags: needinfo?(seth)
Flags: needinfo?(edwin)
Attachment #8702890 - Flags: review?(tnikkel)
So, dupe of bug 1231761?
Whiteboard: [dupe of 1231761]
Attachment #8702890 - Flags: review?(tnikkel) → review+
This doesn't sound like a driver for a dotrelease; wontfixing for 43.
Flags: in-testsuite+
Adding a sec keyword for completeness' sake.
Keywords: sec-moderate
You know, this should have had a rating *before* checkin...
Comment on attachment 8702890 [details] [diff] [review]
1235605.patch

Applies cleanly on Aurora.

Approval Request Comment
[Feature/regressing bug #]: bug 1194058
[User impact if declined]: Crashing on certain GIFs.
[Describe test coverage new/current, TreeHerder]: Added crashtest.
[Risks and why]: Very little. Patch just adds more sanity checks.
[String/UUID change made/needed]: None.
Attachment #8702890 - Flags: approval-mozilla-aurora?
See comment 33.
Attachment #8703118 - Flags: review+
Attachment #8703118 - Flags: approval-mozilla-beta?
This is only a sec-moderate, but the example was publicly visible to various security researchers, and we can't be 100% sure that this buggy code might be exploitable in some other more severe way, so I agree with Edwin that we should fix this on Aurora and Beta.
Attachment #8702890 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8703118 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
fwiw, with a Nightly build from today on Fedora 23 with 16G, loading the url from bug 1235826 caused my workstation to lock up (possibly due to all of the ram being consumed) with 

> kernel: perf interrupt took too long (5004 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
> systemd-coredump[5808]: Detected coredump of the journal daemon itself, diverted to /var/lib/systemd/coredump/core.systemd-journal.0.27786736c13843bab0c98fea27d00897.7
98.1451584854000000.xz.
> systemd-coredump: Process 1100 (systemd-logind) of user 0 dumped core.#012#012Stack trace of thread 1100:#012#0  0x00007f4de6ece073 epoll_wait (libc.so.6)#012#1  0x000
055711d353eb3 sd_event_wait (systemd-logind)#012#2  0x000055711d2fd010 main (systemd-logind)#012#3  0x00007f4de6deb580 __libc_start_main (libc.so.6)#012#4  0x000055711d2fd969 _start (systemd
-logind)
> audit: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:syslogd_t:s0 pid=798 comm="systemd-journal" exe="/usr/lib/systemd/systemd-journald"
 sig=6

After some minutes I was able to log in via ssh but the display did not reappear and I had to reboot via ssh. RyanVM could not reproduce this hang on Ubuntu 15 with 8G. This is much worse than a simple OOM Crash.
(In reply to Bob Clary [:bc:] from comment #36)
> fwiw, with a Nightly build from today on Fedora 23 with 16G, loading the url
> from bug 1235826 caused my workstation to lock up (possibly due to all of
> the ram being consumed) with 

Does the same thing happen if all you do is try to allocate a ridiculously huge amount of memory (18446744073707716648 from comment 20)? Your machine could just be trying to fulfill that request.
Target Milestone: --- → mozilla46
Summary: crash in OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::detail::UniqueSelector<T>::UnknownBound mozilla::MakeUnique<T> → Integer overflow in Deinterlacer::Deinterlacer leading to OOM crash
Group: core-security → core-security-release
(In reply to Wes Kocher (:KWierso) (On vacation until Dec 28) from comment #40)
> I had to back this out from beta for build bustage:
> https://treeherder.mozilla.org/logviewer.html#?job_id=709617&repo=mozilla-
> beta
> 
> https://hg.mozilla.org/releases/mozilla-beta/rev/a602dbc1a620

Looks like we just need to uplift UniquePtrExtensions.h from bug 1216611. I'll post a minimal beta uplift patch in that bug.
Flags: needinfo?(edwin)
I can still reproduce this with nightly ASan builds.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
What is the stack or failure exactly?
Attached file call_stack.txt
(In reply to Timothy Nikkel (:tnikkel) from comment #48)
> What is the stack or failure exactly?

Attached.
That looks like it is failing to allocate...a specifically fallible allocation, which the deinterlacer and gif decoder should be able to handle after this patch. I don't know why ASAN is complaining, but the current code should be dealing with the failure of that allocation fine.
Group: gfx-core-security
Has this been fixed or is this something different and we should create a new bug? I can reproduce the same crash Tyson reported in comment # 49 using an asan build [1].

[1] http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-linux64-asan/1452596547/

==3574==AddressSanitizer CHECK failed: /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_posix.cc:68 "(("unable to mmap" && 0)) != (0)" (0x0, 0x0)
    #0 0x47b70b in AsanCheckFailed _asan_rtl_
    #1 0x481cc1 in CheckFailed sanitizer_common.cc:76
    #2 0x485c9e in RoundUpTo sanitizer_common.h:268
    #3 0x43f438 in Allocate sanitizer_allocator.h:1011
    #4 0x43b2f1 in Allocate sanitizer_allocator.h:1253
    #5 0x475148 in __interceptor_malloc _asan_rtl_
    #6 0x7fdf427eb872 in operator new[] mozalloc.h:260
    #7 0x7fdf4284b335 in FrameSize Maybe.h:386
    #8 0x7fdf427e80e2 in Write Decoder.cpp:183
    #9 0x7fdf427e639c in Decode Decoder.cpp:128
    #10 0x7fdf427e5dc2 in Decode DecodePool.cpp:455
    #11 0x7fdf4280507c in Run DecodePool.cpp:281
    #12 0x7fdf406a8984 in ProcessNextEvent nsThread.cpp:989
    #13 0x7fdf4072254a in NS_ProcessNextEvent nsThreadUtils.cpp:297
    #14 0x7fdf4105395f in Run MessagePump.cpp:326
    #15 0x7fdf40fbfedc in RunInternal message_loop.cc:234
    #16 0x7fdf406a44d0 in ThreadFunc nsThread.cpp:401
    #17 0x7fdf4e3044b5 in _pt_root ptthread.c:212
    #18 0x7fdf4e943181 in start_thread pthread_create.c:312 (discriminator 2)
    #19 0x7fdf3e05747c in clone clone.S:111
(In reply to Kamil Jozwiak [:kjozwiak] from comment #51)
> Has this been fixed or is this something different and we should create a
> new bug? I can reproduce the same crash Tyson reported in comment # 49 using
> an asan build [1].

Is that the same crash? The stack looks different.

Anyways, the crash from 49 is reporting a failure to allocate memory. In the same stack the allocation is specifically marked as fallible (meaning that the code is written to allow a memory allocation failure). So it sounds like it is a bug in either ASAN or our fallible memory allocation code, or the interaction of the two. Because allocations specifically marked fallible shouldn't be reported as failures, the code is specifically written to handle that failure.
Can a new issue be cloned off of this and this re-resolved as fixed?
Whiteboard: [dupe of 1231761] → [dupe of 1231761][adv-main44+]
(In reply to Al Billings [:abillings] from comment #53)
> Can a new issue be cloned off of this and this re-resolved as fixed?

Yes. I'll leave it to those that see remaining issues to file a new bug.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Group: gfx-core-security
Group: core-security-release
Keywords: regression
Version: unspecified → 43 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: