Closed Bug 1263384 Opened 8 years ago Closed 8 years ago

VP8 encoder: Heap block overrun (writing) from copy_and_extend_plane -> memset

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox45 --- wontfix
firefox46 --- wontfix
firefox47 + fixed
firefox48 + fixed
firefox-esr38 --- wontfix
firefox-esr45 47+ fixed

People

(Reporter: jseward, Assigned: jesup, NeedInfo)

References

Details

(Keywords: csectype-bounds, sec-high, Whiteboard: [post-critsmash-triage][adv-main47+][adv-esr45.2+])

Attachments

(3 files, 1 obsolete file)

      No description provided.
STR: 32-bit Linux build, valgrind-enabled (--enable-valgrind --disable-jemalloc)

DISPLAY=:1.0 ./mach mochitest --valgrind-args=--show-mismatched-frees=no \
   --valgrind=/home/sewardj/TEST/valgrind-3.11.0/Inst/bin/valgrind \
   dom/media/tests/mochitest/test_peerConnection_addSecondVideoStream.html 2>&1 \
   | tee spew-03-mc-4

The DISPLAY is a 32-bit deep vncserver, 800x600, with no window manager running.

There seems to be a significant (at least 13 bytes) block overrun with
memset, which trashes the metadata after the block and causes some subsequent
malloc to segfault in the usual way.
Attached file Valgrind complaints
This shows the same errors interleaved with the test output, which
is pretty verbose.  In short this happens after this:

551 INFO Run step 73: PC_REMOTE_SET_LOCAL_DESCRIPTION

It somewhat concerns me that I cannot reproduce the crash when running
natively.  This strikes me as odd, and also it is odd that the ASan
automation runs (presumably?) haven't shown this.  So I am left wondering
if it is some Valgrind weirdness, but I did some cursory sanity checks,
and I can't see anything obviously wrong.  It is at least repeatable;
I repeated it 5 times in 6 runs, I think.
Group: core-security → media-core-security
Likely the same or similar to bug 1252488.

This isn't a use-after-free as we thought that one was/is, but clearly a write-past-end.  The only obvious way I can see that happening is if the input size changed (to bigger), but there's a test in vp8_receive_raw_frame() to reinit the lookahead buffer if the frame size changes.  I was wondering if that might get faked out by a live config change - something we had disabled due to visual anomalies (on frame sizes getting smaller due to load), but had reenabled some time back after a vp8 update.  However, vp8_change_config() appears to reinit the lookahead buffers on a size change.  The test is a little muddy, though, with rounding:

    if (((cm->Width + 15) & 0xfffffff0) !=
          cm->yv12_fb[cm->lst_fb_idx].y_width ||
        ((cm->Height + 15) & 0xfffffff0) !=
          cm->yv12_fb[cm->lst_fb_idx].y_height ||
        cm->yv12_fb[cm->lst_fb_idx].y_width == 0)
    {
        dealloc_raw_frame_buffers(cpi);
        alloc_raw_frame_buffers(cpi);

but vp8_lookahead_init() does something similar:
    /* Align the buffer dimensions */
    width = (width + 15) & ~15;
    height = (height + 15) & ~15;

so I don't think that guess is correct.  We may try re-enabling the full encoder reset as a safety measure though, if there's no solution seen or ability to repro.

jseward: is this reproducible?
Rank: 10
Component: Audio/Video: Playback → WebRTC: Audio/Video
Flags: needinfo?(jseward)
Priority: -- → P1
Summary: Heap block overrun (writing) from copy_and_extend_plane -> memset → VP8 encoder: Heap block overrun (writing) from copy_and_extend_plane -> memset
(In reply to Randell Jesup [:jesup] from comment #4)
> jseward: is this reproducible?

I can reproduce this every time, using a 32-bit build on valgrind.

I cannot reproduce it not-on-valgrind, although I wonder if that has to do
with the slowness pushing it down some other execution path.

I cannot reproduce using a 64-bit build on valgrind.
Flags: needinfo?(jseward)
I did a bit of digging.  Memory is trashed because copy_and_extend_plane
is passed er=-96.  This is handed verbatim as the length argument to memset,
which interprets it unsignedly (as size_t) and hence as the value 4294967200.

The value er=-96 comes from the caller vp8_copy_and_extend_frame, which 
does this

    int er = dst->border + dst->y_width - src->y_width;

with the values

  dst->border 32, dst->y_width 352, src->y_width 480, er -96
(In reply to Julian Seward [:jseward] from comment #6)
> (In reply to Randell Jesup [:jesup] from comment #4)
> > jseward: is this reproducible?
> I cannot reproduce using a 64-bit build on valgrind.

Correction: now I *can* reproduce it on a 64-bit valgrind run.  I still
can't reproduce it with any native (non-valgrind) runs though.  Any ideas
why?  Timing issues?
(In reply to Julian Seward [:jseward] from comment #7)
>   dst->border 32, dst->y_width 352, src->y_width 480, er -96

For both the 32- and 64-bit valgrind runs, the only other value
printed at this point -- and it is printed 729 times before the
crash -- is:

    dst->border 32, dst->y_width 640, src->y_width 640, er 32

clearly giving a valid (non negative) |er| value.
I contemplated making copy_and_extend_plane safe against negative
values for |et|, |el|, |eb| and |er| but this seems complex and I
don't know this code, so I gave up.
(In reply to Julian Seward [:jseward] from comment #6)
> I cannot reproduce it not-on-valgrind, although I wonder if that has to do
> with the slowness pushing it down some other execution path.

Test output (+ vp8_copy_and_extend_frame debug printing) immediately prior
to the failure looks like this.  Any clues there?

550 INFO Run step 73: PC_REMOTE_SET_LOCAL_DESCRIPTION
**32588** VP8caef: dst->border 32, dst->y_width 640, src->y_width 640, er 32
72121344[450c380]: [main|sdp_config] sdp_config.c:86: SDP: Initialized config pointer: 0x4882a20
72121344[450c380]: [1460565391527177 (id=2147483655 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_addSecondVideoSt]: have-remot\
e-offer -> stable
**32588** VP8caef: dst->border 32, dst->y_width 640, src->y_width 640, er 32
72121344[450c380]: [main|WebrtcVideoSessionConduit] VideoConduit.cpp:876: ConfigureRecvMediaCodecs Successfully Set the codec VP8
532532032[189c6a98]: [VideoFrameConverter #1|WebrtcVideoSessionConduit] VideoConduit.cpp:1455: SendVideoFrame Engine not transmitting
**32588** VP8caef: dst->border 32, dst->y_width 640, src->y_width 640, er 32
72121344[450c380]: [main|WebrtcVideoSessionConduit] VideoConduit.cpp:461: Init Initialization Done
72121344[450c380]: [main|WebrtcVideoSessionConduit] VideoConduit.cpp:876: ConfigureRecvMediaCodecs Successfully Set the codec VP8
**32588** VP8caef: dst->border 32, dst->y_width 352, src->y_width 480, er -96
==32588== Thread 43 ViECaptureThread:
TEST-UNEXPECTED-VALGRIND-ERROR | Invalid write of size 4 at memset / copy_and_extend_plane / vp8_copy_and_extend_frame / vp8_lookahead_push
==32588== Invalid write of size 4
==32588==    at 0x400D425: memset (vg_replace_strmem.c:1234)
==32588==    by 0xB8F5EF5: copy_and_extend_plane (extend.c:50)
==32588==    by 0xB8F60AA: vp8_copy_and_extend_frame (extend.c:93)
I tried kludging copy_and_extend_plane in an obvious way, shown below,
but it doesn't help.  It gets rid of the memset-overwrites but the
problem just reappears as overlapping and memory-overwriting memcpys
instead.

diff --git a/media/libvpx/vp8/common/extend.c b/media/libvpx/vp8/common/extend.c
--- a/media/libvpx/vp8/common/extend.c
+++ b/media/libvpx/vp8/common/extend.c
@@ -27,26 +28,38 @@ static void copy_and_extend_plane
     int er            /* extend right border */
 )
 {
     int i;
     unsigned char *src_ptr1, *src_ptr2;
     unsigned char *dest_ptr1, *dest_ptr2;
     int linesize;
 
+#if 1
+    /* guard against negative extensions */
+    if (et < 0) et = 0;
+    if (el < 0) el = 0;
+    if (eb < 0) eb = 0;
+    if (er < 0) er = 0;
+#endif
+
     /* copy the left and right most columns out */
     src_ptr1 = s;
     src_ptr2 = s + w - 1;
     dest_ptr1 = d - el;
     dest_ptr2 = d + w;
 
     for (i = 0; i < h; i++)
     {
I've manager to repro locally with a valgrind build and
./mach mochitest --valgrind=/usr/local/bin/valgrind --timeout 999999 dom/media/tests/mochitest/test_peerConnection_addSecondVideoStream.html

jseward found that src frame is larger than the dst frame, causing a negative er, which is type-punned to an unsigned value, so it believes the size of the frame is very, very large.
The problem (perhaps due to a race) is that the vp8 init for the stream is 352x288 (aka CIF).  The first frame appears to be down-sampled from 640x480 to 480x360 due to load (fine).   The webrtc code notices the resolution change, and updates the VP8 config.

In vp8e_set_config(), it validates new resolutions with this:
        if ((ctx->cpi->initial_width && (int)cfg->g_w > ctx->cpi->initial_width) ||
            (ctx->cpi->initial_height && (int)cfg->g_h > ctx->cpi->initial_height))
            ERROR("Cannot increast width or height larger than their initial values");

Since 480x360 > 352x288, this fails and returns VPX_CODEC_INVALID_PARAM.
A few options:
a) go back to full encoder reset on resolution changes
   May cost us a few frames and bits when it happens; unfortunate when we're down-resolution due to lack of bits.
b) do a full encoder reset if the resolution change fails
   rarely would be needed
   more code
c) remember/compare initial size before trying a reconfig
   little advantage over b)
d) make (default?) initial size larger
   how large?
   how much memory hit?
e) find out why it's being inited with CIF (isn't clear...) and defer it until we have a frame.
   Initial frame may be downsampled, or the source may get bigger (window share), so still can fail

I think a) or b) are the best solutions, plus code to throw away frames larger than the safe size so we don't trash memory just in case (that would be in libvpx).
I managed to reproduce running natively (not on Valgrind).  I used a slow
machine -- a dual-core "Intel(R) Celeron(R) CPU  N3050  @ 1.60GHz" and
ran on it, two gcc builds (-j10) and two Firefox builds (-j4) and a bunch
of C spin loop programs.  The steady-state load average is around 35, and
I ran the test with nice -19.

I had to run the 2 gcc and 2 m-c builds from separate remote logins since it
seems like the linux kernel allocates time per login rather than per process
(or some wierd thing like that).  In any case I had difficulty persuading the
kernel to allocate only a small amount of time to the test -- typically it
would give the firefox under test quite a lot of CPU.

In order to spot the failure I added the (obvious) assertions shown below.
It then failed in 2 runs out of 3 with

media/libvpx/vp8/common/extend.c:39: copy_and_extend_plane: Assertion `eb >= 0' failed.

I noticed that increasing logging causes it not to fail, which is something
I also observed with the on-valgrind runs.  I don't know why that is.  For
example, this fails (at loadavg around 35):

DISPLAY=:1.0 nice -19 ./mach mochitest --keep-open=no dom/media/tests/mochitest/test_peerConnection_addSecondVideoStream.html

but this doesnt (twice out of two runs)

DISPLAY=:1.0 NSPR_LOG_MODULES=MediaManager:4,GetUserMedia:4,webrtc_trace:65535 WEBRTC_TRACE_FILE=nspr nice -19 ./mach mochitest --keep-open=no dom/media/tests/mochitest/test_peerConnection_addSecondVideoStream.html

Anyway, the patch I used is as follows.

diff --git a/media/libvpx/vp8/common/extend.c b/media/libvpx/vp8/common/extend.c
--- a/media/libvpx/vp8/common/extend.c
+++ b/media/libvpx/vp8/common/extend.c
@@ -7,16 +7,18 @@
  *  in the file PATENTS.  All contributing project authors may
  *  be found in the AUTHORS file in the root of the source tree.
  */
 
 
 #include "extend.h"
 #include "vpx_mem/vpx_mem.h"
 
+#undef NDEBUG
+#include <assert.h>
 
 static void copy_and_extend_plane
 (
     unsigned char *s, /* source */
     int sp,           /* source pitch */
     unsigned char *d, /* destination */
     int dp,           /* destination pitch */
     int h,            /* height */
@@ -27,16 +29,21 @@ static void copy_and_extend_plane
     int er            /* extend right border */
 )
 {
     int i;
     unsigned char *src_ptr1, *src_ptr2;
     unsigned char *dest_ptr1, *dest_ptr2;
     int linesize;
 
+    assert(et >= 0);
+    assert(el >= 0);
+    assert(eb >= 0);
+    assert(er >= 0);
+
     /* copy the left and right most columns out */
     src_ptr1 = s;
     src_ptr2 = s + w - 1;
     dest_ptr1 = d - el;
     dest_ptr2 = d + w;
 
     for (i = 0; i < h; i++)
     {
with an added printf to dump out when this occurs, we no longer blow up in valgrind - it complains about invalid input frames instead.  This is the safety/wallpaper patch for uplift and probably to propose upstream.  A second patch to notice config failures and reinit will be wanted.
Attachment #8741839 - Flags: review?(giles)
Assignee: nobody → rjesup
Status: NEW → ASSIGNED
I'll fix the indent, note
Comment on attachment 8741839 [details] [diff] [review]
validate input frames against configured resolution in vp8

Review of attachment 8741839 [details] [diff] [review]:
-----------------------------------------------------------------

Per irc, please put the same content in a .patch file and add it to the apply list in update.py.

r=me with issues addressed.

::: media/libvpx/vp8/vp8_cx_iface.c
@@ +925,5 @@
> +              /* from vp8_encoder.h for g_w/g_h:
> +                 "Note that the frames passed as input to the encoder must have this resolution"
> +              */
> +              ctx->base.err_detail = "Invalid input frame resolution";
> +              res = VPX_CODEC_INVALID_PARAM;

Does it make sense to run the reset of the function if we can't receive new frame data? Can you just `return VPX_CODEC_INVALID_PARAM` here?
Attachment #8741839 - Flags: review?(giles) → review+
Patch does apply upstream. Adding James Zern for upstream notification. James, please note this bug is confidential.
> ::: media/libvpx/vp8/vp8_cx_iface.c
> @@ +925,5 @@
> > +              /* from vp8_encoder.h for g_w/g_h:
> > +                 "Note that the frames passed as input to the encoder must have this resolution"
> > +              */
> > +              ctx->base.err_detail = "Invalid input frame resolution";
> > +              res = VPX_CODEC_INVALID_PARAM;
> 
> Does it make sense to run the reset of the function if we can't receive new
> frame data? Can you just `return VPX_CODEC_INVALID_PARAM` here?

I decided that was the least-risk solution since if vp8_receive_raw_frame() fails it continues as well.  I might have put the test there, except it doesn't have as easy access to the config values.
Attachment #8741839 - Attachment is obsolete: true
Attachment #8741859 - Flags: review?(giles)
Comment on attachment 8741859 [details] [diff] [review]
validate input frames against configured resolution in vp8

Review of attachment 8741859 [details] [diff] [review]:
-----------------------------------------------------------------

Ok. LGTM, thanks.
Attachment #8741859 - Flags: review?(giles) → review+
Comment on attachment 8741859 [details] [diff] [review]
validate input frames against configured resolution in vp8

[Security approval request comment]
How easily could an exploit be constructed based on the patch?  Hard.  Hard to build an exploit at all; it will try to overwrite all of memory.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?  No, beyond it landing in a sec bug.

Which older supported branches are affected by this flaw? All

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?  Trivial, not risky

How likely is this patch to cause regressions; how much testing does it need?  Very unlikely.  Minimal testing; I've verified it reliably blocks the failure under valgrind (and his hitting the failure case many times).
Attachment #8741859 - Flags: sec-approval?
Attachment #8741859 - Flags: approval-mozilla-esr45?
Attachment #8741859 - Flags: approval-mozilla-beta?
Attachment #8741859 - Flags: approval-mozilla-aurora?
This is too late to make Firefox 46. We're making release builds early next week. I'm giving sec-approval for checkin on May 10, two weeks into the next cycle.
Whiteboard: [checkin on 5/10]
Attachment #8741859 - Flags: sec-approval? → sec-approval+
Comment on attachment 8741859 [details] [diff] [review]
validate input frames against configured resolution in vp8

Adding m-r approval request flag since 46 migrated to the release channel today
Attachment #8741859 - Flags: approval-mozilla-release?
Randell, this isn't approved until May 10. Why is this on inbound now?
Flags: needinfo?(rjesup)
Ugh, sorry - I misread.  What would you like me to do?  I could back out, or leave it be.
I think the risk is pretty low - it's a "trash all of memory" type of bug, and VERY hard to repro
Flags: needinfo?(rjesup)
Flags: needinfo?(abillings)
Well, it is exposed now since you have an inbound checkin for a hidden bug so at this point I'd check it into trunk. It is really up to release management about whether to take it in branches or not.
Flags: needinfo?(rkothari)
Flags: needinfo?(lhenry)
Flags: needinfo?(abillings)
My apologies.  I got confused/distracted looking at Liz's comment about mozilla-release.

This really doesn't make anything clear about why you'd need this or what risk they're be; if it hadn't landed on a sec bug it would be totally unremarkable.  Normal use has lots of trouble tripping the problem; the only report of non-crazy-load we have is ian's case with Hello, and he couldn't repro it when asked (nor could I).

Even if you can with pain induce it (and inducing it without user help would be far harder), it will write over all of memory starting there with a repeat of a single byte from the last byte of the first line of the luma channel.  I (should) never say "impossible to exploit", but this would be pretty insanely hard to do so.  I hope that helps with the assessment.

I will slow down and not check stuff like this in when sitting down late on a Friday night, though. :-/
We could take this for 46 still, but then we should also take it for esr45. Hard to exploit based on reading the patch, so maybe we can leave it to 47 and the corresponding esr. On the other hand we still have time to get this into tomorrow's RC build. Sylvestre, what do you think? I'm ok with either but leaning towards waiting till 47.
Flags: needinfo?(lhenry) → needinfo?(sledru)
If we can wait, happy to wait for 45.2.0esr and 47.
Flags: needinfo?(sledru)
https://hg.mozilla.org/mozilla-central/rev/1150a9442ad2
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
(In reply to Sylvestre Ledru [:sylvestre] from comment #33)
> If we can wait, happy to wait for 45.2.0esr and 47.

I think that's reasonable given the risk.  I've continued to think about it; I'll revise slightly my analysis above:  If someone manages to increase the height above the init value and doesn't increase the width, it will fill memory with the last line of source data in the buffer.  This slightly increases the risk (as opposed to filling with a byte, since the "video" could be pointers), but only slightly, given how hard it is to even get to the point of this happening without user help (and jseward's method will always cause the width to be too large).
OK. I would rather we wait for 47 here.  Thanks for the extra feedback!
Comment on attachment 8741859 [details] [diff] [review]
validate input frames against configured resolution in vp8

Sec-high, taking it in Aurora47. Beta- and Release- based on Liz and Sylvestre's comments.
Flags: needinfo?(rkothari)
Attachment #8741859 - Flags: approval-mozilla-release?
Attachment #8741859 - Flags: approval-mozilla-release-
Attachment #8741859 - Flags: approval-mozilla-beta?
Attachment #8741859 - Flags: approval-mozilla-beta-
Attachment #8741859 - Flags: approval-mozilla-aurora?
Attachment #8741859 - Flags: approval-mozilla-aurora+
Group: media-core-security → core-security-release
Thanks for the note Ralph. I'll take a closer look and try to upstream the patch this week. The bug is restricted, can I add a link to the commit as in previous changes?
Is vpx_codec_enc_config_set() being called from a different thread than vpx_codec_encode()?

Note vp8e_encode() validates the input at the top of the function:
    if (img)
        res = validate_img(ctx, img);
which checks:
    if ((img->d_w != ctx->cfg.g_w) || (img->d_h != ctx->cfg.g_h))
        ERROR("Image size must match encoder init configuration size");

The final patch adds later in the function:
            if (sd.y_width != ctx->cfg.g_w || sd.y_height != ctx->cfg.g_h) {
where y_width/y_height are pass-throughs from img:
    const int y_w = img->d_w;
    const int y_h = img->d_h;
    ...
    yv12->y_width  = y_w;
    yv12->y_height = y_h;

So I don't see how this patch helps in a non-threaded case unless I'm misreading something. libvpx doesn't provide any protection for use of encode or decode instances between threads.
See comment 14.  You may have a point about validate_img() (and note this is not used from multiple threads), but...

later in vp8e_encode():
   res = set_reference_and_update(ctx, flags);

Note, unlike an earlier call, there's no "if (!res)" before it - i.e. if validate_img() fails, it's ignored and overwritten.

Very likely the code should exit earlier, though clearly there's intent to do some of the processing regardless of errors in the img or the config (why, I don't know).
Flags: needinfo?(jzern)
> later in vp8e_encode():
>    res = set_reference_and_update(ctx, flags);
> 
> Note, unlike an earlier call, there's no "if (!res)" before it - i.e. if
> validate_img() fails, it's ignored and overwritten.
> 

Thanks, I was misreading and assumed ERROR() was doing more.

> Very likely the code should exit earlier, though clearly there's intent to do
> some of the processing regardless of errors in the img or the config (why, I
> don't know).
>

In cases like this I wouldn't assume too much. It's likely an old pattern to
return from one location.
Note:
https://chromium-review.googlesource.com/#/c/295437/

The point about code-refactoring still stands.
Comment on attachment 8741859 [details] [diff] [review]
validate input frames against configured resolution in vp8

Please uplift to esr45, sec-high and this is fixed in 47 now.
Attachment #8741859 - Flags: approval-mozilla-esr45? → approval-mozilla-esr45+
has problems to apply to esr45

Tomcats-MacBook-Pro-2:mozilla-central Tomcat$ hg graft --edit -r c07c7171bcc9
grafting 340477:c07c7171bcc9 "Bug 1263384: validate input frames against configured resolution in vp8 r=rillian, a=ritu"
merging media/libvpx/update.py
warning: conflicts while merging media/libvpx/update.py! (edit, then use 'hg resolve --mark')

could you take a look thanks!
Flags: needinfo?(rjesup)
Flags: qe-verify-
Whiteboard: [checkin on 5/10] → [post-critsmash-triage][checkin on 5/10]
Whiteboard: [post-critsmash-triage][checkin on 5/10] → [post-critsmash-triage][adv-main47+][adv-esr45.2+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: