Closed Bug 1765480 Opened 2 years ago Closed 2 years ago

Crash in [@ ff_vp9_avg4_8_mmxext] (ffmpeg update needed)

Categories

(Core :: Audio/Video, defect, P3)

defect

Tracking

()

RESOLVED FIXED
106 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- disabled
firefox104 --- disabled
firefox105 --- disabled
firefox106 --- fixed

People

(Reporter: gsvelto, Assigned: padenot)

References

(Blocks 2 open bugs)

Details

(Keywords: crash, csectype-bounds, testcase)

Crash Data

Attachments

(8 files)

Crash report: https://crash-stats.mozilla.org/report/index/9b784938-1970-4ed0-9af4-5d6a90220420

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames of crashing thread:

0 mozavcodec.dll ff_vp9_avg4_8_mmxext 
1 mozavcodec.dll inter_pred_8bpp media/ffvpx/libavcodec/vp9_mc_template.c:431
2 mozavcodec.dll ff_vp9_inter_recon_8bpp media/ffvpx/libavcodec/vp9recon.c:650
3 mozavcodec.dll ff_vp9_decode_block media/ffvpx/libavcodec/vp9block.c:1397
4 mozavcodec.dll decode_sb media/ffvpx/libavcodec/vp9.c:1107
5 mozavcodec.dll decode_sb media/ffvpx/libavcodec/vp9.c:1135
6 mozavcodec.dll decode_sb media/ffvpx/libavcodec/vp9.c:1135
7 mozavcodec.dll decode_sb media/ffvpx/libavcodec/vp9.c:1135
8 mozavcodec.dll vp9_decode_frame media/ffvpx/libavcodec/vp9.c:1737
9 mozavcodec.dll frame_worker_thread media/ffvpx/libavcodec/pthread_frame.c:219

This looks like a regression. Most crashes are on YouTube and in particular this video comes up in several reports coming from different users.

It looks like this is unrelated to any of my changes in that pushlog at least, I've tested backing out those three patches locally and can still reproduce the issue.

I've created a test case which I've attached to here. To reproduce on Windows, set "media.wmf.vp9.enabled" to false, then open the video and the access violation should occur after a second or so.

Flags: needinfo?(Zaggy1024)

Thank you, Zaggy.

Bisection points to bug 1757802 as the regressing change.

Has Regression Range: --- → yes
Has STR: --- → yes
Flags: needinfo?(nika)
Regressed by: 1757802

The regressing bug 1757802 has been backed out

Flags: needinfo?(nika)

Fixed by backout.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED

Bob, can you check if the STR work for you?

Assignee: nobody → bobowencode
Flags: needinfo?(bobowencode)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #6)

Bob, can you check if the STR work for you?

I must have been using a slightly older build.
I can reproduce this before the backout (STR from comment 2), but not after.

Flags: needinfo?(bobowencode)

(In reply to Bob Owen (:bobowen) from comment #7)

I can reproduce this before the backout (STR from comment 2), but not after.

Can we reproduce this with an ASan build with the Shmem patch (or from before the backout)? Hopefully that will give us a stack from when the memory was unmapped.

Flags: needinfo?(bobowencode)
Priority: -- → P2

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #8)

(In reply to Bob Owen (:bobowen) from comment #7)

I can reproduce this before the backout (STR from comment 2), but not after.

Can we reproduce this with an ASan build with the Shmem patch (or from before the backout)? Hopefully that will give us a stack from when the memory was unmapped.

I've tried using an ASan build from just before the backout, but I don't seem to get anything interesting logged out.

Flags: needinfo?(bobowencode)

From looking at some of the media code around shmems, it seems like there may be some weird interaction between the shmem recycler and the changes in the patch (https://searchfox.org/mozilla-central/rev/9902932742fcdce2c956eeb81fd38350f5394ab2/dom/media/ipc/ShmemRecycleAllocator.h#13-56). I don't know what behavior would have changed in a bad way around this, but perhaps there's some edge-case I'm not thinking of.

If we can get it to reproduce, it might be worth seeing if it stops being an issue after we stub out the recycler to just allocate a new shmem every time.

These crashes happen on a page boundary. In fact, the minidump for the crash in comment 0 shows the code actually accesses nearby memory in the prior page, just before the crash address of 0x0000017bb390c000. Notably, the instructions are:

00007FFB22A5A464 0F E0 01             pavgb       mm0,mmword ptr [rcx]  
00007FFB22A5A467 0F E0 0C 11          pavgb       mm1,mmword ptr [rcx+rdx]  
00007FFB22A5A46B 0F E0 14 51          pavgb       mm2,mmword ptr [rcx+rdx*2]  
00007FFB22A5A46F 42 0F E0 1C 19       pavgb       mm3,mmword ptr [rcx+r11]  

Where %rcx = 0x0000017BB390B87C, %rdx = 0x0000000000000280 and %r11 = 0000000000000780. The crash is on the last line. The address rcx+r11 works out to 0x17bb390bffc, which is 4 bytes before the crash address of 0x0000017bb390c000... and the instruction is reading 8 bytes. The other addresses that didn't crash are clearly on the same page as 0x17bb390bffc -- the page just before the invalid access.

Since Shmem can't be asked to allocate memory adjacent to other memory and this code looks to be treating all of this as one block of memory, this may not be about a freed Shmem. Our Shmems are bigger without bug 1757802's patch -- at least by 4 bytes since they would stick a 4-byte size at the end of the block. I don't think the size field is used outside of debug builds so the codec could just be stomping on it using garbage (it's a read access) and we don't notice.

Given that this now looks like a buffer overrun, I'm going to repurpose this bug and set its attributes accordingly.

Group: core-security
Severity: S2 → --
Status: RESOLVED → REOPENED
Has Regression Range: yes → ---
OS: Windows 11 → Unspecified
Priority: P2 → --
No longer regressed by: 1757802
Resolution: FIXED → ---
See Also: → 1757802
Target Milestone: 101 Branch → ---
Group: core-security → media-core-security

This reproduces reliably on a debug build on Linux, using the attached file; the guard pages are set up differently on debug builds (thanks to Nika for pointing that out). I've uploaded a recording to Pernosco, if that helps.

I've looked at the pernosco trace a bit, and identified which shmem region is being accessed out of bounds. There are some notes in the notebook which I've added to help with it. The shmem is a recycled shmem, and I've identified which recycle callback for the shmem region happened most recently as well (as well as the call which originally allocated it).

I'm guessing there's some subtle off-by-one error or similar in the decode logic, and I don't understand media well enough to figure out what it is, but hopefully with the context of this being a shmem texture and the stacks of where it is allocated etc. it'll be possible for someone who does know to figure out what is causing the issue.

I think I know what's going on here. Here's ff_vp9_avg8_8_mmxext, which if I understand the naming scheme (see init_fpel_func, and how it's used, and the comments on VP9DSPContext::mc) operates on an 8×n region of 8bpp data:

   0x7f63c0395aa0 <ff_vp9_avg8_8_mmxext>:       lea    (%rcx,%rcx,2),%rax
   0x7f63c0395aa4 <ff_vp9_avg8_8_mmxext+4>:     lea    (%rsi,%rsi,2),%r9
   0x7f63c0395aa8 <ff_vp9_avg8_8_mmxext.loop>:  movq   (%rdx),%mm0
   0x7f63c0395aab <ff_vp9_avg8_8_mmxext.loop+3>:        movq   (%rdx,%rcx,1),%mm1
   0x7f63c0395aaf <ff_vp9_avg8_8_mmxext.loop+7>:        movq   (%rdx,%rcx,2),%mm2
   0x7f63c0395ab3 <ff_vp9_avg8_8_mmxext.loop+11>:       movq   (%rdx,%rax,1),%mm3
   0x7f63c0395ab7 <ff_vp9_avg8_8_mmxext.loop+15>:       lea    (%rdx,%rcx,4),%rdx
   0x7f63c0395abb <ff_vp9_avg8_8_mmxext.loop+19>:       pavgb  (%rdi),%mm0
   0x7f63c0395abe <ff_vp9_avg8_8_mmxext.loop+22>:       pavgb  (%rdi,%rsi,1),%mm1
   0x7f63c0395ac2 <ff_vp9_avg8_8_mmxext.loop+26>:       pavgb  (%rdi,%rsi,2),%mm2
   0x7f63c0395ac6 <ff_vp9_avg8_8_mmxext.loop+30>:       pavgb  (%rdi,%r9,1),%mm3
   0x7f63c0395acb <ff_vp9_avg8_8_mmxext.loop+35>:       movq   %mm0,(%rdi)
   0x7f63c0395ace <ff_vp9_avg8_8_mmxext.loop+38>:       movq   %mm1,(%rdi,%rsi,1)
   0x7f63c0395ad2 <ff_vp9_avg8_8_mmxext.loop+42>:       movq   %mm2,(%rdi,%rsi,2)
   0x7f63c0395ad6 <ff_vp9_avg8_8_mmxext.loop+46>:       movq   %mm3,(%rdi,%r9,1)
   0x7f63c0395adb <ff_vp9_avg8_8_mmxext.loop+51>:       lea    (%rdi,%rsi,4),%rdi
   0x7f63c0395adf <ff_vp9_avg8_8_mmxext.loop+55>:       sub    $0x4,%r8d
   0x7f63c0395ae3 <ff_vp9_avg8_8_mmxext.loop+59>:       jne    0x7f63c0395aa8 <ff_vp9_avg8_8_mmxext.loop>
   0x7f63c0395ae5 <..@6767.branch_instr>:       repz ret 

Here's the version that's crashing, intended for a 4×n area:

   0x7f63c0395a50 <ff_vp9_avg4_8_mmxext>:       lea    (%rcx,%rcx,2),%rax
   0x7f63c0395a54 <ff_vp9_avg4_8_mmxext+4>:     lea    (%rsi,%rsi,2),%r9
   0x7f63c0395a58 <ff_vp9_avg4_8_mmxext.loop>:  movd   (%rdx),%mm0
   0x7f63c0395a5b <ff_vp9_avg4_8_mmxext.loop+3>:        movd   (%rdx,%rcx,1),%mm1
   0x7f63c0395a5f <ff_vp9_avg4_8_mmxext.loop+7>:        movd   (%rdx,%rcx,2),%mm2
   0x7f63c0395a63 <ff_vp9_avg4_8_mmxext.loop+11>:       movd   (%rdx,%rax,1),%mm3
   0x7f63c0395a67 <ff_vp9_avg4_8_mmxext.loop+15>:       lea    (%rdx,%rcx,4),%rdx
   0x7f63c0395a6b <ff_vp9_avg4_8_mmxext.loop+19>:       pavgb  (%rdi),%mm0
   0x7f63c0395a6e <ff_vp9_avg4_8_mmxext.loop+22>:       pavgb  (%rdi,%rsi,1),%mm1
   0x7f63c0395a72 <ff_vp9_avg4_8_mmxext.loop+26>:       pavgb  (%rdi,%rsi,2),%mm2
=> 0x7f63c0395a76 <ff_vp9_avg4_8_mmxext.loop+30>:       pavgb  (%rdi,%r9,1),%mm3
   0x7f63c0395a7b <ff_vp9_avg4_8_mmxext.loop+35>:       movd   %mm0,(%rdi)
   0x7f63c0395a7e <ff_vp9_avg4_8_mmxext.loop+38>:       movd   %mm1,(%rdi,%rsi,1)
   0x7f63c0395a82 <ff_vp9_avg4_8_mmxext.loop+42>:       movd   %mm2,(%rdi,%rsi,2)
   0x7f63c0395a86 <ff_vp9_avg4_8_mmxext.loop+46>:       movd   %mm3,(%rdi,%r9,1)
   0x7f63c0395a8b <ff_vp9_avg4_8_mmxext.loop+51>:       lea    (%rdi,%rsi,4),%rdi
   0x7f63c0395a8f <ff_vp9_avg4_8_mmxext.loop+55>:       sub    $0x4,%r8d
   0x7f63c0395a93 <ff_vp9_avg4_8_mmxext.loop+59>:       jne    0x7f63c0395a58 <ff_vp9_avg4_8_mmxext.loop>
   0x7f63c0395a95 <..@6638.branch_instr>:       repz ret 

Almost the same, but notice how the load from the first buffer and the write back to the second buffer are now a 4-byte movd instead of an 8-byte movq, but the pavgb is still reading 8 bytes. However, the results of reading those extra pixels aren't written back, so I think this isn't a security bug.

Specifically, it looks like we're trying to operate on a 4×4 region of a 1920×960 8bpp plane (a chroma plane from a 3840×1920 4:2:0 YUV image, apparently), and I found a variable uvoff with a value equal to 1920*956+1916, which is 4×4 from the end, so we're reading 4 pixels off the end of the image on the last line, which matches comment #11.

A workaround that seems to work: comment out this line of vp9dsp_init.c, so that we don't use the broken assembly function and fall back to something else (probably C code).

Assignee: bobowencode → nobody

:alwu, any chance you could take a look at this?

STR to easily reproduce the issue without patches from bug 1757802 is to load the testcase video from comment 2 in a debug build of Firefox. See comment 11, comment 14, and comment 15 for some debugging context.

Flags: needinfo?(alwu)

Ronald, can you have a look at this?

Flags: needinfo?(rsbultje)

I can look, yes. I think the over-read is intentional because FFmpeg always over-allocates buffers, but it's possible that this isn't the case for user-supplied buffers. There's two ways to fix this: ask user-supplied buffers to be over-allocated also, or alternatively to indeed prevent the overread by adding one extra instruction for reading the 4 bytes before pavgb on the already-read data.

Flags: needinfo?(rsbultje)

I'm assuming this would fix it:
http://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297028.html

Apparently the over-allocation is only present in FFmpeg's default allocator and is not part of the API or callback requirements, so the bug is in our decoder and the above patch is then the correct way to fix it. Could you test that for me on your end?

Flags: needinfo?(alwu)

(In reply to Ronald S. Bultje from comment #19)

I'm assuming this would fix it:
http://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297028.html

Apparently the over-allocation is only present in FFmpeg's default allocator and is not part of the API or callback requirements, so the bug is in our decoder and the above patch is then the correct way to fix it. Could you test that for me on your end?

The test case from comment 2 appears to run fine with that fix, thanks.

Looks like a second patch was posted to the ffmpeg mailing list: http://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297029.html

Should be fixed upstream. I indeed slightly modified the patch. Let me know if there's anything else I can help with.

Group: media-core-security
Keywords: crash, testcase

Not sure who to ask to get an updated version of the code from ffmpeg in-tree. ni? :jimm who might be know/be able to.

Flags: needinfo?(jmathies)
Blocks: media-triage
Flags: needinfo?(jmathies)
Summary: Crash in [@ ff_vp9_avg4_8_mmxext] → Crash in [@ ff_vp9_avg4_8_mmxext] (ffmpeg update needed)

Last update was a couple months ago by Stransky. We're not sure level of complexity here, would be good to get this into updatebot's queue.

Severity: -- → S4
Priority: -- → P3
Assignee: nobody → padenot

I'm updating ffmpeg, it's building / working locally here on my linux box, but I have all OSes handy.

Depends on D150970

This is what the documentation says we should be doing (and it's clearly the
right thing to do). We miss decoding a packet otherwise.

Depends on D150974

Blocks: 1757802

Is it possible to update ffvpx via update-bot?

The update process is a bit weird for now, but I'll have a go at it after landing this.

No longer blocks: media-triage
Attachment #9285794 - Attachment description: WIP: Bug 1765480 - Conditionally include bsf, codec and parser list with CONFIG_* macros.r ?alwu → Bug 1765480 - Conditionally include bsf, codec and parser list with CONFIG_* macros. r?alwu
Pushed by padenot@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e3017c8a70af
Remove the file ffvpx/FILES and prefer rsync to update ffvpx. r=alwu
https://hg.mozilla.org/integration/autoland/rev/37cfee2c325e
Overhaul ffvpx/README_MOZILLA. r=alwu
https://hg.mozilla.org/integration/autoland/rev/2979c28076f7
Regenerate config* files for ffvpx on all platforms needed, splitting off `config_components.h`. r=alwu
https://hg.mozilla.org/integration/autoland/rev/e393cf609b9b
Update ffvpx to a recent ffmpeg version, reapply the in-tree patch, fix moz.build for the new files, fix the symbol files. r=alwu
https://hg.mozilla.org/integration/autoland/rev/c0efff24b361
Conditionally include bsf, codec and parser list with CONFIG_* macros. r=alwu
https://hg.mozilla.org/integration/autoland/rev/3a362936969a
Switch in-tree FFVPX PDM to use header from version 59. r=alwu
https://hg.mozilla.org/integration/autoland/rev/ed10a546db4f
"send" before "receive"-ing when decoding audio using ffmpeg. r=alwu
See Also: 1757802
Pushed by padenot@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/94e2e399316f
Remove the file ffvpx/FILES and prefer rsync to update ffvpx. r=alwu
https://hg.mozilla.org/integration/autoland/rev/3ea1516df102
Overhaul ffvpx/README_MOZILLA. r=alwu
https://hg.mozilla.org/integration/autoland/rev/f877e032405a
Regenerate config* files for ffvpx on all platforms needed, splitting off `config_components.h`. r=alwu
https://hg.mozilla.org/integration/autoland/rev/e1d1d4cc9835
Update ffvpx to a recent ffmpeg version, reapply the in-tree patch, fix moz.build for the new files, fix the symbol files. r=alwu
https://hg.mozilla.org/integration/autoland/rev/677ab47a9f49
Conditionally include bsf, codec and parser list with CONFIG_* macros. r=alwu
https://hg.mozilla.org/integration/autoland/rev/93bc1d846152
Switch in-tree FFVPX PDM to use header from version 59. r=alwu
https://hg.mozilla.org/integration/autoland/rev/82d99be977dd
"send" before "receive"-ing when decoding audio using ffmpeg. r=alwu

Note to self to add configure files for Linux Desktop aarch64 when relanding.

There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:padenot, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit auto_nag documentation.

Flags: needinfo?(padenot)
Flags: needinfo?(alwu)

Per comment39, paul will add configure files for Linux Desktop aarch64 after he is back from PTO.

Flags: needinfo?(alwu)
Pushed by padenot@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6077ae857148
Remove the file ffvpx/FILES and prefer rsync to update ffvpx. r=alwu
https://hg.mozilla.org/integration/autoland/rev/7e2ce67588d3
Overhaul ffvpx/README_MOZILLA. r=alwu
https://hg.mozilla.org/integration/autoland/rev/104a811c42e5
Regenerate config* files for ffvpx on all platforms needed, splitting off `config_components.h`. r=alwu
https://hg.mozilla.org/integration/autoland/rev/50b0534f07a5
Update ffvpx to a recent ffmpeg version, reapply the in-tree patch, fix moz.build for the new files, fix the symbol files. r=alwu
https://hg.mozilla.org/integration/autoland/rev/a54225be60ac
Conditionally include bsf, codec and parser list with CONFIG_* macros. r=alwu
https://hg.mozilla.org/integration/autoland/rev/7176e2a3b1a0
Switch in-tree FFVPX PDM to use header from version 59. r=alwu
https://hg.mozilla.org/integration/autoland/rev/5403acde30c2
"send" before "receive"-ing when decoding audio using ffmpeg. r=alwu

Backed out 7 changesets (Bug 1765480) for causing build bustages on config.h.
Backout link
Push with failures <--> B
Failure Log
Also Bx failure log

Pushed by padenot@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/355596087041
Remove the file ffvpx/FILES and prefer rsync to update ffvpx. r=alwu
https://hg.mozilla.org/integration/autoland/rev/8b8b5d364961
Overhaul ffvpx/README_MOZILLA. r=alwu
https://hg.mozilla.org/integration/autoland/rev/c174bd2624e2
Regenerate config* files for ffvpx on all platforms needed, splitting off `config_components.h`. r=alwu
https://hg.mozilla.org/integration/autoland/rev/2b8ef32d5cbe
Update ffvpx to a recent ffmpeg version, reapply the in-tree patch, fix moz.build for the new files, fix the symbol files. r=alwu
https://hg.mozilla.org/integration/autoland/rev/86f5be2f7d24
Conditionally include bsf, codec and parser list with CONFIG_* macros. r=alwu
https://hg.mozilla.org/integration/autoland/rev/c0bdf286aeea
Switch in-tree FFVPX PDM to use header from version 59. r=alwu
https://hg.mozilla.org/integration/autoland/rev/1d959fb3ed39
"send" before "receive"-ing when decoding audio using ffmpeg. r=alwu
Regressions: 1786799
Regressions: 1786824
Regressions: 1787281
Regressions: 1793289
Regressions: 1796893
See Also: → 1793507
Blocks: 1799132
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: