Crash in [@ ff_vp9_avg4_8_mmxext] (ffmpeg update needed)
Categories
(Core :: Audio/Video, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox-esr102 | --- | disabled |
firefox104 | --- | disabled |
firefox105 | --- | disabled |
firefox106 | --- | fixed |
People
(Reporter: gsvelto, Assigned: padenot)
References
(Blocks 2 open bugs)
Details
(Keywords: crash, csectype-bounds, testcase)
Crash Data
Attachments
(8 files)
6.93 MB,
video/webm
|
Details | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/9b784938-1970-4ed0-9af4-5d6a90220420
Reason: EXCEPTION_ACCESS_VIOLATION_READ
Top 10 frames of crashing thread:
0 mozavcodec.dll ff_vp9_avg4_8_mmxext
1 mozavcodec.dll inter_pred_8bpp media/ffvpx/libavcodec/vp9_mc_template.c:431
2 mozavcodec.dll ff_vp9_inter_recon_8bpp media/ffvpx/libavcodec/vp9recon.c:650
3 mozavcodec.dll ff_vp9_decode_block media/ffvpx/libavcodec/vp9block.c:1397
4 mozavcodec.dll decode_sb media/ffvpx/libavcodec/vp9.c:1107
5 mozavcodec.dll decode_sb media/ffvpx/libavcodec/vp9.c:1135
6 mozavcodec.dll decode_sb media/ffvpx/libavcodec/vp9.c:1135
7 mozavcodec.dll decode_sb media/ffvpx/libavcodec/vp9.c:1135
8 mozavcodec.dll vp9_decode_frame media/ffvpx/libavcodec/vp9.c:1737
9 mozavcodec.dll frame_worker_thread media/ffvpx/libavcodec/pthread_frame.c:219
This looks like a regression. Most crashes are on YouTube and in particular this video comes up in several reports coming from different users.
Updated•3 years ago
|
Comment 1•3 years ago
|
||
Pushlog from last good to first bad Nightly build: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a1be0e0a7515604ac0a9bf5d08ac8cdb798867bc&tochange=61d8c578d367ea82c64fa5f55c8a4af54fe84c21
Zaggy, could this be from the change in bug 1764425?
It looks like this is unrelated to any of my changes in that pushlog at least, I've tested backing out those three patches locally and can still reproduce the issue.
I've created a test case which I've attached to here. To reproduce on Windows, set "media.wmf.vp9.enabled" to false, then open the video and the access violation should occur after a second or so.
Comment 3•3 years ago
|
||
Thank you, Zaggy.
Bisection points to bug 1757802 as the regressing change.
Comment 5•3 years ago
|
||
Fixed by backout.
Updated•3 years ago
|
Comment 6•3 years ago
|
||
Bob, can you check if the STR work for you?
Comment 7•3 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #6)
Bob, can you check if the STR work for you?
I must have been using a slightly older build.
I can reproduce this before the backout (STR from comment 2), but not after.
Comment 8•3 years ago
|
||
(In reply to Bob Owen (:bobowen) from comment #7)
I can reproduce this before the backout (STR from comment 2), but not after.
Can we reproduce this with an ASan build with the Shmem patch (or from before the backout)? Hopefully that will give us a stack from when the memory was unmapped.
Updated•3 years ago
|
Comment 9•3 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #8)
(In reply to Bob Owen (:bobowen) from comment #7)
I can reproduce this before the backout (STR from comment 2), but not after.
Can we reproduce this with an ASan build with the Shmem patch (or from before the backout)? Hopefully that will give us a stack from when the memory was unmapped.
I've tried using an ASan build from just before the backout, but I don't seem to get anything interesting logged out.
Comment 10•2 years ago
|
||
From looking at some of the media code around shmems, it seems like there may be some weird interaction between the shmem recycler and the changes in the patch (https://searchfox.org/mozilla-central/rev/9902932742fcdce2c956eeb81fd38350f5394ab2/dom/media/ipc/ShmemRecycleAllocator.h#13-56). I don't know what behavior would have changed in a bad way around this, but perhaps there's some edge-case I'm not thinking of.
If we can get it to reproduce, it might be worth seeing if it stops being an issue after we stub out the recycler to just allocate a new shmem every time.
Comment 11•2 years ago
•
|
||
These crashes happen on a page boundary. In fact, the minidump for the crash in comment 0 shows the code actually accesses nearby memory in the prior page, just before the crash address of 0x0000017bb390c000
. Notably, the instructions are:
00007FFB22A5A464 0F E0 01 pavgb mm0,mmword ptr [rcx]
00007FFB22A5A467 0F E0 0C 11 pavgb mm1,mmword ptr [rcx+rdx]
00007FFB22A5A46B 0F E0 14 51 pavgb mm2,mmword ptr [rcx+rdx*2]
00007FFB22A5A46F 42 0F E0 1C 19 pavgb mm3,mmword ptr [rcx+r11]
Where %rcx = 0x0000017BB390B87C
, %rdx = 0x0000000000000280
and %r11 = 0000000000000780
. The crash is on the last line. The address rcx+r11
works out to 0x17bb390bffc
, which is 4 bytes before the crash address of 0x0000017bb390c000
... and the instruction is reading 8 bytes. The other addresses that didn't crash are clearly on the same page as 0x17bb390bffc
-- the page just before the invalid access.
Since Shmem can't be asked to allocate memory adjacent to other memory and this code looks to be treating all of this as one block of memory, this may not be about a freed Shmem. Our Shmems are bigger without bug 1757802's patch -- at least by 4 bytes since they would stick a 4-byte size at the end of the block. I don't think the size field is used outside of debug builds so the codec could just be stomping on it using garbage (it's a read access) and we don't notice.
Comment 12•2 years ago
|
||
Given that this now looks like a buffer overrun, I'm going to repurpose this bug and set its attributes accordingly.
Updated•2 years ago
|
Comment 13•2 years ago
|
||
This reproduces reliably on a debug build on Linux, using the attached file; the guard pages are set up differently on debug builds (thanks to Nika for pointing that out). I've uploaded a recording to Pernosco, if that helps.
Comment 14•2 years ago
|
||
I've looked at the pernosco trace a bit, and identified which shmem region is being accessed out of bounds. There are some notes in the notebook which I've added to help with it. The shmem is a recycled shmem, and I've identified which recycle callback for the shmem region happened most recently as well (as well as the call which originally allocated it).
I'm guessing there's some subtle off-by-one error or similar in the decode logic, and I don't understand media well enough to figure out what it is, but hopefully with the context of this being a shmem texture and the stacks of where it is allocated etc. it'll be possible for someone who does know to figure out what is causing the issue.
Comment 15•2 years ago
|
||
I think I know what's going on here. Here's ff_vp9_avg8_8_mmxext
, which if I understand the naming scheme (see init_fpel_func
, and how it's used, and the comments on VP9DSPContext::mc
) operates on an 8×n region of 8bpp data:
0x7f63c0395aa0 <ff_vp9_avg8_8_mmxext>: lea (%rcx,%rcx,2),%rax
0x7f63c0395aa4 <ff_vp9_avg8_8_mmxext+4>: lea (%rsi,%rsi,2),%r9
0x7f63c0395aa8 <ff_vp9_avg8_8_mmxext.loop>: movq (%rdx),%mm0
0x7f63c0395aab <ff_vp9_avg8_8_mmxext.loop+3>: movq (%rdx,%rcx,1),%mm1
0x7f63c0395aaf <ff_vp9_avg8_8_mmxext.loop+7>: movq (%rdx,%rcx,2),%mm2
0x7f63c0395ab3 <ff_vp9_avg8_8_mmxext.loop+11>: movq (%rdx,%rax,1),%mm3
0x7f63c0395ab7 <ff_vp9_avg8_8_mmxext.loop+15>: lea (%rdx,%rcx,4),%rdx
0x7f63c0395abb <ff_vp9_avg8_8_mmxext.loop+19>: pavgb (%rdi),%mm0
0x7f63c0395abe <ff_vp9_avg8_8_mmxext.loop+22>: pavgb (%rdi,%rsi,1),%mm1
0x7f63c0395ac2 <ff_vp9_avg8_8_mmxext.loop+26>: pavgb (%rdi,%rsi,2),%mm2
0x7f63c0395ac6 <ff_vp9_avg8_8_mmxext.loop+30>: pavgb (%rdi,%r9,1),%mm3
0x7f63c0395acb <ff_vp9_avg8_8_mmxext.loop+35>: movq %mm0,(%rdi)
0x7f63c0395ace <ff_vp9_avg8_8_mmxext.loop+38>: movq %mm1,(%rdi,%rsi,1)
0x7f63c0395ad2 <ff_vp9_avg8_8_mmxext.loop+42>: movq %mm2,(%rdi,%rsi,2)
0x7f63c0395ad6 <ff_vp9_avg8_8_mmxext.loop+46>: movq %mm3,(%rdi,%r9,1)
0x7f63c0395adb <ff_vp9_avg8_8_mmxext.loop+51>: lea (%rdi,%rsi,4),%rdi
0x7f63c0395adf <ff_vp9_avg8_8_mmxext.loop+55>: sub $0x4,%r8d
0x7f63c0395ae3 <ff_vp9_avg8_8_mmxext.loop+59>: jne 0x7f63c0395aa8 <ff_vp9_avg8_8_mmxext.loop>
0x7f63c0395ae5 <..@6767.branch_instr>: repz ret
Here's the version that's crashing, intended for a 4×n area:
0x7f63c0395a50 <ff_vp9_avg4_8_mmxext>: lea (%rcx,%rcx,2),%rax
0x7f63c0395a54 <ff_vp9_avg4_8_mmxext+4>: lea (%rsi,%rsi,2),%r9
0x7f63c0395a58 <ff_vp9_avg4_8_mmxext.loop>: movd (%rdx),%mm0
0x7f63c0395a5b <ff_vp9_avg4_8_mmxext.loop+3>: movd (%rdx,%rcx,1),%mm1
0x7f63c0395a5f <ff_vp9_avg4_8_mmxext.loop+7>: movd (%rdx,%rcx,2),%mm2
0x7f63c0395a63 <ff_vp9_avg4_8_mmxext.loop+11>: movd (%rdx,%rax,1),%mm3
0x7f63c0395a67 <ff_vp9_avg4_8_mmxext.loop+15>: lea (%rdx,%rcx,4),%rdx
0x7f63c0395a6b <ff_vp9_avg4_8_mmxext.loop+19>: pavgb (%rdi),%mm0
0x7f63c0395a6e <ff_vp9_avg4_8_mmxext.loop+22>: pavgb (%rdi,%rsi,1),%mm1
0x7f63c0395a72 <ff_vp9_avg4_8_mmxext.loop+26>: pavgb (%rdi,%rsi,2),%mm2
=> 0x7f63c0395a76 <ff_vp9_avg4_8_mmxext.loop+30>: pavgb (%rdi,%r9,1),%mm3
0x7f63c0395a7b <ff_vp9_avg4_8_mmxext.loop+35>: movd %mm0,(%rdi)
0x7f63c0395a7e <ff_vp9_avg4_8_mmxext.loop+38>: movd %mm1,(%rdi,%rsi,1)
0x7f63c0395a82 <ff_vp9_avg4_8_mmxext.loop+42>: movd %mm2,(%rdi,%rsi,2)
0x7f63c0395a86 <ff_vp9_avg4_8_mmxext.loop+46>: movd %mm3,(%rdi,%r9,1)
0x7f63c0395a8b <ff_vp9_avg4_8_mmxext.loop+51>: lea (%rdi,%rsi,4),%rdi
0x7f63c0395a8f <ff_vp9_avg4_8_mmxext.loop+55>: sub $0x4,%r8d
0x7f63c0395a93 <ff_vp9_avg4_8_mmxext.loop+59>: jne 0x7f63c0395a58 <ff_vp9_avg4_8_mmxext.loop>
0x7f63c0395a95 <..@6638.branch_instr>: repz ret
Almost the same, but notice how the load from the first buffer and the write back to the second buffer are now a 4-byte movd
instead of an 8-byte movq
, but the pavgb
is still reading 8 bytes. However, the results of reading those extra pixels aren't written back, so I think this isn't a security bug.
Specifically, it looks like we're trying to operate on a 4×4 region of a 1920×960 8bpp plane (a chroma plane from a 3840×1920 4:2:0 YUV image, apparently), and I found a variable uvoff
with a value equal to 1920*956+1916
, which is 4×4 from the end, so we're reading 4 pixels off the end of the image on the last line, which matches comment #11.
A workaround that seems to work: comment out this line of vp9dsp_init.c
, so that we don't use the broken assembly function and fall back to something else (probably C code).
Updated•2 years ago
|
Comment 16•2 years ago
|
||
:alwu, any chance you could take a look at this?
STR to easily reproduce the issue without patches from bug 1757802 is to load the testcase video from comment 2 in a debug build of Firefox. See comment 11, comment 14, and comment 15 for some debugging context.
Comment 18•2 years ago
|
||
I can look, yes. I think the over-read is intentional because FFmpeg always over-allocates buffers, but it's possible that this isn't the case for user-supplied buffers. There's two ways to fix this: ask user-supplied buffers to be over-allocated also, or alternatively to indeed prevent the overread by adding one extra instruction for reading the 4 bytes before pavgb on the already-read data.
Comment 19•2 years ago
|
||
I'm assuming this would fix it:
http://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297028.html
Apparently the over-allocation is only present in FFmpeg's default allocator and is not part of the API or callback requirements, so the bug is in our decoder and the above patch is then the correct way to fix it. Could you test that for me on your end?
Updated•2 years ago
|
Comment 20•2 years ago
|
||
(In reply to Ronald S. Bultje from comment #19)
I'm assuming this would fix it:
http://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297028.htmlApparently the over-allocation is only present in FFmpeg's default allocator and is not part of the API or callback requirements, so the bug is in our decoder and the above patch is then the correct way to fix it. Could you test that for me on your end?
The test case from comment 2 appears to run fine with that fix, thanks.
Comment 21•2 years ago
|
||
Looks like a second patch was posted to the ffmpeg mailing list: http://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297029.html
Comment 22•2 years ago
|
||
Should be fixed upstream. I indeed slightly modified the patch. Let me know if there's anything else I can help with.
Updated•2 years ago
|
Comment 23•2 years ago
|
||
Not sure who to ask to get an updated version of the code from ffmpeg in-tree. ni? :jimm who might be know/be able to.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 24•2 years ago
|
||
Last update was a couple months ago by Stransky. We're not sure level of complexity here, would be good to get this into updatebot's queue.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 25•2 years ago
|
||
I'm updating ffmpeg, it's building / working locally here on my linux box, but I have all OSes handy.
Assignee | ||
Comment 26•2 years ago
|
||
Assignee | ||
Comment 27•2 years ago
|
||
Depends on D150970
Assignee | ||
Comment 28•2 years ago
|
||
Depends on D150971
Assignee | ||
Comment 29•2 years ago
|
||
Depends on D150972
Assignee | ||
Comment 30•2 years ago
|
||
Depends on D150973
Assignee | ||
Comment 31•2 years ago
|
||
This is what the documentation says we should be doing (and it's clearly the
right thing to do). We miss decoding a packet otherwise.
Depends on D150974
Comment 32•2 years ago
|
||
Is it possible to update ffvpx via update-bot?
Assignee | ||
Comment 33•2 years ago
|
||
The update process is a bit weird for now, but I'll have a go at it after landing this.
Updated•2 years ago
|
Assignee | ||
Comment 34•2 years ago
|
||
Depends on D150973
Updated•2 years ago
|
Comment 35•2 years ago
|
||
Comment 36•2 years ago
|
||
Backed out for causing build bustages.
Backout link: https://hg.mozilla.org/integration/autoland/rev/96609971f3cf351e0f8d3fe309add7147df405eb
Failure log: https://treeherder.mozilla.org/logviewer?job_id=385027489&repo=autoland&lineNumber=36619
Comment 37•2 years ago
|
||
Comment 38•2 years ago
•
|
||
Backout- > too big for the soft-freeze period
Backout link: https://hg.mozilla.org/integration/autoland/rev/e433aaa78ab4960fcd770860678223ba2221e8cc
Build bustage: https://treeherder.mozilla.org/logviewer?job_id=385131038&repo=autoland
Assignee | ||
Comment 39•2 years ago
|
||
Note to self to add configure files for Linux Desktop aarch64 when relanding.
Comment 40•2 years ago
|
||
There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:padenot, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit auto_nag documentation.
Comment 41•2 years ago
|
||
Per comment39, paul will add configure files for Linux Desktop aarch64 after he is back from PTO.
Comment 42•2 years ago
|
||
Comment 43•2 years ago
•
|
||
Backed out 7 changesets (Bug 1765480) for causing build bustages on config.h.
Backout link
Push with failures <--> B
Failure Log
Also Bx failure log
Comment 44•2 years ago
|
||
Comment 45•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/355596087041
https://hg.mozilla.org/mozilla-central/rev/8b8b5d364961
https://hg.mozilla.org/mozilla-central/rev/c174bd2624e2
https://hg.mozilla.org/mozilla-central/rev/2b8ef32d5cbe
https://hg.mozilla.org/mozilla-central/rev/86f5be2f7d24
https://hg.mozilla.org/mozilla-central/rev/c0bdf286aeea
https://hg.mozilla.org/mozilla-central/rev/1d959fb3ed39
Updated•2 years ago
|
Description
•