<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 4

•

3 years ago

The regressing bug 1757802 has been backed out

Flags: needinfo?(nika)

Pascal Chevrel:pascalc

Comment 5

•

3 years ago

Fixed by backout.

Status: NEW → RESOLVED

Closed: 3 years ago

status-firefox101: affected → fixed

Resolution: --- → FIXED

Updated

•

3 years ago

Target Milestone: --- → 101 Branch

Comment 6

•

3 years ago

Bob, can you check if the STR work for you?

Assignee: nobody → bobowencode

Flags: needinfo?(bobowencode)

Comment 7

•

3 years ago

(In reply to Gian-Carlo Pascutto [:gcp] from comment #6)

Bob, can you check if the STR work for you?

I must have been using a slightly older build.
I can reproduce this before the backout (STR from comment 2), but not after.

Flags: needinfo?(bobowencode)

Comment 8

•

3 years ago

(In reply to Bob Owen (:bobowen) from comment #7)

I can reproduce this before the backout (STR from comment 2), but not after.

Can we reproduce this with an ASan build with the Shmem patch (or from before the backout)? Hopefully that will give us a stack from when the memory was unmapped.

Flags: needinfo?(bobowencode)

Updated

•

3 years ago

Priority: -- → P2

Comment 9

•

3 years ago

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #8)

(In reply to Bob Owen (:bobowen) from comment #7)

I can reproduce this before the backout (STR from comment 2), but not after.

Can we reproduce this with an ASan build with the Shmem patch (or from before the backout)? Hopefully that will give us a stack from when the memory was unmapped.

I've tried using an ASan build from just before the backout, but I don't seem to get anything interesting logged out.

Flags: needinfo?(bobowencode)

Comment 10

•

2 years ago

From looking at some of the media code around shmems, it seems like there may be some weird interaction between the shmem recycler and the changes in the patch (https://searchfox.org/mozilla-central/rev/9902932742fcdce2c956eeb81fd38350f5394ab2/dom/media/ipc/ShmemRecycleAllocator.h#13-56). I don't know what behavior would have changed in a bad way around this, but perhaps there's some edge-case I'm not thinking of.

If we can get it to reproduce, it might be worth seeing if it stops being an issue after we stub out the recycler to just allocate a new shmem every time.

David Parks [:handyman]

Comment 11

•

2 years ago

•

Edited

These crashes happen on a page boundary. In fact, the minidump for the crash in comment 0 shows the code actually accesses nearby memory in the prior page, just before the crash address of 0x0000017bb390c000. Notably, the instructions are:

00007FFB22A5A464 0F E0 01             pavgb       mm0,mmword ptr [rcx]  
00007FFB22A5A467 0F E0 0C 11          pavgb       mm1,mmword ptr [rcx+rdx]  
00007FFB22A5A46B 0F E0 14 51          pavgb       mm2,mmword ptr [rcx+rdx*2]  
00007FFB22A5A46F 42 0F E0 1C 19       pavgb       mm3,mmword ptr [rcx+r11]

Where %rcx = 0x0000017BB390B87C, %rdx = 0x0000000000000280 and %r11 = 0000000000000780. The crash is on the last line. The address rcx+r11 works out to 0x17bb390bffc, which is 4 bytes before the crash address of 0x0000017bb390c000... and the instruction is reading 8 bytes. The other addresses that didn't crash are clearly on the same page as 0x17bb390bffc -- the page just before the invalid access.

Since Shmem can't be asked to allocate memory adjacent to other memory and this code looks to be treating all of this as one block of memory, this may not be about a freed Shmem. Our Shmems are bigger without bug 1757802's patch -- at least by 4 bytes since they would stick a 4-byte size at the end of the block. I don't think the size field is used outside of debug builds so the codec could just be ~~stomping on it~~ using garbage (it's a read access) and we don't notice.

Comment 12

•

2 years ago

Given that this now looks like a buffer overrun, I'm going to repurpose this bug and set its attributes accordingly.

Group: core-security

Severity: S2 → --

Status: RESOLVED → REOPENED

Has Regression Range: yes → ---

status-firefox100: unaffected → ---

status-firefox101: fixed → ---

status-firefox99: unaffected → ---

status-firefox-esr91: unaffected → ---

tracking-firefox101: blocking → ---

Keywords: crash, regression → csectype-bounds

OS: Windows 11 → Unspecified

Priority: P2 → --

No longer regressed by: 1757802

Resolution: FIXED → ---

Updated

•

2 years ago

Group: core-security → media-core-security

Comment 13

•

2 years ago

This reproduces reliably on a debug build on Linux, using the attached file; the guard pages are set up differently on debug builds (thanks to Nika for pointing that out). I've uploaded a recording to Pernosco, if that helps.

Comment 14

•

2 years ago

I've looked at the pernosco trace a bit, and identified which shmem region is being accessed out of bounds. There are some notes in the notebook which I've added to help with it. The shmem is a recycled shmem, and I've identified which recycle callback for the shmem region happened most recently as well (as well as the call which originally allocated it).

I'm guessing there's some subtle off-by-one error or similar in the decode logic, and I don't understand media well enough to figure out what it is, but hopefully with the context of this being a shmem texture and the stacks of where it is allocated etc. it'll be possible for someone who does know to figure out what is causing the issue.

Comment 15

•

2 years ago

I think I know what's going on here. Here's ff_vp9_avg8_8_mmxext, which if I understand the naming scheme (see init_fpel_func, and how it's used, and the comments on VP9DSPContext::mc) operates on an 8×n region of 8bpp data:

   0x7f63c0395aa0 <ff_vp9_avg8_8_mmxext>:       lea    (%rcx,%rcx,2),%rax
   0x7f63c0395aa4 <ff_vp9_avg8_8_mmxext+4>:     lea    (%rsi,%rsi,2),%r9
   0x7f63c0395aa8 <ff_vp9_avg8_8_mmxext.loop>:  movq   (%rdx),%mm0
   0x7f63c0395aab <ff_vp9_avg8_8_mmxext.loop+3>:        movq   (%rdx,%rcx,1),%mm1
   0x7f63c0395aaf <ff_vp9_avg8_8_mmxext.loop+7>:        movq   (%rdx,%rcx,2),%mm2
   0x7f63c0395ab3 <ff_vp9_avg8_8_mmxext.loop+11>:       movq   (%rdx,%rax,1),%mm3
   0x7f63c0395ab7 <ff_vp9_avg8_8_mmxext.loop+15>:       lea    (%rdx,%rcx,4),%rdx
   0x7f63c0395abb <ff_vp9_avg8_8_mmxext.loop+19>:       pavgb  (%rdi),%mm0
   0x7f63c0395abe <ff_vp9_avg8_8_mmxext.loop+22>:       pavgb  (%rdi,%rsi,1),%mm1
   0x7f63c0395ac2 <ff_vp9_avg8_8_mmxext.loop+26>:       pavgb  (%rdi,%rsi,2),%mm2
   0x7f63c0395ac6 <ff_vp9_avg8_8_mmxext.loop+30>:       pavgb  (%rdi,%r9,1),%mm3
   0x7f63c0395acb <ff_vp9_avg8_8_mmxext.loop+35>:       movq   %mm0,(%rdi)
   0x7f63c0395ace <ff_vp9_avg8_8_mmxext.loop+38>:       movq   %mm1,(%rdi,%rsi,1)
   0x7f63c0395ad2 <ff_vp9_avg8_8_mmxext.loop+42>:       movq   %mm2,(%rdi,%rsi,2)
   0x7f63c0395ad6 <ff_vp9_avg8_8_mmxext.loop+46>:       movq   %mm3,(%rdi,%r9,1)
   0x7f63c0395adb <ff_vp9_avg8_8_mmxext.loop+51>:       lea    (%rdi,%rsi,4),%rdi
   0x7f63c0395adf <ff_vp9_avg8_8_mmxext.loop+55>:       sub    $0x4,%r8d
   0x7f63c0395ae3 <ff_vp9_avg8_8_mmxext.loop+59>:       jne    0x7f63c0395aa8 <ff_vp9_avg8_8_mmxext.loop>
   0x7f63c0395ae5 <..@6767.branch_instr>:       repz ret

Here's the version that's crashing, intended for a 4×n area:

   0x7f63c0395a50 <ff_vp9_avg4_8_mmxext>:       lea    (%rcx,%rcx,2),%rax
   0x7f63c0395a54 <ff_vp9_avg4_8_mmxext+4>:     lea    (%rsi,%rsi,2),%r9
   0x7f63c0395a58 <ff_vp9_avg4_8_mmxext.loop>:  movd   (%rdx),%mm0
   0x7f63c0395a5b <ff_vp9_avg4_8_mmxext.loop+3>:        movd   (%rdx,%rcx,1),%mm1
   0x7f63c0395a5f <ff_vp9_avg4_8_mmxext.loop+7>:        movd   (%rdx,%rcx,2),%mm2
   0x7f63c0395a63 <ff_vp9_avg4_8_mmxext.loop+11>:       movd   (%rdx,%rax,1),%mm3
   0x7f63c0395a67 <ff_vp9_avg4_8_mmxext.loop+15>:       lea    (%rdx,%rcx,4),%rdx
   0x7f63c0395a6b <ff_vp9_avg4_8_mmxext.loop+19>:       pavgb  (%rdi),%mm0
   0x7f63c0395a6e <ff_vp9_avg4_8_mmxext.loop+22>:       pavgb  (%rdi,%rsi,1),%mm1
   0x7f63c0395a72 <ff_vp9_avg4_8_mmxext.loop+26>:       pavgb  (%rdi,%rsi,2),%mm2
=> 0x7f63c0395a76 <ff_vp9_avg4_8_mmxext.loop+30>:       pavgb  (%rdi,%r9,1),%mm3
   0x7f63c0395a7b <ff_vp9_avg4_8_mmxext.loop+35>:       movd   %mm0,(%rdi)
   0x7f63c0395a7e <ff_vp9_avg4_8_mmxext.loop+38>:       movd   %mm1,(%rdi,%rsi,1)
   0x7f63c0395a82 <ff_vp9_avg4_8_mmxext.loop+42>:       movd   %mm2,(%rdi,%rsi,2)
   0x7f63c0395a86 <ff_vp9_avg4_8_mmxext.loop+46>:       movd   %mm3,(%rdi,%r9,1)
   0x7f63c0395a8b <ff_vp9_avg4_8_mmxext.loop+51>:       lea    (%rdi,%rsi,4),%rdi
   0x7f63c0395a8f <ff_vp9_avg4_8_mmxext.loop+55>:       sub    $0x4,%r8d
   0x7f63c0395a93 <ff_vp9_avg4_8_mmxext.loop+59>:       jne    0x7f63c0395a58 <ff_vp9_avg4_8_mmxext.loop>
   0x7f63c0395a95 <..@6638.branch_instr>:       repz ret

Almost the same, but notice how the load from the first buffer and the write back to the second buffer are now a 4-byte movd instead of an 8-byte movq, but the pavgb is still reading 8 bytes. However, the results of reading those extra pixels aren't written back, so I think this isn't a security bug.

Specifically, it looks like we're trying to operate on a 4×4 region of a 1920×960 8bpp plane (a chroma plane from a 3840×1920 4:2:0 YUV image, apparently), and I found a variable uvoff with a value equal to 1920*956+1916, which is 4×4 from the end, so we're reading 4 pixels off the end of the image on the last line, which matches comment #11.

A workaround that seems to work: comment out this line of vp9dsp_init.c, so that we don't use the broken assembly function and fall back to something else (probably C code).

Updated

•

2 years ago

Assignee: bobowencode → nobody

Comment 16

•

2 years ago

:alwu, any chance you could take a look at this?

STR to easily reproduce the issue without patches from bug 1757802 is to load the testcase video from comment 2 in a debug build of Firefox. See comment 11, comment 14, and comment 15 for some debugging context.

Flags: needinfo?(alwu)

Comment 17

•

2 years ago

Ronald, can you have a look at this?

Flags: needinfo?(rsbultje)

Ronald S. Bultje

Comment 18

•

2 years ago

I can look, yes. I think the over-read is intentional because FFmpeg always over-allocates buffers, but it's possible that this isn't the case for user-supplied buffers. There's two ways to fix this: ask user-supplied buffers to be over-allocated also, or alternatively to indeed prevent the overread by adding one extra instruction for reading the 4 bytes before pavgb on the already-read data.

Flags: needinfo?(rsbultje)

Ronald S. Bultje

Comment 19

•

2 years ago

I'm assuming this would fix it:
http://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297028.html

Apparently the over-allocation is only present in FFmpeg's default allocator and is not part of the API or callback requirements, so the bug is in our decoder and the above patch is then the correct way to fix it. Could you test that for me on your end?

Updated

•

2 years ago

Flags: needinfo?(alwu)

Comment 20

•

2 years ago

(In reply to Ronald S. Bultje from comment #19)

I'm assuming this would fix it:
http://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297028.html

Apparently the over-allocation is only present in FFmpeg's default allocator and is not part of the API or callback requirements, so the bug is in our decoder and the above patch is then the correct way to fix it. Could you test that for me on your end?

The test case from comment 2 appears to run fine with that fix, thanks.

Comment 21

•

2 years ago

Looks like a second patch was posted to the ffmpeg mailing list: http://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297029.html

Ronald S. Bultje

Comment 22

•

2 years ago

Should be fixed upstream. I indeed slightly modified the patch. Let me know if there's anything else I can help with.

Daniel Veditz [:dveditz]

Updated

•

2 years ago

Group: media-core-security

Keywords: crash, testcase

Comment 23

•

2 years ago

Not sure who to ask to get an updated version of the code from ffmpeg in-tree. ni? :jimm who might be know/be able to.

Flags: needinfo?(jmathies)

Updated

•

2 years ago

Blocks: media-triage

Flags: needinfo?(jmathies)

Updated

•

2 years ago

Summary: Crash in [@ ff_vp9_avg4_8_mmxext] → Crash in [@ ff_vp9_avg4_8_mmxext] (ffmpeg update needed)

Comment 24

•

2 years ago

Last update was a couple months ago by Stransky. We're not sure level of complexity here, would be good to get this into updatebot's queue.

Severity: -- → S4

Priority: -- → P3

Assignee

Updated

•

2 years ago

Assignee: nobody → padenot

Assignee

Comment 25

•

2 years ago

I'm updating ffmpeg, it's building / working locally here on my linux box, but I have all OSes handy.

Assignee

Comment 26

•

2 years ago

Attached file Bug 1765480 - Remove the file ffvpx/FILES and prefer rsync to update ffvpx. r?alwu — Details

Assignee

Comment 27

•

2 years ago

Attached file Bug 1765480 - Overhaul ffvpx/README_MOZILLA. r?alwu — Details

Depends on D150970

Assignee

Comment 28

•

2 years ago

Attached file Bug 1765480 - Regenerate config* files for ffvpx on all platforms needed, splitting off `config_components.h`. r?alwu — Details

Depends on D150971

Assignee

Comment 29

•

2 years ago

Attached file Bug 1765480 - Update ffvpx to a recent ffmpeg version, reapply the in-tree patch, fix moz.build for the new files, fix the symbol files. r?alwu — Details

Depends on D150972

Assignee

Comment 30

•

2 years ago

Attached file Bug 1765480 - Switch in-tree FFVPX PDM to use header from version 59. r?alwu — Details

Depends on D150973

Assignee

Comment 31

•

2 years ago

Attached file Bug 1765480 - "send" before "receive"-ing when decoding audio using ffmpeg. r?alwu — Details

This is what the documentation says we should be doing (and it's clearly the
right thing to do). We miss decoding a packet otherwise.

Depends on D150974

Assignee

Updated

•

2 years ago

Blocks: 1757802

C.M.Chang[:chunmin]

Comment 32

•

2 years ago

Is it possible to update ffvpx via update-bot?

Assignee

Comment 33

•

2 years ago

The update process is a bit weird for now, but I'll have a go at it after landing this.

Updated

•

2 years ago

No longer blocks: media-triage

Push with failures: https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel&revision=ed10a546db4fab2e2d7824f96f73fe89902f7aff&selectedTaskRun=HjNHpszASUqi8913gI75dA.0

Assignee

Comment 34

•

2 years ago

Attached file Bug 1765480 - Conditionally include bsf, codec and parser list with CONFIG_* macros. r?alwu — Details

Depends on D150973

Phabricator Automation

Updated

•

2 years ago

Attachment #9285794 - Attachment description: WIP: Bug 1765480 - Conditionally include bsf, codec and parser list with CONFIG_* macros.r ?alwu → Bug 1765480 - Conditionally include bsf, codec and parser list with CONFIG_* macros. r?alwu

Pulsebot

Comment 35

•

2 years ago

Pushed by padenot@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e3017c8a70af Remove the file ffvpx/FILES and prefer rsync to update ffvpx. r=alwu https://hg.mozilla.org/integration/autoland/rev/37cfee2c325e Overhaul ffvpx/README_MOZILLA. r=alwu https://hg.mozilla.org/integration/autoland/rev/2979c28076f7 Regenerate config* files for ffvpx on all platforms needed, splitting off `config_components.h`. r=alwu https://hg.mozilla.org/integration/autoland/rev/e393cf609b9b Update ffvpx to a recent ffmpeg version, reapply the in-tree patch, fix moz.build for the new files, fix the symbol files. r=alwu https://hg.mozilla.org/integration/autoland/rev/c0efff24b361 Conditionally include bsf, codec and parser list with CONFIG_* macros. r=alwu https://hg.mozilla.org/integration/autoland/rev/3a362936969a Switch in-tree FFVPX PDM to use header from version 59. r=alwu https://hg.mozilla.org/integration/autoland/rev/ed10a546db4f "send" before "receive"-ing when decoding audio using ffmpeg. r=alwu

Atila Butkovits

Comment 36

•

2 years ago

Backed out for causing build bustages.

Backout link: https://hg.mozilla.org/integration/autoland/rev/96609971f3cf351e0f8d3fe309add7147df405eb

Failure log: https://treeherder.mozilla.org/logviewer?job_id=385027489&repo=autoland&lineNumber=36619

Flags: needinfo?(padenot)

Takanori MATSUURA

Updated

•

2 years ago

Comment 37

•

2 years ago

Pushed by padenot@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/94e2e399316f Remove the file ffvpx/FILES and prefer rsync to update ffvpx. r=alwu https://hg.mozilla.org/integration/autoland/rev/3ea1516df102 Overhaul ffvpx/README_MOZILLA. r=alwu https://hg.mozilla.org/integration/autoland/rev/f877e032405a Regenerate config* files for ffvpx on all platforms needed, splitting off `config_components.h`. r=alwu https://hg.mozilla.org/integration/autoland/rev/e1d1d4cc9835 Update ffvpx to a recent ffmpeg version, reapply the in-tree patch, fix moz.build for the new files, fix the symbol files. r=alwu https://hg.mozilla.org/integration/autoland/rev/677ab47a9f49 Conditionally include bsf, codec and parser list with CONFIG_* macros. r=alwu https://hg.mozilla.org/integration/autoland/rev/93bc1d846152 Switch in-tree FFVPX PDM to use header from version 59. r=alwu https://hg.mozilla.org/integration/autoland/rev/82d99be977dd "send" before "receive"-ing when decoding audio using ffmpeg. r=alwu

Cristian Tuns

Comment 38

•

2 years ago

•

Edited

Backout- > too big for the soft-freeze period
Backout link: https://hg.mozilla.org/integration/autoland/rev/e433aaa78ab4960fcd770860678223ba2221e8cc
Build bustage: https://treeherder.mozilla.org/logviewer?job_id=385131038&repo=autoland