Closed Bug 1881185 Opened 1 year ago Closed 10 months ago

Crash in [@ put_8tap_4h_43]

Categories

(Core :: Audio/Video: Playback, defect)

ARM64
All
defect

Tracking

()

RESOLVED FIXED
136 Branch
Tracking Status
firefox-esr128 --- wontfix
firefox124 --- wontfix
firefox125 --- wontfix
firefox133 --- wontfix
firefox134 --- wontfix
firefox135 --- fixed
firefox136 --- fixed

People

(Reporter: RyanVM, Assigned: az)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/8002f140-24bf-4294-b6ed-78f510240219

Reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS

Top 10 frames of crashing thread:

0  libmozavcodec.dylib  put_8tap_4h_43  
1  libmozavcodec.dylib  put_regular4_hv_neon  media/ffvpx/libavcodec/aarch64/vp9dsp_init_aarch64.c:100
2  libmozavcodec.dylib  mc_chroma_unscaled  media/ffvpx/libavcodec/vp9recon.c
2  libmozavcodec.dylib  inter_pred_8bpp  media/ffvpx/libavcodec/vp9_mc_template.c:418
3  libmozavcodec.dylib  inter_recon  media/ffvpx/libavcodec/vp9recon.c:593
3  libmozavcodec.dylib  ff_vp9_inter_recon_8bpp  media/ffvpx/libavcodec/vp9recon.c:648
4  libmozavcodec.dylib  ff_vp9_decode_block  media/ffvpx/libavcodec/vp9block.c:1397
5  libmozavcodec.dylib  decode_sb  media/ffvpx/libavcodec/vp9.c
6  libmozavcodec.dylib  decode_sb  media/ffvpx/libavcodec/vp9.c:1130
7  libmozavcodec.dylib  decode_sb  media/ffvpx/libavcodec/vp9.c:1135

Adding another signature that looks quite close to the first one, crash addresses also look similar.

Crash Signature: [@ put_8tap_4h_43] → [@ put_8tap_4h_43] [@ put_8tap_4h_34]

And more signatures, I suspect we might be looking at a single installation sending all of these.

Crash Signature: [@ put_8tap_4h_43] [@ put_8tap_4h_34] → [@ put_8tap_4h_43] [@ put_8tap_4h_34] [@ put_8tap_16h_43] [@ put_8tap_8h_34]

And one more... these crashes all seem to be coming from a single ARM-based machine.

Crash Signature: [@ put_8tap_4h_43] [@ put_8tap_4h_34] [@ put_8tap_16h_43] [@ put_8tap_8h_34] → [@ put_8tap_4h_43] [@ put_8tap_4h_34] [@ put_8tap_16h_43] [@ put_8tap_8h_34] [@ put_8tap_8h_43 ]
Hardware: Unspecified → ARM64
Severity: -- → S3

https://crash-pings.mozilla.org/ now shows this to be the nr. 1 topcrasher on macOS, affecting at least 250 clients and causing thousands of crashes. We're not seeing this on Socorro because it's in the RDD process so people never submit reports. Raising to S2.

Severity: S3 → S2

Added a couple more signatures which are clearly related.

Crash Signature: [@ put_8tap_4h_43] [@ put_8tap_4h_34] [@ put_8tap_16h_43] [@ put_8tap_8h_34] [@ put_8tap_8h_43 ] → [@ avg_8tap_16h_34] [@ put_8tap_16h_34] [@ put_8tap_16h_43] [@ put_8tap_4h_34] [@ put_8tap_4h_43] [@ put_8tap_8h_34] [@ put_8tap_8h_43]
See Also: → 1906684

Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit BugBot documentation.

Severity: S2 → S3

The crash volume is much higher than what we can see on Socorro. Crash ping data indicates that there are thousand of instances per week with hundreds of users being affected.

Severity: S3 → S2

Symbols for Linux 64-bit ARM builds show that this is a problem there too, so likely tied to the AArch64-specific code paths and not to any particular operating system. It might be worth looking into ffmpeg's tracker if they're already aware of this issue or not.

OS: macOS → All

From our mastodon conversation I gather that firefox allocates the frame buffers as '1280 * 640 + 640 * 320 + 640 * 320 == 300 * 4096', i.e. the buffer size matches the pixel data exactly? This triggers the issue.

There is a mismatch in FFmpeg's VP9 decoder on when to emulate edges and how much the 8tap NEON mc filter over reads from the reference frame. This is hidden within FFmpeg since its avcodec_default_get_buffer2() over allocates the planes by at least 31-bytes, see https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/HEAD:/libavcodec/get_buffer.c#l122

A quick workaround could be to do the same in firefox. Fixing the issue in FFmpeg might take a few days.

Since nightly and release are affected, beta will likely be affected too.
For more information, please visit BugBot documentation.

Thank you very much for your help here Janne.

Assignee: nobody → azebrowski
Status: NEW → ASSIGNED

We're not currently using libavcodec/arm/vp9mc_neon.S so it's not included in the attached patch.

media/ffvpx/libavcodec/vp9recon.c is generating a code coverage warning. Do we not test VP9 decoding in CI?

(In reply to Janne Grunau from comment #10)

There is a mismatch in FFmpeg's VP9 decoder on when to emulate edges and how much the 8tap NEON mc filter over reads from the reference frame. This is hidden within FFmpeg since its avcodec_default_get_buffer2() over allocates the planes by at least 31-bytes, see https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/HEAD:/libavcodec/get_buffer.c#l122

A quick workaround could be to do the same in firefox.

Hi guys. I've been discussing the patch with Janne upstream. Thanks to him for finding the root cause of the crash. Upstream, buffers are allocated with extra padding (31 bytes) to allow minor overreads in individual decoder implementations. Individual decoder implementations (including SIMD optimized variants for particular target systems) may use this at their discretion to do minor overreads, as we see here.

Janne's patch fixes the issue, but other issues may remain in other corners of the FFmpeg codebase. A more generic approach would be to over-allocate the buffer allocated by Firefox in the AVCodecContext.get_buffer() callback also - by the same 31 bytes also. This would be identical to what upstream does and would therefore be more future-proof.

Would you guys be open to such a change?

Janne's original patch-set was merged upstream yesterday. Providing padding in the buffer returned from the get_buffer() callback would still be desirable but VP9 now no longer relies on that. (Other codecs may, though.)

We should very much do the same as upstream indeed, thanks for taking the time to mention it.

I filed this as https://bugzilla.mozilla.org/show_bug.cgi?id=1940034.

Pushed by azebrowski@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/98d52ba85f81 Add and apply ffmpeg patch to fix VP9 decoding edge emulation and potential over-reads on aarch64. r=alwu
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → 136 Branch

The patch landed in nightly and beta is affected.
:az, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox135 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(azebrowski)

Comment on attachment 9444751 [details]
Bug 1881185 - Add and apply ffmpeg patch to fix VP9 decoding edge emulation and potential over-reads on aarch64.

Beta/Release Uplift Approval Request

  • User impact if declined/Reason for urgency: Top crasher on mac
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Patch from upstream where quite a few people have already looked at it
  • String changes made/needed:
  • Is Android affected?: Yes
Flags: needinfo?(azebrowski)
Attachment #9444751 - Flags: approval-mozilla-beta?

Comment on attachment 9444751 [details]
Bug 1881185 - Add and apply ffmpeg patch to fix VP9 decoding edge emulation and potential over-reads on aarch64.

Approved for 135.0b4.

Attachment #9444751 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: