1352894 - Crash in ff_vp9_loop_filter_v_16_16_sse2

Reporter

Description

•

8 years ago

This bug was filed from the Socorro interface and is report bp-09052897-8633-4191-a8c8-1f93d2170331. ============================================================= There have been 71 occurrences of this crash in the past 7 days on Nightly, across 6 installations. And smaller numbers on other channels going back to FF47. jya, any ideas?

Flags: needinfo?(jyavenard)

Andrew McCreight [:mccr8]

Comment 1

•

8 years ago

If you search on crash-stats for ff_vp9_, there are at least 50 similar signatures, like ff_vp9_loop_filter_v_88_16_sse2, ff_vp9_idct_idct_32x32_add_ssse3, ff_vp9_idct_idct_32x32_add_avx, ff_vp9_loop_filter_h_16_16_sse2, etc.

Jean-Yves Avenard [:jya]

Comment 2

•

8 years ago

Ronald, any ideas? Is this something known that got fixed upstream and we could have missed during our code integration?

Flags: needinfo?(jyavenard) → needinfo?(rsbultje)

Ronald S. Bultje

Comment 3

•

8 years ago

Do you know what video was being watched (e.g. do we have access to the file, or can we contact the user) while the crash occurred? I'll review the code around the loop filter, but having a complete backtrace (with a "disass" around the assembly crash site) or file access would be very helpful.

Flags: needinfo?(rsbultje)

Jean-Yves Avenard [:jya]

Comment 4

•

8 years ago

Unfortunately, that particular crash report has no URL attached and due to privacy concern I wouldn't be able to provide much information anyway... Being VP9, there's a great chance that the website would be YouTube... Of the few reports that do have a URL attached, it is indeed YouTube In the past 7 days, there's been 194 crashes, 53% are on Windows 7 and the rest on Windows XP Two public URLs that caused the crash. https://www.youtube.com/watch?v=HrAnOqztv5w https://www.youtube.com/watch?v=hA6VrZbv8Ck Processors involved appear to always be either: GenuineIntel family 15 model 4 stepping 9 | 2 or: GenuineIntel family 15 model 4 stepping 3 | 2 so Pentium 4 (who can still use that in these days and age!??) all 32 bits Firefox, over 50% on Intel G41 express graphics

Ronald S. Bultje

Comment 5

•

8 years ago

I think the reason you see a p4 associated with it is because the crash is in a SSE2 function that has a SSSE3 counterpart. Anyone having a newer CPU would not see a crash in the SSE2 function, but either in the SSSE3 function (if it's a higher-up bug), or not at all (if the bug is specific to the SSE2 code). I see 192 crashes with the SSE2 version and 27 with the SSSE3 counterpart of the same function. It suggest it's not the specific SSE2 function that has a bug, but rather something higher-level (loopfilter template, loopfilter memory, ...). For both SSE2 and SSSE3, I clicked on a few raw dumps, and it indeed seems they're all in 32bit code. Would I be able to conclude that this means the bug is likely 32bit-specific? Then, looking at the raw dumps, there is an offset in the first frame of the crashing thread, can we somehow link that to a specific instruction in the binary (disassembly)? I've downloaded the first video using youtube-dl (HrAnOqztv5w ) at all VP9 resolutions (id=242, 243, 244, 247, 248, 271, 278, 313) and played them in 32bit ffmpeg restricted to SSE2 with address sanitizer, and everything worked fine: $ ls *.webm Flawless FULL COVERAGE Foundation Routine-HrAnOqztv5w.242.webm Flawless FULL COVERAGE Foundation Routine-HrAnOqztv5w.243.webm Flawless FULL COVERAGE Foundation Routine-HrAnOqztv5w.244.webm Flawless FULL COVERAGE Foundation Routine-HrAnOqztv5w.247.webm Flawless FULL COVERAGE Foundation Routine-HrAnOqztv5w.248.webm Flawless FULL COVERAGE Foundation Routine-HrAnOqztv5w.271.webm Flawless FULL COVERAGE Foundation Routine-HrAnOqztv5w.278.webm Flawless FULL COVERAGE Foundation Routine-HrAnOqztv5w.313.webm $ for n in *.webm; do ./ffmpeg -i "${n}" -f null -v error -nostats -; done $ git diff diff --git a/libavutil/cpu.c b/libavutil/cpu.c index 16e0c92..20d81db 100644 --- a/libavutil/cpu.c +++ b/libavutil/cpu.c @@ -93,7 +93,8 @@ int av_get_cpu_flags(void) flags = get_cpu_flags(); atomic_store_explicit(&cpu_flags, flags, memory_order_relaxed); } - return flags; + return flags & (AV_CPU_FLAG_MMX | AV_CPU_FLAG_MMXEXT | + AV_CPU_FLAG_SSE | AV_CPU_FLAG_SSE2); } void av_set_cpu_flags_mask(int mask) $ grep address config.mak CFLAGS=-m32 -std=c11 -mdynamic-no-pic -fomit-frame-pointer -pthread -g -Wdeclaration-after-statement -Wall -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wwrite-strings -Wtype-limits -Wundef -Wmissing-prototypes -Wno-pointer-to-int-cast -Wstrict-prototypes -Wempty-body -Wno-parentheses -Wno-switch -Wno-format-zero-length -Wno-pointer-sign -Wno-unused-const-variable -O0 -fsanitize=address -fno-math-errno -fno-signed-zeros -mstack-alignment=16 -Qunused-arguments -Werror=implicit-function-declaration -Werror=missing-prototypes -Werror=return-type LDFLAGS=-g -fsanitize=address -Wl,-dynamic,-search_paths_first -Qunused-arguments $ I realize there's some issues with this test: it's a 64bit system running a 32bit binary (the fact that the bug occurs only on 32bit binaries and has a far higher crash count in SSE2 than in SSSE3 functions makes me believe - by distribution - that that means the system was 32bit also, not a 64bit system running 32bit binaries), asan doesn't cover assembly (valgrind does, I believe, but unfortunately valgrind doesn't work on Mac Sierra), I'm on a Mac (not Windows). However, things like memory management inside ffvp9 do not really differ by architecture or system. So, some questions that go more to the higher level (where I'm suspecting the bug may lie): - can you reproduce the crash using a 32bit build on the videos above? - how do you guys allocate memory for AVFrame data[] planes? Do you use a custom callback or do you let FFmpeg allocate buffers internally? Assuming you're using a custom implementation, do you have a link to the code for that? Does it provide the same characteristics as avcodec_default_get_buffer2() in terms of plane/line padding, line/buffer alignment, etc.? If you remove the custom callback, and if you could reproduce the crash earlier, did it go away after removing the custom callback? - how is av_malloc() implemented in your (32bit Windows) build? Do you know if symbols like HAVE_POSIX_MEMALIGN, HAVE_ALIGNED_MALLOC or HAVE_MEMALIGN are available for that target platform? (I'm assuming that HAVE_ALIGNED_MALLOC is 1 and the rest is 0.)

Flags: needinfo?(jyavenard)

Jean-Yves Avenard [:jya]

Comment 6

•

8 years ago

The config file used for the win32 build can be found there: https://dxr.mozilla.org/mozilla-central/source/media/ffvpx/config_win32.h config.h was indeed produced on a 64 bits machine running Visual Studio SDK in 32 bits mode, using a FFmpeg checkout of the same version as what's being resynced. It's then manually copied into our own tree. The macro: /HAVE_(MALLOC_H|ARC4RANDOM|LOCALTIME_R|MEMALIGN|POSIX_MEMALIGN) are as set by Mozilla build system. I'm not sure on what those would be here. :glandium will know :glandium what would those be on windows 32 build?

Flags: needinfo?(jyavenard) → needinfo?(mh+mozilla)

Jean-Yves Avenard [:jya]

Comment 7

•

8 years ago

1- I haven't.. I don't have a 32 bits only machine available these days... 2- for the AVFrame if you're referring to the AVFrame used internally, we let FFmpeg manages the memory internally (we used to make use of callbacks but got rid of that over a year ago. If you're referring to the AVFrame passed to avcodec_decode_video2 where the result will be copied then the allocation of that one is done there: https://dxr.mozilla.org/mozilla-central/source/dom/media/platforms/ffmpeg/FFmpegDataDecoder.cpp#154

Mike Hommey [:glandium]

Comment 8

•

8 years ago

(In reply to Jean-Yves Avenard [:jya] from comment #6) > The config file used for the win32 build can be found there: > https://dxr.mozilla.org/mozilla-central/source/media/ffvpx/config_win32.h > > config.h was indeed produced on a 64 bits machine running Visual Studio SDK > in 32 bits mode, using a FFmpeg checkout of the same version as what's being > resynced. It's then manually copied into our own tree. > > The macro: /HAVE_(MALLOC_H|ARC4RANDOM|LOCALTIME_R|MEMALIGN|POSIX_MEMALIGN) > are as set by Mozilla build system. I'm not sure on what those would be > here. :glandium will know > > :glandium what would those be on windows 32 build? You can check yourself in the configure logs for windows 32 bit builds e.g. https://archive.mozilla.org/pub/firefox/nightly/2017/04/2017-04-06-03-02-06-mozilla-central/mozilla-central-win32-nightly-bm91-build1-build3.txt.gz for the last nightly 03:12:53 INFO - checking for malloc.h... yes 03:13:07 INFO - checking for memalign... no 03:13:07 INFO - checking for posix_memalign... no arc4random and localtime_r are not checked on windows at all (the tests are skipped entirely), so the defines are not set.

Flags: needinfo?(mh+mozilla)

Ronald S. Bultje

Comment 9

•

8 years ago

We've tried to do some tests with ffmpeg developers on this feature. On 32bit Mac with tsan, the whole thing is clear. The part where this is specific to windows/32bit makes me suspicious that it may be related to the manual alignment feature, but it's hard to know that for sure. In the logs that I looked at, the stack pointer was always 16-byte aligned. Would it be possible for you guys to run a representative firefox or ffmpeg build on such a file on a 32bit windows machine under tsan, valgrind (I don't know if either of that makes sense), drmemory or something similar? I'm hoping for some new insights that I can't get right now because of lack of combination of tools, machine etc.

Anthony Jones (:ajones, :kentuckyfriedtakahe, :k17e)

Updated

•

8 years ago

Priority: -- → P1

Emma Humphries ☕️🎸🧞‍♀️✨ (she/they) [:emceeaich] (Pacific Time) use needinfo

Comment 10

•

8 years ago

This is a P1 bug without an assignee. P1 are bugs which are being worked on for the current release cycle/iteration/sprint. If the bug is not assigned by Monday, 28 August, the bug's priority will be reset to '--'.

Keywords: stale-bug

Bulk Bug Changes for mreavy's org

Comment 11

•

8 years ago

Mass change P1->P2 to align with new Mozilla triage process

Priority: P1 → P2

Sylvestre Ledru [:Sylvestre]

Comment 12

•

6 years ago

Moving to p3 because no activity for at least 1 year(s). See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information

Priority: P2 → P3

Rares Doghi, Desktop QA

Updated

•

4 years ago

Whiteboard: qa-not-actionable

Gian-Carlo Pascutto [:gcp]

Comment 13

•

3 years ago

This is still getting the occasional report, e.g. https://crash-stats.mozilla.org/report/index/404b18c2-cd73-4823-8f71-a35b70211129

BugBot [:suhaib / :marco/ :calixte]

Comment 14

•

3 years ago

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED

Closed: 3 years ago

Resolution: --- → WORKSFORME

Bugzilla

Crash in ff_vp9_loop_filter_v_16_16_sse2

Categories

(Core :: Audio/Video: Playback, defect, P3)

Tracking

()

People

(Reporter: n.nethercote, Unassigned)

References

Details

(Keywords: crash, stale-bug, Whiteboard: qa-not-actionable)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Comment 11

Comment 12

Updated

Comment 13

Comment 14