Crash in dav1d_ipred_z1_avx2

RESOLVED FIXED in Firefox 66

Status

()

defect
P2
critical
RESOLVED FIXED
7 months ago
4 months ago

People

(Reporter: marcus.husar, Assigned: achronop)

Tracking

(Blocks 1 bug, {crash, regression})

66 Branch
mozilla66
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify +

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox64 unaffected, firefox65 unaffected, firefox66 fixed)

Details

(crash signature)

Attachments

(6 attachments)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:66.0) Gecko/20100101 Firefox/66.0

Steps to reproduce:

Enable dav1d in about:config:
media.av1.use-dav1d -> true

Enable AV1 playback on YouTube:
https://www.youtube.com/testtube

Play a video on AV1 beta playlist:
https://www.youtube.com/playlist?list=PLyqf6gJt7KuHBmeVzZteZUlNUQAVLwrZS

This happens since nightly of 2018-12-20. Before everything was fine.


Actual results:

Firefox is crashing; some crash reports:
https://crash-stats.mozilla.org/report/index/0882cda8-33b3-460c-899d-195820181221
https://crash-stats.mozilla.org/report/index/17763ac9-a2f9-455c-8d36-9de500181221
https://crash-stats.mozilla.org/report/index/5fbdf308-2e2c-49e8-b843-edf1f0181221
https://crash-stats.mozilla.org/report/index/befa5507-59a2-4489-970d-32c010181221


Expected results:

Firefox plays AV1 videos with dav1d.
Severity: normal → critical
Component: Untriaged → Audio/Video: Playback
Keywords: crash
Product: Firefox → Core
Crash Signature: [@ dav1d_ipred_z1_avx2 ]
I verify due to the crash report.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P2
How often do you have it? Is it at the beginning of the playback or at a later point, after seeking etc? Is it a local build or you have downloaded firefox from the official location? I try to repro here but I cannot. Can you please post the about:support in case I will see something helpful in there.
Flags: needinfo?(marcus.husar)
It happens right at the beginning when the video starts to play. I press play, the video buffers, and when it starts to play it crashes immediately. I’m not able to see one frame of the video.

It’s the official nightly build which is automatically updated by Mozilla. My system is Fedora 29 with nothing fancy. Maybe the problem is that I enabled WebRender. But the stacktrace doesn’t look like there is something else involved.
Flags: needinfo?(marcus.husar)
This seems to be also an AMD Ryzen CPU like bug 1516235
Blocks: dav1d
I did some testing. On December 28 I built a Firefox with debugging symbols from mozilla-central hg. But I couldn’t reproduce a crash.

> ac_add_options --disable-optimize
> ac_add_options --enable-debug

Then I compiled a normal build. Still not able to reproduce. Today I added a printf to dav1d_decode_frame(f) in decode.c to make shure that dav1d is running. And yes, it is running.

For all my tests I used the same profile with identical settings in about:config.

> ./mach run --profile ~/.mozilla/firefox/f4902ahb.default/

My Nightly automatically updated by Mozilla (2019-01-01) is still crashing when dav1d is enabled. There must be something different in Mozilla’s build system. I use plain Fedora 29. No adjustments like backported packages or self built packages.
Marcus, Thank you for the info. What nasm version do you use locally?

My nasm version is 2.13.03.

Thanks Marcus. Would be possible to roll back your nasm to version 2.13.01 and attempt a new build. There is a known bug in that version of nasm that might trigger those crash. If you cannot do it no worries, I will try it.

Now I downgraded to nasm 2.13.01. After that I restarted my machine and compiled everything again after deleting the obj-x86_64-pc-linux-gnu folder. With dav1d enabled I still can’t reproduce a crash with my usual test video (https://www.youtube.com/watch?v=KOOhPfMbuIQ with AV1 and Opus).

I looked into the Fedora src git. I used version 2.13.0.1-4. It has a fix for use-after-free and heap buffer overflow vulnerabilities. I’ll try to find an older version.

See: https://src.fedoraproject.org/cgit/rpms/nasm.git/log/?h=f27

Thank you very much for looking at this. I have done the same thing and I cannot crash. I am on fedora also and same version of nasm:
Last metadata expiration check: 1:22:37 ago on Tue 08 Jan 2019 04:43:20 PM EET.
Installed Packages
nasm.x86_64 2.13.01-4.fc27 @fedora

I have raised that issue in dav1d, they suggest that 2.13.1 should be sufficient. I am checking this comment [1].

[1] https://code.videolan.org/videolan/dav1d/issues/225#note_27075

Hmm also the nasm fix is for MachO64. We must look elsewhere for that crash ...

Now I just recompiled media/libdav1d. Still can’t reproduce a crash with nasm 2.13.01-1 (fc26).

Hi Marcus, this is a debug build of the latest Nightly (Linux x64) . Could you try it and tell me if it crashes for you? Thank you in advance.

https://queue.taskcluster.net/v1/task/LaHWIwiPQ4KZbphrYB3WKA/runs/0/artifacts/public/build/target.tar.bz2

This is the output of a crashing media playback thread from gdb. Gdb bt and disassemble.

Here’s another backtrace with line numbers. This should help a lot.

I used a python script to get some symbols for gdb (source patch/to/symbols.py; https://gist.github.com/luser/193572147c401c8a965c).

Marcus tried the debug build on his system and it did not crash, opt build keep crashing. The GDB outputs above are from the opt build, latest Nightly.

Output of gdb info all-registers.

A developer in a videolan bug report asked for more information. I’ll attach the output here to have all information in one place.

See https://code.videolan.org/videolan/dav1d/issues/235 for reference.

Thank you Marcus for following up. It's nice to have the all information in one place. I reported it in dav1d issue: https://code.videolan.org/videolan/dav1d/issues/235#note_27695

Somebody suggested that would be useful if you could connect on dav1d's channels on IRC, in case they need to debug interactively. I am forwarding this suggestion in case you can. Thanks!

Yes, I could conncect to dav1d’s channel on IRC. The question is, when is the right time? My timezone is CET respectively UTC+1. If a fast internet conncetion is needed, it would be possible from 9 to 5 (or later). Downloading large amounts of data at home takes too much time.

Regarding IRC I don't think it's necessary for now. Dav1d people resulted in that the crash is caused because of the stack alignment, it is not 32 bytes aligned as it should be. This is odd because we set all the necessary flags on the compiler in order to create 32 bytes alignment.

One solution would be to return to 16 bytes alignment which is the default. Actually, I will create a custom build with that and I'll let you know in order to test it if possible, since the problem does not appear here.

A second thing I would like you to try, when you have the time, is to test the latest Nightly. I have done a new import from upstream (in Bug 1520174) and that could affect it. The fix should be in your latest Nightly by now. No need to provide any GDB output just let us know if it is crashing for you or not.

Thanks!

See Also: → 1516235

A few minutes ago I updated to latest nightly. It should be the one from 11:48 (UTC?). Bug 1520174 has been merged 21 hours ago. So these fixes should be in the latest Nightly. The browser is still crashing when dav1d is used.

Thank you Marcus. In [1] you can find a Firefox for Linux 64 (opt build) with stack alignment set to 16 bytes (this is coming from run [2]). I have tested locally and it works. Can you please try it and tell us if it still crashes?

[1] https://queue.taskcluster.net/v1/task/LwRgAUn4RuGxT60GPzbvDA/runs/0/artifacts/public/build/target.tar.bz2
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=361d7db8622d82f45db7f3f60a345dcc1438f6e7&selectedJob=222447869

I confirm that the Firefox build from comment 22 works without a crash. Dav1d is activated.

Pushed by achronopoulos@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2a87e6d6e050
Configure 16 bytes stack alignment on Linux x86_64 dav1d builds. r=TD-Linux
Status: NEW → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Assignee: nobody → achronop
Flags: qe-verify+
Duplicate of this bug: 1516235
Crash Signature: [@ dav1d_ipred_z1_avx2 ] → [@ dav1d_ipred_z1_avx2 ] [@ dav1d_ipred_smooth_avx2] [@ dav1d_ipred_smooth_h_avx2]
You need to log in before you can comment on or make changes to this bug.