Closed
Bug 1293996
Opened 8 years ago
Closed 8 years ago
Crash in adapt_probs
Categories
(Core :: Audio/Video: Playback, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla52
People
(Reporter: philipp, Assigned: jya, NeedInfo)
References
(Blocks 1 open bug)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(2 files)
701 bytes,
patch
|
jya
:
review+
|
Details | Diff | Splinter Review |
58 bytes,
text/x-review-board-request
|
jya
:
review+
ritu
:
approval-mozilla-aurora+
ritu
:
approval-mozilla-beta+
|
Details |
This bug was filed from the Socorro interface and is report bp-864dc8d3-c840-4fb3-bcbe-256292160810. ============================================================= Crashing Thread (66) Frame Module Signature Source 0 mozavcodec.dll adapt_probs media/ffvpx/libavcodec/vp9.c:3929 1 mozavcodec.dll vp9_decode_frame media/ffvpx/libavcodec/vp9.c:4225 2 mozavcodec.dll frame_worker_thread media/ffvpx/libavcodec/pthread_frame.c:147 3 mozavcodec.dll win32thread_worker media/ffvpx/compat/w32pthreads.h:81 4 ucrtbase.dll _crt_at_quick_exit 5 kernel32.dll BaseThreadInitThunk 6 ntdll.dll __RtlUserThreadStart 7 ntdll.dll _RtlUserThreadStart this vp9 related crash is regressing in crash stats data since firefox 46 and seems to be increasing in numbers in 49.0b1 again. many user comments seem to reference playing content on youtube as source of the crash.
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(rbultje)
Flags: needinfo?(michael)
Comment 1•8 years ago
|
||
Pretty small volume, we won't take any fix as ride along in 48
Comment 2•8 years ago
|
||
@jya, @k17e - Can you find an owner for this? It's flagged as a carryover regression with no owner. Thanks.
Flags: needinfo?(jyavenard)
Flags: needinfo?(ajones)
Comment 3•8 years ago
|
||
if this is a libavcodec/vp9 bug then its probably ronald who knows this best, he is on vacation till end of august IIRC though. Either way, can this be reproduced somehow with command line ffmpeg?
Flags: needinfo?(michael)
Assignee | ||
Comment 4•8 years ago
|
||
I dont know if it's ever been reproduced. We can provide the memory dump if required on Windows. Easy to track with visual studio then.
Flags: needinfo?(jyavenard)
Updated•8 years ago
|
Flags: needinfo?(ajones)
Priority: -- → P1
(In reply to Jean-Yves Avenard [:jya] from comment #4) > We can provide the memory dump if required on Windows. We can't share memory dumps from Socorro outside of Mozilla. We'd need permission from the person submitting the crash.
Updated•8 years ago
|
Assignee: nobody → jyavenard
Updated•8 years ago
|
Reporter | ||
Updated•8 years ago
|
status-firefox52:
--- → affected
Comment 6•8 years ago
|
||
(99.15% in signature vs 00.53% overall) reason = EXCEPTION_INT_DIVIDE_BY_ZERO (93.45% in signature vs 19.21% overall) adapter_vendor_id = 0x1002 (60.24% in signature vs 00.77% overall) cpu_info = AuthenticAMD family 18 model 1 stepping 0 | 4 (89.70% in signature vs 45.25% overall) platform_pretty_version = Windows 7 (81.94% in signature vs 38.07% overall) platform_version = 6.1.7601 Service Pack 1 (00.61% in signature vs 34.52% overall) reason = EXCEPTION_ACCESS_VIOLATION_READ (33.21% in signature vs 00.48% overall) cpu_info = AuthenticAMD family 18 model 1 stepping 0 | 2 (29.21% in signature vs 00.16% overall) adapter_subsys_id = 061d1025 (28.61% in signature vs 00.17% overall) adapter_driver_version = 8.862.3.0 (24.00% in signature vs 00.22% overall) adapter_device_id = 0x964a (59.27% in signature vs 37.87% overall) bios_manufacturer = American Megatrends Inc. (04.61% in signature vs 23.11% overall) adapter_vendor_id = 0x10de It's often (93.45%) the same CPU: AuthenticAMD family 18 model 1 stepping 0 (with different number of cores).
Hi Jean-Yves, this crash is spiking on 49.0.1. Is there a fix in the works? I'd be happy to take an uplift in Beta50.
Flags: needinfo?(jyavenard)
Assignee | ||
Comment 8•8 years ago
|
||
Ronald, got any ideas about this one?
Flags: needinfo?(jyavenard) → needinfo?(rsbultje)
Comment 9•8 years ago
|
||
Can you provide a link to the exact source code file used in Firefox 49.0.1 for vp9.c, and can you also provide disassembly for the given function in the 49.0.1 build of Firefox? The source code is pretty obviously correct upstream (it does an explicit check for denominator being zero before doing the division), so maybe there's something funky going on like local patches or local miscompilation? We've had an AMD-specific crash before but that was related to longnops instead of shortnops and usually affected SIMD code. Looking at Chrome, there's a remarkably similar bug (claiming the same issue - technically, really! - in libvpx as well as ffmpeg) here: https://bugs.chromium.org/p/chromium/issues/detail?id=599899 Interesting quotes: "I suspect the issue may be related to something I had seen a while back, where crashes happened due to a hardware bug in certain family of AMD processors." (from Yaowu) It's unclear to me what the patch does exactly, but it seems like it's trying to add sign-extension instructions (adding padding, i.e. specifically to delay the pipeline) hoping that that will workaround the known CPU bug "665 Integer Divide Instruction May Cause Unpredictable Behavior". I suppose we can do that also although I have to admit I'm never a big fan of such "fixes".
Comment 10•8 years ago
|
||
The other patch referenced in the bug seem to split prob adaptation in two functions, where the return-if-zero is done before entering the function that does the divide. Since that introduces a call between the branch and the div (instead of call before branch+div), it works around the CPU bug also, and would do so without a slowdown. We can of course do both also. I suppose it's not easy to get my hands on access (SSH) to one of the affected machines to reproduce the crash and test potential workarounds?
Comment 11•8 years ago
|
||
Here's the source code: https://hg.mozilla.org/releases/mozilla-release/annotate/2d931a5eaf8a/media/ffvpx/libavcodec/vp9.c#l3929 And here's the disassembly around the crashing instruction for one of the crash reports (I've downloaded another one and it's pretty similar): 64f14624 <.data>: mozavcodec.dll+0xe4624 64f14624: 3b c2 cmp %edx,%eax 64f14626: 0f 47 c2 cmova %edx,%eax 64f14629: ba cd cc cc 0c mov $0xccccccd,%edx 64f1462e: f7 e2 mul %edx 64f14630: 0f a4 c2 07 shld $0x7,%eax,%edx 64f14634: 8b 44 24 3c mov 0x3c(%esp),%eax 64f14638: 2b c8 sub %eax,%ecx 64f1463a: 0f af d1 imul %ecx,%edx 64f1463d: 83 ea 80 sub $0xffffff80,%edx 64f14640: c1 ea 08 shr $0x8,%edx 64f14643: 02 d0 add %al,%dl 64f14645: 88 55 01 mov %dl,0x1(%ebp) 64f14648: 8b 07 mov (%edi),%eax 64f1464a: 2b d8 sub %eax,%ebx 64f1464c: 03 d8 add %eax,%ebx 64f1464e: 74 54 je 0x64f146a4 64f14650: 0f b6 4d 02 movzbl 0x2(%ebp),%ecx 64f14654: 33 d2 xor %edx,%edx 64f14656: 89 4c 24 3c mov %ecx,0x3c(%esp) 64f1465a: 8b cb mov %ebx,%ecx 64f1465c: c1 e0 08 shl $0x8,%eax 64f1465f: d1 e9 shr %ecx 64f14661: 03 c1 add %ecx,%eax 64f14663: f7 f3 div %ebx 64f14665: 8b c8 mov %eax,%ecx 64f14667: 83 f9 01 cmp $0x1,%ecx 64f1466a: 7d 05 jge 0x64f14671 64f1466c: 33 c9 xor %ecx,%ecx 64f1466e: 41 inc %ecx 64f1466f: eb 0a jmp 0x64f1467b 64f14671: b8 ff 00 00 00 mov $0xff,%eax 64f14676: 3b c8 cmp %eax,%ecx 64f14678: 0f 4f c8 cmovg %eax,%ecx 64f1467b: 6a 14 push $0x14 64f1467d: 58 pop %eax 64f1467e: 3b d8 cmp %eax,%ebx 64f14680: ba cd cc cc 0c mov $0xccccccd,%edx 64f14685: 0f 47 d8 cmova %eax,%ebx 64f14688: 8b c3 mov %ebx,%eax 64f1468a: f7 e2 mul %edx 64f1468c: 0f a4 c2 07 shld $0x7,%eax,%edx 64f14690: 8b 44 24 3c mov 0x3c(%esp),%eax 64f14694: 2b c8 sub %eax,%ecx 64f14696: 0f af d1 imul %ecx,%edx 64f14699: 83 ea 80 sub $0xffffff80,%edx 64f1469c: c1 ea 08 shr $0x8,%edx 64f1469f: 02 d0 add %al,%dl 64f146a1: 88 55 02 mov %dl,0x2(%ebp) 64f146a4: 8b 5f 10 mov 0x10(%edi),%ebx ^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^ 64f146a7: 8b 47 0c mov 0xc(%edi),%eax 64f146aa: 03 d8 add %eax,%ebx 64f146ac: 74 54 je 0x64f14702 64f146ae: 0f b6 4d 03 movzbl 0x3(%ebp),%ecx 64f146b2: 33 d2 xor %edx,%edx 64f146b4: 89 4c 24 3c mov %ecx,0x3c(%esp) 64f146b8: 8b cb mov %ebx,%ecx 64f146ba: c1 e0 08 shl $0x8,%eax 64f146bd: d1 e9 shr %ecx 64f146bf: 03 c1 add %ecx,%eax 64f146c1: f7 f3 div %ebx 64f146c3: 8b c8 mov %eax,%ecx 64f146c5: 83 f9 01 cmp $0x1,%ecx 64f146c8: 7d 05 jge 0x64f146cf 64f146ca: 33 c9 xor %ecx,%ecx 64f146cc: 41 inc %ecx 64f146cd: eb 0a jmp 0x64f146d9 64f146cf: b8 ff 00 00 00 mov $0xff,%eax 64f146d4: 3b c8 cmp %eax,%ecx 64f146d6: 0f 4f c8 cmovg %eax,%ecx 64f146d9: 6a 14 push $0x14 64f146db: 58 pop %eax 64f146dc: 3b d8 cmp %eax,%ebx 64f146de: ba cd cc cc 0c mov $0xccccccd,%edx 64f146e3: 0f 47 d8 cmova %eax,%ebx 64f146e6: 8b c3 mov %ebx,%eax 64f146e8: f7 e2 mul %edx 64f146ea: 0f a4 c2 07 shld $0x7,%eax,%edx 64f146ee: 8b 44 24 3c mov 0x3c(%esp),%eax 64f146f2: 2b c8 sub %eax,%ecx 64f146f4: 0f af d1 imul %ecx,%edx 64f146f7: 83 ea 80 sub $0xffffff80,%edx 64f146fa: c1 ea 08 shr $0x8,%edx 64f146fd: 02 d0 add %al,%dl 64f146ff: 88 55 03 mov %dl,0x3(%ebp) 64f14702: 8b 47 08 mov 0x8(%edi),%eax 64f14705: 2b f0 sub %eax,%esi 64f14707: 8d 1c 06 lea (%esi,%eax,1),%ebx 64f1470a: 85 db test %ebx,%ebx 64f1470c: 74 54 je 0x64f14762 64f1470e: 0f b6 4d 04 movzbl 0x4(%ebp),%ecx 64f14712: 33 d2 xor %edx,%edx 64f14714: 89 4c 24 3c mov %ecx,0x3c(%esp) 64f14718: 8b cb mov %ebx,%ecx 64f1471a: c1 e0 08 shl $0x8,%eax 64f1471d: d1 e9 shr %ecx 64f1471f: 03 c1 add %ecx,%eax 64f14721: f7 f3 div %ebx 64f14723: 8b .byte 0x8b The values of the registers at the time of the crash: Thread 37 (crashed) 0 mozavcodec.dll!adapt_probs [vp9.c:2d931a5eaf8a : 3929 + 0x0] eip = 0x64f146a4 esp = 0x123bf714 ebp = 0x00eb776f ebx = 0x00000000 esi = 0x00000000 edi = 0x00ebb2d8 eax = 0x00000000 ecx = 0x00000000 edx = 0x00000086 efl = 0x00010246 (In reply to Ronald S. Bultje from comment #10) > I suppose it's not easy to get my hands on access (SSH) to one of the > affected machines to reproduce the crash and test potential workarounds? Unfortunately yes. We would need to contact by email the people who reported crashes and ask them.
Reporter | ||
Comment 12•8 years ago
|
||
(In reply to Ronald S. Bultje from comment #9) > Looking at Chrome, there's a remarkably similar bug (claiming the same issue > - technically, really! - in libvpx as well as ffmpeg) here: > https://bugs.chromium.org/p/chromium/issues/detail?id=599899 > > Interesting quotes: "I suspect the issue may be related to something I had > seen a while back, where crashes happened due to a hardware bug in certain > family of AMD processors." (from Yaowu) ...and the cpu model that was mentioned there (AMD A6-3400M, https://bugs.chromium.org/p/chromium/issues/attachmentText?aid=232024) is the same one that we see failing in our crash stats data according to comment #6 as well.
Comment 13•8 years ago
|
||
The "problem" in this disass is that adapt_prob appears inlined, so branch-call-div wouldn't do anything. I'd really like SSH access to an affected machine to test potential fixes. Can we get one from either an affected user or can we contact AMD directly? Maybe AMD (since they have insight in the exact details of this bug) can suggest how we should change the code to work around it?
Assignee | ||
Comment 14•8 years ago
|
||
(In reply to Ronald S. Bultje from comment #9) > Can you provide a link to the exact source code file used in Firefox 49.0.1 > for vp9.c, and can you also provide disassembly for the given function in The link in the description gives you exactly the code where the crash occurred. It's using the revision you're interested in.. https://hg.mozilla.org/mozilla-unified/annotate/f21e48559ecd/media/ffvpx/libavcodec/vp9.c#l3929 The ffvpx code included is almost identical to what's found upstream. We have almost zero patches applied. Great care was taken to ensure we made as little modifications as possible so resyncing with upstream will be easy.
Comment 15•8 years ago
|
||
Thanks @jya, I indeed just wanted to confirm there were no strange firefox-specific modifications. I'm at this point pretty sure it's the AMD hardware bug... Not 100% sure how to go from there to a fix yet... Do you guys have suggestions or recommendations?
Assignee | ||
Comment 16•8 years ago
|
||
This is the only patch applied to our tree: https://dxr.mozilla.org/mozilla-central/source/media/ffvpx/changes.patch
Assignee | ||
Comment 17•8 years ago
|
||
Hi David... Didn't you identify an AMD bug a while ago and crafted a solution around it? Any suggestions?
Flags: needinfo?(dmajor)
Updated•8 years ago
|
Comment 18•8 years ago
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #17) > Hi David... > > Didn't you identify an AMD bug a while ago and crafted a solution around it? > Any suggestions? This looks exactly like the issue that Chrome encountered. How about taking their workaround as a starting point: https://chromium-review.googlesource.com/#/c/73570/
Flags: needinfo?(dmajor)
Comment 19•8 years ago
|
||
Going by what libvpx did, I propose the attached patch. However, as mentioned, I have no way of testing this patch so I'm not sure how to confirm that this does indeed fix the issue. Can we get in touch with one of the bug submitters and see if they can test a modified binary with this patch included to see if that resolves the crash?
Flags: needinfo?(rsbultje)
Comment 20•8 years ago
|
||
Another option is to take the patch and see if it fixes the crash in the wild (we would need to wait for the patch to reach beta or uplift it, since we don't have any report with Aurora or Nightly).
Hello Jean-Yves, as Marco said, could we get stabilize this patch on pre-beta (to catch unexpected fallouts) and then uplift to Beta50 to determine if it fixes the crash or not? Thoughts?
Flags: needinfo?(jyavenard)
Assignee | ||
Comment 22•8 years ago
|
||
I have no problem with the patch, it looks identical to the earlier code, just different order of operations. So there's nothing to lose
Flags: needinfo?(jyavenard)
Updated•8 years ago
|
Attachment #8798908 -
Flags: review?(jyavenard)
Comment 23•8 years ago
|
||
jya, can you take care of landing the patch or helping Ronald to do it?
Comment 24•8 years ago
|
||
Dear Firefox programmer, I had plenty of crashes when whatching vp9 encoded content on youtube, so I followed the bug reports and I ended here. One of my crash report is 852b8ad6-ff1b-4eb7-8f39-4ef152161011. I'll be happy to test a patched binary, and see if the issue is still ocurring. Best regards, Nicolas DEROUINEAU
Comment 25•8 years ago
|
||
Thanks Nicolas, this is great. Jya, given the nature of the bug, perhaps we should build Firefox on try with the official configuration and let Nicolas try that instead of a local build.
Flags: needinfo?(jyavenard)
Assignee | ||
Updated•8 years ago
|
Attachment #8798908 -
Flags: review?(jyavenard) → review+
Comment hidden (mozreview-request) |
Assignee | ||
Comment 27•8 years ago
|
||
mozreview-review |
Comment on attachment 8800178 [details] Bug 1293996: Workaround AMD hardware related crash. https://reviewboard.mozilla.org/r/85168/#review83740
Attachment #8800178 -
Flags: review?(jyavenard) → review+
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(jyavenard)
Assignee | ||
Comment 28•8 years ago
|
||
Windows 32 bits installer: https://archive.mozilla.org/pub/firefox/try-builds/jyavenard@mozilla.com-8af877fa2637fab950e9b2986886c5cbb8f207c1/try-win32/firefox-52.0a1.en-US.win32.installer.exe Windws 64 bits installer: https://archive.mozilla.org/pub/firefox/try-builds/jyavenard@mozilla.com-8af877fa2637fab950e9b2986886c5cbb8f207c1/try-win64/firefox-52.0a1.en-US.win64.installer.exe Thank you for testing !
Flags: needinfo?(rbultje) → needinfo?(nicodero72)
Comment 29•8 years ago
|
||
I have installed the nightly, and so far vp9 content decoding doesn't crash. However, I noticed that this bug happened after a while, so I'll keep some content running and keep you updated.
Comment 30•8 years ago
|
||
Pushed by jyavenard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/6154e22334bc Workaround AMD hardware related crash. r=jya
Comment 31•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/6154e22334bc
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Comment 32•8 years ago
|
||
[Tracking Requested - why for this release]: It's a low volume crash on Beta, but high volume on release. Fix verified by Nicolas (comment 29), testing with a larger population on Beta will confirm that the crash if fixed for everybody. The risk is pretty low, considering that the patch is just changing the order of operations.
tracking-firefox50:
--- → ?
Hello Jean-Yves, should we uplift this patch to Aurora51 and Beta50?
Flags: needinfo?(jyavenard)
Assignee | ||
Comment 35•8 years ago
|
||
Comment on attachment 8800178 [details] Bug 1293996: Workaround AMD hardware related crash. Approval Request Comment [Feature/regressing bug #]: ffvp9 [User impact if declined]: crashes on some AMD machines [Describe test coverage new/current, TreeHerder]: in central. manual check. Lots of Michu tests. [Risks and why]: can't think of any. It's just reordering instructions to go around AMD CPU pipeline bug. Fix is also found in chrome code [String/UUID change made/needed]: none
Attachment #8800178 -
Flags: approval-mozilla-beta?
Attachment #8800178 -
Flags: approval-mozilla-aurora?
Comment on attachment 8800178 [details] Bug 1293996: Workaround AMD hardware related crash. Crash fix that was verified by at least one person, Aurora51+, Beta50+
Attachment #8800178 -
Flags: approval-mozilla-beta?
Attachment #8800178 -
Flags: approval-mozilla-beta+
Attachment #8800178 -
Flags: approval-mozilla-aurora?
Attachment #8800178 -
Flags: approval-mozilla-aurora+
Comment 37•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/b1c2d90536fc
Comment 38•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/5ffa4de2f343
Comment 39•8 years ago
|
||
Needinfoing myself so I don't forget to check if this is actually fixed.
Flags: needinfo?(mcastelluccio)
Comment 41•8 years ago
|
||
Awesome. I've upstreamed the patch to ffmpeg git also. Thanks very much for your help and input, guys!
You need to log in
before you can comment on or make changes to this bug.
Description
•