Closed Bug 1374231 Opened 8 years ago Closed 7 years ago

Crash in RtlpWaitOnCriticalSection | RtlpEnterCriticalSectionContended | RtlEnterCriticalSection | CFrameSurfaceAllocator::ReleaseDXVASurface

Categories

(Core :: Graphics, defect, P1)

55 Branch
x86_64
Windows 10
defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox-esr52 --- wontfix
firefox55 --- wontfix
firefox56 blocking fixed
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix

People

(Reporter: philipp, Assigned: bas.schouten)

References

Details

(Keywords: crash, regression, Whiteboard: [gfx-noted])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is report bp-e45070de-d043-4b7f-8a65-7957d0170619. ============================================================= Crashing Thread (14) Frame Module Signature Source 0 ntdll.dll RtlpWaitOnCriticalSection 1 ntdll.dll RtlpEnterCriticalSectionContended 2 ntdll.dll RtlEnterCriticalSection 3 msvp9dec.dll CFrameSurfaceAllocator::ReleaseDXVASurface(unsigned char) 4 msvp9dec.dll CVP9SWinDXVAFrameManager::Invoke(IMFAsyncResult*) 5 rtworkq.dll CSerialWorkQueue::OnReplyAsyncCallback::Release() 6 ntdll.dll TppWorkpExecuteCallback 7 ntdll.dll TppWorkerThread 8 kernel32.dll BaseThreadInitThunk 9 ntdll.dll RtlUserThreadStart crash reports with this signature started regressing in frequency in 55.0a1 20170610030207 and subsequent builds. they all are from installations with an intel gpu and from firefox 64bit versions on windows 10 so far. this would be the pushlog from 20170610030207 to the day before: https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2017-06-09&tochange=91dc9525c422f11041da33b008b14a8117ed9a40 some av/graphics bugs getting fixed in that timeframe: https://bugzilla.mozilla.org/buglist.cgi?list_id=13640005&resolution=---&resolution=FIXED&resolution=INVALID&resolution=WONTFIX&resolution=DUPLICATE&resolution=WORKSFORME&resolution=INCOMPLETE&resolution=SUPPORT&resolution=EXPIRED&resolution=MOVED&chfieldto=2017-06-10&chfield=cf_status_firefox55&query_format=advanced&chfieldfrom=2017-06-09&chfieldvalue=fixed&component=Audio%2FVideo&component=Audio%2FVideo%3A%20MediaStreamGraph&component=Audio%2FVideo%3A%20Playback&component=Graphics&component=Graphics%3A%20Layers&component=Graphics%3A%20Text&component=Graphics%3A%20WebRender&product=Core
Looked into the suspected bugs, there is a few related (1370079 and 1370805). Diving deeper requires more insight on our video subsystem. Adding Jean-Yves.
Whiteboard: [gfx-noted]
https://crash-stats.mozilla.com/report/index/74a1f299-29ab-499f-9e67-a85190170725 1. With Servo enabled I went to https://www.youtube.com/playlist?list=PLFRSDckdQc1vs7FJS-it5nCreCQ3UGQEp Middle clicked on top 10 video links to open them in new tabs, one per second. The browser was somewhat struggling. When it came to the 11th one I left clicked to open it. The YouTube video didn't play. It showed a standard YouTube error about a video not being available alongside with a long (ex: ererbyg1dfdn458wn273) error code. There was no other visual indication of a crash. 2. Usually the 11th video plays without a crash. Sometimes if I let it play the first couple of seconds and quickly press the back button to go back to playlist page, the video will continue to play on the background. The tab (now displaying playlist) has the audio button indicating there is something playing. The button can be used to mute it. Audio only stops when I refresh the page. While this is happening, the red YouTube loading line on the very top of the page is stuck http://i.imgur.com/V1rvDRx.png I can't reproduce the crash but the audio being stuck can be reproduced. Should I file a separate bug for that 2nd part? Intel 630 graphics, Windows 10, 64-bit Nightly.
I was able to reproduce the playback issue but not the crash with a higher accuracy by changing the following: Open 15 tabs instead of 10. Don't let the 16th video play for a second before pressing the back button. Press back as soon as the tab (but not the video) loads. When the crash happens, no video will play on any other tab because of playback error message. The browser has to completely restart.
[Tracking Requested - why for this release]: it's still a bit early to tell, but this signature is jumping up to one of the top crashes (around 10% of browser crashes) in 56.0b12 that was released a few hours ago and might be a late breaking regression in this cycle. the only obvious graphics related change i could spot in b12 was bug 1352016 - could that be related? the crashes are still all coming from 64bit firefox installations on windows 10 with a particular intel gpu family: Adapter device id facet 1 0x5916 1120 70.44 % 2 0x5912 193 12.14 % 3 0x591b 147 9.25 % 4 0x5a85 56 3.52 % 5 0x5902 19 1.19 % 6 0x5a84 19 1.19 % 7 0x591e 10 0.63 % 8 0x5906 9 0.57 % 9 0x5926 9 0.57 % 10 0x591d 5 0.31 % 11 0x5927 2 0.13 %
Flags: needinfo?(jyavenard)
(In reply to [:philipp] from comment #4) > [Tracking Requested - why for this release]: > it's still a bit early to tell, but this signature is jumping up to one of > the top crashes (around 10% of browser crashes) in 56.0b12 that was released > a few hours ago and might be a late breaking regression in this cycle. > > the only obvious graphics related change i could spot in b12 was bug 1352016 > - could that be related? > > the crashes are still all coming from 64bit firefox installations on windows > 10 with a particular intel gpu family: > Adapter device id facet > 1 0x5916 1120 70.44 % > 2 0x5912 193 12.14 % > 3 0x591b 147 9.25 % > 4 0x5a85 56 3.52 % > 5 0x5902 19 1.19 % > 6 0x5a84 19 1.19 % > 7 0x591e 10 0.63 % > 8 0x5906 9 0.57 % > 9 0x5926 9 0.57 % > 10 0x591d 5 0.31 % > 11 0x5927 2 0.13 % I think it's extremely unlikely, that bug 1352016 had anything to do with the crash rate increase. It reverted to code we had back in 54. We can of course turn on that pref again and see if the crash rate goes down. Do we get another beta release in between to just trial this? Bas has worked on a similar crash in but 1397040, but chose not to uplift it. Bas, got an idea?
Flags: needinfo?(jyavenard) → needinfo?(bas)
The crash appears to come from Intel hardware VP9 decoder, only available on Intel 7th generation and later. From the comments earlier, it may be occurring when there are too many hardware decoders being in use, but this would have nothing to do with the pref change
this was the list of changes going into 56.0b12: https://mzl.la/2vVyxo4
If it wasn't for the regression range provided earlier, I would have thought bug 1397307 was the culprit.
Joe Is there a known issue with the VP9 Intel decoder that could yield crashes as seen above? Is there a limit to how many hardware decoder can be in use at the same Time? The only DXVA/VP9 change that occurred in 56, is thst we went back to using RGB frames instead of nv12. That is, the nv12 frame coming out of the decoder is converted to RGB using a sync texture. This is something we used to do in 54, but un 55, we just copy the NV12 surface into another. following video corruptions on some AMD and Nvidia GPU we went back to doing a NV12->RGB conversion/copy Blake, when was the suspend hidden video option disabled in 56?
Flags: needinfo?(joseph.k.olivas)
Flags: needinfo?(bwu)
(In reply to Jean-Yves Avenard [:jya] from comment #9) > > Blake, when was the suspend hidden video option disabled in 56? 56.0a1 on 7/27 @https://hg.mozilla.org/mozilla-central/rev/c95cc38d6065
Flags: needinfo?(bwu)
(In reply to Jean-Yves Avenard [:jya] from comment #5) > (In reply to [:philipp] from comment #4) > > [Tracking Requested - why for this release]: > > it's still a bit early to tell, but this signature is jumping up to one of > > the top crashes (around 10% of browser crashes) in 56.0b12 that was released > > a few hours ago and might be a late breaking regression in this cycle. > > > > the only obvious graphics related change i could spot in b12 was bug 1352016 > > - could that be related? > > > > the crashes are still all coming from 64bit firefox installations on windows > > 10 with a particular intel gpu family: > > Adapter device id facet > > 1 0x5916 1120 70.44 % > > 2 0x5912 193 12.14 % > > 3 0x591b 147 9.25 % > > 4 0x5a85 56 3.52 % > > 5 0x5902 19 1.19 % > > 6 0x5a84 19 1.19 % > > 7 0x591e 10 0.63 % > > 8 0x5906 9 0.57 % > > 9 0x5926 9 0.57 % > > 10 0x591d 5 0.31 % > > 11 0x5927 2 0.13 % > > I think it's extremely unlikely, that bug 1352016 had anything to do with > the crash rate increase. It reverted to code we had back in 54. > We can of course turn on that pref again and see if the crash rate goes > down. Do we get another beta release in between to just trial this? > > Bas has worked on a similar crash in but 1397040, but chose not to uplift it. > Bas, got an idea? We actually did end up uplifting it :-). Having said that the crash in the bug here doesn't seem to have anything to do with race conditions (i.e. the mainthread isn't working on anything graphics related in this case).
Flags: needinfo?(bas)
Blake, décode suspend would have prevented problems as per comment 2, as only when a video is visible would we have a decoder active
This crash jumped to a crazy volume in 56.0b12. Liz FYI.
Flags: needinfo?(lhenry)
bug 1388309 would have been another a/v-related bug in the changelog for b12.
Can we backout 1352016, we'll see soon enough if that was the problem.
Assignee: nobody → bas
Priority: P3 → P1
After talking to the team, there aren't any known issues or limitations on VP9 decode.
Flags: needinfo?(joseph.k.olivas)
Bas, what do you think, is this from bug 1352016, or something else?
Flags: needinfo?(lhenry) → needinfo?(bas)
rather than reversing bug 1352016 as-is (it does after all fix a serious regression introduced in 55), I'll wrap a very small change that keep the 55 behaviour for intel users. Be done in about 1h
The other alternative is to backout the beta only change of bug 1352016 and instead uplift the full 57 version which has been baking in central (and now beta) for several weeks.
Flags: needinfo?(lhenry)
(In reply to Jean-Yves Avenard [:jya] from comment #19) > The other alternative is to backout the beta only change of bug 1352016 and > instead uplift the full 57 version which has been baking in central (and now > beta) for several weeks. It's tempting to say that's better. Especially since we're shipping 57 as a performance release and.. well.. without NV12 surfaces, at least on my machine, video performance is pretty abysmal compared to Chrome and Edge.
Flags: needinfo?(bas)
(In reply to Bas Schouten (:bas.schouten) from comment #21) > (In reply to Jean-Yves Avenard [:jya] from comment #19) > > The other alternative is to backout the beta only change of bug 1352016 and > > instead uplift the full 57 version which has been baking in central (and now > > beta) for several weeks. > > It's tempting to say that's better. Especially since we're shipping 57 as a > performance release and.. well.. without NV12 surfaces, at least on my > machine, video performance is pretty abysmal compared to Chrome and Edge. Having said that, this is on an intel machine. Likely the performance implications of not using NV12 surfaces are less serious on NVidia and AMD cards with dedicated memory busses.
Comment on attachment 8910156 [details] Bug 1374231 - Work around intel drivers crashing for unknown reasons. https://reviewboard.mozilla.org/r/181652/#review187082
Attachment #8910156 - Flags: review?(bas) → review+
Sorry, I got versions mixed up, for 56, the intel only enabling of NV12 seems like a fine solution.
OK, thanks, please be super clear what you want to uplift to mozilla-release for 56.
Flags: needinfo?(lhenry) → needinfo?(jyavenard)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #25) > OK, thanks, please be super clear what you want to uplift to mozilla-release > for 56. I thought I had already answered. I have requested uplift of bug 1352016. If that is not deemed okay, then I will request uplift for this one. but 1352016 has been baking in 57 for two weeks now
Flags: needinfo?(jyavenard)
I think we should uplift the one in this bug, although I really like 1352016, our nightly audience isn't very representative. And for uplifting into RC2 we'd have to be really certain :s.
This is what I currently understand, please correct me if I'm wrong. If we land the patch on this bug we will enable NV12 on Intel, we will leave disabled NV12 on non-Intel. We are seeing crashes with Intel but no NV12, and we have previously seen crashes with non-Intel and NV12, so from the point of view of simply enabling/disabling functionality, this is the only option that remains. I suppose there is a chance of there being problems with Intel+NV12, but we haven't really seen those. I'd certainly rather see this bug's patch on release, given the timing of 56.
Bas, can you request uplift to m-r? Thanks.
Flags: needinfo?(bas)
(In reply to Milan Sreckovic [:milan] from comment #28) > This is what I currently understand, please correct me if I'm wrong. If we > land the patch on this bug we will enable NV12 on Intel, we will leave > disabled NV12 on non-Intel. We are seeing crashes with Intel but no NV12, > and we have previously seen crashes with non-Intel and NV12, so from the > point of view of simply enabling/disabling functionality, this is the only Core issue that bug attempted to fix in bug 1352016 was that on old AMD GPU or AMD drivers prior this year, or nvidia Tesla based GPU : we have video corruptions (either completely wrong colors, or green bars) So: Intel + RGB -> Crashes Intel + NV12 -> No report of issues NVidia/AMD + RGB -> Ok Nvidia/AMD + NV12 -> Video corruption 55 made NV12 the default (it was RGB in 54), but in 55 we also changed the way we convert the frame coming out of the decoder (NV12) into a RGB surface. I think it's the change in that copy/conversion that is causing issue with intel. If there's an issue with NV12+Intel, it's been there during the entire 55 lifetime
Comment on attachment 8910156 [details] Bug 1374231 - Work around intel drivers crashing for unknown reasons. Approval Request Comment [Feature/Bug causing the regression]: 1340398 [User impact if declined]: Some users will experience corrupted videos: wrong colours or big green lines around or within the video [Is this code covered by automated tests?]: no [Has the fix been verified in Nightly?]: no [Needs manual test from QE? If yes, steps to reproduce]: hard to say, we'll see the crash rate soon enough [List of other uplifts needed for the feature/fix]: no [Is the change risky?]: low. [Why is the change risky/not risky?]: hard to say these days. [String changes made/needed]: none
Attachment #8910156 - Flags: approval-mozilla-release?
Comment on attachment 8910156 [details] Bug 1374231 - Work around intel drivers crashing for unknown reasons. Let's try this since it sounds like it may fix the top crash while being less risky than the option in the other bug.
Attachment #8910156 - Flags: approval-mozilla-release? → approval-mozilla-release+
Flags: needinfo?(bas)
We see the crash in 57, and we don't use RGB there. https://crash-stats.mozilla.com/report/index/de2ddb30-2028-472e-8289-74b880170922 This leads me to think that the crash has ZERO to do with NV12 vs RGB and as such has no relation to bug 1352016 Joe, the issue only occurs on intel devices, all 7th gen intel. That is all intel with VP9 hardware decoding support. device id = 0x5916, 0x5912, 0x5a84 (this one is particular) Now, all those crashes occur in the GPU process, so it will recover, albeit disabling hardware acceleration. In any case, I have my doubts that we will see the crash rate drops in the new 56 build... The fix is unlikely to fix anything.
Flags: needinfo?(joseph.k.olivas)
Flags: needinfo?(bas)
:Bas what has changed between 55 and 56, where suddenly intel is crashing but didn't used to? VP9 HW decoding was enabled in Firefox 55, so this code was already in use there.
Looking at the various crash, things make little sense to me. Most of the time the sites that caused the crash doesn't have a video element at all (the crash is in msvp9dec.dll) and if there is, more often than not it's H264. The majority of non H264 sites seem to be VP8. Seeing that the web site almost exclusively using VP9 is YouTube, there are almost no youtube related crash. So, very puzzled.
(In reply to Jean-Yves Avenard [:jya] from comment #36) > Looking at the various crash, things make little sense to me. > > Most of the time the sites that caused the crash doesn't have a video > element at all (the crash is in msvp9dec.dll) > and if there is, more often than not it's H264. The majority of non H264 > sites seem to be VP8. > > Seeing that the web site almost exclusively using VP9 is YouTube, there are > almost no youtube related crash. > > So, very puzzled. Well, the patch here still makes our performance better for Intel devices so hey, that's good :). I can't think of any major changes in graphics land, Advanced Layers was only enabled in 57, so that can't be it. I'm unsure what the issue could be as well. It might be the crash happens when you switch -away- from a website with video? That would be harder to prove though.
Flags: needinfo?(bas)
I wonder if the crash rate increase correspond to a change on an intel driver update. That crash has been occurring for quite a while, and occurs daily...
Since the crash is happening in Microsoft components (msvp9dec.dll and ntdll.dll), are there other non-Intel gpu systems that are using msvp9dec.dll? It might be hard to pinpoint back to Intel drivers when they're not in the stack.
Flags: needinfo?(joseph.k.olivas)
I haven't seen any crashes using other than 7th gen Intel. They are by far the most common machines doing hardware VP9. I see the stack trace with vp8 videos, are they supposed to be HW accelerated too?
Yes, the Microsoft MFT handles both VP8 and VP9, so it could happen on either type. Has Microsoft been able to confirm that there is no issue on their MFT?
Bas, Chris : any contacts at Microsoft?
Flags: needinfo?(cpeterson)
Flags: needinfo?(bas)
(In reply to Jean-Yves Avenard [:jya] from comment #42) > Bas, Chris : any contacts at Microsoft? Yes. I emailed our Mozilla/Microsoft mailing list about this bug and CC'd you.
Flags: needinfo?(cpeterson)
Flags: needinfo?(bas)
Depends on: 1403063
Closing because no crashes reported for 12 weeks.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: