Closed Bug 1267970 Opened 8 years ago Closed 8 years ago

crash in atiumdag.dll@0x6d4a7 (atiumdag.dll | atiu9pag.dll | TrimNotificationCallback)

Categories

(Core :: Audio/Video: Playback, defect, P1)

38 Branch
x86
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox46 --- wontfix
firefox47 --- wontfix
firefox48 + wontfix
firefox49 + fixed
firefox-esr38 --- wontfix
firefox-esr45 --- wontfix
firefox50 + fixed
firefox51 + fixed

People

(Reporter: marco, Assigned: mozbugz)

References

Details

(Keywords: crash, topcrash, Whiteboard: [gfx-noted])

Crash Data

Attachments

(5 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-e92c1354-477a-4556-84a7-50d7c2160427.
=============================================================

This crash has been spiking recently (since 2016-04-23).

The crash is in atiumdag.dll, which is an ATI library.

Many crashes contain the error "Too many dropped/corrupted frames, disabling DXVA".

Most crashes happen with the driver version 15.300.1025.1001 (91.29 % during the last week). In the last year, 99.07% crashes were with 15.300.1025.1001 or 15.300.1025.0.
Attached image 1 year trend.png (obsolete) —
I looked at a year's worth of data and it's definitely rising but I wouldn't say it's spiked recently. I'd categorize it as more of a gradual increase, possibly as more AMD users adopt Windows 10.
Attached image graph.png
I'm not sure if I'm doing something wrong, I have started using Socorro only recently.

This is the graph where I saw the spike (https://crash-stats.mozilla.com/signature/?date=%3E%3D2015-04-28T19%3A11%3A35.179370&date=%3C2016-04-27T19%3A11%3A35.179370&signature=atiumdag.dll%400x6d4a7#graphs).
In addition to comment 1, this signature goes back at least to Firefox 38. The driver correlations Marco mentions in comment 0 are still true and mostly although not exclusively in the Radeon HD 8000 series of chipsets (E-Series APU revision).

Note that there is an updated driver available as of March 27, 2016 and this version does not show up in the crash stats for this signature. We should probably just block the affected driver versions in this case.

Milan, can you take care of this?
Flags: needinfo?(milan)
Whiteboard: [gfx-noted]
(In reply to Marco Castelluccio [:marco] from comment #2)
> Created attachment 8746145 [details]
> graph.png
> 
> I'm not sure if I'm doing something wrong, I have started using Socorro only
> recently.
> 
> This is the graph where I saw the spike
> (https://crash-stats.mozilla.com/signature/?date=%3E%3D2015-04-
> 28T19%3A11%3A35.179370&date=%3C2016-04-27T19%3A11%3A35.
> 179370&signature=atiumdag.dll%400x6d4a7#graphs).

You're right, I was tracking all atiumdag.dll crashes. I'll upload a new chart for just this signature.
Attached image 1 year trend
Attachment #8746133 - Attachment is obsolete: true
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #5)
> Created attachment 8746149 [details]
> 1 year trend

So there is indeed a spike recently and looking at the data closer it appears this started to spike last week, largely driven by reports against 45.0.2. I'm not sure what's driving the spike but that doesn't negate my ask to just block the driver.
Version: 46 Branch → 38 Branch
Matt?
Flags: needinfo?(milan) → needinfo?(matt.woodrow)
I did some digging around and could not find a specific event on our end that would correlate to the spike. However, I did find that Microsoft released a security update to their Graphics Component on April 12th. I wonder if this could be a factor?

https://support.microsoft.com/en-us/kb/3148522
It's not really clear what we'd need to blacklist from this crash report, so it would probably have to be everything to guarantee a change.

If most reports are showing that DXVA was disabled then it can't be that, but it could instead be our software decoding upload path.

My main worry with crashes like this (crash at binary + offset) is that the crash address is specific to this driver version, and the same crash exists in other driver versions too, just at a different offset. This one could just be on top based on how many users have that driver.

How many/what percentage of our AMD (and total?) users are we going to affect by blacklisting this driver?

Are there any other (maybe lower volume) crashes that seem similar, but are at a different address?
Flags: needinfo?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #9)
> How many/what percentage of our AMD (and total?) users are we going to
> affect by blacklisting this driver?

It looks like this would apply to up to ~0.3% of all users and ~1.8% of AMD users based on Telemetry and Crash Stats.

> Are there any other (maybe lower volume) crashes that seem similar, but are
> at a different address?

There are 7 other signatures in atiumdag.dll from users reporting driver 15.300.1025.1001 but these amount to only 2% of the overall crash volume with this driver. http://bit.ly/244gJph
Ok, sounds fine to go ahead with blacklisting all acceleration then.
This would block all features on this driver version for ATI.  Is this what we want?

<gfxBlacklistEntry>
  <os>WINNT 10.0</os>
  <vendor>0x1002</vendor>
  <featureStatus>BLOCKED_DRIVER_VERSION</featureStatus>
  <driverVersion>15.300.1025.1001</driverVersion>   
  <driverVersionComparator>EQUAL</driverVersionComparator>
</gfxBlacklistEntry>
Flags: needinfo?(jmuizelaar)
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #10)
> There are 7 other signatures in atiumdag.dll from users reporting driver
> 15.300.1025.1001 but these amount to only 2% of the overall crash volume
> with this driver. http://bit.ly/244gJph

What about on other drivers? i.e. do we have strong evidence that this problem is only in this version of the driver and not at a different signature in other drivers?
Flags: needinfo?(jmuizelaar) → needinfo?(anthony.s.hughes)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #13)
> What about on other drivers? i.e. do we have strong evidence that this
> problem is only in this version of the driver and not at a different
> signature in other drivers?

For atiumdag.dll@0x6d4a7 there is a strong but not exclusive correlation to 15.300.1025.1001:
> 94.12% report 15.300.1025.1001
>  2.89% report 15.300.1025.0
>  2.68% report 15.201.1151.1008
>  0.10% report 13.151.0.0
>  0.10% report 15.200.1062.1004
>  0.10% report 15.201.1301.0
No other driver versions show up.

For the other signatures the correlation is not as strong:
> 41.61% report 16.150.2211.0
> 37.97% report 15.201.1151.1008
> 15.51% report 16.150.2211.1001
> 2.69% report 15.300.1025.1001
> 1.11% report 16.150.2111.0
> 0.63% report 15.201.1151.0
> 0.32% report 15.300.1025.0
> 0.16% report 16.150.0.0
No other driver versions show up.

I don't think blocking 15.300.1025.1001 is going to solve this problem completely but I do think it's going to make the situation far better for the most part.
Flags: needinfo?(anthony.s.hughes)
Jorge, let's add this one "Block acceleration on AMD devices on Windows 10 with the driver version 15.300.1025.1001"


<gfxBlacklistEntry>
  <os>WINNT 10.0</os>
  <vendor>0x1002</vendor>
  <featureStatus>BLOCKED_DRIVER_VERSION</featureStatus>
  <driverVersion>15.300.1025.1001</driverVersion>   
  <driverVersionComparator>EQUAL</driverVersionComparator>
</gfxBlacklistEntry>
Flags: needinfo?(jorge)
Flags: needinfo?(jorge) → needinfo?(awilliamson)
Added 
https://addons.mozilla.org/en-US/admin/models/blocklist/blocklistdetail/1209/
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(awilliamson)
Resolution: --- → FIXED
Paul, can you give us any information on this crash, whether it exists in early driver versions, and if it's fixed?
This crash looks the same:
https://crash-stats.mozilla.com/report/index/cf781cbc-ffdd-4a61-9b13-d84bb2160429

but has the signature [@ atiumdag.dll@0x6d357 ]

I'm actually worried that blacklisting this driver is a bad idea. Can we revert this from list?
Status: RESOLVED → REOPENED
Flags: needinfo?(awilliamson)
Resolution: FIXED → ---
atiumdag.dll@0x6d357 is related to 15.301.1201.0 (100% of the 17 crashes: https://crash-stats.mozilla.com/search/?product=Firefox&signature=%3Datiumdag.dll%400x6d357&_facets=signature&_facets=adapter_driver_version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature), is it possible that this crash was introduced in 15.300.1025.1001 and is still there on 15.301.1201.0?
(In reply to Marco Castelluccio [:marco] from comment #19)
> atiumdag.dll@0x6d357 is related to 15.301.1201.0 (100% of the 17 crashes:
> https://crash-stats.mozilla.com/search/
> ?product=Firefox&signature=%3Datiumdag.
> dll%400x6d357&_facets=signature&_facets=adapter_driver_version&_columns=date&
> _columns=signature&_columns=product&_columns=version&_columns=build_id&_colum
> ns=platform#facet-signature), is it possible that this crash was introduced
> in 15.300.1025.1001 and is still there on 15.301.1201.0?

That seems possible. I've confirmed that atiumdag.dll@0x6d357 is very likely to be in the same function as atiumdag.dll@0x6d4a7
Crash Signature: [@ atiumdag.dll@0x6d4a7] → [@ atiumdag.dll@0x6d4a7] [@ atiumdag.dll@0x6d357]
Crash Signature: [@ atiumdag.dll@0x6d4a7] [@ atiumdag.dll@0x6d357] → [@ atiumdag.dll@0x6d4a7] [@ atiumdag.dll@0x6d357] [@ atiumdag.dll@0x6d3c7]
Block is deleted.  ni me when/if you know what the new block should be.
Flags: needinfo?(awilliamson)
For Crimson 16.1 / 15.301.1901.0 I tried to go the other way from likely crash address to signature and sure enough we see a single crash at: atiumdag.dll@0x6d437
Crash Signature: [@ atiumdag.dll@0x6d4a7] [@ atiumdag.dll@0x6d357] [@ atiumdag.dll@0x6d3c7] → [@ atiumdag.dll@0x6d4a7] [@ atiumdag.dll@0x6d357] [@ atiumdag.dll@0x6d3c7] [@ atiumdag.dll@0x6d437]
To show that this is happening anymore without a signature we'd need to see a drop of atiumdag crashes per driver version in proportion to that driver's deployment.
Is there a particular repro scenario for this bug that this issue shows up more often ?
The crash reports I had spot checked don't seem to have enough contextual information for me to understand if it's happening during e.g. video playback or WebGL render or anything else. Any good pointers to past crash signatures relevant here?
The issue has been found in the driver, next releases should contain the fix.
(In reply to Paul Blinzer from comment #25)
> The issue has been found in the driver, next releases should contain the fix.

Can you tell us which drivers are affected and if there's a work-around? Is it only triggered by DXVA?
Flags: needinfo?(paul.blinzer)
Not clearly, unfortunately. 
The bug seems to have been latent in the last release, it was a race condition in the resource management of video resources which got partially exposed due to some recent optimizations in more recent hotfix drivers. I don't know of a good user accessible workaround at the moment. The fix should be available in the next public driver release in any case so update to a TBA driver revision is the best approach here.
(In reply to Paul Blinzer from comment #27)
> Not clearly, unfortunately. 
> The bug seems to have been latent in the last release, it was a race
> condition in the resource management of video resources which got partially
> exposed due to some recent optimizations in more recent hotfix drivers. I
> don't know of a good user accessible workaround at the moment. The fix
> should be available in the next public driver release in any case so update
> to a TBA driver revision is the best approach here.

Do you know which drivers we should block?
the issue was introduced recently only. I need to still find out how recent, though.
Seems to have been introduced with the Crimson release only and it should be fixed with the recently released AMD hotfix drivers. pLease confirm.
Flags: needinfo?(paul.blinzer)
Paul, which driver versions would these translate into?
Flags: needinfo?(paul.blinzer)
Radeon Crimson edition, 15.12 release up until 16.5.1 hotfix. 16.5.2 onwards should have this fixed.
Flags: needinfo?(paul.blinzer)
Thanks for the info Paul.

What's the magic formula for translating something like 15.300.1025.1001 into something like "15.12" and "16.5.1"?
For all the crashes coming from TrimNotificationCallback, the most common drivers are:

36% - 15.300.1025.1001
28% - 15.201.1101.0, 15.200.1065.0, 15.201.1301.0
21% - 15.200.1055.0, 15.200.1060.0
8% - 16.150.2211.0

How do these match to the above "15.12 release up until 16.5.1 hotfix. 16.5.2 onwards should have this fixed"?
Flags: needinfo?(paul.blinzer)
Crash Signature: [@ atiumdag.dll@0x6d4a7] [@ atiumdag.dll@0x6d357] [@ atiumdag.dll@0x6d3c7] [@ atiumdag.dll@0x6d437] → [@ atiumdag.dll@0x6d4a7] [@ atiumdag.dll@0x6d357] [@ atiumdag.dll@0x6d3c7] [@ atiumdag.dll@0x6d437] [@ atiumdag.dll@0x6b563] [@ atiumdag.dll@0x6b627] [@ atiumdag.dll@0x6b857] [@ atiumdag.dll@0x6ce67]
Flags: needinfo?(milan)
Crash Signature: [@ atiumdag.dll@0x6d4a7] [@ atiumdag.dll@0x6d357] [@ atiumdag.dll@0x6d3c7] [@ atiumdag.dll@0x6d437] [@ atiumdag.dll@0x6b563] [@ atiumdag.dll@0x6b627] [@ atiumdag.dll@0x6b857] [@ atiumdag.dll@0x6ce67] → [@ atiumdag.dll@0x6d4a7] [@ atiumdag.dll@0x6d357] [@ atiumdag.dll@0x6d3c7] [@ atiumdag.dll@0x6d437] [@ atiumdag.dll@0x6b563] [@ atiumdag.dll@0x6b627] [@ atiumdag.dll@0x6b857] [@ atiumdag.dll@0x6ce67] [atiumdag.dll | atiu9pag.dll | TrimNotification…
Crash Signature: [@ atiumdag.dll@0x6d4a7] [@ atiumdag.dll@0x6d357] [@ atiumdag.dll@0x6d3c7] [@ atiumdag.dll@0x6d437] [@ atiumdag.dll@0x6b563] [@ atiumdag.dll@0x6b627] [@ atiumdag.dll@0x6b857] [@ atiumdag.dll@0x6ce67] [atiumdag.dll | atiu9pag.dll | TrimNotification… → [@ atiumdag.dll@0x6d4a7] [@ atiumdag.dll@0x6d357] [@ atiumdag.dll@0x6d3c7] [@ atiumdag.dll@0x6d437] [@ atiumdag.dll@0x6b563] [@ atiumdag.dll@0x6b627] [@ atiumdag.dll@0x6b857] [@ atiumdag.dll@0x6ce67] [@ atiumdag.dll | atiu9pag.dll | TrimNotificati…
(In reply to Milan Sreckovic [:milan] from comment #33)
> Thanks for the info Paul.
> 
> What's the magic formula for translating something like 15.300.1025.1001
> into something like "15.12" and "16.5.1"?
> For all the crashes coming from TrimNotificationCallback, the most common
> drivers are:
> 
> 36% - 15.300.1025.1001
> 28% - 15.201.1101.0, 15.200.1065.0, 15.201.1301.0
> 21% - 15.200.1055.0, 15.200.1060.0
> 8% - 16.150.2211.0
> 
> How do these match to the above "15.12 release up until 16.5.1 hotfix.
> 16.5.2 onwards should have this fixed"?

There is no magic formula unfortunately. The driver package and display driver file versions are different due to the way the packages and its components are created. 

Both versions are listed at the download page (see http://support.amd.com/en-us/download/desktop?os=Windows+10+-+64 as an example). The display driver version (which is what you see as part of the OS reported driver version is listed in the release notes of a driver package. Crimson Edition 16.6.1 for example translates to 16.15.2211 display drivers in the doc. It is still listed in your lineup though at a much lower fail rate; makes me wonder if this is truly still the same crash signature than the other ones, or if the fix has made it in there, need to verify which case it is.
Flags: needinfo?(paul.blinzer)
Right - it may be a different path into the driver, both coming from TrimNotificationCallback call - having that info would be useful.
Flags: needinfo?(milan)
Summary: crash in atiumdag.dll@0x6d4a7 → crash in atiumdag.dll@0x6d4a7 (atiumdag.dll | atiu9pag.dll | TrimNotificationCallback)
Assigning, since Jeff's been looking at this.

Jeff, what would we block here?  Windows 10, all versions < 16.30.* (that one should have a fix).

All devices?  All features, or just DXVA?

This one is all devices, all features.

<gfxBlacklistEntry>
  <os>WINNT 10.0</os>
  <vendor>0x1002</vendor>
  <featureStatus>BLOCKED_DRIVER_VERSION</featureStatus>
  <driverVersion>16.30.0.0</driverVersion>
  <driverVersionComparator>LESS_THAN</driverVersionComparator>
</gfxBlacklistEntry>
Assignee: nobody → jmuizelaar
Flags: needinfo?(jmuizelaar)
[Tracking Requested - why for this release]: Top crash, the fix should be safe.
This is DXVA only. It looks like the best option is to disable hardware video decoding on modern AMD hardware (CARRIZO, VI, SI, CIK) on <= 16.15.2211 and > what ever number corresponds to 15.12
Component: Graphics → Audio/Video: Playback
Flags: needinfo?(jmuizelaar)
Here's the list of affected device ids:
0x6920
0x6921
0x6928
0x6929
0x692b
0x692f
0x6930
0x6938
0x6939
0x6900
0x6901
0x6902
0x6903
0x6907
0x98e4
0x67c0
0x67c1
0x67c2
0x67c4
0x67c7
0x67c8
0x67c9
0x67ca
0x67cc
0x67cf
0x67df
0x67e0
0x67e1
0x67e3
0x67e7
0x67e8
0x67e9
0x67eb
0x67ef
0x67ff
0x7300
0x1304
0x1305
0x1306
0x1307
0x1309
0x130a
0x130b
0x130c
0x130d
0x130e
0x130f
0x1310
0x1311
0x1312
0x1313
0x1315
0x1316
0x1317
0x1318
0x131b
0x131c
0x131d
0x6640
0x6641
0x6646
0x6647
0x6649
0x6650
0x6651
0x6658
0x665c
0x665d
0x665f
0x67a0
0x67a1
0x67a2
0x67a8
0x67a9
0x67aa
0x67b0
0x67b1
0x67b8
0x67b9
0x67ba
0x67be
0x9830
0x9831
0x9832
0x9833
0x9834
0x9835
0x9836
0x9837
0x9838
0x9839
0x983a
0x983b
0x983c
0x983d
0x983e
0x983f
0x9850
0x9851
0x9852
0x9853
0x9854
0x9855
0x9856
0x9857
0x9858
0x9859
0x985a
0x985b
0x985c
0x985d
0x985e
0x985f
0x6800
0x6801
0x6802
0x6806
0x6808
0x6809
0x6810
0x6811
0x6816
0x6817
0x6818
0x6819
0x684c
0x6600
0x6601
0x6602
0x6603
0x6604
0x6605
0x6606
0x6607
0x6608
0x6610
0x6611
0x6613
0x6617
0x6620
0x6621
0x6623
0x6631
0x6820
0x6821
0x6822
0x6823
0x6824
0x6825
0x6826
0x6827
0x6828
0x6829
0x682a
0x682b
0x682c
0x682d
0x682f
0x6830
0x6831
0x6835
0x6837
0x6838
0x6839
0x683b
0x683d
0x683f
0x6660
0x6663
0x6664
0x6665
0x6667
0x666f
0x6780
0x6784
0x6788
0x678a
0x6790
0x6791
0x6792
0x6798
0x6799
0x679a
0x679b
0x679e
0x679f
0x9870
0x9874
0x9875
0x9876
0x9877
Jeff, et al, can you hold off this change and consider a different workaround approach ? 

After looking at this and other reported issues in the multimedia path on a variety of vendor drivers (AMD and non-AMD), there seems to be a a class of these issue and from that a better way of working around it for the affected drivers than falling back to SW render here. 
"Recent" Firefox code has apparently increased exploiting thread concurrency in the D3D & DXVA render paths more aggressively, pushing the DX API, e.g. for resource allocation and render threads within process contexts. Introducing a less aggressive approach here would address this and other driver's stress test issues.
(In reply to Paul Blinzer from comment #40)
> Jeff, et al, can you hold off this change and consider a different
> workaround approach ? 
> 
> After looking at this and other reported issues in the multimedia path on a
> variety of vendor drivers (AMD and non-AMD), there seems to be a a class of
> these issue and from that a better way of working around it for the affected
> drivers than falling back to SW render here. 
> "Recent" Firefox code has apparently increased exploiting thread concurrency
> in the D3D & DXVA render paths more aggressively, pushing the DX API, e.g.
> for resource allocation and render threads within process contexts.
> Introducing a less aggressive approach here would address this and other
> driver's stress test issues.

Yes. I'd much prefer a workaround. Do you think we can arrange a meeting between the appropriate AMD and Mozilla people to come up with some ideas of how to make this work?
FYI, it looks like this is rising to topcrash territory. It is currently #16 in Firefox 47, up 109 positions from last week with 3297 crashes reported.

There are 38 different driver versions ranging from 15.200.1045.0 to 16.200.1025.0, 15.300.1025.1001 is #1 with 34%. Similarly there are 45 device correlations mostly in the Kabini (60.14%) and Mullins (28.77%) family.
Keywords: topcrash
Depends on: 1284672
Track this as this crash is top crash.
Both the Crimson 16.6.1 and the just released Crimson 16.7.1 drivers (http://support.amd.com/en-us/download/desktop?os=Windows+10+-+64) should have a fix for the issue here. Can you please identify if the crash pattern is seen on any of these driver versions?
Flags: needinfo?(gchang)
(In reply to Paul Blinzer from comment #44)
> Both the Crimson 16.6.1 and the just released Crimson 16.7.1 drivers
> (http://support.amd.com/en-us/download/desktop?os=Windows+10+-+64) should
> have a fix for the issue here. Can you please identify if the crash pattern
> is seen on any of these driver versions?

Do you know the 4 number style (eg. 6.150.2401.1001) number for those versions?
Flags: needinfo?(paul.blinzer)
The latest version that we've seen any crashes of this kind on is: 16.200.1010.1002 which seems to correspond to 16.5.2.1. However we've not seen any crashes from anyone on 16.200.1035.1001 (16.7.1) which suggests that it is not very widely deployed yet.
The download page still shows 16.15.2211.0, which is also 16.6.1 (comment 34), and there is certainly a number of those crashes (e.g., https://crash-stats.mozilla.com/report/index/7f37cdc0-eb94-4c37-952d-5ff602160703)
16.6.1 looks to me like 16.200.1025.0000 which does seem to have this style of crash which contradicts my earlier statement:
https://crash-stats.mozilla.com/report/index/346a7323-cad4-419d-85ea-368742160707
Hi Marco,
can you help to answer comment #44?
Flags: needinfo?(gchang) → needinfo?(mcastelluccio)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #44)

Do you know the 4 number style (eg. 6.150.2401.1001) number for those versions?

16.7.1 is Driver Version 16.20.1035, the four-ID driver version (vs the Crimson download package version) can be always found in the release notes, e.g.:
http://support.amd.com/en-us/kb-articles/Pages/Radeon-Software-Crimson-Edition-16.7.1-Release-Notes.aspx

It looks that the links to the release notes on the http://support.amd.com site are not correctly refreshed. Driver package link is correct but some of the supporting links and notes need updates for both 16.6.1 and 16.7.1...
16.200.1025.0 and 16.200.1035.0 are ~1.5% of the crashes with the 'atiumdag.dll | atiu9pag.dll | TrimNotificationCallback' signature.
In crash reports in general where the user has an AMD graphics card, the two version are in ~1.5% of the reports.
Flags: needinfo?(mcastelluccio)
Issue root cause has now been found and fixed in an upcoming driver
Flags: needinfo?(paul.blinzer)
Hi :jrmuizel,
Per comment #52,is there anything we need to do here to fix the issue?
Flags: needinfo?(jmuizelaar)
(In reply to Gerry Chang [:gchang] from comment #53)
> Hi :jrmuizel,
> Per comment #52,is there anything we need to do here to fix the issue?

Sorry, didn't get to update the bugzilla ticket yet. The Crimson 16.8.1 (released 8/7/16) and 16.8.2 (released 8/11/16) driver packages should contain the fix for the issue. 
The root cause was straightforward to address but please verify on your side that the issue is now addressed with this driver and close the ticket. 

http://support.amd.com/en-us/download/desktop?os=Windows+10+-+64
The latest version in the crash reports is 16.300.2311.0, so it looks like 16.30.2511 and 16.30.2511.1001 are unaffected.

I also see 20.19.0.32832, I don't know which version it maps to.

Should we block the driver versions which are still affected?
Flags: needinfo?(milan)
Seems reasonable - we would be blocking DXVA, and there are special ways of doing that, so I will let Anthony et al. sort it out.
Flags: needinfo?(milan) → needinfo?(ajones)
20.19.0.32832 doesn't map to any known drivers from AMD. Curious where that came from. Any more system info here?
(In reply to Paul Blinzer from comment #57)
> 20.19.0.32832 doesn't map to any known drivers from AMD. Curious where that
> came from. Any more system info here?

There are just three crash reports with this version:
AdapterVendorID: 0x1002, AdapterDeviceID: 0x6658, AdapterSubsysID: 227b1458
AdapterVendorID: 0x1002, AdapterDeviceID: 0x6658, AdapterSubsysID: 22741458
AdapterVendorID: 0x1002, AdapterDeviceID: 0x6810, AdapterSubsysID: 30301462
Gerald - can you blacklist DXVA on these?
Flags: needinfo?(ajones) → needinfo?(gsquelart)
On it
Assignee: jmuizelaar → gsquelart
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(gsquelart)
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #59)
> Gerald - can you blacklist DXVA on these?
Anthony, Gerald, please do not blacklist the 20.x version numbers. These reports apparently came from an unreleased,AMD-internal set of drivers, the issue has been fixed in these since and when they are released to the public the issue will be gone.
Thank you Paul, I've removed the blocking of 20.x from the patch.
Comment on attachment 8787020 [details]
Bug 1267970 - Block hw decoding on ati up to 16.300.2311.0 -

https://reviewboard.mozilla.org/r/75862/#review74884
Attachment #8787020 - Flags: review?(ajones) → review+
Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1eb286442c40
Block hw decoding on ati up to 16.300.2311.0 - r=kentuckyfriedtakahe
https://hg.mozilla.org/mozilla-central/rev/1eb286442c40
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
It seems late for beta. We could still take a patch for 50/ aurora. If you think it is important for 49 please let me know since we go to build tomorrow for RC2.
Comment on attachment 8787020 [details]
Bug 1267970 - Block hw decoding on ati up to 16.300.2311.0 -

Approval Request Comment
[Feature/regressing bug #]: Video playback
[User impact if declined]: Crashes for Windows users with older ATI drivers, ~900 crashes per week
[Describe test coverage new/current, TreeHerder]: Existing media tests
[Risks and why]: Very low risk, because:
- It's "just" another blacklisting, similar to many others before, so there's no really new code.
- Later (non-blacklisted) drivers are available, so affected users should be able to upgrade.
- We're only preventing hardware video decoding with the older drivers, so decoding will fall back to software, or another codec (e.g.: webm on YouTube).
[String/UUID change made/needed]: None.


(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #70)
> It seems late for beta. We could still take a patch for 50/ aurora. If you
> think it is important for 49 please let me know since we go to build
> tomorrow for RC2.

Anthony thinks that "we should request uplift [to 49 beta] on the basis that it actually has a low risk of breaking things". I agree, and it should help prevent hundreds of crashes per day.
Attachment #8787020 - Flags: approval-mozilla-beta?
Attachment #8787020 - Flags: approval-mozilla-aurora?
Comment on attachment 8787020 [details]
Bug 1267970 - Block hw decoding on ati up to 16.300.2311.0 -

Crash fix for fairly high volume video playback crash, let's land this for RC2 today.
Attachment #8787020 - Flags: approval-mozilla-release+
Attachment #8787020 - Flags: approval-mozilla-beta?
Attachment #8787020 - Flags: approval-mozilla-beta+
Attachment #8787020 - Flags: approval-mozilla-aurora?
Attachment #8787020 - Flags: approval-mozilla-aurora+
This doesn't apply cleanly to aurora. Can't imagine beta/release would fare much better.
Flags: needinfo?(lhenry)
Flags: needinfo?(gsquelart)
Rebase for Aurora.
Flags: needinfo?(gsquelart)
Comment on attachment 8789061 [details] [diff] [review]
1267970-aurora.patch

Carrying r+
Attachment #8789061 - Flags: review+
Rebase for beta. Carrying r+.
Attachment #8789064 - Flags: review+
(In reply to Wes Kocher (:KWierso) from comment #77)
> https://hg.mozilla.org/releases/mozilla-beta/rev/4fc4a3563ab5
> https://hg.mozilla.org/releases/mozilla-release/rev/4fc4a3563ab5

This was the aurora patch, which apparently is different from the beta patch that got attached after I pushed the aurora patch everywhere.

Backed out the aurora patch and landed the beta patch:
remote:   https://hg.mozilla.org/releases/mozilla-beta/rev/cada5345bf14
remote:   https://hg.mozilla.org/releases/mozilla-beta/rev/1099b4f39b00

And merged both of those to release:
remote:   https://hg.mozilla.org/releases/mozilla-release/rev/cada5345bf14
remote:   https://hg.mozilla.org/releases/mozilla-release/rev/1099b4f39b00
Please let me position against this block.

There are many people who can't update past certain driver version, as AMD/ATI moves boards to "Legacy" support and stop providing updates.

The latest driver we can install is 15.301.1901:
http://support.amd.com/en-us/download/desktop/legacy?product=legacy3&os=Windows+8.1+-+64
http://support.amd.com/en-us/kb-articles/Pages/AMD-Radeon-Software-Crimson-Edition-16.2.1-for-Non-GCN-Products-Release-Notes.aspx

I've never had a crash for video. Maybe these crashes happen only on specific broken sites or system circumstances. Blocking HW video decoding for everyone running these drivers may affect much more people than those having specific crashes.

Also, we know, AMD is used to support its GPUs for much less time than nVidia, leaving users with outdated drivers, but not really broken ones.
On(In reply to Ricardo from comment #80)
> Please let me position against this block.
> 
> There are many people who can't update past certain driver version, as
> AMD/ATI moves boards to "Legacy" support and stop providing updates.
> 
> The latest driver we can install is 15.301.1901:
> http://support.amd.com/en-us/download/desktop/
> legacy?product=legacy3&os=Windows+8.1+-+64
> http://support.amd.com/en-us/kb-articles/Pages/AMD-Radeon-Software-Crimson-
> Edition-16.2.1-for-Non-GCN-Products-Release-Notes.aspx
> 
> I've never had a crash for video. Maybe these crashes happen only on
> specific broken sites or system circumstances. Blocking HW video decoding
> for everyone running these drivers may affect much more people than those
> having specific crashes.
> 
> Also, we know, AMD is used to support its GPUs for much less time than
> nVidia, leaving users with outdated drivers, but not really broken ones.

Looking at the patch, it seems that it is more broad than I would expect. To be clear, the crash signature can only occur on WIndows10 OS, e.g. drivers that are working as WDDM2 drivers for Win10 interfaces vs WDDM1.x driver model drivers used in Windows 8.1 and Windows7 and which is the interface for non-GCN GPUs that are now under legacy, but which run perfectly fine on Windows10 still. 
So Ricardo wouldn't likely have experienced issues here for that reason if he ran this on pre-Win10 OS and or non Win10-targeted drivers for non-GCN devices. So Ricardo has a point
Flags: needinfo?(jmuizelaar)
Sorry, I should have noticed that it was only win10.
I've opened bug 1301615 to restrict the blocklist. Please comment there if you see other issues (e.g., should the blocked range be smaller?)
Flags: needinfo?(jmuizelaar)
Depends on: 1305107
Depends on: 1305296
No longer depends on: 1305296
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: