Closed Bug 1328838 Opened 8 years ago Closed 9 months ago

Crashes due to AMD OpenCL when working with pictures

Categories

(External Software Affecting Firefox :: Other, defect, P3)

All
Windows

Tracking

(firefox-esr45 wontfix, relnote-firefox 122+, firefox50 wontfix, firefox51 wontfix, firefox52 wontfix, firefox-esr52 wontfix, firefox-esr91 wontfix, firefox53 wontfix, firefox-esr115 wontfix, firefox56 wontfix, firefox57 wontfix, firefox58 wontfix, firefox120 wontfix, firefox121 wontfix, firefox122 fixed)

RESOLVED FIXED
122 Branch
Tracking Status
firefox-esr45 --- wontfix
relnote-firefox --- 122+
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr52 --- wontfix
firefox-esr91 --- wontfix
firefox53 --- wontfix
firefox-esr115 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox120 --- wontfix
firefox121 --- wontfix
firefox122 --- fixed

People

(Reporter: philipp, Assigned: gstoll)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-7db0e6f7-ead4-43e1-a2f7-127932161128.
=============================================================

there is a number of crash reports with signatures containing references to the amdocl*.dll module - they are spread out among various signatures (attaching the most common ones to the bug report).
these crashes mainly affect windows 10, windows 7 and windows 8.1 and have been going on for a while - they are happening on 32bit and 64bit builds.

many comments of affected users indicate that they where somehow interacting with pictures (downloading, uploading, copying...): http://bit.ly/2hVuiUS

some of those reports like https://crash-stats.mozilla.com/report/index/b6d53bcd-66a9-4017-a57e-1c4f52170102 have the amd related module amf-wic-jpeg-decoder64.dll in the stack, so it might also be related...

The ids of the most affected amd graphics adapters are:
1 	0x9874 	536 	28.66 %
2 	0x6611 	191 	10.21 %
3 	0x9851 	159 	8.50 %
4 	0x675f 	107 	5.72 %
5 	0x9830 	75 	4.01 %
The drivers where this issue occurs are long outdated, any driver released since mid-last year should address the problem. The root cause is in the amf-wic-jpeg-decoder*.dll. Recommended update is to the Crimson ReLive Edition 16.12.x or newer.
Is this something we can work around? (other than suggesting users to update)
Too late for firefox 52, mass-wontfix.
blocklist candidate. every version of this has issues, we should just eject all of them.
Priority: -- → P3
Crash Signature: amdocl.dll@0x60964] [@ amdocl64.dll@0x2a5b90] [@ amdocl12cl64.dll@0xed66] → amdocl.dll@0x60964] [@ amdocl64.dll@0x2a5b90] [@ amdocl12cl64.dll@0xed66] [@ amdocl64.dll@0x43d56] [@ amdocl64.dll@0x478c6] [@ amdocl64.dll@0x478d6]
(In reply to kyle.plumadore from bug 1403741 comment 4)
> This looks like a corrupt OpenCL installation on the system. This hardware
> doesn't support the amd vp9 decoder, but the amdopencl driver will still be
> loaded to query the hardware name. It appears as if this basic opencl query
> is crashing in the amd opencl driver, likely due to some mismatch of opencl
> components.
> 
> My guess would be that this happened due to the user doing something
> "abnormal" e.g. partly installing amd drivers, or installing an old opencl
> driver on top of a newer driver. We've seen similar crashes when users
> manually install the AMD APP development sdk, which comes with it's own
> version of amdocl64.dll. (We actually shipped a workaround for this case in
> a more recent driver version)
> 
> I don't suppose we have access to a failing system to get more info? I would
> be interested to know whether doing something as simple as opening a command
> prompt and typing "clinfo" would also crash. It would also be interesting to
> know the full path of the amdopencl64.dll module that's loaded into the
> firefox process.

(In reply to Jim Mathies [:jimm] from comment #4)
> blocklist candidate. every version of this has issues, we should just eject
> all of them.

I'm afraid if we blocked these DLLs regardless of version we could prevent VP9 decoding acceleration from working for some users.
Low crash volume, wontfix for 57.
Very low volume crash. Wontfix for 58.
Crash Signature: amdocl.dll@0x60964] [@ amdocl64.dll@0x2a5b90] [@ amdocl12cl64.dll@0xed66] [@ amdocl64.dll@0x43d56] [@ amdocl64.dll@0x478c6] [@ amdocl64.dll@0x478d6] → amdocl.dll@0x60964] [@ amdocl64.dll@0x2a5b90] [@ amdocl12cl64.dll@0xed66] [@ amdocl64.dll@0x43d56] [@ amdocl64.dll@0x478c6] [@ amdocl64.dll@0x478d6] [@ RtlpWakeByAddress | RtlpUnWaitCriticalSection | RtlLeaveCriticalSection | amf-wic-jpeg-decoder64…
Crash Signature: amf-wic-jpeg-decoder64.dll@0x88d3] → amf-wic-jpeg-decoder64.dll@0x88d3] [@ RtlpWakeByAddress | RtlLeaveCriticalSection | amf-wic-jpeg-decoder64.dll@0x88d3]
Crash Signature: amf-wic-jpeg-decoder64.dll@0x88d3] [@ RtlpWakeByAddress | RtlLeaveCriticalSection | amf-wic-jpeg-decoder64.dll@0x88d3] → amf-wic-jpeg-decoder64.dll@0x88d3] [@ RtlpWakeByAddress | RtlLeaveCriticalSection | amf-wic-jpeg-decoder64.dll@0x88d3] [@ RtlpWakeByAddress | RtlLeaveCriticalSection | amf-wic-jpeg-decoder64.dll | VirtualQuery] [@ RtlpWaitOnCriticalSection | RtlpEnter…
Crash Signature: RtlpUnWaitCriticalSection | RtlLeaveCriticalSection | amf-wic-jpeg-decoder64.dll | VirtualQuery] → RtlpUnWaitCriticalSection | RtlLeaveCriticalSection | amf-wic-jpeg-decoder64.dll | VirtualQuery] [@ RtlpWakeByAddress | RtlLeaveCriticalSection | amf-wic-jpeg-decoder64.dll | ShouldBlockThread ]

Cleaning up the signatures as many report no crashes anymore. This is still happening and it's still strongly correlated with AMD graphics hardware. I couldn't find crashes with different graphics vendors under these signatures.

Crash Signature: [@ amdocl.dll@0x23507b] [@ amdocl12cl.dll@0xd7f2] [@ amdocl12cl.dll@0xe31e] [@ amdocl64.dll@0x2a7230] [@ amdocl.dll@0x23563b] [@ amdocl.dll@0x234d0b] [@ amdocl12cl.dll@0xe34e] [@ amdocl.dll@0xbd7e] [@ amdocl.dll@0x18d794] [@ amdocl.dll@0x23506b] … → [@ RtlpWakeByAddress | RtlLeaveCriticalSection | amf-wic-jpeg-decoder64.dll | VirtualQuery] [@ RtlpWakeByAddress | RtlLeaveCriticalSection | amf-wic-jpeg-decoder64.dll | BaseThreadInitThunk] [@ RtlpWakeByAddress | RtlLeaveCriticalSection | amf-wic-jpeg-…

The correlation was a little too strong: this isn't an issue with the graphics driver, it's a CPU bug. The vast majority of the crashes comes from AMD Family 22 processors (i.e. Jaguar and friends) which have been a never-ending source of CPU-related issues. The rest of the crashes are coming from AMD Family 21 processors (Bulldozer derivatives) and those have also popped up from time to time. Long story short: don't try to fix this bug, it's not our fault.

(In reply to Gabriele Svelto [:gsvelto] from comment #10)

The correlation was a little too strong: this isn't an issue with the graphics driver, it's a CPU bug. The vast majority of the crashes comes from AMD Family 22 processors (i.e. Jaguar and friends) which have been a never-ending source of CPU-related issues. The rest of the crashes are coming from AMD Family 21 processors (Bulldozer derivatives) and those have also popped up from time to time. Long story short: don't try to fix this bug, it's not our fault.

With this pattern, it is likely a GPU driver related problem. The affected systems mentioned typically are matched with integrated or discrete AMD GPUs that are supported by the (now legacy for years) driver with the aforementioned OpenCL image processing. The last released driver for these products has that removed and is the recommended approach to address the issue as mentioned earlier in the bug list. I suspect the reports are from systems that have not updated to the last supported driver for these legacy products. I'd expect if affected systems are upgraded then the issue is addressed.

I thought about that given the modules on the stacks but there's two things that don't match the idea of a driver bug:

  • We have crashes with very recent driver versions such as this one which is for version 21.5.2 released in 2021
  • We don't have crashes from users with a combination of newer processors and older graphics cards

If it were purely a driver issue we'd see the crashes clump to a subset of driver versions and not include recent ones. Additionally we'd see more combinations of hardware being affected, not just two specific CPU families.

Severity: critical → S2

This crash still has a decent volume, and the user comments on the crashes are pretty angry...

:gsvelto, do you think if we blocked this DLL just for AMD Family 21 or 22 this would fix these crashes? Or would it just crash somewhere else? (note that I don't know how possible this is to do, just want to explore some options)

Note for the future - this would also be a candidate for user notification of crashes and directing them to a SUMO page, although unless we have a workaround this won't be very helpful...

Flags: needinfo?(gsvelto)

Many comments mention that this crash happens when attaching a file... they don't say it explicitly but I suppose that this means that they happen within the file picker. That would explain why we crash on a thread that doesn't appear to be ours, maybe the Windows shell is spawning a thread on our behalf to create a preview of the picture or something like that and invoking this DLL in the process? If that's the case then I suppose that blocking it shouldn't affect users, but we cannot be certain unless we try.

Flags: needinfo?(gsvelto)

This comment suggest that we can probably safely block the DLL and users will get a black preview - which we don't care about. The solution is to upgrade the driver so yeah, this would be a great candidate for a "smart" crash reporter to inform the user about the fix. Anyway, I agree we should block it (both the 64- and 32-bit versions).

Windows DLLBlocklist request form

  1. How were we aware of the problem?
    Crash reports

  2. What is a suspicious product causing the problem?
    AMD JPEG decoder

  3. Is the product downloadable? If so, do we have a local repro?
    It requires a few specific AMD CPUs, so I haven't tried reproing

  4. Which OS versions does the problem occur on?
    Happens on Windows 10.

  5. Which process types does the problem occur on?
    Just in the parent process. Although with the out-of-process file picker turned on, we may need to come back and block this in that process as well.

  6. What is the maximum version of the module in the crash reports?
    1.1.0.0 for the 64-bit version; however based on AMD comments we should block all versions of these.

  7. Is the issue fixed by a newer version of the product?
    No

  8. Do we have data about the module in the third-party-module ping?
    We don't seem to.

  9. Do we know how the module is loaded?
    No.

  10. Describe your conclusion.
    We should block all versions of these two DLLs in the parent process.

Assignee: nobody → gstoll
Status: NEW → ASSIGNED

An idle suggestion: in lieu of a "smart" crash reporter, might we want to add a release note telling people to update their driver if they see these blank image previews?

Yes, that's a great idea, thanks!

Release Note Request (optional, but appreciated)
[Why is this notable]: It may cause image previews in the file dialog to show up black.
[Affects Firefox for Android]: No
[Suggested wording]: "Some machines with older AMD CPUs may see image thumbnails incorrectly render as all black in file dialogs. If this is the case, updating the graphics driver should address this issue."
[Links (documentation, blog post, etc)]:

relnote-firefox: --- → ?
Pushed by gstoll@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/037b4a8e4edc
block old AMD JPEG decoder DLLs r=gsvelto

Backed out for causing build bustages in WindowsDllBlocklistA11yDefs.h.stub

Flags: needinfo?(gstoll)

sorry, typo :-/

Flags: needinfo?(gstoll)
Pushed by gstoll@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f5b9bd13761f
block old AMD JPEG decoder DLLs r=gsvelto
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → 122 Branch

Thanks, added the relnote as a known issue to the Fx122 nightly release notes, please allow 30 minutes for the site to update.
Keeping the relnote-firefox flag as ? to keep it on the radar for inclusion in the final Fx122 release notes

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: