Bug 1713230 Comment 0 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

`IOAccelContext2::setContextError(unsigned int error)` is a method defined in the `IOAcceleratorFamily2` kernel-mode graphics driver. It's called, on an error condition, to set the "context error" in an `IOAccelContext2` context -- usually from one of the hardware-specific kernel-mode graphics drivers like `AppleIntelHD5000Graphics` or `AMDRadeonX4000`. When this happens, this error number becomes the "graphics kernel error" in the `mac_crash_info` data written by `gpusGenerateCrashLog.cold.1()`.

Normally these error numbers are simple negative integers -- for example `0xfffffff9/-7` or `0xfffffffc/-4`. But as of macOS 11.4, context errors set (by calls to `setContextError()`) from AMDRadeon hardware-specific kernel-mode graphics drivers have a different, much more elaborate format -- for example `0x1be385f9` or `0x067900fc`.

I believe this is part of a special effort on Apple's part to get to the bottom of these errors on AMDRadeon hardware. macOS 11.1 and 11.3 are supposed to have included fixes for these problems. But if anything they've grown worse. (See bug 1576767 comment #347 and bug 1576767 comment #348.) And it appears that Apple has doubled down in their efforts to resolve them. Though they haven't (so far as I know) done this publicly. (Which probably shows that many more apps than just Firefox and Thunderbird are effected.)

Each of these error numbers has three "fields": "nnnn:nn:nn". I don't (yet) understand the first and the third. But I'm pretty sure the second indicates the kind of "token" being processed when the failure occurred. More on this in a later comment. But even without understanding the error numbers' format, you can see (in a good disassembler) that each one is only ever used once. So you can tell from the error number exactly where the error happened -- exactly which call to `setContextError()` "set" it.

Here are some examples from the last few days:

bp-3539c604-8378-4d6d-adcd-f89aa0210526
bp-c5702982-e478-4dad-97fe-4150e0210527
bp-cfb6c9e9-c86b-4449-8bdb-aa9490210527

        {
          "num_records": 2,
          "records": [
            {
              "message": "abort() called",
              "module": "/usr/lib/system/libsystem_c.dylib"
            },
            {
              "module": "/System/Library/PrivateFrameworks/GPUSupport.framework/Versions/A/Libraries/libGPUSupportMercury.dylib",
              "signature_string": "Graphics kernel error: 0x1be385f9\n"
            }
          ]
        }

bp-a6438d18-4304-49d4-990a-4fb2f0210526

        {
          "num_records": 2,
          "records": [
            {
              "message": "abort() called",
              "module": "/usr/lib/system/libsystem_c.dylib"
            },
            {
              "module": "/System/Library/PrivateFrameworks/GPUSupport.framework/Versions/A/Libraries/libGPUSupportMercury.dylib",
              "signature_string": "Graphics kernel error: 0x067900fc\n"
            }
          ]
        }
`IOAccelContext2::setContextError(unsigned int error)` is a method defined in the `IOAcceleratorFamily2` kernel-mode graphics driver. It's called, on an error condition, to set the "context error" in an `IOAccelContext2` context -- usually from one of the hardware-specific kernel-mode graphics drivers like `AppleIntelHD5000Graphics` or `AMDRadeonX4000`. When this happens, this error number becomes the "graphics kernel error" in the `mac_crash_info` data written by `gpusGenerateCrashLog.cold.1()`.

Normally these error numbers are simple negative integers -- for example `0xfffffff9/-7` or `0xfffffffc/-4`. But as of macOS 11.4, context errors set (by calls to `setContextError()`) from AMDRadeon hardware-specific kernel-mode graphics drivers can have a different, much more elaborate format -- for example `0x1be385f9` or `0x067900fc`.

I believe this is part of a special effort on Apple's part to get to the bottom of these errors on AMDRadeon hardware. macOS 11.1 and 11.3 are supposed to have included fixes for these problems. But if anything they've grown worse. (See bug 1576767 comment #347 and bug 1576767 comment #348.) And it appears that Apple has doubled down in their efforts to resolve them. Though they haven't (so far as I know) done this publicly. (Which probably shows that many more apps than just Firefox and Thunderbird are effected.)

Each of these error numbers has three "fields": "nnnn:nn:nn". I don't (yet) understand the first and the third. But I'm pretty sure the second indicates the kind of "token" being processed when the failure occurred. More on this in a later comment. But even without understanding the error numbers' format, you can see (in a good disassembler) that each one is only ever used once. So you can tell from the error number exactly where the error happened -- exactly which call to `setContextError()` "set" it.

Here are some examples from the last few days:

bp-3539c604-8378-4d6d-adcd-f89aa0210526
bp-c5702982-e478-4dad-97fe-4150e0210527
bp-cfb6c9e9-c86b-4449-8bdb-aa9490210527

        {
          "num_records": 2,
          "records": [
            {
              "message": "abort() called",
              "module": "/usr/lib/system/libsystem_c.dylib"
            },
            {
              "module": "/System/Library/PrivateFrameworks/GPUSupport.framework/Versions/A/Libraries/libGPUSupportMercury.dylib",
              "signature_string": "Graphics kernel error: 0x1be385f9\n"
            }
          ]
        }

bp-a6438d18-4304-49d4-990a-4fb2f0210526

        {
          "num_records": 2,
          "records": [
            {
              "message": "abort() called",
              "module": "/usr/lib/system/libsystem_c.dylib"
            },
            {
              "module": "/System/Library/PrivateFrameworks/GPUSupport.framework/Versions/A/Libraries/libGPUSupportMercury.dylib",
              "signature_string": "Graphics kernel error: 0x067900fc\n"
            }
          ]
        }

Back to Bug 1713230 Comment 0