Closed Bug 1664843 Opened 4 years ago Closed 4 years ago

Crash in [@ abort | glrAppleSyncState] (WebGL crashes on 10.12/10.13 on Intel HD 3000, affects Google Maps)

Categories

(Core :: Graphics, defect)

80 Branch
Desktop
macOS
defect

Tracking

()

VERIFIED FIXED
83 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox80 --- wontfix
firefox81 + verified
firefox82 --- verified
firefox83 --- verified

People

(Reporter: philipp, Assigned: jrmuizel)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/ea8d92b8-3f3e-4e0e-8188-4a6840200914

Top 10 frames of crashing thread:

0 libsystem_kernel.dylib mach_msg_trap 
1 XUL google_breakpad::ReceivePort::WaitForMessage toolkit/crashreporter/google-breakpad/src/common/mac/MachIPC.mm:249
2 XUL google_breakpad::CrashGenerationClient::RequestDumpForException toolkit/crashreporter/breakpad-client/mac/crash_generation/crash_generation_client.cc:70
3 XUL google_breakpad::ExceptionHandler::WriteMinidumpWithException toolkit/crashreporter/breakpad-client/mac/handler/exception_handler.cc:403
4 XUL google_breakpad::ExceptionHandler::SignalHandler toolkit/crashreporter/breakpad-client/mac/handler/exception_handler.cc:650
5 libsystem_platform.dylib _sigtramp 
6  @0x7ffee7400f3f 
7  @0x7ffee7400a67 
8 libsystem_c.dylib abort 
9 AppleIntelHD3000GraphicsGLDriver glrAppleSyncState 

these crash reports are starting to show up from nightly and devedition users on osx 10.12 and 10.13 since firefox 80 - they are not visible in beta or release builds at this time.

url correlations and comments indicate that these crashes are all happening on google maps:

Interestingly it seems the first few frames - with the exception handler and crash reporter machinery - aren't stripped from the crash. They should be so I'll file a bug for that.

actually this is happening on release as well with multiple different signatures and seems to be a reproducible and fairly obnoxious crash for affected user judging by the comments in reports and threads like https://support.mozilla.org/en-US/questions/1304314

Crash Signature: [@ abort | glrAppleSyncState] → [@ abort | glrAppleSyncState] [@ BaseAllocator::malloc] [@ mach_msg_trap] [@ libsystem_kernel.dylib@0x1320a] [@ libsystem_kernel.dylib@0x131fa] [@ libsystem_kernel.dylib@0x1234a]

80.0a1 build 20200705214003 seems to have been the first nightly build affected by the problem. this would have been the patches landing the day before: https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2020-07-04&tochange=6087e976924f95018479c6f5881878c95b8bd8e2

those 4 bugs look related to macos and/or gfx in particular: https://bugzilla.mozilla.org/buglist.cgi?bug_id=1649490,1647565,1647186,1324591

Crash Signature: [@ abort | glrAppleSyncState] [@ BaseAllocator::malloc] [@ mach_msg_trap] [@ libsystem_kernel.dylib@0x1320a] [@ libsystem_kernel.dylib@0x131fa] [@ libsystem_kernel.dylib@0x1234a] → [@ abort | glrAppleSyncState] [@ BaseAllocator::malloc] [@ mach_msg_trap] [@ libsystem_kernel.dylib@0x1320a] [@ libsystem_kernel.dylib@0x131fa] [@ libsystem_kernel.dylib@0x1234a] [@ malloc] [@ _sigtramp] [@ libsystem_platform.dylib@0x1f59] [@ abo…

Unless in addition to loading google maps all of these users just happened to be changing the system pref for overlay scrollbars at the same time I don't think bug 1647565 is responsible. If it's one of those 4 bugs I would guess bug 1649490.

If the Apple driver is explicitly calling Abort(), isn't this more a stability problem than a potential vulnerability?

Severity: -- → S3

Pretty bad webgl related crash in an apple drive related to google maps. Looks like it regressed in 80. Needs a priority.

Flags: needinfo?(gpascutto)

Although it's triggered by WebGL use cases, the (suspected) regressor was in device enumeration code (for Telemetry, to support WebRenderer) and the crashing code isn't ours (but if it's on old macOS, we can't even ask Apple to fix).

Jeff, any idea how to approach this?

Flags: needinfo?(gpascutto) → needinfo?(jgilbert)

Likely regressed by the telemetry changes we made in bug 1649490

Blocks: gfx-triage
See Also: → 1663361, 1665838
Group: gfx-core-security

Graphics team triaged this, we felt the telemetry work wasn't related.

Flags: needinfo?(gpascutto)

I really don't see anything else in this list that would cause us to suddenly hit a bug in the HD3000 driver on older macOS:
https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2020-07-04&tochange=6087e976924f95018479c6f5881878c95b8bd8e2

We have multiple bugs filed on this, let's see if any of the users can use mozregression?

Flags: needinfo?(gpascutto)

(In reply to [:philipp] from comment #3)

80.0a1 build 20200705214003 seems to have been the first nightly build affected by the problem. this would have been the patches landing the day before: https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2020-07-04&tochange=6087e976924f95018479c6f5881878c95b8bd8e2

those 4 bugs look related to macos and/or gfx in particular: https://bugzilla.mozilla.org/buglist.cgi?bug_id=1649490,1647565,1647186,1324591

2020-07-05 was when Mozilla started pushing Firefox users on older versions of OS X (pre 10.12) to the ESR branch. Crashes on the release branch are throttled (at about the rate of 1 in 10), but those on ESR releases aren't. So there are many bugs that show a substantial increase in crash numbers starting on that date. (All the other ones I know about are security bugs, so I won't list them here.)

Looking at the crash stats in "Crash Data" now, I notice a large increase starting just after 2020-08-25. That, too, may be an artifact. But until we know otherwise, it's a better place to look for possible triggers for this bug. The bug itself is almost certainly an Apple bug, in one or more of their graphics drivers. But if we can find the trigger, it may be possible to work around it.

Update - we think we have this hardware in Toronto, we're trying to get access to it so we can do some debugging.

I have requested the machines from the office.

Has Regression Range: --- → yes

This reverts commit f559b920572b19d1f943744416c38baf82780b98
for causing crashes on Google maps on Intel HD 3000 hardware

Assignee: nobody → jmuizelaar
Status: NEW → ASSIGNED

Comment on attachment 9178205 [details]
Bug 1664843. Revert "Bug 1649490 - detect all Mac GPUs"

Beta/Release Uplift Approval Request

  • User impact if declined: Google maps crashes on MacOS for users with Intel gen6 hardware. It seems like the crash might be restricted to 10.12/10.13
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This should be relatively low risk. It reverts a change that allowed us to detect multiple GPUs on mac systems.
  • String changes made/needed:
Attachment #9178205 - Flags: approval-mozilla-release?
Attachment #9178205 - Flags: approval-mozilla-beta?

Comment on attachment 9178205 [details]
Bug 1664843. Revert "Bug 1649490 - detect all Mac GPUs"

approved for 82.0b5

Attachment #9178205 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: needinfo?(jgilbert)
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 83 Branch

I can reproduce this now. I installed macOS 10.12 on a machine with an Intel HD 3000; going to Google Maps in an affected Firefox build crashes the tab.

Summary: Crash in [@ abort | glrAppleSyncState] → Crash in [@ abort | glrAppleSyncState] (WebGL crashes on 10.12/10.13 on Intel HD 3000, affects Google Maps)

Interesting. I don't crash on 10.12, on a mid 2015 MacBook Pro with Intel Graphics hardware (Iris Pro) that uses the AppleIntelHD5000Graphics kernel extension. I tested with FF 81 and yesterday's mozilla-central nightly (which doesn't yet have this bug's patch), using plain-vanilla settings.

Not surprising, though: On 10.12, the only hardware-specific Intel driver that contains a "glrAppleSyncState" function is AppleIntelHD3000GraphicsGLDriver (not AppleIntelHD4000GraphicsGLDriver or AppleIntelHD5000GraphicsGLDriver).

This is the first graphics driver bug I've seen that is so very specific.

I've been comparing a bad build and a good build. It looks like the telemetry gfx environment is the exact same, and matches the one from this crash report. This machine only has a single GPU.

However, the information displayed on https://webglreport.com/ has a lot of differences! For example:

Bad build Good build
Max Varying Vectors 32 15
Max Combined Texture Image Units 48 16
Aliased Line Width Range [1, 7] [1, 1]
Advertises support for EXT_float_blend Yes No

I wonder if there's some code in Firefox that would be limiting those queryable numbers to something smaller, and if the different way of querying GPUs broke it.

Maybe this is related to sandboxing. Does the GfxInfo code run in the content process? There's a list of allow-listed IOKit property names here: https://searchfox.org/mozilla-central/rev/dfd9c0f72f9765bd4a187444e0c1e19e8834a506/security/sandbox/mac/SandboxPolicyContent.h#194-209

Notably, the allowlist does not include the property name "class-code" which bug 1649490 queries.

The sandbox seems to be it: After setting security.sandbox.content.level to 0 and restarting, webglreport in the bad build reports the same (lower) numbers as the good build, and Google Maps no longer crashes.

So in the "bad build", the sandbox in the content process makes it so that GfxInfo computes an empty vendor ID + device ID.

jgilbert supplied the missing piece of the puzzle: We have code in gfxPlatform.cpp which specifically works around an Intel HD 3000 crash by disabling Core Profile contexts based on the vendor ID + device ID. This code is now ineffective and the crash is back.

Blocks: 1668145

I've filed bug 1668145 for the second attempt.

Comment on attachment 9178205 [details]
Bug 1664843. Revert "Bug 1649490 - detect all Mac GPUs"

Approved for 81.0.1.

Attachment #9178205 - Flags: approval-mozilla-release? → approval-mozilla-release+

I have reproduced this issue in older versions and verified in Nightly v83.0a1 from 2020-09-29 and Beta v82.0b5 on Mac Mini (mid2011) with Mac OS 10.13.6. Waiting for Firefox Release v80.0.1 to verify there as well. Leaving NI on me.

Flags: needinfo?(daniel.bodea)
Hardware: Unspecified → Desktop

This fix was also verified on Firefox Release v81.0.1. Thank you.

Flags: needinfo?(daniel.bodea)
Status: RESOLVED → VERIFIED
No longer blocks: gfx-triage

I experience very similar behavior but on HD4000 - reported a bug here #1739870, could anyone take a look on it and suggest how to debug it further?

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: