Closed Bug 1901043 Opened 23 days ago Closed 9 days ago

Crashes [@ create_protected_copy ] while printing on 127 branch and up

Categories

(Core :: Printing: Output, defect)

Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED
129 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox126 --- unaffected
firefox127 + wontfix
firefox128 --- fixed
firefox129 --- fixed

People

(Reporter: smichaud, Assigned: jfkthame)

References

(Regression)

Details

(Keywords: regression)

Crash Data

Attachments

(2 files)

These crashes happen only on macOS 13 and 14, so they may be partly an Apple bug. But they started somewhere on the 127 branch, so presumably they're also at least partly a Mozilla bug. They don't happen frequently enough to pin down a precise regression range.

Edit: They also happen on macOS 12 and 11.

They all have _cairo_quartz_snapshot_create on the stack.

https://crash-stats.mozilla.org/search/?proto_signature=~cairo_quartz_snapshot_create&date=%3E%3D2024-05-06T15%3A05%3A00.000Z&date=%3C2024-06-06T15%3A05%3A00.000Z&_facets=signature&_facets=version&_facets=platform_version&_facets=proto_signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-proto_signature

Typical crash stack:

Crashing Thread (0), Name: MainThread
Frame  Module  Signature  Source  Trust
0  libsystem_platform.dylib  _platform_memmove   context
1  CoreGraphics  create_protected_copy   cfi
2  CoreGraphics  CGDataProviderCreateWithCopyOfData   cfi
3  CoreGraphics  CGDataProviderCreateTrustedWithCopyOfData   cfi
4  CoreGraphics  CGBitmapContextCreateImage   cfi
5  XUL  _cairo_quartz_snapshot_create  gfx/cairo/cairo/src/cairo-quartz-surface.c:2650  cfi
6  XUL  _cairo_quartz_surface_snapshot_get_image  gfx/cairo/cairo/src/cairo-quartz-surface.c:2676  cfi
7  XUL  _cairo_surface_to_cgimage  gfx/cairo/cairo/src/cairo-quartz-surface.c:756  cfi
8  XUL  _cairo_quartz_setup_pattern_source  gfx/cairo/cairo/src/cairo-quartz-surface.c:987  inlined
8  XUL  _cairo_quartz_setup_state  gfx/cairo/cairo/src/cairo-quartz-surface.c:1248  cfi
9  XUL  _cairo_quartz_cg_fill  gfx/cairo/cairo/src/cairo-quartz-surface.c:1858  cfi
10  XUL  _cairo_compositor_fill  gfx/cairo/cairo/src/cairo-compositor.c:245  cfi
11  XUL  _cairo_surface_fill  gfx/cairo/cairo/src/cairo-surface.c:2502  cfi
12  XUL  _cairo_gstate_fill  gfx/cairo/cairo/src/cairo-gstate.c:1352  cfi
13  XUL  _moz_cairo_fill_preserve  gfx/cairo/cairo/src/cairo.c:2454  cfi
14  XUL  mozilla::gfx::DrawTargetCairo::DrawPattern(mozilla::gfx::Pattern const&, mozilla::gfx::StrokeOptions const&, mozilla::gfx::DrawOptions const&, mozilla::gfx::DrawTargetCairo::DrawPatternType, bool)  gfx/2d/DrawTargetCairo.cpp:1051  cfi
15  XUL  mozilla::gfx::DrawTargetCairo::FillRect(mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits, float> const&, mozilla::gfx::Pattern const&, mozilla::gfx::DrawOptions const&)  gfx/2d/DrawTargetCairo.cpp:1101  cfi
16  XUL  mozilla::gfx::RecordedFillRect::PlayEvent(mozilla::gfx::Translator*) const  gfx/2d/RecordedEventImpl.h:2489  cfi
17  XUL  std::__1::__function::__value_func<bool (mozilla::gfx::RecordedEvent*)>::operator()[abi:un170006](mozilla::gfx::RecordedEvent*&&) const  /builds/worker/fetches/MacOSX14.4.sdk/usr/include/c++/v1/__functional/function.h:518  inlined
17  XUL  std::__1::function<bool (mozilla::gfx::RecordedEvent*)>::operator()(mozilla::gfx::RecordedEvent*) const  /builds/worker/fetches/MacOSX14.4.sdk/usr/include/c++/v1/__functional/function.h:1169  inlined
17  XUL  mozilla::gfx::RecordedEvent::DoWithEvent<mozilla::gfx::EventStream>(mozilla::gfx::EventStream&, mozilla::gfx::RecordedEvent::EventType, std::__1::function<bool (mozilla::gfx::RecordedEvent*)> const&)  gfx/2d/RecordedEventImpl.h:4514  cfi
18  XUL  mozilla::layout::PrintTranslator::TranslateRecording(mozilla::layout::PRFileDescStream&)  layout/printing/PrintTranslator.cpp:54  cfi
19  XUL  mozilla::layout::RemotePrintJobParent::PrintPage(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::layout::PRFileDescStream&, nsRefCountedHashtable<nsIntegralHashKey<unsigned long long, 0>, RefPtr<mozilla::gfx::RecordedDependentSurface> >*)  layout/printing/ipc/RemotePrintJobParent.cpp:179  cfi
20  XUL  mozilla::layout::RemotePrintJobParent::FinishProcessingPage(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, nsRefCountedHashtable<nsIntegralHashKey<unsigned long long, 0>, RefPtr<mozilla::gfx::RecordedDependentSurface> >*)  layout/printing/ipc/RemotePrintJobParent.cpp:158  inlined
20  XUL  mozilla::layout::RemotePrintJobParent::RecvProcessPage(int const&, int const&, nsTArray<unsigned long long>&&)  layout/printing/ipc/RemotePrintJobParent.cpp:132  cfi
21  XUL  mozilla::layout::PRemotePrintJobParent::OnMessageReceived(IPC::Message const&)  ipc/ipdl/PRemotePrintJobParent.cpp:376  cfi
22  XUL  mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&)  ipc/ipdl/PContentParent.cpp:6517  cfi
23  XUL  mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&)  ipc/glue/MessageChannel.cpp:1820  inlined
23  XUL  mozilla::ipc::MessageChannel::DispatchMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message> >)  ipc/glue/MessageChannel.cpp:1739  inlined
23  XUL  mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::ipc::MessageChannel::MessageTask&)  ipc/glue/MessageChannel.cpp:1530  inlined
23  XUL  mozilla::ipc::MessageChannel::MessageTask::Run()  ipc/glue/MessageChannel.cpp:1630  cfi
24  XUL  mozilla::RunnableTask::Run()  xpcom/threads/TaskController.cpp:580  inlined
24  XUL  mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&)  xpcom/threads/TaskController.cpp:907  inlined
24  XUL  mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&)  xpcom/threads/TaskController.cpp:730  cfi
25  XUL  mozilla::TaskController::ProcessPendingMTTask(bool)  xpcom/threads/TaskController.cpp:516  inlined
25  XUL  mozilla::TaskController::TaskController()::$_0::operator()() const  xpcom/threads/TaskController.cpp:234  inlined
25  XUL  mozilla::detail::RunnableFunction<mozilla::TaskController::TaskController()::$_0>::Run()  xpcom/threads/nsThreadUtils.h:548  cfi
26  XUL  nsThread::ProcessNextEvent(bool, bool*)  xpcom/threads/nsThread.cpp:1199  inlined
26  XUL  NS_ProcessPendingEvents(nsIThread*, unsigned int)  xpcom/threads/nsThreadUtils.cpp:445  cfi
27  XUL  nsBaseAppShell::NativeEventCallback()  widget/nsBaseAppShell.cpp:87  cfi
28  XUL  nsAppShell::ProcessGeckoEvents(void*)  widget/cocoa/nsAppShell.mm:541  cfi
29  CoreFoundation  __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__   cfi
30  CoreFoundation  __CFRunLoopDoSource0   cfi
31  CoreFoundation  __CFRunLoopDoSources0   cfi
32  CoreFoundation  __CFRunLoopRun   cfi
33  CoreFoundation  CFRunLoopRunSpecific   cfi
34  HIToolbox  RunCurrentEventLoopInMode   cfi
35  HIToolbox  ReceiveNextEventCommon   cfi
36  HIToolbox  _BlockUntilNextEventMatchingListInModeWithFilter   cfi
37  AppKit  _DPSNextEvent   cfi
38  AppKit  -[NSApplication(NSEventRouting) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]   cfi
39  XUL  -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]  widget/cocoa/nsAppShell.mm:196  cfi
40  AppKit  -[NSApplication run]   cfi
41  XUL  -[GeckoNSApplication run]  widget/cocoa/nsAppShell.mm:174  cfi
42  XUL  nsAppShell::Run()  widget/cocoa/nsAppShell.mm:871  cfi
43  XUL  nsAppStartup::Run()  toolkit/components/startup/nsAppStartup.cpp:296  cfi
44  XUL  XREMain::XRE_mainRun()  toolkit/xre/nsAppRunner.cpp:5741  cfi
45  XUL  XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&)  toolkit/xre/nsAppRunner.cpp:5953  cfi
46  XUL  XRE_main(int, char**, mozilla::BootstrapConfig const&)  toolkit/xre/nsAppRunner.cpp:6010  cfi
47  firefox  do_main(int, char**, char**)  browser/app/nsBrowserApp.cpp:230  inlined
47  firefox  main  browser/app/nsBrowserApp.cpp:448  cfi
48  dyld  start   cfi

Presumably related to the cairo 1.18.0 update, but without concrete STR it may be difficult to investigate as it's clearly not affecting every print operation.

(There was a cairo-quartz fix recently landed in bug 1900028, but one of the crash reports I see comes from the RC1 build (20240603152359) which included that fix, so apparently that wasn't the issue here.)

The CGDataProviderCreateTrustedWithCopyOfData function on the stack is intruiging; I don't see any mention of that on developer.apple.com. Nor does Google have much about create_protected_copy (some internal CoreGraphics thing, presumably).

Anyhow, I'm going to mark this as a regression from bug 1892913, given that this cairo_quartz_surface code underwent significant changes then, but also call it S3 for now, unless it becomes higher-frequency.

Looking at the current crash reports, there are a couple of pairs that look like they might be the same user making two attempts to print something, and crashing both times; if so, perhaps there's hope that we'll get a bug report with a specific page/document that reproduces this.

Severity: -- → S3
Keywords: regression
Regressed by: 1892913

Steven, if you know (or can discover) anything about this CGDataProviderCreateTrustedWithCopyOfData thing that is getting used internally by CGBitmapContextCreateImage, that might give us clues as to what's triggering this. My Google searches have come up with nothing so far...

Flags: needinfo?(smichaud)

Set release status flags based on info from the regressing bug 1892913

So I ran Hopper Disassembler and took a look at the CoreGraphics framework (on an Intel Mac running macOS 13.6.7).

CGDataProviderCreateTrustedWithCopyOfData() just calls CGDataProviderCreateWithCopyOfData() and sets a flag in the object returned. Within the CoreGraphics framework it's only called from CGBitmapContextCreateImage(), and then only if CGContextGetType() returns 0x4 (kCGContextTypeBitmap) or 0xc (unknown type).

create_protected_copy() creates a CFData object from raw data, and if it's not greater than vm_page_size also calls vm_protect() on it with set_maximum == true and new_protection == 1 (read-only).

I still don't really know what "trusted" means here. But it has to do with bitmaps. I suppose it could mean "immutable", but create_protected_copy() is called from CGDataProviderCreateWithCopyOfData(), so both providers have immutable data.

Edit: Digging around on https://opensource.apple.com, I found a reference to the term "trusted UI", which might be relevant here.

See comment #7 below.

Flags: needinfo?(smichaud)

(In reply to Jonathan Kew [:jfkthame] from comment #1)

Looking at the current crash reports, there are a couple of pairs that look like they might be the same user making two attempts to print something, and crashing both times; if so, perhaps there's hope that we'll get a bug report with a specific page/document that reproduces this.

meta-note: none of this bug's associated crash reports were submitted with a URL-of-the-page-that-was-loaded. I assume that's because these are all parent-process crashes, and the parent process isn't specific to any one page/URL. Maybe when we enter PContentParent::OnMessageReceived or somesuch, we should make a note of the content process's URL for usage in possible crash reports, in the event that there's a crash? (if the user checks the box in the crash-report dialog to include the URL of the crashing content) I'm not sure whether that's something that's already supposed to just work.

[Tracking Requested - why for this release]: We should make sure the volume is not concerning once this hits release.

Following up comment #4

I may have figured out what the "trusted" means in CGDataProviderCreateTrustedWithCopyOfData(). If I'm right it doesn't mean "TrustedUI". Instead it means something like "[a bitmap] created the standard way", as opposed to "[a bitmap] created using a CGContextDelegate callback".

CGBitmapContextCreateImage(), before it does anything else, first calls CGContextDelegateImplementsCallback() with type (arg1) set to 0x1a. By digging through the CoreGraphics framework I've found that 0x1a == kCGContextDelegateCreateImage. If this callback is implemented, CGBitmapContextCreateImage() calls it (indirectly, via CGContextDelegateCreateImage()). Otherwise it goes on to create a bitmap "in the standard way".

This distinction between "trusted" and "non-trusted" is moot, though. The whole CGContextDelegate API (including CGContextDelegateSetCallback()) is undocumented (though there's been some work to reverse engineer it). So it's highly unlikely that anyone besides Apple uses it. And Apple does use it, in a few cases, to set a kCGContextDelegateCreateImage callback. But the callback is always Apple code, usually also in the CoreGraphics framework (there's one more case in the RenderBox framework).

For the record, the RenderBox callback, whose name is mangled, is create_image(CGContextDelegate*, CGRenderingState*, CGGState*).

(In reply to Daniel Holbert [:dholbert] from comment #5)

we should make a note of the content process's URL for usage in possible crash reports

I filed bug 1901639 on this, FWIW.

Set release status flags based on info from the regressing bug 1892913

The bug is marked as tracked for firefox127 (release). However, the bug still isn't assigned and has low severity.

:fgriffith, could you please find an assignee and increase the severity for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(fgriffith)
Assignee: nobody → jfkthame
Flags: needinfo?(fgriffith)
Crash Signature: [@ create_protected_copy ] → [@ create_protected_copy ] [@ copy_byte_ptr ]

This eliminates the new _cairo_quartz_surface_snapshot, and the CGContextRef-based
version of cairo_quartz_image_surface, which seems to be the potentially-problematic
codepath.

(Unfortunately, with no known-crashing URL or steps to reproduce, I don't have
any way to actually test this short of landing it and watching crash-stats.)

Here's a possible workaround that we might consider trying. Basically, the idea is to revert part of cairo_quartz_surface to the pre-1.18.0 version, where AFAIK we weren't seeing a crash like this. The implementation of _cairo_surface_to_cgimage and its dependencies is substantially different: cairo_quartz_surface_snapshot doesn't exist, and cairo_quartz_image_surface is backed by a CGImageRef rather than a CGContextRef. Hopefully that will avoid us making the CoreGraphics call that ends up crashing here.

I've pushed a try run at https://treeherder.mozilla.org/jobs?repo=try&revision=8389c00d84d37c993ec292d21de629492d0d6be0 to check how things look there. In a little bit of local testing, printing functionality still seems to work OK; @jwatt, if you're able to do a bit of testing as well, that'd be awesome.

Flags: needinfo?(jwatt)
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/25ffbca1272c
Revert to the pre-1.18.0 version of _cairo_surface_to_cgimage() and cairo_quartz_image_surface code. r=gfx-reviewers,lsalzman
Status: NEW → RESOLVED
Closed: 9 days ago
Resolution: --- → FIXED
Target Milestone: --- → 129 Branch
Flags: needinfo?(jwatt)

The patch landed in nightly and beta is affected.
:jfkthame, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox128 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(jfkthame)

Lacking any known STR, we can't be sure whether this patch will in fact stop the crashes (though I'm hopeful, given that it removes the specific codepath where we're crashing, and reverts to older code that was working OK). Given that watching crash-stats is currently our only way to assess this (and the crash rate is too low for Nightly to provide useful data), I think we should go ahead and take it on beta.

Flags: needinfo?(jfkthame)

This eliminates the new _cairo_quartz_surface_snapshot, and the CGContextRef-based
version of cairo_quartz_image_surface, which seems to be the potentially-problematic
codepath.

(Unfortunately, with no known-crashing URL or steps to reproduce, I don't have
any way to actually test this short of landing it and watching crash-stats.)

Original Revision: https://phabricator.services.mozilla.com/D214297

Attachment #9408897 - Flags: approval-mozilla-beta?

beta Uplift Approval Request

  • User impact if declined: possible parent-process crash while printing on macOS
  • Code covered by automated testing: no
  • Fix verified in Nightly: no
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: no known str
  • Risk associated with taking this patch: low-ish
  • Explanation of risk level: bascially reverting to an earlier version of the cairo-quartz code; but not entirely risk-free given that surrounding code has changed substantially, so there could be unanticipated side-effects (but limited to macOS printing, the only scenario where this code is used)
  • String changes made/needed: none
  • Is Android affected?: no
Attachment #9408897 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: