Closed Bug 671064 Opened 13 years ago Closed 13 years ago

[Mac] Update to cairo 1.10 causes Print Preview to always crash on some pages on OS X

Categories

(Core :: Graphics, defect)

x86
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla8
Tracking Status
firefox7 + fixed

People

(Reporter: marcia, Assigned: smichaud)

References

Details

(4 keywords, Whiteboard: [inbound][qa!])

Crash Data

Attachments

(3 files, 1 obsolete file)

Mac only low volume crash seen in Aurora and trunk but right now ranks as #1 top crash on Aurora. Crashes started showing up using the 2011062100 build. Comments mention Print Preview and Google books. https://crash-stats.mozilla.com/report/list?signature=_cairo_surface_release_source_image

https://crash-stats.mozilla.com/report/index/e6b2257b-34ed-4b25-934d-682a02110708

Frame 	Module 	Signature [Expand] 	Source
0 	XUL 	_cairo_surface_release_source_image 	gfx/cairo/cairo/src/cairo-surface.c:1476
1 	XUL 	DataProviderReleaseCallback 	gfx/cairo/cairo/src/cairo-quartz-surface.c:1108
2 	CoreGraphics 	CoreGraphics@0x7409a 	
3 	CoreGraphics 	CoreGraphics@0x74042 	
4 	CoreFoundation 	_CFRelease 	
5 	CoreGraphics 	CoreGraphics@0x73f80 	
6 	CoreFoundation 	_CFRelease 	
7 	CoreGraphics 	CoreGraphics@0x156476 	
8 	libPDFRIP.A.dylib 	libPDFRIP.A.dylib@0x1e9e 	
9 	CoreFoundation 	__CFSetCallback 	
10 	CoreFoundation 	__CFBasicHashDrain 	
11 	CoreFoundation 	_CFRelease 	
12 	libPDFRIP.A.dylib 	libPDFRIP.A.dylib@0x20ed 	
13 	libPDFRIP.A.dylib 	libPDFRIP.A.dylib@0x5c36 	
14 	CoreGraphics 	CoreGraphics@0x741ea 	
15 	CoreFoundation 	_CFRelease 	
16 	CoreGraphics 	CoreGraphics@0x741d4 	
17 	PrintCore 	pdfSpoolingReleaseDrawingContext 	
18 	PrintCore 	PJCReleaseSpoolingSession 	
19 	PrintCore 	PJCEndDocument 	
20 	PrintCore 	PMSessionEndDocumentNoDialog 	
21 	XUL 	nsDeviceContextSpecX::EndDocument 	widget/src/cocoa/nsDeviceContextSpecX.mm:129
22 	XUL 	nsDeviceContext::EndDocument 	gfx/src/nsDeviceContext.cpp:609
23 	XUL 	nsPrintData::~nsPrintData 	layout/printing/nsPrintData.cpp:117
24 	XUL 	nsPrintEngine::Destroy 	layout/printing/nsPrintEngine.cpp:284
25 	XUL 	DocumentViewerImpl::OnDonePrinting 	layout/base/nsDocumentViewer.cpp:4281
26 	XUL 	nsPrintCompletionEvent::Run 	layout/printing/nsPrintEngine.cpp:3377
27 	XUL 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:617
28 	XUL 	NS_ProcessPendingEvents_P 	obj-firefox/x86_64/xpcom/build/nsThreadUtils.cpp:195
29 	XUL 	nsBaseAppShell::NativeEventCallback 	widget/src/xpwidgets/nsBaseAppShell.cpp:130
30 	XUL 	nsAppShell::ProcessGeckoEvents 	widget/src/cocoa/nsAppShell.mm:422
31 	CoreFoundation 	__CFRunLoopDoSources0 	
32 	CoreFoundation 	__CFRunLoopRun 	
33 	CoreFoundation 	CFRunLoopRunSpecific 	
34 	HIToolbox 	HIToolbox@0x2e7ed 	
35 	HIToolbox 	HIToolbox@0x2e550 	
36 	HIToolbox 	HIToolbox@0x2e4ab 	
37 	AppKit 	_DPSNextEvent 	
38 	AppKit 	-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 	
39 	AppKit 	-[NSApplication run] 	
40 	XUL 	nsAppShell::Run 	widget/src/cocoa/nsAppShell.mm:769
41 	XUL 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:222
42 	XUL 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3570
43 	firefox-bin 	main 	browser/app/nsBrowserApp.cpp:198
44 	firefox-bin 	firefox-bin@0xb73
Just able to reproduce this on 10.7. Here are my STR using the latest trunk build:

1. Load my main mail page on GMail - https://mail.google.com/mail/. I have the "Tree" theme installed in GMail but I am not sure if that is a factor or not.
2. File -> Print -> Open PDF in Preview
3. Crash 100%

https://crash-stats.mozilla.com/report/index/bp-15a25f2f-9b91-4b58-8fb9-677412110712
Keywords: reproducible
100% reproducible for me on 10.6 too; File -> Print -> PDF -> Save as PDF. Doesn't seem to matter what theme on Gmail.

Stephen, can you take a look at this?
Assignee: nobody → smichaud
err - Steven. Sorry for the misspelling!
I've just reproduced this on OS X 10.5.8:

bp-b30499bd-bdf1-48ee-8cd0-b2c4f2110713

I'll look into it, and see what I can find out.
This is bad.  Every attempt to use Print Preview crashes!

Here's the regression range:

firefox-2011-05-27-03-mozilla-central
firefox-2011-05-28-03-mozilla-central

I'm pretty sure this incriminates the patch for bug 562746, which updated cairo to version 1.10.
Blocks: 562746
Summary: [Mac] Firefox Crash [@ _cairo_surface_release_source_image ] → [Mac] Update to cairo 1.10 causes Print Preview to always crash on OS X
Breakpad reports several different crash signatures.  I'll get some gdb crash stacks.
There appear to be some pages on which Print Preview *doesn't* crash (or at least doesn't always crash).  I'll try to find some examples.
Attached file Gdb crash stacks
Here are two gdb crash stacks -- the first taken in 64-bit mode, the
other in 32-bit mode.

There was one tab (in one window) open to the default about:home page
when I chose Preview from the Print dialog ... and crashed.

I tested with a	build with debug symbols, with optimization turned off
(not the same as a debug build).  The build was made from current
trunk code.  I tested on OS X 10.6.8.
After some quick tests with today's mozilla-central nightly on OS X 10.6.8:

Pages that (almost) always crash doing Print Preview:

about:home
http://www.mozilla.org/projects/firefox/prerelease.html
http://www.apple.com/

Pages that (usually) don't crash doing Print Preview:

https://bugzilla.mozilla.org/show_bug.cgi?id=0
https://bugzilla.mozilla.org/show_bug.cgi?id=671064
http://www.mozilla.org/
http://www.mozilla.com/en-US/firefox/new/
> <missing bug number>

This was a link to the bug whose id is 0.
Summary: [Mac] Update to cairo 1.10 causes Print Preview to always crash on OS X → [Mac] Update to cairo 1.10 causes Print Preview to always crash on some pages on OS X
It's definitely the patch for bug 562746 (the update to cairo 1.10) that triggered these crashes, if it didn't actually cause them.

A build with http://hg.mozilla.org/mozilla-central/rev/503990b2c5c6 as tip doesn't crash.  A build with http://hg.mozilla.org/mozilla-central/rev/04e8d0b481bc as tip does crash.
Attached patch Fix (obsolete) — Splinter Review
This bug involves accessing a cairo_surface_t object after it's been
deleted.  Here's a very simple patch that seems to fix it.

I'm doing a tryserver build, which should be available tomorrow
morning.
Attachment #546049 - Flags: review?(jmuizelaar)
> This might be related:
> http://lists.freedesktop.org/archives/cairo/2009-February/016677.html

I don't know the cairo code well enough to say.

But my patch does (in effect) make a quartz_source_image_t object hold
a reference to the cairo_surface_t object that it points to, to ensure
that the lifetime of the cairo_surface_t object is at least as long as
the lifetime of the quartz_source_image_t object that points to it.
(Following up comment #16)

In debugging this, though, I noticed that
quartz_source_image_t.surface and quartz_source_image_t.img_out are
always the same ... which puzzles me.
(In reply to comment #17)
> (Following up comment #16)
> 
> In debugging this, though, I noticed that
> quartz_source_image_t.surface and quartz_source_image_t.img_out are
> always the same ... which puzzles me.

Any idea what changed to cause this crash in the first place?
> Any idea what changed to cause this crash in the first place?

No.

Though (of course) it's clear that a cairo_surface_t object is being accessed after deletion -- in _cairo_surface_release_source_image() and (probably) also elsewhere.

Possibly it was just a timing change.
I can confirm that I do not crash on 10.7 using the tryserver build and trying to print using the same STR in Comment 1. Trying 10.6 next.

(In reply to comment #15)
> Here's a tryserver build made with my patch from comment #13
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com-
> ea8910103ac1/try-macosx64/firefox-8.0a1.en-US.mac.dmg
> 
> Please try it out!
It occurs to me (as I'm learning more about the cairo code) that it might be better to fix this bug in cairo_quartz_image_surface.c.  New patch coming up shortly.
> It occurs to me (as I'm learning more about the cairo code) that it
> might be better to fix this bug in cairo_quartz_image_surface.c.

But this doesn't work, for reasons I can't yet fathom.

So I'll need to dig further.
Andrea Canciani writes:
regarding the cairo-quartz patch, ref'ing/unref'ing should not be needed as the life of cgimage objects (and of the surface acquisition) does not extend beyond the backend functions. if you have some code which breaks without it, it is probably indicating another bug in cairo[-quartz]. (that ref'ing/unref'ing would be needed if the cgimage was associated to the surface snapshot-like)
>> It occurs to me (as I'm learning more about the cairo code) that it
>> might be better to fix this bug in cairo_quartz_image_surface.c.
>
> But this doesn't work, for reasons I can't yet fathom.

It does work if I try to fix the bug in cairo-image-surface.c instead.
But I now think my original patch is probably the best we can do.

Fixing the problem in a "backend"'s calls to acquire_source_image()
and release_source_image() is overkill -- we only (as far as I can
tell) need the additional calls to cairo_surface_reference() and
cairo_surface_destroy() in this one case -- that of using a
CGDataProvider to create a CGImage.  The problem is that the
releaseCallback (DataProviderReleaseCallback()) is called by the OS at
unpredictable times, which will sometimes be after
quartz_source_image_t.surface has been destroyed.

I've also figured out why these crashes started happening with the
upgrade to Cairo 1.10.  Before the upgrade, a separate surface
snapshot was created in _cairo_surface_to_cgimage() and passed to
_cairo_quartz_create_cgimage() as its last parameter (releaseInfo), so
we didn't have to worry that something else would destroy it.  After
the upgrade, only a pointer to the "original" surface gets passed (as
((quartz_source_image_t*)releaseInfo)->surface), which *can* get
destroyed by something else, unless we create another reference to it.
As best I can tell, what Andrea Canciani says is some combination of wrong and irrelevant.
Jeff, are you going to r- Steven's patch? If not, what would a better solution to this bug be?

It's gotten to the point where, if we don't fix this, we should probably look at backing out the Cairo update.
Attachment #546049 - Flags: review?(jmuizelaar) → review+
(In reply to comment #25)
> As best I can tell, what Andrea Canciani says is some combination of wrong
> and irrelevant.

Yes, unfortunately I didn't understand the issue correctly since I was thinking about immediate drawing.
Quartz printing surfaces seem to be retained, which means that the life of the sources used to draw on them is much longer than cairo-quartz currently expects. This is a known problem, as pointed out in this comment http://cgit.freedesktop.org/cairo/tree/src/cairo-quartz-surface.c#n933

The patch attached to this bugreport handles the simple case of a source surface which does not change for the whole lifetime of the destination printing surface. Unfortunately this assumption is not valid in general and to guarantee the correctness of the output, a snapshot or some other form of deep-cloning should be used instead.

Nevertheless, the patch alleviates the issue, because it guarantees that the source surface will be alive until it is needed (the content might be different from the desired one, but at least it should avoid crashes).

Snapshotting might have a performance impact, but might be acceptable for printing surfaces. A patch which both avoids the crash and provides the correct result would be much preferred upstream.
I've carried forward Jeff's r+.
Attachment #546049 - Attachment is obsolete: true
Attachment #549835 - Flags: review+
(In reply to comment #27)

> The patch attached to this bugreport handles the simple case of a
> source surface which does not change for the whole lifetime of the
> destination printing surface. Unfortunately this assumption is not
> valid in general and to guarantee the correctness of the output, a
> snapshot or some other form of deep-cloning should be used instead.

Would it be acceptable to call _cairo_surface_snapshot() on 'source'
and set source_img->surface to its result?  (This is more-or-less what
our cairo code did before the upgrade to version 1.10.)  Would this
snapshot get updated if the original source surface changed?
(In reply to comment #29)
> (In reply to comment #27)
> 
> > The patch attached to this bugreport handles the simple case of a
> > source surface which does not change for the whole lifetime of the
> > destination printing surface. Unfortunately this assumption is not
> > valid in general and to guarantee the correctness of the output, a
> > snapshot or some other form of deep-cloning should be used instead.
> 
> Would it be acceptable to call _cairo_surface_snapshot() on 'source'
> and set source_img->surface to its result?  (This is more-or-less what

Yes, this should avoid the crash and provide the desired behavior.

> our cairo code did before the upgrade to version 1.10.)  Would this
> snapshot get updated if the original source surface changed?

Cairo snapshot are copy-on-write. When the surface they target is about
to bemodified, snapshots are detached from it and attached to a
non-modificable clone (the clonation takes place before the surface is
changed, so that the content which the snapshot provides never
changes for its whole lifetime).

This is the expected behavior in your case (if I paint A over B, then I
change A, I still expect to have the old content of A in B... in fact this
is how client code usually copies surface contents).

This solution should be simple and correct, so if the performance
does not degrade too much, I believe you should definitely apply it.
Keywords: checkin-needed
Whiteboard: [inbound]
Attachment #549835 - Flags: approval-mozilla-aurora?
Comment on attachment 549835 [details] [diff] [review]
Fix (for others to land)

This patch is (basically) zero risk, and fixes a topcrasher.
Comment on attachment 549835 [details] [diff] [review]
Fix (for others to land)

Approved for releases/mozilla-aurora
Attachment #549835 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
http://hg.mozilla.org/mozilla-central/rev/0cdcf876dd05
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
qa+ for verification on Firefox 7
Whiteboard: [inbound] → [inbound][qa+]
Setting resolution to Verified Fixed on MacOS X 10.6 and 10.7:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0) Gecko/20100101 Firefox/7.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0a2) Gecko/20110921 Firefox/8.0a2
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0a1) Gecko/20110922 Firefox/9.0a1

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0) Gecko/20100101 Firefox/7.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:8.0a2) Gecko/20110921 Firefox/8.0a2
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:9.0a1) Gecko/20110919 Firefox/9.0a1

In order to verify this, I've used the steps from comment1 and comment2 and I got no crash.
Status: RESOLVED → VERIFIED
Whiteboard: [inbound][qa+] → [inbound][qa!]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: