Last Comment Bug 671064 - [Mac] Update to cairo 1.10 causes Print Preview to always crash on some pages on OS X
: [Mac] Update to cairo 1.10 causes Print Preview to always crash on some pages...
Status: VERIFIED FIXED
[inbound][qa!]
: crash, reproducible, verified-aurora, verified-beta
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: Trunk
: x86 Mac OS X
: -- critical (vote)
: mozilla8
Assigned To: Steven Michaud [:smichaud] (Retired)
:
: Milan Sreckovic [:milan]
Mentors:
Depends on:
Blocks: 562746
  Show dependency treegraph
 
Reported: 2011-07-12 13:47 PDT by Marcia Knous [:marcia - use ni]
Modified: 2013-12-27 14:21 PST (History)
10 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed


Attachments
Gdb crash stacks (8.84 KB, text/plain)
2011-07-13 13:56 PDT, Steven Michaud [:smichaud] (Retired)
no flags Details
Simplified test case that causes the crash with print preview (245 bytes, patch)
2011-07-14 16:04 PDT, Jeff Muizelaar [:jrmuizel]
no flags Details | Diff | Splinter Review
Fix (1.79 KB, patch)
2011-07-14 17:17 PDT, Steven Michaud [:smichaud] (Retired)
jmuizelaar: review+
Details | Diff | Splinter Review
Fix (for others to land) (1.99 KB, patch)
2011-08-01 10:00 PDT, Steven Michaud [:smichaud] (Retired)
smichaud: review+
christian: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Marcia Knous [:marcia - use ni] 2011-07-12 13:47:31 PDT
Mac only low volume crash seen in Aurora and trunk but right now ranks as #1 top crash on Aurora. Crashes started showing up using the 2011062100 build. Comments mention Print Preview and Google books. https://crash-stats.mozilla.com/report/list?signature=_cairo_surface_release_source_image

https://crash-stats.mozilla.com/report/index/e6b2257b-34ed-4b25-934d-682a02110708

Frame 	Module 	Signature [Expand] 	Source
0 	XUL 	_cairo_surface_release_source_image 	gfx/cairo/cairo/src/cairo-surface.c:1476
1 	XUL 	DataProviderReleaseCallback 	gfx/cairo/cairo/src/cairo-quartz-surface.c:1108
2 	CoreGraphics 	CoreGraphics@0x7409a 	
3 	CoreGraphics 	CoreGraphics@0x74042 	
4 	CoreFoundation 	_CFRelease 	
5 	CoreGraphics 	CoreGraphics@0x73f80 	
6 	CoreFoundation 	_CFRelease 	
7 	CoreGraphics 	CoreGraphics@0x156476 	
8 	libPDFRIP.A.dylib 	libPDFRIP.A.dylib@0x1e9e 	
9 	CoreFoundation 	__CFSetCallback 	
10 	CoreFoundation 	__CFBasicHashDrain 	
11 	CoreFoundation 	_CFRelease 	
12 	libPDFRIP.A.dylib 	libPDFRIP.A.dylib@0x20ed 	
13 	libPDFRIP.A.dylib 	libPDFRIP.A.dylib@0x5c36 	
14 	CoreGraphics 	CoreGraphics@0x741ea 	
15 	CoreFoundation 	_CFRelease 	
16 	CoreGraphics 	CoreGraphics@0x741d4 	
17 	PrintCore 	pdfSpoolingReleaseDrawingContext 	
18 	PrintCore 	PJCReleaseSpoolingSession 	
19 	PrintCore 	PJCEndDocument 	
20 	PrintCore 	PMSessionEndDocumentNoDialog 	
21 	XUL 	nsDeviceContextSpecX::EndDocument 	widget/src/cocoa/nsDeviceContextSpecX.mm:129
22 	XUL 	nsDeviceContext::EndDocument 	gfx/src/nsDeviceContext.cpp:609
23 	XUL 	nsPrintData::~nsPrintData 	layout/printing/nsPrintData.cpp:117
24 	XUL 	nsPrintEngine::Destroy 	layout/printing/nsPrintEngine.cpp:284
25 	XUL 	DocumentViewerImpl::OnDonePrinting 	layout/base/nsDocumentViewer.cpp:4281
26 	XUL 	nsPrintCompletionEvent::Run 	layout/printing/nsPrintEngine.cpp:3377
27 	XUL 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:617
28 	XUL 	NS_ProcessPendingEvents_P 	obj-firefox/x86_64/xpcom/build/nsThreadUtils.cpp:195
29 	XUL 	nsBaseAppShell::NativeEventCallback 	widget/src/xpwidgets/nsBaseAppShell.cpp:130
30 	XUL 	nsAppShell::ProcessGeckoEvents 	widget/src/cocoa/nsAppShell.mm:422
31 	CoreFoundation 	__CFRunLoopDoSources0 	
32 	CoreFoundation 	__CFRunLoopRun 	
33 	CoreFoundation 	CFRunLoopRunSpecific 	
34 	HIToolbox 	HIToolbox@0x2e7ed 	
35 	HIToolbox 	HIToolbox@0x2e550 	
36 	HIToolbox 	HIToolbox@0x2e4ab 	
37 	AppKit 	_DPSNextEvent 	
38 	AppKit 	-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 	
39 	AppKit 	-[NSApplication run] 	
40 	XUL 	nsAppShell::Run 	widget/src/cocoa/nsAppShell.mm:769
41 	XUL 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:222
42 	XUL 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3570
43 	firefox-bin 	main 	browser/app/nsBrowserApp.cpp:198
44 	firefox-bin 	firefox-bin@0xb73
Comment 1 Marcia Knous [:marcia - use ni] 2011-07-12 17:30:04 PDT
Just able to reproduce this on 10.7. Here are my STR using the latest trunk build:

1. Load my main mail page on GMail - https://mail.google.com/mail/. I have the "Tree" theme installed in GMail but I am not sure if that is a factor or not.
2. File -> Print -> Open PDF in Preview
3. Crash 100%

https://crash-stats.mozilla.com/report/index/bp-15a25f2f-9b91-4b58-8fb9-677412110712
Comment 2 Joe Drew (not getting mail) 2011-07-13 10:40:41 PDT
100% reproducible for me on 10.6 too; File -> Print -> PDF -> Save as PDF. Doesn't seem to matter what theme on Gmail.

Stephen, can you take a look at this?
Comment 3 Joe Drew (not getting mail) 2011-07-13 10:41:32 PDT
err - Steven. Sorry for the misspelling!
Comment 4 Steven Michaud [:smichaud] (Retired) 2011-07-13 11:56:29 PDT
I've just reproduced this on OS X 10.5.8:

bp-b30499bd-bdf1-48ee-8cd0-b2c4f2110713

I'll look into it, and see what I can find out.
Comment 5 Steven Michaud [:smichaud] (Retired) 2011-07-13 12:47:23 PDT
This is bad.  Every attempt to use Print Preview crashes!

Here's the regression range:

firefox-2011-05-27-03-mozilla-central
firefox-2011-05-28-03-mozilla-central

I'm pretty sure this incriminates the patch for bug 562746, which updated cairo to version 1.10.
Comment 6 Steven Michaud [:smichaud] (Retired) 2011-07-13 12:51:19 PDT
Breakpad reports several different crash signatures.  I'll get some gdb crash stacks.
Comment 7 Steven Michaud [:smichaud] (Retired) 2011-07-13 12:54:11 PDT
There appear to be some pages on which Print Preview *doesn't* crash (or at least doesn't always crash).  I'll try to find some examples.
Comment 8 Steven Michaud [:smichaud] (Retired) 2011-07-13 13:56:14 PDT
Created attachment 545737 [details]
Gdb crash stacks

Here are two gdb crash stacks -- the first taken in 64-bit mode, the
other in 32-bit mode.

There was one tab (in one window) open to the default about:home page
when I chose Preview from the Print dialog ... and crashed.

I tested with a	build with debug symbols, with optimization turned off
(not the same as a debug build).  The build was made from current
trunk code.  I tested on OS X 10.6.8.
Comment 9 Steven Michaud [:smichaud] (Retired) 2011-07-13 14:13:25 PDT
After some quick tests with today's mozilla-central nightly on OS X 10.6.8:

Pages that (almost) always crash doing Print Preview:

about:home
http://www.mozilla.org/projects/firefox/prerelease.html
http://www.apple.com/

Pages that (usually) don't crash doing Print Preview:

https://bugzilla.mozilla.org/show_bug.cgi?id=0
https://bugzilla.mozilla.org/show_bug.cgi?id=671064
http://www.mozilla.org/
http://www.mozilla.com/en-US/firefox/new/
Comment 10 Steven Michaud [:smichaud] (Retired) 2011-07-13 14:14:40 PDT
> <missing bug number>

This was a link to the bug whose id is 0.
Comment 11 Steven Michaud [:smichaud] (Retired) 2011-07-13 14:51:52 PDT
It's definitely the patch for bug 562746 (the update to cairo 1.10) that triggered these crashes, if it didn't actually cause them.

A build with http://hg.mozilla.org/mozilla-central/rev/503990b2c5c6 as tip doesn't crash.  A build with http://hg.mozilla.org/mozilla-central/rev/04e8d0b481bc as tip does crash.
Comment 12 Jeff Muizelaar [:jrmuizel] 2011-07-14 16:04:22 PDT
Created attachment 546031 [details] [diff] [review]
Simplified test case that causes the crash with print preview
Comment 13 Steven Michaud [:smichaud] (Retired) 2011-07-14 17:17:27 PDT
Created attachment 546049 [details] [diff] [review]
Fix

This bug involves accessing a cairo_surface_t object after it's been
deleted.  Here's a very simple patch that seems to fix it.

I'm doing a tryserver build, which should be available tomorrow
morning.
Comment 14 Jeff Muizelaar [:jrmuizel] 2011-07-15 07:42:30 PDT
This might be related:
http://lists.freedesktop.org/archives/cairo/2009-February/016677.html
Comment 15 Steven Michaud [:smichaud] (Retired) 2011-07-15 07:44:10 PDT
Here's a tryserver build made with my patch from comment #13
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com-ea8910103ac1/try-macosx64/firefox-8.0a1.en-US.mac.dmg

Please try it out!
Comment 16 Steven Michaud [:smichaud] (Retired) 2011-07-15 08:06:02 PDT
> This might be related:
> http://lists.freedesktop.org/archives/cairo/2009-February/016677.html

I don't know the cairo code well enough to say.

But my patch does (in effect) make a quartz_source_image_t object hold
a reference to the cairo_surface_t object that it points to, to ensure
that the lifetime of the cairo_surface_t object is at least as long as
the lifetime of the quartz_source_image_t object that points to it.
Comment 17 Steven Michaud [:smichaud] (Retired) 2011-07-15 08:12:46 PDT
(Following up comment #16)

In debugging this, though, I noticed that
quartz_source_image_t.surface and quartz_source_image_t.img_out are
always the same ... which puzzles me.
Comment 18 Jeff Muizelaar [:jrmuizel] 2011-07-15 08:55:29 PDT
(In reply to comment #17)
> (Following up comment #16)
> 
> In debugging this, though, I noticed that
> quartz_source_image_t.surface and quartz_source_image_t.img_out are
> always the same ... which puzzles me.

Any idea what changed to cause this crash in the first place?
Comment 19 Steven Michaud [:smichaud] (Retired) 2011-07-15 09:04:49 PDT
> Any idea what changed to cause this crash in the first place?

No.

Though (of course) it's clear that a cairo_surface_t object is being accessed after deletion -- in _cairo_surface_release_source_image() and (probably) also elsewhere.

Possibly it was just a timing change.
Comment 20 Marcia Knous [:marcia - use ni] 2011-07-15 09:39:58 PDT
I can confirm that I do not crash on 10.7 using the tryserver build and trying to print using the same STR in Comment 1. Trying 10.6 next.

(In reply to comment #15)
> Here's a tryserver build made with my patch from comment #13
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com-
> ea8910103ac1/try-macosx64/firefox-8.0a1.en-US.mac.dmg
> 
> Please try it out!
Comment 21 Steven Michaud [:smichaud] (Retired) 2011-07-15 09:42:11 PDT
It occurs to me (as I'm learning more about the cairo code) that it might be better to fix this bug in cairo_quartz_image_surface.c.  New patch coming up shortly.
Comment 22 Steven Michaud [:smichaud] (Retired) 2011-07-15 10:58:22 PDT
> It occurs to me (as I'm learning more about the cairo code) that it
> might be better to fix this bug in cairo_quartz_image_surface.c.

But this doesn't work, for reasons I can't yet fathom.

So I'll need to dig further.
Comment 23 Jeff Muizelaar [:jrmuizel] 2011-07-15 14:10:33 PDT
Andrea Canciani writes:
regarding the cairo-quartz patch, ref'ing/unref'ing should not be needed as the life of cgimage objects (and of the surface acquisition) does not extend beyond the backend functions. if you have some code which breaks without it, it is probably indicating another bug in cairo[-quartz]. (that ref'ing/unref'ing would be needed if the cgimage was associated to the surface snapshot-like)
Comment 24 Steven Michaud [:smichaud] (Retired) 2011-07-15 14:31:05 PDT
>> It occurs to me (as I'm learning more about the cairo code) that it
>> might be better to fix this bug in cairo_quartz_image_surface.c.
>
> But this doesn't work, for reasons I can't yet fathom.

It does work if I try to fix the bug in cairo-image-surface.c instead.
But I now think my original patch is probably the best we can do.

Fixing the problem in a "backend"'s calls to acquire_source_image()
and release_source_image() is overkill -- we only (as far as I can
tell) need the additional calls to cairo_surface_reference() and
cairo_surface_destroy() in this one case -- that of using a
CGDataProvider to create a CGImage.  The problem is that the
releaseCallback (DataProviderReleaseCallback()) is called by the OS at
unpredictable times, which will sometimes be after
quartz_source_image_t.surface has been destroyed.

I've also figured out why these crashes started happening with the
upgrade to Cairo 1.10.  Before the upgrade, a separate surface
snapshot was created in _cairo_surface_to_cgimage() and passed to
_cairo_quartz_create_cgimage() as its last parameter (releaseInfo), so
we didn't have to worry that something else would destroy it.  After
the upgrade, only a pointer to the "original" surface gets passed (as
((quartz_source_image_t*)releaseInfo)->surface), which *can* get
destroyed by something else, unless we create another reference to it.
Comment 25 Steven Michaud [:smichaud] (Retired) 2011-07-15 14:32:22 PDT
As best I can tell, what Andrea Canciani says is some combination of wrong and irrelevant.
Comment 26 Joe Drew (not getting mail) 2011-07-29 14:30:45 PDT
Jeff, are you going to r- Steven's patch? If not, what would a better solution to this bug be?

It's gotten to the point where, if we don't fix this, we should probably look at backing out the Cairo update.
Comment 27 Andrea Canciani 2011-07-29 15:48:57 PDT
(In reply to comment #25)
> As best I can tell, what Andrea Canciani says is some combination of wrong
> and irrelevant.

Yes, unfortunately I didn't understand the issue correctly since I was thinking about immediate drawing.
Quartz printing surfaces seem to be retained, which means that the life of the sources used to draw on them is much longer than cairo-quartz currently expects. This is a known problem, as pointed out in this comment http://cgit.freedesktop.org/cairo/tree/src/cairo-quartz-surface.c#n933

The patch attached to this bugreport handles the simple case of a source surface which does not change for the whole lifetime of the destination printing surface. Unfortunately this assumption is not valid in general and to guarantee the correctness of the output, a snapshot or some other form of deep-cloning should be used instead.

Nevertheless, the patch alleviates the issue, because it guarantees that the source surface will be alive until it is needed (the content might be different from the desired one, but at least it should avoid crashes).

Snapshotting might have a performance impact, but might be acceptable for printing surfaces. A patch which both avoids the crash and provides the correct result would be much preferred upstream.
Comment 28 Steven Michaud [:smichaud] (Retired) 2011-08-01 10:00:57 PDT
Created attachment 549835 [details] [diff] [review]
Fix (for others to land)

I've carried forward Jeff's r+.
Comment 29 Steven Michaud [:smichaud] (Retired) 2011-08-01 10:15:39 PDT
(In reply to comment #27)

> The patch attached to this bugreport handles the simple case of a
> source surface which does not change for the whole lifetime of the
> destination printing surface. Unfortunately this assumption is not
> valid in general and to guarantee the correctness of the output, a
> snapshot or some other form of deep-cloning should be used instead.

Would it be acceptable to call _cairo_surface_snapshot() on 'source'
and set source_img->surface to its result?  (This is more-or-less what
our cairo code did before the upgrade to version 1.10.)  Would this
snapshot get updated if the original source surface changed?
Comment 30 Andrea Canciani 2011-08-01 10:54:20 PDT
(In reply to comment #29)
> (In reply to comment #27)
> 
> > The patch attached to this bugreport handles the simple case of a
> > source surface which does not change for the whole lifetime of the
> > destination printing surface. Unfortunately this assumption is not
> > valid in general and to guarantee the correctness of the output, a
> > snapshot or some other form of deep-cloning should be used instead.
> 
> Would it be acceptable to call _cairo_surface_snapshot() on 'source'
> and set source_img->surface to its result?  (This is more-or-less what

Yes, this should avoid the crash and provide the desired behavior.

> our cairo code did before the upgrade to version 1.10.)  Would this
> snapshot get updated if the original source surface changed?

Cairo snapshot are copy-on-write. When the surface they target is about
to bemodified, snapshots are detached from it and attached to a
non-modificable clone (the clonation takes place before the surface is
changed, so that the content which the snapshot provides never
changes for its whole lifetime).

This is the expected behavior in your case (if I paint A over B, then I
change A, I still expect to have the old content of A in B... in fact this
is how client code usually copies surface contents).

This solution should be simple and correct, so if the performance
does not degrade too much, I believe you should definitely apply it.
Comment 31 Steven Michaud [:smichaud] (Retired) 2011-08-02 12:46:40 PDT
Comment on attachment 549835 [details] [diff] [review]
Fix (for others to land)

This patch is (basically) zero risk, and fixes a topcrasher.
Comment 32 christian 2011-08-02 14:41:41 PDT
Comment on attachment 549835 [details] [diff] [review]
Fix (for others to land)

Approved for releases/mozilla-aurora
Comment 33 Marco Bonardo [::mak] 2011-08-03 02:10:19 PDT
http://hg.mozilla.org/mozilla-central/rev/0cdcf876dd05
Comment 34 Steven Michaud [:smichaud] (Retired) 2011-08-03 12:35:20 PDT
Comment on attachment 549835 [details] [diff] [review]
Fix (for others to land)

Landed on mozilla-aurora:
http://hg.mozilla.org/releases/mozilla-aurora/rev/e0a5feb39665
Comment 35 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-09-22 16:02:48 PDT
qa+ for verification on Firefox 7
Comment 36 Vlad [QA] 2011-09-23 05:14:59 PDT
Setting resolution to Verified Fixed on MacOS X 10.6 and 10.7:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0) Gecko/20100101 Firefox/7.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0a2) Gecko/20110921 Firefox/8.0a2
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0a1) Gecko/20110922 Firefox/9.0a1

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0) Gecko/20100101 Firefox/7.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:8.0a2) Gecko/20110921 Firefox/8.0a2
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:9.0a1) Gecko/20110919 Firefox/9.0a1

In order to verify this, I've used the steps from comment1 and comment2 and I got no crash.

Note You need to log in before you can comment on or make changes to this bug.