Closed Bug 925448 Opened 6 years ago Closed 6 years ago

Crashes @ CoreGraphics@0x7fb9b | nsDeviceContextSpecX::EndPage on OS X 10.8 while printing

Categories

(Core :: Widget: Cocoa, defect, critical)

All
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla28
Tracking Status
firefox24 --- wontfix
firefox25 --- wontfix
firefox26 + wontfix
firefox27 + verified
firefox28 + verified

People

(Reporter: smichaud, Assigned: smichaud)

References

(Depends on 2 open bugs)

Details

(Keywords: topcrash-mac)

Crash Data

Attachments

(6 files, 2 obsolete files)

https://crash-stats.mozilla.com/report/list?signature=CoreGraphics%400x7fb9b&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&platform=mac&hang_type=any&date=2013-10-10+18%3A00%3A00&range_value=4#reports

These crashes go back a long way -- at least to FF 22 (bp-6dc3ad8b-de96-4e56-aa80-cdb5f2131004).  And they're still happening in significant numbers on the trunk (currently the 27 branch) -- enough to be the #9 topcrasher there.

The stacks themselves indicate these crashes happen while printing, and several comments say they happen while "printing" to a PDF file (or to Open PDF in Preview).  This strongly suggests these crashes are related to bug 880140.  But the workaround for that bug (landed on the 24 and 25 branches) doesn't seem to have fixed these crashes.  And bug 880140 as originally reported was triggered by the patch for bug 871530 (which first landed on the 24 branch).
See Also: → 880140
Crash Signature: [@ CoreGraphics@0x7fb9b ]
These are also the #6 Mac topcrasher on the 26 branch.
0 	CoreGraphics 	CoreGraphics@0x7fb9b 	
1 	CoreGraphics 	CoreGraphics@0x7aced 	
2 	CoreGraphics 	CoreGraphics@0x79542 	
3 	CoreGraphics 	CoreGraphics@0x7a2a2 	
4 	CoreGraphics 	CoreGraphics@0x7a0da 	
5 	libmozglue.dylib 	arena_run_alloc 	/builds/slave/b2g-in-osx64-00000000000000000/build/obj-firefox/x86_64/memory/mozjemalloc/../../../../memory/mozjemalloc/jemalloc.c
6 	CoreGraphics 	CoreGraphics@0x79d66 	
7 	CoreGraphics 	CoreGraphics@0x4e8876 	
8 	libPDFRIP.A.dylib 	libPDFRIP.A.dylib@0xad79 	
9 	libmozglue.dylib 	libmozglue.dylib@0x45d0 	
10 	CoreFoundation 	CoreFoundation@0xfd9c2 	
11 	CoreFoundation 	CoreFoundation@0x127b0 	
12 	libPDFRIP.A.dylib 	libPDFRIP.A.dylib@0xb803 	
13 	CoreFoundation 	CoreFoundation@0x1271e 	
14 	CoreFoundation 	CoreFoundation@0xfd9b0 	
15 	libPDFRIP.A.dylib 	libPDFRIP.A.dylib@0xb803 	
16 	libPDFRIP.A.dylib 	libPDFRIP.A.dylib@0xb7e5 	
17 	libPDFRIP.A.dylib 	libPDFRIP.A.dylib@0x4cdb 	
18 	libPDFRIP.A.dylib 	libPDFRIP.A.dylib@0x4c61 	
19 	libPDFRIP.A.dylib 	libPDFRIP.A.dylib@0x37f7 	
20 	PrintCore 	PrintCore@0x3bbab 	
21 	libmozglue.dylib 	arena_dalloc 	/builds/slave/b2g-in-osx64-00000000000000000/build/obj-firefox/x86_64/memory/mozjemalloc/../../../../memory/mozjemalloc/jemalloc.c
22 	PrintCore 	PrintCore@0x32ebf 	
23 	PrintCore 	PrintCore@0x10cc1 	
24 	XUL 	nsDeviceContextSpecX::EndPage() 	widget/cocoa/nsDeviceContextSpecX.mm
25 	XUL 	nsDeviceContext::EndPage() 	/builds/slave/m-cen-osx64-000000000000000000/build/obj-firefox/x86_64/gfx/src/../../../../gfx/src/nsDeviceContext.cpp
26 	XUL 	_ZThn112_N25nsSimplePageSequenceFrame9DoPageEndEv 	/builds/slave/m-cen-osx64-000000000000000000/build/obj-firefox/x86_64/layout/generic/../../../../layout/generic/nsSimplePageSequence.cpp
27 	XUL 	nsPrintEngine::PrintPage(nsPrintObject*, bool&) 	/builds/slave/m-cen-osx64-000000000000000000/build/obj-firefox/x86_64/layout/printing/../../../../layout/printing/nsPrintEngine.cpp
28 	XUL 	nsPagePrintTimer::Run() 	/builds/slave/m-cen-osx64-000000000000000000/build/obj-firefox/x86_64/layout/printing/../../../../layout/printing/nsPagePrintTimer.cpp
29 	XUL 	nsThread::ProcessNextEvent(bool, bool*) 	/builds/slave/m-in-osx64-0000000000000000000/build/obj-firefox/x86_64/xpcom/threads/../../../../xpcom/threads/nsThread.cpp
30 	XUL 	NS_ProcessPendingEvents(nsIThread*, unsigned int) 	xpcom/glue/nsThreadUtils.cpp
31 	XUL 	nsBaseAppShell::NativeEventCallback() 	widget/xpwidgets/nsBaseAppShell.cpp
32 	XUL 	nsAppShell::ProcessGeckoEvents(void*) 	/builds/slave/m-cen-osx64-000000000000000000/build/obj-firefox/x86_64/widget/cocoa/../../../../widget/cocoa/nsAppShell.mm
33 	CoreFoundation 	CoreFoundation@0x12b31 	
34 	CoreFoundation 	CoreFoundation@0x12455 	
35 	CoreFoundation 	CoreFoundation@0x357f5 	
36 	HIToolbox 	HIToolbox@0x5a7a0 	
37 	CarbonCore 	CarbonCore@0x1b806 	
38 	libobjc.A.dylib 	objc::DenseMap<objc_object*, unsigned long, true, objc::DenseMapInfo<objc_object*>, objc::DenseMapInfo<unsigned long> >::erase(objc_object* const&) 	
39 	CoreFoundation 	CoreFoundation@0x350e2 	
40 	HIToolbox 	HIToolbox@0x5feb4 	
41 	HIToolbox 	HIToolbox@0x5fb94 	
42 	HIToolbox 	HIToolbox@0x5fae3 	
43 	AppKit 	AppKit@0x155533 	
44 	CoreFoundation 	CoreFoundation@0x35279
I've no idea why there are so many more of these on OS X 10.8 than on other versions of OS X.
Assignee: nobody → smichaud
Total Count 	URL
18 	about:blank
6 	https://online.manchester.ac.uk/courses/1/I3016-MBSW-60367-1111-2SE-029445/groups/_22549_1//_995834_1/CEOSurvey2010%20IBM.pdf
4 	https://servicing.capitalone.com/C1/Accounts/StatementsPdfHandler.ashx?index=1&StmtIdx=13&PDFType=PDF&OpenType=view&IsVersion1Call=False&FreqType=Monthly&Part=
4 	http://www.toaelectronics.com/products/amplifier/manuals/P-906_12_24MK2.pdf
3 	http://www.spraguereg.com/wp-content/uploads/2011/12/Flyer-4101EM.pdf
3 	https://edoc.etrade.com/e/t/onlinedocs/doc?docType=tax&docId=0
3 	http://web.mail.comcast.net/service/home/~/?auth=co&loc=en_US&id=640186&part=2.6
3 	http://www.connectingthreads.com/cfPatterns/FreePatternDL.cfm?id=07%28U%2A%20L%3DL%408JWZZP30DD%3BTP%20%20%0A
2 	http://3angelsfellowship.com/wp-content/uploads/2013/06/2-You-Can-Gain-the-Victory.pdf
2 	http://www.tirelyna.com/docs/TireLyna_Michelin_TMC_RP(1).pdf
2 	http://ocw.upm.es/educacion-fisica-y-deportiva/kinantropometria/contenidos/temas/Tema-5.pdf
2 	http://www.buildingexamples.com/index.php/lego/buildings31/download-pdf/finish/13-buildings/374-building-castle-mini/0
2 	about:newtab
2 	http://www.diykyoto.com/store/assets/0000/0670/Wattson_Solar_Plus_web.pdf
2 	https://web.ebill-a.telecomitalia.it/ebilla_p187/ebilla_p187_frame.asp
2 	http://www.victoriaplumb.com/images/installation_guides/Minimalist-Dual-Valve_892.pdf
2 	http://westky.paragonrels.com/ParagonLS/Files/AssociatedDocs/WESTKY/5/WESTKY_55909.pdf

pdf seems to be common
Keywords: needURLs
Current Ranking (Mac only):
 * Release: #29 (0.05%)
 * Beta:    #42 (0.03%)
 * Aurora:  #5  (1.97%)
 * Nightly: #10 (1.00%)

I think the numbers on Aurora and Nightly are indicative of a topcrash on OSX 10.8. Since the numbers seem to spike in Aurora it might be a good idea to start investigating with Firefox 26.0a2 builds.

Furthermore, uriloader@pdf.js is fairly high in the add-on correlations:
18% (2/11) vs.   1% (14/1680) onepassword@agilebits.com
18% (2/11) vs.   2% (31/1680) uriloader@pdf.js
18% (2/11) vs.   9% (150/1680) {b9db16a4-6edc-47ec-a1f4-b86292ed211d} (Video DownloadHelper, https://addons.mozilla.org/addon/3006)
9% (1/11) vs.   0% (1/1680) {8ac62a8b-8b3f-43ba-9b1a-90c299b9dfda}
9% (1/11) vs.   0% (1/1680) intgcal@egarracingteam.com.ar
9% (1/11) vs.   0% (1/1680) 122c5ff6ff5c11e0948812313d1adcbe@jetpack
9% (1/11) vs.   0% (1/1680) trackerblock@privacychoice.org
9% (1/11) vs.   0% (1/1680) HTML5LocalStorageExplorer@foundstone.com
9% (1/11) vs.   0% (8/1680) es-es@dictionaries.addons.mozilla.org (Spanish (Spain) Dictionary, https://addons.mozilla.org/addon/3554)
9% (1/11) vs.   0% (8/1680) {0538E3E3-7E9B-4d49-8831-A227C80A7AD3} (Forecastfox, https://addons.mozilla.org/addon/398)
9% (1/11) vs.   1% (11/1680) anttoolbar@ant.com (Ant.com Toolbar - Tube Downloader, https://addons.mozilla.org/addon/8174)
9% (1/11) vs.   1% (11/1680) {d40f5e7b-d2cf-4856-b441-cc613eeffbe3} (BetterPrivacy, https://addons.mozilla.org/addon/6623)
9% (1/11) vs.   1% (12/1680) toolbar@shopathome.com
9% (1/11) vs.   1% (14/1680) en-US@dictionaries.addons.mozilla.org (United States English Dictionary, https://addons.mozilla.org/addon/3497)
9% (1/11) vs.   1% (17/1680) {3d7eb24f-2740-49df-8937-200b1cc08f8a} (Flashblock, https://addons.mozilla.org/addon/433)
9% (1/11) vs.   1% (21/1680) firefox@ghostery.com (Ghostery, https://addons.mozilla.org/addon/9609)
9% (1/11) vs.   2% (31/1680) testpilot@labs.mozilla.com (Mozilla Labs - Test Pilot, https://addons.mozilla.org/addon/13661)
9% (1/11) vs.   2% (36/1680) {635abd67-4fe9-1b23-4f01-e679fa7484c1} (Yahoo! Toolbar, https://addons.mozilla.org/addon/2032)
9% (1/11) vs.   2% (39/1680) {c0c9a2c7-2e5c-4447-bc53-97718bc91e1b} (Easy YouTube Video Downloader, https://addons.mozilla.org/addon/10137)
9% (1/11) vs.   3% (47/1680) {195A3098-0BD5-4e90-AE22-BA1C540AFD1E} (Garmin Communicator, https://addons.mozilla.org/addon/10278)
18% (2/11) vs.  13% (220/1680) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865)

Does this info give you any ideas of what we might test Steven?
Keywords: topcrash
Note that we expect to crash on the 26 and 27 branches because of bug 880140, because those branches don't have Yury's workaround.  So for the purposes of this bug, we need to test with builds that *do* have Yury's workaround.

I know the FF 24 release does have it, and just now I was able to trigger this bug's crashes a couple of times in FF 24 using http://susanparadis.files.wordpress.com/2013/10/batscatskeyboardcards.pdf, which I got from a comment on one of this bug's crash reports:

bp-adabfe0a-3fda-4449-ab58-55d492131010
bp-7b98fd78-f003-4c05-9e1c-f90bb2131010

But I can't yet reproduce them reliably.

As with bug 880140, the basic steps are to load the pdf file (in pdfjs), choose File : Print, then "Open PDF in Preview" or "Save as PDF".  But extra steps seem to be required for this bug's crashes.  I'll try to find out what they are.
> Note that we expect to crash on the 26 and 27 branches because of bug 880140

Presumably with a different signature.
topcrash is being replaced by more precise keywords per https://bugzilla.mozilla.org/show_bug.cgi?id=927557#c3
Keywords: topcrashtopcrash-mac
I've got pretty reliable STR for these crashes in FF 24 (i.e. in builds containing Yury's workaround), on both OS X 10.8.5 and 10.7.5:

1) Run FF and (in Preferences : Advanced : Network) press the Clear Now button for Cached Web Content.
2) Quit FF and restart it.
3) Visit http://susanparadis.files.wordpress.com/2013/10/batscatskeyboardcards.pdf, then very quickly but not too quickly (just after the blank first page appears in pdfjs's display) choose File : Print (or press Cmd-p).
4) As quickly as possible, choose PDF : Open PDF in Preview.  Then wait 15-20 seconds for a crash.

I'll see what I can learn from this.
Attachment #819930 - Attachment description: Local testcase → Copy of batscatskeyboards.pdf testcase
(Following up comment #9)

I can't get these STR to work with https://bugzilla.mozilla.org/attachment.cgi?id=819930 -- possibly because Mozilla's network connection is better than Wordpress's.

I find the optimal time to press Cmd-p (in step 3) is after the blank first page appears and when the document (judging by the progress bar) is three-quarters loaded.  (When using https://bugzilla.mozilla.org/attachment.cgi?id=819930, the document always finishes loading before the first blank page appears.)

If the STR stop working, you can generally get them working again by rebooting your computer.
(Following up comment #12)

Notice that this stack is basically identical to the ones reported at bug 880140.  I suspect this means that the underlying problem (with or without Yury's workaround) is the same in both bugs, and that it has nothing to do with pdfjs.  Since bug 880140 is easier to reproduce, from this point I'm going to work on the trunk (which doesn't have Yury's workaround).  But I'll do the work here, since bug 880140 is already too "busy".

If/when I come up with a fix/workaround on the trunk, I'll backport it to the 24 branch to test it out there.  (This will be just for testing -- not in preparation for another 24-branch release.)
Yury's minimal testcase from bug 880140, for use here on the trunk:
https://bugzilla.mozilla.org/attachment.cgi?id=791623
See Also: → 932077
Lately I've had some time to work on this, and I've finally made some progress -- I've found that this is our bug.

More specifically I've found that the following change to _cairo_quartz_create_cgimage() in gfx/cairo/cairo/src/cairo-quartz-surface.c stops the crashes from happening:

+    dataCopy = (unsigned char *) malloc(height * stride);
+    memcpy(dataCopy, data, height * stride);
     dataProvider = CGDataProviderCreateWithData (releaseInfo,
-                                                data,
+                                                dataCopy,
                                                 height * stride,
                                                 releaseCallback);

This (of course) *isn't* a proper fix -- it just leaks dataCopy.  But that it works shows us the problem is on our side.  I'll keep digging.
Attached patch Fix (obsolete) — Splinter Review
Here, at long last, is a fix for this bug, and also for bug 880140 and bug 932077.  This is a true fix -- not just a workaround.

Here's a tryserver build made with my patch.  There were no non-spurious tryserver test failures.

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com-cb3f0ce3a461/try-macosx64/firefox-28.0a1.en-US.mac.dmg

Turns out the crashes happen when OS printing code accesses a CFImage object that we created.  The object itself is still valid (its reference count is greater than 0), but its data is stored in a DrawTargetCG object that has long since been deleted.  Because of the way jemalloc behaves, the CFImage object's data pointer is still valid (it still points to an accessible memory block), but that block now contains something else entirely.  So the OS crashes because the CFImage's data is (in effect) corrupt.

I'll go into more detail in later comments.
Attachment #8335346 - Flags: review?(jmuizelaar)
The problematic CFImage object is created here:

https://hg.mozilla.org/mozilla-central/annotate/597287004ff5/gfx/cairo/cairo/src/cairo-quartz-surface.c#l262

It gets drawn to a CGContextRef here:

https://hg.mozilla.org/mozilla-central/annotate/597287004ff5/gfx/cairo/cairo/src/cairo-quartz-surface.c#l1895

Previously the DrawTargetCG object holding its data was created and initialized here:

https://hg.mozilla.org/mozilla-central/annotate/597287004ff5/gfx/2d/Factory.cpp#l215

Then this was used to create a SourceSurfaceCGBitmapContext object here, whose mCg object is the CGContextRef the CGImage above eventually gets drawn to (and which from that point holds a strong reference to that CGImage object).

https://hg.mozilla.org/mozilla-central/annotate/597287004ff5/content/canvas/src/CanvasRenderingContext2D.cpp#l1442

All of this ultimately happens from a call to CanvasRenderingContext2D::FillRect() here:

https://hg.mozilla.org/mozilla-central/annotate/597287004ff5/content/canvas/src/CanvasRenderingContext2D.cpp#l1515

Along the way, a DataSourceSurface object is created from the SourceSurfaceCGBitmapContext here (during a call to GetCairoSurfaceForSourceSurface()), and its lifetime is made to match that of the resulting cairo_surface_t object -- which (as it happens) is about as long as the SourceSurfaceCGBitmapContext object itself, and is as long as we need.

https://hg.mozilla.org/mozilla-central/annotate/597287004ff5/gfx/2d/DrawTargetCairo.cpp#l221

Problem is, without my patch the DataSourceSurface doesn't hold a strong reference to its SourceSurfaceCGBitmapContext's DrawTargetCG object (mDrawTarget).  And neither (for complicated reasons) does the SourceSurfaceCGBitmapContext object itself.  So its DrawTargetCG object (which contains the data for our CGImage object created above) can get destroyed before the SourceSurfaceCGBitmapContext it was used to create.  (Even without my patch, the DataSourceSurface object in effect holds a strong reference to its creator's mCg object, which in turn holds a strong reference to our CGImage object.)

The SourceSurfaceCGBitmapContext object lasts as long as we need it to -- it gets garbage collected after it's no longer needed.  So does the cairo_surface_t object returned by GetCairoSurfaceForSourceSurface().  The lifetime of the DataSourceSurface object created in GetCairoSurfaceForSourceSurface() is already tied to that of the cairo_surface_t object.  So it makes sense to tie the lifetime of the SourceSurfaceCGBitmapContext's DrawTargetCG object to that of the DataSourceSurface object -- which my patch does.
This bug would have been impossible to figure out without some reverse engineering.  I started from the crash stack from comment #12, and did most of the work using an interpose library.

The relevant calls above CGSImageStreamRead are made from that method to other methods in the same module (the CoreGraphics framework), so they can't be hooked using an interpose library.  But the call to CGSImageStreamRead itself can be, since it's called from libPDFRIP.dylib.

By digging around with a disassembler in those two modules, I reconstructed the following declaration for CGSImageStreamRead:

uint32_t CGSImageStreamRead(void *arg1, long arg2, long arg3);

I also found two other, similarly named methods in the CoreGraphics framework, which were getting called in concert with the previous method:

void *CGSImageStreamLock(CGImageRef image, uint32_t arg2, uint32_t arg3, uint32_t arg4, uint32 arg5,
                         uint32 arg6, long arg7, uint32 arg8);
void CGSImageStreamUnlock(void *arg1);

First you'd see a call to CGSImageStreamLock(), then a bunch of calls to CGSImageStreamRead() with the first parameter what'd been returned from CGSImageStreamLock(), then a call to CGSImageStreamUnlock() with that same parameter.  By looking at calls to CGSImageStreamLock() in my disassembler, I figured out that its first parameter is a CGImageRef.

So now I needed to find out where this CGImageRef was created.

There are a bunch of methods (documented and undocumented) that can create CGImageRef objects.  But it turns out that all of them indirectly call the following method, which is used to allocate and do preliminary initialization for all CFType objects:

void *_CFRuntimeCreateInstance(void *allocator, CFTypeID type, CFIndex size, int32_t arg4);

By hooking this method and logging only calls to create objects whose type == CGImageGetTypeID(), I was able to find where our CGImageRef had been created, and that we had created it.

I also had to a bunch more debugging, but this was the crucial breakthrough.  I'll post my interpose library and debugging patch next.
Notice that this can call methods in the interpose library, if that's been loaded into memory.  E.g. it can call ilPrintStackTrace() to log a stack trace.
One more thing of note:

https://hg.mozilla.org/mozilla-central/annotate/597287004ff5/gfx/cairo/cairo/src/cairo-quartz-surface.c#l262

Once our CGImageRef object has been created here, in my debugging patch I use CGImageSetProperty() (undocumented) to set an arbitrary property on it.  Then I'm able to hook CFRetain() and CFRelease() in my interpose library, but only log calls to those methods on objects with that property.
Comment on attachment 8335346 [details] [diff] [review]
Fix

Turns out Jeff is on vacation.
Attachment #8335346 - Flags: review?(jmuizelaar) → review?(bgirard)
Comment on attachment 8335346 [details] [diff] [review]
Fix

Review of attachment 8335346 [details] [diff] [review]:
-----------------------------------------------------------------

This is fine with me. Bas?
Attachment #8335346 - Flags: review?(bgirard)
Attachment #8335346 - Flags: review?(bas)
Attachment #8335346 - Flags: review+
Comment on attachment 8335346 [details] [diff] [review]
Fix

Review of attachment 8335346 [details] [diff] [review]:
-----------------------------------------------------------------

I feel like we should be able to grab a strong reference to the CGContextRef, and lose the reference to the DrawTarget. That way if the DrawTarget is destroyed we can lose the DTs reference and be fine. If the DrawTarget is changed DrawTargetWillChange will ensure a copy and we'll be safe as well, and if the DrawTarget stays around quietly there's no reference cycle as far as I can see. DrawTargetCG will hold a strong reference to the SourceSurfaceCGBitmapContext through mSnapshot (as long as MarkChanged hasn't been called on the DrawTarget). Both DrawTargetCG and SourceSurfaceCGBitmapContext will hold on to the CGContext, but from looking at the code, and with my limited knowledge of CoreGraphics, I'd say that shouldn't be a problem?
> I feel like we should be able to grab a strong reference to the CGContextRef

This won't work.  The (external) data for the problemmatic CGImage object is in a DrawTargetCG object (in its mData variable) that gets deleted prematurely.  There's no CGContextRef (or CGImageRef, or any other CFType object) that we can addref to stop this from happening.
(Following up comment #26)

The CGImage object itself does not (and cannot) hold a reference to its external data (or, of course, to the DrawTargetCG object that contains it).

(Technically the pointer to the external data is in a CFDataProvider object to which the CGImage object does hold a reference.  But since the data is external, this has no effect on the data itself.)
Attached patch Fix rev1 (obsolete) — Splinter Review
Bas and I thrashed out my original patch on IRC yesterday -- he now understands what I was trying to do.  But he also pointed out that my original patch allows changes to be made to the image data of a "snapshot", which should be immutable.  He suggested an alternative which doesn't do this (which creates a local copy of the image data in the SourceSurfaceCGBitmapContext object as soon as it returns a DataSourceSurface object from GetDataSurface()).

This patch expands on what he suggested.  Notably, it stores the local image data copy in an aligned array (AlignedArray<uint8_t> mDataHolder), which hopefully results in more efficient drawing.  (DrawTargetCG objects already have an aligned array for image data.)

Bas, don't feel obliged to work on this over the weekend :-)

I've started a tryserver run, whose results should eventually be available here:
https://tbpl.mozilla.org/?tree=Try&rev=28ffbde40010
Attachment #8335346 - Attachment is obsolete: true
Attachment #8335346 - Flags: review?(bas)
Attachment #8337121 - Flags: review?(bas)
Comment on attachment 8337121 [details] [diff] [review]
Fix rev1

Review of attachment 8337121 [details] [diff] [review]:
-----------------------------------------------------------------

This looks like an improvement to me and it will fix the bug. I'd like additional review from someone more Mac savvy though :).

::: gfx/2d/SourceSurfaceCG.cpp
@@ +281,2 @@
>      if (!mData) abort();
> +    mImage = CreateCGImage(nullptr, mData, mSize, mStride, mFormat);

I think this is a fine change, I must admit I don't know too much about CoreGraphics though.
Attachment #8337121 - Flags: review?(bgirard)
Attachment #8337121 - Flags: review?(bas)
Attachment #8337121 - Flags: review+
(Following up comment #28)

> But he also pointed out that my original patch allows changes to be
> made to the image data of a "snapshot", which should be immutable.

Now I suspect the problem with my first patch is that it allowed
changes to the "original" of which the SourceSurfaceCGBitmapContext
object is a snapshot.  But the code is complex, and I'm still not sure
I've understood it correctly.

At the very least, though, my first patch allows changes to the
DrawTargetCG object with which the SourceSurfaceCGBitmapContext object
was created (changes to its image data), while my rev1 patch doesn't.
Comment on attachment 8337121 [details] [diff] [review]
Fix rev1

Review of attachment 8337121 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/SourceSurfaceCG.cpp
@@ +297,5 @@
>      size_t stride = CGBitmapContextGetBytesPerRow(mCg);
>      size_t height = CGBitmapContextGetHeight(mCg);
>  
> +    mDataHolder.Realloc(stride * height);
> +    mData = mDataHolder;

I think there might of been a memory leak before this patch. where you replaced mData but never free-d it.

What about SourceSurfaceCGBitmapContext::SourceSurfaceCGBitmapContext that allocates mData? You still have to free that:
https://developer.apple.com/library/mac/qa/qa1509/_index.html

::: gfx/2d/SourceSurfaceCG.h
@@ +93,5 @@
>  
>    virtual SurfaceType GetType() const { return SURFACE_COREGRAPHICS_CGCONTEXT; }
>    virtual IntSize GetSize() const;
>    virtual SurfaceFormat GetFormat() const { return mFormat; }
> +  virtual TemporaryRef<DataSourceSurface> GetDataSurface() { DrawTargetWillChange(); return this; }

It's not clear from the code why getting a data surface here requires calling DrawTargetWillChange. It should be clear to the reader of the code. I think this needs a comment.

Is returning 'this' with TemporaryRef<DataSourceSurface> safe BTW? Not sure if it will give the right refcount.
Attachment #8337121 - Flags: review?(bgirard) → review-
> I think there might of been a memory leak before this patch. where
> you replaced mData but never free-d it.

No, there wasn't.  The code removed by my patch is what guaranteed this.

> What about
> SourceSurfaceCGBitmapContext::SourceSurfaceCGBitmapContext that
> allocates mData? You still have to free that

The SourceSurfaceCGBitmapContext object itself gets freed (indirectly)
by the cycle collector (when it "collects" a CanvasRenderingContext2D
object).  But it seems you're asking about some other kind of object.
Please clarify.

> It's not clear from the code why getting a data surface here
> requires calling DrawTargetWillChange. It should be clear to the
> reader of the code. I think this needs a comment.

OK.

> Is returning 'this' with TemporaryRef<DataSourceSurface> safe BTW?
> Not sure if it will give the right refcount.

I'm not entirely sure why, but this code *does* work.  But the
following code gives exactly the same results, and is "easier to
read":

RefPtr<DataSourceSurface> temp = this; return temp.forget();

Maybe I should use this in place of:

return this;
>> What about
>> SourceSurfaceCGBitmapContext::SourceSurfaceCGBitmapContext that >
>> allocates mData? You still have to free that
>
> The SourceSurfaceCGBitmapContext object itself gets freed
> (indirectly) by the cycle collector (when it "collects" a
> CanvasRenderingContext2D object).  But it seems you're asking about
> some other kind of object.  Please clarify.

You mean this line?

mData = CGBitmapContextGetData(mCg);

Then no, you're wrong -- we *don't* want to free this.
CGBitmapContextGetData() returns a (weak) void * to internal data.
And SourceSurfaceCGBitmapContext.mCg is a weak pointer, so we don't want to free that, either.
(In reply to Steven Michaud from comment #33)
> >> What about
> >> SourceSurfaceCGBitmapContext::SourceSurfaceCGBitmapContext that >
> >> allocates mData? You still have to free that
> >
> > The SourceSurfaceCGBitmapContext object itself gets freed
> > (indirectly) by the cycle collector (when it "collects" a
> > CanvasRenderingContext2D object).  But it seems you're asking about
> > some other kind of object.  Please clarify.
> 
> You mean this line?

yes
> 
> mData = CGBitmapContextGetData(mCg);
> 
> Then no, you're wrong -- we *don't* want to free this.
> CGBitmapContextGetData() returns a (weak) void * to internal data.

Ahh I see. It wasn't clear at all from the apple example. I believe you now.
> RefPtr<DataSourceSurface> temp = this; return temp.forget();
> 
> Maybe I should use this in place of:
> 
> return this;

Benoit, do you really want me to do this? :-)

The same idiom is used elsewhere, notably here:
https://hg.mozilla.org/releases/mozilla-aurora/annotate/1e7e095cfcf4/gfx/2d/2D.h#l363

But it seems unnecessary.
Comment on attachment 8337121 [details] [diff] [review]
Fix rev1

Review of attachment 8337121 [details] [diff] [review]:
-----------------------------------------------------------------

This looks like an improvement to me and it will fix the bug. I'd like additional review from someone more Mac savvy though :).

::: gfx/2d/SourceSurfaceCG.cpp
@@ +281,2 @@
>      if (!mData) abort();
> +    mImage = CreateCGImage(nullptr, mData, mSize, mStride, mFormat);

I think this is a fine change, I must admit I don't know too much about CoreGraphics though.

@@ +297,5 @@
>      size_t stride = CGBitmapContextGetBytesPerRow(mCg);
>      size_t height = CGBitmapContextGetHeight(mCg);
>  
> +    mDataHolder.Realloc(stride * height);
> +    mData = mDataHolder;

There's never a case where it's replaced. This code nulls out mDrawTarget and it would never be true again.

In the case of the constructor CG will manage the lifetime of the memory pointed to by mData. See how mImage is explicitly nullptr in that constructor, meaning the free would also never be called in that case.

::: gfx/2d/SourceSurfaceCG.h
@@ +93,5 @@
>  
>    virtual SurfaceType GetType() const { return SURFACE_COREGRAPHICS_CGCONTEXT; }
>    virtual IntSize GetSize() const;
>    virtual SurfaceFormat GetFormat() const { return mFormat; }
> +  virtual TemporaryRef<DataSourceSurface> GetDataSurface() { DrawTargetWillChange(); return this; }

The reason we're doing this is because otherwise, if someone does a GetData(), starts working with that, and then someone touches the DrawTarget, the data at the pointer they're looking at would change. This violates the immutability of a SourceSurface. I agree this isn't the nicest solution but in reality we should have enough memory bandwidth to tolerate this in the case of someone using an API which is already not guaranteed to be performant.

You're correct that a comment should be added.

Yes, this is safe. TemporaryRef simply does nothing but when it assigns to a RefPtr will AddRef.
Attachment #8337121 - Flags: review- → review?(bgirard)
Attachment #8337121 - Flags: review?(bgirard) → review+
(Following up comment #17)

I discovered I made a small mistake -- the following is wrong.  Since
I doubt anyone actually read that comment, I'm making this correction
for my own future reference :-)

> Then this was used to create a SourceSurfaceCGBitmapContext object
> here, whose mCg object is the CGContextRef the CGImage above
> eventually gets drawn to (and which from that point holds a strong
> reference to that CGImage object).

This SourceSurfaceCGBitmapContext.mCg object is *not* the one the
CGImage object gets drawn to.  That CGContextRef is a
gfxQuartzSurface.mCGContext object, created by a call to
gfxQuartzSurface.CreateSimilarSurface() on another gfxQuartzSurface
object returned by nsDeviceContextSpecX::GetSurfaceForPrinter().

This CGContextRef object (the one that the CGImage object gets drawn
to) ends up holding a reference to the CGImage object (actually the
CGContextRef's CGContextDelegate object holds the reference).  The
CGImage object gets deleted when the CGContextRef it was drawn to gets
deleted (the CGContextRef holds the last reference).  This happens
when the gfxQuartzSurface object is deleted (the one returned by
CreateSimilarSurface()).  The gfxQuartzSurface object is deleted when
the CanvasRenderingContext2D object is garbage collected on which the
call to FillRect() was made.  Our SourceSurfaceCGBitmapContext object
gets deleted at the same time (via a call to ReleaseData() in
DrawTargetCairo.cpp).
Carrying over r+es.
Attachment #8337121 - Attachment is obsolete: true
Attachment #8338656 - Flags: review+
Too late now to take this to beta and get feedback & bake time to ensure this doesn't cause other regressions and does in fact fix the crash so we'll keep tracking for FF27 and beyond but wontfixing on 26.
https://hg.mozilla.org/mozilla-central/rev/8a5c832a8bbf
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
(In reply to Steven Michaud from comment #40)
> Comment on attachment 8338656 [details] [diff] [review]
> Add comment (what I will land)
> 
> Landed on mozilla-inbound:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/8a5c832a8bbf

Steven, If this is ready we should uplift on aurora asap. Can you please help with that ?
Flags: needinfo?(smichaud)
Comment on attachment 8338656 [details] [diff] [review]
Add comment (what I will land)

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Old bug in Mozilla graphics code
User impact if declined: Topcrasher will remain unfixed on 27 branch
Testing completed (on m-c, etc.): A week of baking on m-c, no crashes on 28 branch since then
Risk to taking this patch (and alternatives if risky): low to moderate
String or IDL/UUID changes made by this patch: none

The fix is simple and straightforward, which would normally mean its risk is low.  But the code it changes is *very* complex, so I think it's fairer to say "low to moderate".
Attachment #8338656 - Flags: approval-mozilla-aurora?
Flags: needinfo?(smichaud)
(In reply to Steven Michaud from comment #44)
> Comment on attachment 8338656 [details] [diff] [review]
> Add comment (what I will land)
> 
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #): Old bug in Mozilla graphics code
> User impact if declined: Topcrasher will remain unfixed on 27 branch
> Testing completed (on m-c, etc.): A week of baking on m-c, no crashes on 28
> branch since then
> Risk to taking this patch (and alternatives if risky): low to moderate
> String or IDL/UUID changes made by this patch: none
> 
> The fix is simple and straightforward, which would normally mean its risk is
> low.  But the code it changes is *very* complex, so I think it's fairer to
> say "low to moderate".

Lets take this for now to avoid the top-crasher and see if it has any fallouts in which case we can back this out during beta once we get feedback. Also if there is any areas QA can help test to mitigate the risk, please add qawanted and help with pointers on test areas to get targeted testing.
Attachment #8338656 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Flagging for verification by checking crashstats.
Keywords: verifyme
Depends on: 972877
You need to log in before you can comment on or make changes to this bug.