Last Comment Bug 721663 - Crash in unpackImageRow @ CGAccessSessionGetBytes on Mac OS X 10.5 while printing or previewing
: Crash in unpackImageRow @ CGAccessSessionGetBytes on Mac OS X 10.5 while prin...
Status: VERIFIED FIXED
[qa!]
: crash, regression, reproducible, topcrash
Product: Core
Classification: Components
Component: General (show other bugs)
: 11 Branch
: x86 Mac OS X
: -- critical (vote)
: mozilla13
Assigned To: Steven Michaud [:smichaud] (Retired)
:
Mentors:
Depends on: 737509
Blocks: 533577 685767
  Show dependency treegraph
 
Reported: 2012-01-27 01:28 PST by Scoobidiver (away)
Modified: 2012-03-30 15:56 PDT (History)
9 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
verified
-
verified


Attachments
environment.plist that sets NO_MAC_JEMALLOC (236 bytes, text/plain)
2012-02-14 09:26 PST, Steven Michaud [:smichaud] (Retired)
no flags Details
Provisional fix (3.45 KB, patch)
2012-02-16 11:01 PST, Steven Michaud [:smichaud] (Retired)
joe: review+
Details | Diff | Splinter Review
Patch with copy in gfx/cairo (7.17 KB, patch)
2012-02-17 09:11 PST, Steven Michaud [:smichaud] (Retired)
smichaud: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Scoobidiver (away) 2012-01-27 01:28:00 PST
It's a new crash signature that first appeared in 12.0a1/20111223 and 11.0a2/20111230.
It's #4 top crasher in 11.0a2 on Mac OS X.

Every comments talk about printing or previewing

Signature 	CGAccessSessionGetBytes More Reports Search
UUID	63a79621-4dc0-48e1-9fb7-ed7762120125
Date Processed	2012-01-25 09:45:18
Uptime	13
Last Crash	38 seconds before submission
Install Age	13 seconds since version was first installed.
Install Time	2012-01-25 09:44:54
Product	Firefox
Version	11.0a2
Build ID	20120124042008
Release Channel	aurora
OS	Mac OS X
OS Version	10.5.8 9L31a
Build Architecture	x86
Build Architecture Info	family 6 model 23 stepping 6
Crash Reason	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash Address	0x1c9a4000
App Notes 	
AdapterVendorID: 0x10de, AdapterDeviceID: 0x 863	
EMCheckCompatibility	True

Frame 	Module 	Signature 	Source
0 		@0xffff08a0 	
1 	CoreGraphics 	CGAccessSessionGetBytes 	
2 	libPDFRIP.A.dylib 	unpackImageRow 	
3 	libPDFRIP.A.dylib 	PDFImageEmitData 	
4 	libPDFRIP.A.dylib 	imageRefEmitDefinition 	
5 	libPDFRIP.A.dylib 	PDFImageEmitDefinition 	
6 	libPDFRIP.A.dylib 	emitImageDefinition 	
7 	CoreFoundation 	CFSetApplyFunction 	

More reports at:
https://crash-stats.mozilla.com/report/list?signature=CGAccessSessionGetBytes
Comment 1 Scoobidiver (away) 2012-02-10 11:40:26 PST
It's #1 top crasher on Mac OS X in 11.0b1.

There are a few crashes on Mac OS X 10.6.
Comment 2 Alex Keybl [:akeybl] 2012-02-10 12:16:26 PST
Adding the qawanted and regressionwindow-wanted keywords to do some exploratory testing around Printing/Previewing printing on OS X. See the comments at https://crash-stats.mozilla.com/report/list?query_search=signature&query_type=contains&reason_type=contains&range_value=3&range_unit=weeks&hang_type=any&process_type=any&signature=CGAccessSessionGetBytes for more leads.

Also tracking for FF11.
Comment 3 Marcia Knous [:marcia - use ni] 2012-02-10 17:34:35 PST
Here are a set of STR on 10.5 using most recent 11 beta: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:11.0) Gecko/20100101 Firefox/11.0

STR:
1. Load http://www.psdbox.com/tutorials/new-manga-effect-2011-photoshop-tutorial/
2. Go to file menu - File->Print->Save as PDF.
3. Save the file.
4. Immediately after that click on the new tab button next to the tab open in Step 1.
5. Crash.

https://crash-stats.mozilla.com/report/index/bp-0ba65bfe-6c8e-4426-b6e1-40cdb2120211
Comment 4 Scoobidiver (away) 2012-02-11 05:17:36 PST
Here are correlations:
  CGAccessSessionGetBytes|EXC_BAD_ACCESS / KERN_INVALID_ADDRESS (57 crashes)
    100% (57/57) vs.  17% (86/498) libPDFRIP.A.dylib
    100% (57/57) vs.  19% (95/498) PrintingCocoaPDEs
Comment 5 Steven Michaud [:smichaud] (Retired) 2012-02-13 08:32:58 PST
Testing with today's mozilla-central nightly and Marcia's STR from comment #3 (on OS X 10.5.8), I get a completely different error:

The nightly crashes, and then so does crashreporter.  Then I get the following Apple crash report:

Process:         crashreporter [1877]
Path:            /Users/smichaud/Desktop/FirefoxNightly 2012-02-13.app/Contents/MacOS/crashreporter.app/Contents/MacOS/crashreporter
Identifier:      crashreporter
Version:         ??? (???)
Code Type:       X86 (Native)
Parent Process:  firefox [1863]

Interval Since Last Report:          84 sec
Crashes Since Last Report:           1
Per-App Interval Since Last Report:  0 sec
Per-App Crashes Since Last Report:   1

Date/Time:       2012-02-13 10:26:40.583 -0600
OS Version:      Mac OS X 10.5.8 (9L30)
Report Version:  6
Anonymous UUID:  C9512736-EC85-4CD1-B209-AB5833DEE8E2

Exception Type:  EXC_BREAKPOINT (SIGTRAP)
Exception Codes: 0x0000000000000002, 0x0000000000000000
Crashed Thread:  0

Dyld Error Message:
  Library not loaded: /usr/lib/libcrypto.0.9.8.dylib
  Referenced from: /Users/smichaud/Desktop/FirefoxNightly 2012-02-13.app/Contents/MacOS/crashreporter.app/Contents/MacOS/crashreporter
  Reason: image not found

Ted, any idea what's going on here?

This is very likely a different bug, unrelated to the one Marcia reported.
Comment 6 Steven Michaud [:smichaud] (Retired) 2012-02-13 08:36:01 PST
(Following up comment #5)

The OS X 10.5.8 version of libcrypto.dylib is 0.9.7.  So it looks like Breakpad is broken on OS X 10.5.8 in current nightlies.  I'll open a new bug.
Comment 7 Ted Mielczarek [:ted.mielczarek] 2012-02-13 08:38:29 PST
Marcia filed bug 721160 on that already.
Comment 8 Steven Michaud [:smichaud] (Retired) 2012-02-13 13:40:48 PST
I can reproduce this in the 2012-01-19 trunk nightly:
bp-253266fa-f4ba-4b16-a228-d10f92120213

I'll look for a regression range.

It's pointless testing with trunk nightlies dated from 2012-01-20 through 2012-02-13, because they all have bug 721160.  But bug 721160 should be fixed in tomorrow's mozilla-central nightly.
Comment 9 Steven Michaud [:smichaud] (Retired) 2012-02-13 14:27:21 PST
Here's the regression range I found:

firefox-2012-11-21-09-23-46-mozilla-central
firefox-2012-11-22-03-09-49-mozilla-central

I can't tell which patch in this range might have triggered these crashes.

Interestingly, though, jemalloc was backed out for OS X 10.5 in this range (bug 702250).  If this does turn out to be what triggered these crashes, jemalloc most likely masked the real bug.
Comment 10 Steven Michaud [:smichaud] (Retired) 2012-02-13 14:28:40 PST
Marcia, please see if you can reproduce my regression range.
Comment 11 Marcia Knous [:marcia - use ni] 2012-02-13 16:33:08 PST
Steven: Happy to oblige, but I am having trouble figuring out what builds the ones in Comment 9 correlate to - I cannot find builds with that exact ID in the directory.

(In reply to Steven Michaud from comment #10)
> Marcia, please see if you can reproduce my regression range.
Comment 12 Steven Michaud [:smichaud] (Retired) 2012-02-13 16:52:56 PST
> Interestingly, though, jemalloc was backed out for OS X 10.5 in this
> range (bug 702250).  If this does turn out to be what triggered
> these crashes, jemalloc most likely masked the real bug.

Yes, turning jemalloc back on (on OS X 10.5) does seem to "fix" these
crashes.

Which isn't good news, because doing that isn't feasible, and because
jemalloc's masking of this bug's real cause makes it much harder to
find.

By the way, all the gdb stack traces I've been able to get of these
crashes are corrupt, and basically useless.  (This is of course	with a
non-symbol-stripped build.)

This is	likely to turn out to be a memory corruption bug.
Comment 13 Steven Michaud [:smichaud] (Retired) 2012-02-13 16:55:35 PST
(In reply to comment #11)

Here are the two builds I mentioned in comment #9.  The first one doesn't crash (for me).  The second one does.

ftp://ftp.mozilla.org/pub/firefox/nightly/2011/11/2011-11-21-09-23-46-mozilla-central/firefox-11.0a1.en-US.mac.dmg
ftp://ftp.mozilla.org/pub/firefox/nightly/2011/11/2011-11-22-03-09-49-mozilla-central/firefox-11.0a1.en-US.mac.dmg
Comment 14 Steven Michaud [:smichaud] (Retired) 2012-02-13 17:20:39 PST
> Which isn't good news, because doing that isn't feasible, and because
> jemalloc's masking of this bug's real cause makes it much harder to
> find.

There recently was another bug that jemalloc masked -- bug 700835.  And (something I'd forgotten) you can set the NO_MAC_JEMALLOC to turn off jemalloc on the Mac.  So it won't be so hard, after all, to find this bug's true regression range.

I'll be working on that tomorrow.
Comment 15 Steven Michaud [:smichaud] (Retired) 2012-02-13 17:23:21 PST
The NO_MAC_JEMALLOC environment variable.
Comment 16 Steven Michaud [:smichaud] (Retired) 2012-02-13 17:36:08 PST
For my own future reference:

In order to set the NO_MAC_JEMALLOC environment variable when you double-click an app, add the following to its Info.plist:

  <key>LSEnvironment</key>
  <dict>
  <key>NO_MAC_JEMALLOC</key>
  <string>1</string>
  </dict>
Comment 17 Steven Michaud [:smichaud] (Retired) 2012-02-14 09:26:44 PST
Created attachment 597052 [details]
environment.plist that sets NO_MAC_JEMALLOC

(Following up comment #16)

Though Apple documents this capability, it'd widely (and correctly) reported not to work at all.  Fortunately there's another way:

Create a ~/.MacOSX/ directory, and (if you don't have one already) add an environment.plist file to it.  Or edit it if you do have one.  Then log out and back in again.
Comment 18 Steven Michaud [:smichaud] (Retired) 2012-02-14 09:32:28 PST
(Following up comment #17)

Unfortunately, though, mozilla-central nightlies crash on startup on OS X 10.5.8 with NO_MAC_JEMALLOC set from when jemalloc was enabled on 10.5 (http://hg.mozilla.org/mozilla-central/rev/c4e4af6b7ae4, bug 694335) until it was disabled again (bug 702250).

So (apparently) NO_MAC_JEMALLOC will be of no help finding this bug's true regression range.
Comment 19 Marcia Knous [:marcia - use ni] 2012-02-14 13:17:17 PST
I see the same thing on my lab machine - the second build crashes with my STR but the first one does not. Thanks for hunting down the regression range.

(In reply to Steven Michaud from comment #13)
> (In reply to comment #11)
> 
> Here are the two builds I mentioned in comment #9.  The first one doesn't
> crash (for me).  The second one does.
> 
> ftp://ftp.mozilla.org/pub/firefox/nightly/2011/11/2011-11-21-09-23-46-
> mozilla-central/firefox-11.0a1.en-US.mac.dmg
> ftp://ftp.mozilla.org/pub/firefox/nightly/2011/11/2011-11-22-03-09-49-
> mozilla-central/firefox-11.0a1.en-US.mac.dmg
Comment 20 Steven Michaud [:smichaud] (Retired) 2012-02-14 15:11:22 PST
I'm still trying to get the "true" regression range (unmasked by jemalloc).  Since I can't use nightlies, it's quite slow.  But I've now managed to narrow the range to somewhere between these two landings (inclusive):

http://hg.mozilla.org/mozilla-central/rev/9ae1d4f44b8b
dougt@mozilla.com
Mon Nov 14 20:38:46 2011 -0800
Doug Turner — Bug 690201 - dead code - mLastDrawEvent never used. r=mbrubeck

http://hg.mozilla.org/mozilla-central/rev/24c8d04f6174
bmo@edmorley.co.uk
Mon Nov 14 04:35:37 2011 -0800
Ed Morley — Merge mozilla-central and mozilla-inbound

I'll keep working at it until I've identified the single patch that triggered this bug's crashes.
Comment 21 Steven Michaud [:smichaud] (Retired) 2012-02-14 17:32:06 PST
Here's the patch that triggered these crashes:

Bug 685767 - Factor blurring out into its own class, and use it from gfxAlphaBoxBlur. r=mattwoodrow
author	Joe Drew <joe@drew.ca>
	Mon Nov 14 17:29:28 2011 +1300 (at Mon Nov 14 17:29:28 2011 +1300)
http://hg.mozilla.org/mozilla-central/rev/6ae6d3beeaf4

Tomorrow I'll try to figure out why.
Comment 22 Steven Michaud [:smichaud] (Retired) 2012-02-14 17:35:49 PST
I had to disable jemalloc while testing.  Here's how I did it:

diff --git a/configure.in b/configure.in
--- a/configure.in
+++ b/configure.in
@@ -2343,21 +2343,21 @@ case "$target" in
            MKSHLIB_UNFORCE_ALL=''
        ;;
     esac
     ;;
 
 *-darwin*)
     MKSHLIB='$(CXX) $(CXXFLAGS) $(DSO_PIC_CFLAGS) $(DSO_LDOPTS) -o $@'
     MKCSHLIB='$(CC) $(CFLAGS) $(DSO_PIC_CFLAGS) $(DSO_LDOPTS) -o $@'
     MOZ_OPTIMIZE_FLAGS="-O3"
     _PEDANTIC=
-    MOZ_MEMORY=1
+    #MOZ_MEMORY=1
     CFLAGS="$CFLAGS -fno-common"
     CXXFLAGS="$CXXFLAGS -fno-common"
     DLL_SUFFIX=".dylib"
     DSO_LDOPTS=''
     STRIP="$STRIP -x -S"
     # Check whether we're targeting OS X or iOS
     AC_CACHE_CHECK(for iOS target,
                    ac_cv_ios_target,
                    [AC_TRY_COMPILE([#include <TargetConditionals.h>
 #if !(TARGET_OS_IPHONE || TARGET_IPHONE_SIMULATOR)
Comment 23 Joe Drew (not getting mail) 2012-02-14 19:26:56 PST
oh, wow!

Steven, please don't hesitate to call on me for help here. Ping me on irc, whatever. :)
Comment 24 Ted Mielczarek [:ted.mielczarek] 2012-02-15 04:27:40 PST
If this is memory corruption, presumably you want to run this through valgrind, right?
Comment 25 Steven Michaud [:smichaud] (Retired) 2012-02-16 10:16:12 PST
Marcia, could you test your STR from comment #3 on OS X 10.6 and 10.7 in 32-bit mode?  I'm not able to reproduce the crashes myself under those conditions.  But I don't have printer drivers installed for either of those OS versions, and that may be what makes the difference.

jemalloc is disabled in 32-bit mode, on all versions of OS X.
Comment 26 Steven Michaud [:smichaud] (Retired) 2012-02-16 10:57:16 PST
These crashes happen due to memory corruption that happens at either of the following two lines:

http://hg.mozilla.org/mozilla-central/annotate/6989376471f7/gfx/thebes/gfxBlur.cpp#l119
http://hg.mozilla.org/mozilla-central/annotate/6989376471f7/gfx/thebes/gfxBlur.cpp#l122

The crashes don't happen immediately -- only after a second or two.  But they don't happen if I comment out either of those lines.

More specifically, the memory corruption happens here:

http://hg.mozilla.org/mozilla-central/annotate/6989376471f7/gfx/cairo/cairo/src/cairo-quartz-surface.c#l3039

But it doesn't happen if, instead, we use the call to _cairo_quartz_surface_mask_with_generic() on the following line:

http://hg.mozilla.org/mozilla-central/annotate/6989376471f7/gfx/cairo/cairo/src/cairo-quartz-surface.c#l3041

Calling _cairo_quartz_surface_mask_with_generic() is (as best I can tell) a little less efficient, but functionally equivalent.

As far as I know (from the tests I mentioned in comment #25) these crashes only happen on OS X 10.5.  But I need help from others to confirm this.

I've ruled out that the crashes have anything to do with accessing deleted objects.  I did this by adding printf statements to the constructors and destructors of all the objects in play (including cairo objects).

I still don't know exactly what sort of memory corruption happens, or why.  But (as I said above) I'm reasonably confident it's confined to OS X 10.5.  So it may be an OS bug.  Or it may conceivably be a bug in how quartz cairo operates on OS X 10.5.  I don't believe we used the "mask" capability before Joe's patch for bug 685767 landed.  This would explain why we didn't have problems earlier.
Comment 27 Steven Michaud [:smichaud] (Retired) 2012-02-16 11:01:18 PST
Created attachment 597900 [details] [diff] [review]
Provisional fix

As I explained above, this patch doesn't really get to the heart of the problem.

But if I'm right that the memory corruption only happens on OS X 10.5, which we aren't going to support for much longer, I'm pretty sure the patch is "good enough".

I've started tryserver builds, which should be available in a few hours.
Comment 28 Steven Michaud [:smichaud] (Retired) 2012-02-17 07:25:58 PST
Here's a tryserver build made with my patch from comment #27:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com-13ccb76757b7/try-macosx64/firefox-13.0a1.en-US.mac.dmg

There were no non-spurious test failures.
Comment 29 Steven Michaud [:smichaud] (Retired) 2012-02-17 07:28:59 PST
By the way, I've been trying to use libgmalloc and valgrind on my OS X 10.5.8 machine, so far with no luck.  I suspect the machine has too little RAM -- only 4GB.
Comment 30 Joe Drew (not getting mail) 2012-02-17 08:03:36 PST
Comment on attachment 597900 [details] [diff] [review]
Provisional fix

Review of attachment 597900 [details] [diff] [review]:
-----------------------------------------------------------------

Please also add this as a Cairo patch to gfx/cairo.
Comment 31 Steven Michaud [:smichaud] (Retired) 2012-02-17 09:07:38 PST
Comment on attachment 597900 [details] [diff] [review]
Provisional fix

Landed on mozilla-inbound:
http://hg.mozilla.org/integration/mozilla-inbound/rev/9466529cdbc0

> Please also add this as a Cairo patch to gfx/cairo.

Done.
Comment 32 Steven Michaud [:smichaud] (Retired) 2012-02-17 09:11:16 PST
Created attachment 598258 [details] [diff] [review]
Patch with copy in gfx/cairo

What I landed (carrying forward Joe's r+).

I'll wait a few days for others to test, and then (presuming there aren't any problems) seek aurora and beta branch approval.

Marcia, you're probably in the best position to test :-)
Comment 33 Marcia Knous [:marcia - use ni] 2012-02-17 11:41:17 PST
Testing using Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20100101 Firefox/11.0 (b3), I am not able to reproduce the bug. I will try 10.6 next. Confirming that I was testing in 32 bit mode.

(In reply to Steven Michaud from comment #25)
> Marcia, could you test your STR from comment #3 on OS X 10.6 and 10.7 in
> 32-bit mode?  I'm not able to reproduce the crashes myself under those
> conditions.  But I don't have printer drivers installed for either of those
> OS versions, and that may be what makes the difference.
> 
> jemalloc is disabled in 32-bit mode, on all versions of OS X.
Comment 34 Ed Morley [:emorley] 2012-02-17 17:56:38 PST
https://hg.mozilla.org/mozilla-central/rev/9466529cdbc0
Comment 35 Alex Keybl [:akeybl] 2012-02-21 09:21:34 PST
Is this fix considered low-risk enough to nominate for Aurora/Beta approval? Or does it make more sense to back out bug 685767? Thanks!
Comment 36 Steven Michaud [:smichaud] (Retired) 2012-02-21 09:37:57 PST
> Is this fix considered low-risk enough to nominate for Aurora/Beta approval?

I think it is.  It only changes behavior on OS X 10.5.  And even there it only causes a slight increase in RAM usage and (probably) no change in performance.

It'd be good to hear from others who know more about Cairo than I do, though.
Comment 37 Steven Michaud [:smichaud] (Retired) 2012-02-22 10:05:29 PST
Marcia, could you check with today's mozilla-central nightly to confirm that your STR no longer works with it?
Comment 38 Alex Keybl [:akeybl] 2012-02-23 13:58:16 PST
(In reply to Steven Michaud from comment #36)
> It'd be good to hear from others who know more about Cairo than I do, though.

I've sent email to Joe and Jeff to get their feedback.
Comment 39 Joe Drew (not getting mail) 2012-02-27 10:48:26 PST
This is relatively safe. I'd be OK with shipping it in a beta and aurora. We already have to use the fallback path in many cases, so using it for more cases should continue to be safe.
Comment 40 Lukas Blakk [:lsblakk] use ?needinfo 2012-02-27 15:18:30 PST
Comment on attachment 598258 [details] [diff] [review]
Patch with copy in gfx/cairo

[Triage Comment]
Comfortable taking this low-risk fix, please land today 2/27/12 in preparation for go-to-build on 2/28/12
Comment 41 Steven Michaud [:smichaud] (Retired) 2012-02-27 16:02:42 PST
Comment on attachment 598258 [details] [diff] [review]
Patch with copy in gfx/cairo

Landed on mozilla-aurora:
http://hg.mozilla.org/releases/mozilla-aurora/rev/154b852a9952
Comment 42 Steven Michaud [:smichaud] (Retired) 2012-02-27 16:08:20 PST
Comment on attachment 598258 [details] [diff] [review]
Patch with copy in gfx/cairo

Landed on mozilla-beta:
http://hg.mozilla.org/releases/mozilla-beta/rev/1dc6cd15f683
Comment 43 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-03-05 15:59:54 PST
See comment 3 for steps when verifying this fix.
Comment 44 Marcia Knous [:marcia - use ni] 2012-03-05 16:29:32 PST
Verified fixed using the STR in Comment 3. I tested with Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:11.0) Gecko/20100101 Firefox/11.0 which is the Beta 5 build. No crash observed with saving the PDF and opening a new tab.
Comment 45 Vlad [QA] 2012-03-26 06:56:45 PDT
I have verified this on:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:12.0) Gecko/20100101 Firefox/12.0 beta 2

Firefox didn't crash using the steps from comment3.

Setting resolution to Verified Fixed.

Note You need to log in before you can comment on or make changes to this bug.