Closed Bug 721663 Opened 12 years ago Closed 12 years ago

Crash in unpackImageRow @ CGAccessSessionGetBytes on Mac OS X 10.5 while printing or previewing

Categories

(Core :: General, defect)

11 Branch
x86
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla13
Tracking Status
firefox11 + verified
firefox12 - verified

People

(Reporter: scoobidiver, Assigned: smichaud)

References

Details

(4 keywords, Whiteboard: [qa!])

Crash Data

Attachments

(1 file, 2 obsolete files)

It's a new crash signature that first appeared in 12.0a1/20111223 and 11.0a2/20111230.
It's #4 top crasher in 11.0a2 on Mac OS X.

Every comments talk about printing or previewing

Signature 	CGAccessSessionGetBytes More Reports Search
UUID	63a79621-4dc0-48e1-9fb7-ed7762120125
Date Processed	2012-01-25 09:45:18
Uptime	13
Last Crash	38 seconds before submission
Install Age	13 seconds since version was first installed.
Install Time	2012-01-25 09:44:54
Product	Firefox
Version	11.0a2
Build ID	20120124042008
Release Channel	aurora
OS	Mac OS X
OS Version	10.5.8 9L31a
Build Architecture	x86
Build Architecture Info	family 6 model 23 stepping 6
Crash Reason	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash Address	0x1c9a4000
App Notes 	
AdapterVendorID: 0x10de, AdapterDeviceID: 0x 863	
EMCheckCompatibility	True

Frame 	Module 	Signature 	Source
0 		@0xffff08a0 	
1 	CoreGraphics 	CGAccessSessionGetBytes 	
2 	libPDFRIP.A.dylib 	unpackImageRow 	
3 	libPDFRIP.A.dylib 	PDFImageEmitData 	
4 	libPDFRIP.A.dylib 	imageRefEmitDefinition 	
5 	libPDFRIP.A.dylib 	PDFImageEmitDefinition 	
6 	libPDFRIP.A.dylib 	emitImageDefinition 	
7 	CoreFoundation 	CFSetApplyFunction 	

More reports at:
https://crash-stats.mozilla.com/report/list?signature=CGAccessSessionGetBytes
Version: 12 Branch → 11 Branch
It's #1 top crasher on Mac OS X in 11.0b1.

There are a few crashes on Mac OS X 10.6.
Keywords: topcrash
Adding the qawanted and regressionwindow-wanted keywords to do some exploratory testing around Printing/Previewing printing on OS X. See the comments at https://crash-stats.mozilla.com/report/list?query_search=signature&query_type=contains&reason_type=contains&range_value=3&range_unit=weeks&hang_type=any&process_type=any&signature=CGAccessSessionGetBytes for more leads.

Also tracking for FF11.
Here are a set of STR on 10.5 using most recent 11 beta: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:11.0) Gecko/20100101 Firefox/11.0

STR:
1. Load http://www.psdbox.com/tutorials/new-manga-effect-2011-photoshop-tutorial/
2. Go to file menu - File->Print->Save as PDF.
3. Save the file.
4. Immediately after that click on the new tab button next to the tab open in Step 1.
5. Crash.

https://crash-stats.mozilla.com/report/index/bp-0ba65bfe-6c8e-4426-b6e1-40cdb2120211
Keywords: reproducible
Here are correlations:
  CGAccessSessionGetBytes|EXC_BAD_ACCESS / KERN_INVALID_ADDRESS (57 crashes)
    100% (57/57) vs.  17% (86/498) libPDFRIP.A.dylib
    100% (57/57) vs.  19% (95/498) PrintingCocoaPDEs
Testing with today's mozilla-central nightly and Marcia's STR from comment #3 (on OS X 10.5.8), I get a completely different error:

The nightly crashes, and then so does crashreporter.  Then I get the following Apple crash report:

Process:         crashreporter [1877]
Path:            /Users/smichaud/Desktop/FirefoxNightly 2012-02-13.app/Contents/MacOS/crashreporter.app/Contents/MacOS/crashreporter
Identifier:      crashreporter
Version:         ??? (???)
Code Type:       X86 (Native)
Parent Process:  firefox [1863]

Interval Since Last Report:          84 sec
Crashes Since Last Report:           1
Per-App Interval Since Last Report:  0 sec
Per-App Crashes Since Last Report:   1

Date/Time:       2012-02-13 10:26:40.583 -0600
OS Version:      Mac OS X 10.5.8 (9L30)
Report Version:  6
Anonymous UUID:  C9512736-EC85-4CD1-B209-AB5833DEE8E2

Exception Type:  EXC_BREAKPOINT (SIGTRAP)
Exception Codes: 0x0000000000000002, 0x0000000000000000
Crashed Thread:  0

Dyld Error Message:
  Library not loaded: /usr/lib/libcrypto.0.9.8.dylib
  Referenced from: /Users/smichaud/Desktop/FirefoxNightly 2012-02-13.app/Contents/MacOS/crashreporter.app/Contents/MacOS/crashreporter
  Reason: image not found

Ted, any idea what's going on here?

This is very likely a different bug, unrelated to the one Marcia reported.
(Following up comment #5)

The OS X 10.5.8 version of libcrypto.dylib is 0.9.7.  So it looks like Breakpad is broken on OS X 10.5.8 in current nightlies.  I'll open a new bug.
Marcia filed bug 721160 on that already.
I can reproduce this in the 2012-01-19 trunk nightly:
bp-253266fa-f4ba-4b16-a228-d10f92120213

I'll look for a regression range.

It's pointless testing with trunk nightlies dated from 2012-01-20 through 2012-02-13, because they all have bug 721160.  But bug 721160 should be fixed in tomorrow's mozilla-central nightly.
Assignee: nobody → smichaud
Here's the regression range I found:

firefox-2012-11-21-09-23-46-mozilla-central
firefox-2012-11-22-03-09-49-mozilla-central

I can't tell which patch in this range might have triggered these crashes.

Interestingly, though, jemalloc was backed out for OS X 10.5 in this range (bug 702250).  If this does turn out to be what triggered these crashes, jemalloc most likely masked the real bug.
Marcia, please see if you can reproduce my regression range.
Steven: Happy to oblige, but I am having trouble figuring out what builds the ones in Comment 9 correlate to - I cannot find builds with that exact ID in the directory.

(In reply to Steven Michaud from comment #10)
> Marcia, please see if you can reproduce my regression range.
> Interestingly, though, jemalloc was backed out for OS X 10.5 in this
> range (bug 702250).  If this does turn out to be what triggered
> these crashes, jemalloc most likely masked the real bug.

Yes, turning jemalloc back on (on OS X 10.5) does seem to "fix" these
crashes.

Which isn't good news, because doing that isn't feasible, and because
jemalloc's masking of this bug's real cause makes it much harder to
find.

By the way, all the gdb stack traces I've been able to get of these
crashes are corrupt, and basically useless.  (This is of course	with a
non-symbol-stripped build.)

This is	likely to turn out to be a memory corruption bug.
> Which isn't good news, because doing that isn't feasible, and because
> jemalloc's masking of this bug's real cause makes it much harder to
> find.

There recently was another bug that jemalloc masked -- bug 700835.  And (something I'd forgotten) you can set the NO_MAC_JEMALLOC to turn off jemalloc on the Mac.  So it won't be so hard, after all, to find this bug's true regression range.

I'll be working on that tomorrow.
The NO_MAC_JEMALLOC environment variable.
For my own future reference:

In order to set the NO_MAC_JEMALLOC environment variable when you double-click an app, add the following to its Info.plist:

  <key>LSEnvironment</key>
  <dict>
  <key>NO_MAC_JEMALLOC</key>
  <string>1</string>
  </dict>
(Following up comment #16)

Though Apple documents this capability, it'd widely (and correctly) reported not to work at all.  Fortunately there's another way:

Create a ~/.MacOSX/ directory, and (if you don't have one already) add an environment.plist file to it.  Or edit it if you do have one.  Then log out and back in again.
(Following up comment #17)

Unfortunately, though, mozilla-central nightlies crash on startup on OS X 10.5.8 with NO_MAC_JEMALLOC set from when jemalloc was enabled on 10.5 (http://hg.mozilla.org/mozilla-central/rev/c4e4af6b7ae4, bug 694335) until it was disabled again (bug 702250).

So (apparently) NO_MAC_JEMALLOC will be of no help finding this bug's true regression range.
I see the same thing on my lab machine - the second build crashes with my STR but the first one does not. Thanks for hunting down the regression range.

(In reply to Steven Michaud from comment #13)
> (In reply to comment #11)
> 
> Here are the two builds I mentioned in comment #9.  The first one doesn't
> crash (for me).  The second one does.
> 
> ftp://ftp.mozilla.org/pub/firefox/nightly/2011/11/2011-11-21-09-23-46-
> mozilla-central/firefox-11.0a1.en-US.mac.dmg
> ftp://ftp.mozilla.org/pub/firefox/nightly/2011/11/2011-11-22-03-09-49-
> mozilla-central/firefox-11.0a1.en-US.mac.dmg
Keywords: qawanted
I'm still trying to get the "true" regression range (unmasked by jemalloc).  Since I can't use nightlies, it's quite slow.  But I've now managed to narrow the range to somewhere between these two landings (inclusive):

http://hg.mozilla.org/mozilla-central/rev/9ae1d4f44b8b
dougt@mozilla.com
Mon Nov 14 20:38:46 2011 -0800
Doug Turner — Bug 690201 - dead code - mLastDrawEvent never used. r=mbrubeck

http://hg.mozilla.org/mozilla-central/rev/24c8d04f6174
bmo@edmorley.co.uk
Mon Nov 14 04:35:37 2011 -0800
Ed Morley — Merge mozilla-central and mozilla-inbound

I'll keep working at it until I've identified the single patch that triggered this bug's crashes.
Here's the patch that triggered these crashes:

Bug 685767 - Factor blurring out into its own class, and use it from gfxAlphaBoxBlur. r=mattwoodrow
author	Joe Drew <joe@drew.ca>
	Mon Nov 14 17:29:28 2011 +1300 (at Mon Nov 14 17:29:28 2011 +1300)
http://hg.mozilla.org/mozilla-central/rev/6ae6d3beeaf4

Tomorrow I'll try to figure out why.
I had to disable jemalloc while testing.  Here's how I did it:

diff --git a/configure.in b/configure.in
--- a/configure.in
+++ b/configure.in
@@ -2343,21 +2343,21 @@ case "$target" in
            MKSHLIB_UNFORCE_ALL=''
        ;;
     esac
     ;;
 
 *-darwin*)
     MKSHLIB='$(CXX) $(CXXFLAGS) $(DSO_PIC_CFLAGS) $(DSO_LDOPTS) -o $@'
     MKCSHLIB='$(CC) $(CFLAGS) $(DSO_PIC_CFLAGS) $(DSO_LDOPTS) -o $@'
     MOZ_OPTIMIZE_FLAGS="-O3"
     _PEDANTIC=
-    MOZ_MEMORY=1
+    #MOZ_MEMORY=1
     CFLAGS="$CFLAGS -fno-common"
     CXXFLAGS="$CXXFLAGS -fno-common"
     DLL_SUFFIX=".dylib"
     DSO_LDOPTS=''
     STRIP="$STRIP -x -S"
     # Check whether we're targeting OS X or iOS
     AC_CACHE_CHECK(for iOS target,
                    ac_cv_ios_target,
                    [AC_TRY_COMPILE([#include <TargetConditionals.h>
 #if !(TARGET_OS_IPHONE || TARGET_IPHONE_SIMULATOR)
Blocks: 685767
oh, wow!

Steven, please don't hesitate to call on me for help here. Ping me on irc, whatever. :)
If this is memory corruption, presumably you want to run this through valgrind, right?
Marcia, could you test your STR from comment #3 on OS X 10.6 and 10.7 in 32-bit mode?  I'm not able to reproduce the crashes myself under those conditions.  But I don't have printer drivers installed for either of those OS versions, and that may be what makes the difference.

jemalloc is disabled in 32-bit mode, on all versions of OS X.
These crashes happen due to memory corruption that happens at either of the following two lines:

http://hg.mozilla.org/mozilla-central/annotate/6989376471f7/gfx/thebes/gfxBlur.cpp#l119
http://hg.mozilla.org/mozilla-central/annotate/6989376471f7/gfx/thebes/gfxBlur.cpp#l122

The crashes don't happen immediately -- only after a second or two.  But they don't happen if I comment out either of those lines.

More specifically, the memory corruption happens here:

http://hg.mozilla.org/mozilla-central/annotate/6989376471f7/gfx/cairo/cairo/src/cairo-quartz-surface.c#l3039

But it doesn't happen if, instead, we use the call to _cairo_quartz_surface_mask_with_generic() on the following line:

http://hg.mozilla.org/mozilla-central/annotate/6989376471f7/gfx/cairo/cairo/src/cairo-quartz-surface.c#l3041

Calling _cairo_quartz_surface_mask_with_generic() is (as best I can tell) a little less efficient, but functionally equivalent.

As far as I know (from the tests I mentioned in comment #25) these crashes only happen on OS X 10.5.  But I need help from others to confirm this.

I've ruled out that the crashes have anything to do with accessing deleted objects.  I did this by adding printf statements to the constructors and destructors of all the objects in play (including cairo objects).

I still don't know exactly what sort of memory corruption happens, or why.  But (as I said above) I'm reasonably confident it's confined to OS X 10.5.  So it may be an OS bug.  Or it may conceivably be a bug in how quartz cairo operates on OS X 10.5.  I don't believe we used the "mask" capability before Joe's patch for bug 685767 landed.  This would explain why we didn't have problems earlier.
Attached patch Provisional fix (obsolete) — Splinter Review
As I explained above, this patch doesn't really get to the heart of the problem.

But if I'm right that the memory corruption only happens on OS X 10.5, which we aren't going to support for much longer, I'm pretty sure the patch is "good enough".

I've started tryserver builds, which should be available in a few hours.
Attachment #597052 - Attachment is obsolete: true
Attachment #597900 - Flags: review?(joe)
By the way, I've been trying to use libgmalloc and valgrind on my OS X 10.5.8 machine, so far with no luck.  I suspect the machine has too little RAM -- only 4GB.
Comment on attachment 597900 [details] [diff] [review]
Provisional fix

Review of attachment 597900 [details] [diff] [review]:
-----------------------------------------------------------------

Please also add this as a Cairo patch to gfx/cairo.
Attachment #597900 - Flags: review?(joe) → review+
Comment on attachment 597900 [details] [diff] [review]
Provisional fix

Landed on mozilla-inbound:
http://hg.mozilla.org/integration/mozilla-inbound/rev/9466529cdbc0

> Please also add this as a Cairo patch to gfx/cairo.

Done.
What I landed (carrying forward Joe's r+).

I'll wait a few days for others to test, and then (presuming there aren't any problems) seek aurora and beta branch approval.

Marcia, you're probably in the best position to test :-)
Attachment #597900 - Attachment is obsolete: true
Attachment #598258 - Flags: review+
Testing using Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20100101 Firefox/11.0 (b3), I am not able to reproduce the bug. I will try 10.6 next. Confirming that I was testing in 32 bit mode.

(In reply to Steven Michaud from comment #25)
> Marcia, could you test your STR from comment #3 on OS X 10.6 and 10.7 in
> 32-bit mode?  I'm not able to reproduce the crashes myself under those
> conditions.  But I don't have printer drivers installed for either of those
> OS versions, and that may be what makes the difference.
> 
> jemalloc is disabled in 32-bit mode, on all versions of OS X.
https://hg.mozilla.org/mozilla-central/rev/9466529cdbc0
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla13
Is this fix considered low-risk enough to nominate for Aurora/Beta approval? Or does it make more sense to back out bug 685767? Thanks!
> Is this fix considered low-risk enough to nominate for Aurora/Beta approval?

I think it is.  It only changes behavior on OS X 10.5.  And even there it only causes a slight increase in RAM usage and (probably) no change in performance.

It'd be good to hear from others who know more about Cairo than I do, though.
Marcia, could you check with today's mozilla-central nightly to confirm that your STR no longer works with it?
(In reply to Steven Michaud from comment #36)
> It'd be good to hear from others who know more about Cairo than I do, though.

I've sent email to Joe and Jeff to get their feedback.
This is relatively safe. I'd be OK with shipping it in a beta and aurora. We already have to use the fallback path in many cases, so using it for more cases should continue to be safe.
Attachment #598258 - Flags: approval-mozilla-beta?
Attachment #598258 - Flags: approval-mozilla-aurora?
Comment on attachment 598258 [details] [diff] [review]
Patch with copy in gfx/cairo

[Triage Comment]
Comfortable taking this low-risk fix, please land today 2/27/12 in preparation for go-to-build on 2/28/12
Attachment #598258 - Flags: approval-mozilla-beta?
Attachment #598258 - Flags: approval-mozilla-beta+
Attachment #598258 - Flags: approval-mozilla-aurora?
Attachment #598258 - Flags: approval-mozilla-aurora+
See comment 3 for steps when verifying this fix.
Whiteboard: [qa+]
Verified fixed using the STR in Comment 3. I tested with Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:11.0) Gecko/20100101 Firefox/11.0 which is the Beta 5 build. No crash observed with saving the PDF and opening a new tab.
Depends on: 737509
I have verified this on:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:12.0) Gecko/20100101 Firefox/12.0 beta 2

Firefox didn't crash using the steps from comment3.

Setting resolution to Verified Fixed.
Status: RESOLVED → VERIFIED
Whiteboard: [qa+] → [qa!]
Blocks: 533577
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: