Closed Bug 721663 Opened 8 years ago Closed 8 years ago
Crash in unpack
Image Row @ CGAccess Session Get Bytes on Mac OS X 10 .5 while printing or previewing
It's a new crash signature that first appeared in 12.0a1/20111223 and 11.0a2/20111230. It's #4 top crasher in 11.0a2 on Mac OS X. Every comments talk about printing or previewing Signature CGAccessSessionGetBytes More Reports Search UUID 63a79621-4dc0-48e1-9fb7-ed7762120125 Date Processed 2012-01-25 09:45:18 Uptime 13 Last Crash 38 seconds before submission Install Age 13 seconds since version was first installed. Install Time 2012-01-25 09:44:54 Product Firefox Version 11.0a2 Build ID 20120124042008 Release Channel aurora OS Mac OS X OS Version 10.5.8 9L31a Build Architecture x86 Build Architecture Info family 6 model 23 stepping 6 Crash Reason EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash Address 0x1c9a4000 App Notes AdapterVendorID: 0x10de, AdapterDeviceID: 0x 863 EMCheckCompatibility True Frame Module Signature Source 0 @0xffff08a0 1 CoreGraphics CGAccessSessionGetBytes 2 libPDFRIP.A.dylib unpackImageRow 3 libPDFRIP.A.dylib PDFImageEmitData 4 libPDFRIP.A.dylib imageRefEmitDefinition 5 libPDFRIP.A.dylib PDFImageEmitDefinition 6 libPDFRIP.A.dylib emitImageDefinition 7 CoreFoundation CFSetApplyFunction More reports at: https://crash-stats.mozilla.com/report/list?signature=CGAccessSessionGetBytes
It's #1 top crasher on Mac OS X in 11.0b1. There are a few crashes on Mac OS X 10.6.
Adding the qawanted and regressionwindow-wanted keywords to do some exploratory testing around Printing/Previewing printing on OS X. See the comments at https://crash-stats.mozilla.com/report/list?query_search=signature&query_type=contains&reason_type=contains&range_value=3&range_unit=weeks&hang_type=any&process_type=any&signature=CGAccessSessionGetBytes for more leads. Also tracking for FF11.
Here are a set of STR on 10.5 using most recent 11 beta: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:11.0) Gecko/20100101 Firefox/11.0 STR: 1. Load http://www.psdbox.com/tutorials/new-manga-effect-2011-photoshop-tutorial/ 2. Go to file menu - File->Print->Save as PDF. 3. Save the file. 4. Immediately after that click on the new tab button next to the tab open in Step 1. 5. Crash. https://crash-stats.mozilla.com/report/index/bp-0ba65bfe-6c8e-4426-b6e1-40cdb2120211
Here are correlations: CGAccessSessionGetBytes|EXC_BAD_ACCESS / KERN_INVALID_ADDRESS (57 crashes) 100% (57/57) vs. 17% (86/498) libPDFRIP.A.dylib 100% (57/57) vs. 19% (95/498) PrintingCocoaPDEs
Testing with today's mozilla-central nightly and Marcia's STR from comment #3 (on OS X 10.5.8), I get a completely different error: The nightly crashes, and then so does crashreporter. Then I get the following Apple crash report: Process: crashreporter  Path: /Users/smichaud/Desktop/FirefoxNightly 2012-02-13.app/Contents/MacOS/crashreporter.app/Contents/MacOS/crashreporter Identifier: crashreporter Version: ??? (???) Code Type: X86 (Native) Parent Process: firefox  Interval Since Last Report: 84 sec Crashes Since Last Report: 1 Per-App Interval Since Last Report: 0 sec Per-App Crashes Since Last Report: 1 Date/Time: 2012-02-13 10:26:40.583 -0600 OS Version: Mac OS X 10.5.8 (9L30) Report Version: 6 Anonymous UUID: C9512736-EC85-4CD1-B209-AB5833DEE8E2 Exception Type: EXC_BREAKPOINT (SIGTRAP) Exception Codes: 0x0000000000000002, 0x0000000000000000 Crashed Thread: 0 Dyld Error Message: Library not loaded: /usr/lib/libcrypto.0.9.8.dylib Referenced from: /Users/smichaud/Desktop/FirefoxNightly 2012-02-13.app/Contents/MacOS/crashreporter.app/Contents/MacOS/crashreporter Reason: image not found Ted, any idea what's going on here? This is very likely a different bug, unrelated to the one Marcia reported.
(Following up comment #5) The OS X 10.5.8 version of libcrypto.dylib is 0.9.7. So it looks like Breakpad is broken on OS X 10.5.8 in current nightlies. I'll open a new bug.
Marcia filed bug 721160 on that already.
I can reproduce this in the 2012-01-19 trunk nightly: bp-253266fa-f4ba-4b16-a228-d10f92120213 I'll look for a regression range. It's pointless testing with trunk nightlies dated from 2012-01-20 through 2012-02-13, because they all have bug 721160. But bug 721160 should be fixed in tomorrow's mozilla-central nightly.
Here's the regression range I found: firefox-2012-11-21-09-23-46-mozilla-central firefox-2012-11-22-03-09-49-mozilla-central I can't tell which patch in this range might have triggered these crashes. Interestingly, though, jemalloc was backed out for OS X 10.5 in this range (bug 702250). If this does turn out to be what triggered these crashes, jemalloc most likely masked the real bug.
Marcia, please see if you can reproduce my regression range.
Steven: Happy to oblige, but I am having trouble figuring out what builds the ones in Comment 9 correlate to - I cannot find builds with that exact ID in the directory. (In reply to Steven Michaud from comment #10) > Marcia, please see if you can reproduce my regression range.
> Interestingly, though, jemalloc was backed out for OS X 10.5 in this > range (bug 702250). If this does turn out to be what triggered > these crashes, jemalloc most likely masked the real bug. Yes, turning jemalloc back on (on OS X 10.5) does seem to "fix" these crashes. Which isn't good news, because doing that isn't feasible, and because jemalloc's masking of this bug's real cause makes it much harder to find. By the way, all the gdb stack traces I've been able to get of these crashes are corrupt, and basically useless. (This is of course with a non-symbol-stripped build.) This is likely to turn out to be a memory corruption bug.
(In reply to comment #11) Here are the two builds I mentioned in comment #9. The first one doesn't crash (for me). The second one does. ftp://ftp.mozilla.org/pub/firefox/nightly/2011/11/2011-11-21-09-23-46-mozilla-central/firefox-11.0a1.en-US.mac.dmg ftp://ftp.mozilla.org/pub/firefox/nightly/2011/11/2011-11-22-03-09-49-mozilla-central/firefox-11.0a1.en-US.mac.dmg
> Which isn't good news, because doing that isn't feasible, and because > jemalloc's masking of this bug's real cause makes it much harder to > find. There recently was another bug that jemalloc masked -- bug 700835. And (something I'd forgotten) you can set the NO_MAC_JEMALLOC to turn off jemalloc on the Mac. So it won't be so hard, after all, to find this bug's true regression range. I'll be working on that tomorrow.
The NO_MAC_JEMALLOC environment variable.
For my own future reference: In order to set the NO_MAC_JEMALLOC environment variable when you double-click an app, add the following to its Info.plist: <key>LSEnvironment</key> <dict> <key>NO_MAC_JEMALLOC</key> <string>1</string> </dict>
(Following up comment #16) Though Apple documents this capability, it'd widely (and correctly) reported not to work at all. Fortunately there's another way: Create a ~/.MacOSX/ directory, and (if you don't have one already) add an environment.plist file to it. Or edit it if you do have one. Then log out and back in again.
(Following up comment #17) Unfortunately, though, mozilla-central nightlies crash on startup on OS X 10.5.8 with NO_MAC_JEMALLOC set from when jemalloc was enabled on 10.5 (http://hg.mozilla.org/mozilla-central/rev/c4e4af6b7ae4, bug 694335) until it was disabled again (bug 702250). So (apparently) NO_MAC_JEMALLOC will be of no help finding this bug's true regression range.
I see the same thing on my lab machine - the second build crashes with my STR but the first one does not. Thanks for hunting down the regression range. (In reply to Steven Michaud from comment #13) > (In reply to comment #11) > > Here are the two builds I mentioned in comment #9. The first one doesn't > crash (for me). The second one does. > > ftp://ftp.mozilla.org/pub/firefox/nightly/2011/11/2011-11-21-09-23-46- > mozilla-central/firefox-11.0a1.en-US.mac.dmg > ftp://ftp.mozilla.org/pub/firefox/nightly/2011/11/2011-11-22-03-09-49- > mozilla-central/firefox-11.0a1.en-US.mac.dmg
I'm still trying to get the "true" regression range (unmasked by jemalloc). Since I can't use nightlies, it's quite slow. But I've now managed to narrow the range to somewhere between these two landings (inclusive): http://hg.mozilla.org/mozilla-central/rev/9ae1d4f44b8b email@example.com Mon Nov 14 20:38:46 2011 -0800 Doug Turner — Bug 690201 - dead code - mLastDrawEvent never used. r=mbrubeck http://hg.mozilla.org/mozilla-central/rev/24c8d04f6174 firstname.lastname@example.org Mon Nov 14 04:35:37 2011 -0800 Ed Morley — Merge mozilla-central and mozilla-inbound I'll keep working at it until I've identified the single patch that triggered this bug's crashes.
Here's the patch that triggered these crashes: Bug 685767 - Factor blurring out into its own class, and use it from gfxAlphaBoxBlur. r=mattwoodrow author Joe Drew <email@example.com> Mon Nov 14 17:29:28 2011 +1300 (at Mon Nov 14 17:29:28 2011 +1300) http://hg.mozilla.org/mozilla-central/rev/6ae6d3beeaf4 Tomorrow I'll try to figure out why.
I had to disable jemalloc while testing. Here's how I did it: diff --git a/configure.in b/configure.in --- a/configure.in +++ b/configure.in @@ -2343,21 +2343,21 @@ case "$target" in MKSHLIB_UNFORCE_ALL='' ;; esac ;; *-darwin*) MKSHLIB='$(CXX) $(CXXFLAGS) $(DSO_PIC_CFLAGS) $(DSO_LDOPTS) -o $@' MKCSHLIB='$(CC) $(CFLAGS) $(DSO_PIC_CFLAGS) $(DSO_LDOPTS) -o $@' MOZ_OPTIMIZE_FLAGS="-O3" _PEDANTIC= - MOZ_MEMORY=1 + #MOZ_MEMORY=1 CFLAGS="$CFLAGS -fno-common" CXXFLAGS="$CXXFLAGS -fno-common" DLL_SUFFIX=".dylib" DSO_LDOPTS='' STRIP="$STRIP -x -S" # Check whether we're targeting OS X or iOS AC_CACHE_CHECK(for iOS target, ac_cv_ios_target, [AC_TRY_COMPILE([#include <TargetConditionals.h> #if !(TARGET_OS_IPHONE || TARGET_IPHONE_SIMULATOR)
oh, wow! Steven, please don't hesitate to call on me for help here. Ping me on irc, whatever. :)
If this is memory corruption, presumably you want to run this through valgrind, right?
Marcia, could you test your STR from comment #3 on OS X 10.6 and 10.7 in 32-bit mode? I'm not able to reproduce the crashes myself under those conditions. But I don't have printer drivers installed for either of those OS versions, and that may be what makes the difference. jemalloc is disabled in 32-bit mode, on all versions of OS X.
These crashes happen due to memory corruption that happens at either of the following two lines: http://hg.mozilla.org/mozilla-central/annotate/6989376471f7/gfx/thebes/gfxBlur.cpp#l119 http://hg.mozilla.org/mozilla-central/annotate/6989376471f7/gfx/thebes/gfxBlur.cpp#l122 The crashes don't happen immediately -- only after a second or two. But they don't happen if I comment out either of those lines. More specifically, the memory corruption happens here: http://hg.mozilla.org/mozilla-central/annotate/6989376471f7/gfx/cairo/cairo/src/cairo-quartz-surface.c#l3039 But it doesn't happen if, instead, we use the call to _cairo_quartz_surface_mask_with_generic() on the following line: http://hg.mozilla.org/mozilla-central/annotate/6989376471f7/gfx/cairo/cairo/src/cairo-quartz-surface.c#l3041 Calling _cairo_quartz_surface_mask_with_generic() is (as best I can tell) a little less efficient, but functionally equivalent. As far as I know (from the tests I mentioned in comment #25) these crashes only happen on OS X 10.5. But I need help from others to confirm this. I've ruled out that the crashes have anything to do with accessing deleted objects. I did this by adding printf statements to the constructors and destructors of all the objects in play (including cairo objects). I still don't know exactly what sort of memory corruption happens, or why. But (as I said above) I'm reasonably confident it's confined to OS X 10.5. So it may be an OS bug. Or it may conceivably be a bug in how quartz cairo operates on OS X 10.5. I don't believe we used the "mask" capability before Joe's patch for bug 685767 landed. This would explain why we didn't have problems earlier.
As I explained above, this patch doesn't really get to the heart of the problem. But if I'm right that the memory corruption only happens on OS X 10.5, which we aren't going to support for much longer, I'm pretty sure the patch is "good enough". I've started tryserver builds, which should be available in a few hours.
Here's a tryserver build made with my patch from comment #27: http://firstname.lastname@example.org/try-macosx64/firefox-13.0a1.en-US.mac.dmg There were no non-spurious test failures.
By the way, I've been trying to use libgmalloc and valgrind on my OS X 10.5.8 machine, so far with no luck. I suspect the machine has too little RAM -- only 4GB.
Comment on attachment 597900 [details] [diff] [review] Provisional fix Review of attachment 597900 [details] [diff] [review]: ----------------------------------------------------------------- Please also add this as a Cairo patch to gfx/cairo.
Attachment #597900 - Flags: review?(joe) → review+
Comment on attachment 597900 [details] [diff] [review] Provisional fix Landed on mozilla-inbound: http://hg.mozilla.org/integration/mozilla-inbound/rev/9466529cdbc0 > Please also add this as a Cairo patch to gfx/cairo. Done.
What I landed (carrying forward Joe's r+). I'll wait a few days for others to test, and then (presuming there aren't any problems) seek aurora and beta branch approval. Marcia, you're probably in the best position to test :-)
Testing using Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20100101 Firefox/11.0 (b3), I am not able to reproduce the bug. I will try 10.6 next. Confirming that I was testing in 32 bit mode. (In reply to Steven Michaud from comment #25) > Marcia, could you test your STR from comment #3 on OS X 10.6 and 10.7 in > 32-bit mode? I'm not able to reproduce the crashes myself under those > conditions. But I don't have printer drivers installed for either of those > OS versions, and that may be what makes the difference. > > jemalloc is disabled in 32-bit mode, on all versions of OS X.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla13
Is this fix considered low-risk enough to nominate for Aurora/Beta approval? Or does it make more sense to back out bug 685767? Thanks!
> Is this fix considered low-risk enough to nominate for Aurora/Beta approval? I think it is. It only changes behavior on OS X 10.5. And even there it only causes a slight increase in RAM usage and (probably) no change in performance. It'd be good to hear from others who know more about Cairo than I do, though.
Marcia, could you check with today's mozilla-central nightly to confirm that your STR no longer works with it?
(In reply to Steven Michaud from comment #36) > It'd be good to hear from others who know more about Cairo than I do, though. I've sent email to Joe and Jeff to get their feedback.
This is relatively safe. I'd be OK with shipping it in a beta and aurora. We already have to use the fallback path in many cases, so using it for more cases should continue to be safe.
Comment on attachment 598258 [details] [diff] [review] Patch with copy in gfx/cairo [Triage Comment] Comfortable taking this low-risk fix, please land today 2/27/12 in preparation for go-to-build on 2/28/12
Comment on attachment 598258 [details] [diff] [review] Patch with copy in gfx/cairo Landed on mozilla-aurora: http://hg.mozilla.org/releases/mozilla-aurora/rev/154b852a9952
Comment on attachment 598258 [details] [diff] [review] Patch with copy in gfx/cairo Landed on mozilla-beta: http://hg.mozilla.org/releases/mozilla-beta/rev/1dc6cd15f683
See comment 3 for steps when verifying this fix.
Verified fixed using the STR in Comment 3. I tested with Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:11.0) Gecko/20100101 Firefox/11.0 which is the Beta 5 build. No crash observed with saving the PDF and opening a new tab.
I have verified this on: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:12.0) Gecko/20100101 Firefox/12.0 beta 2 Firefox didn't crash using the steps from comment3. Setting resolution to Verified Fixed.
You need to log in before you can comment on or make changes to this bug.