Closed Bug 804606 Opened 12 years ago Closed 12 years ago

[adbe 3373629] OSX Flash player crash in F_1290421835 @ CGSConvertRGBX8888toRGBA8888 with Intel GMA 950/X3100 GPUs

Categories

(Core Graveyard :: Plug-ins, defect)

18 Branch
All
macOS
defect
Not set
critical

Tracking

(firefox17 unaffected, firefox18+ verified, firefox19+ verified, firefox20 verified, b2g18 fixed)

VERIFIED FIXED
mozilla20
Tracking Status
firefox17 --- unaffected
firefox18 + verified
firefox19 + verified
firefox20 --- verified
b2g18 --- fixed

People

(Reporter: kairo, Assigned: smichaud)

References

Details

(4 keywords)

Crash Data

Attachments

(11 files, 6 obsolete files)

1.88 KB, patch
smichaud
: review+
Details | Diff | Splinter Review
822 bytes, text/plain
Details
4.35 KB, patch
Details | Diff | Splinter Review
70.93 KB, text/plain
Details
615 bytes, text/html
Details
4.98 KB, text/plain
Details
19.98 KB, patch
Details | Diff | Splinter Review
1.58 KB, text/plain
Details
35.05 KB, patch
Details | Diff | Splinter Review
14.17 KB, patch
BenWa
: review+
Details | Diff | Splinter Review
17.98 KB, patch
Details | Diff | Splinter Review
This bug was filed from the Socorro interface and is report bp-5f5feb3f-5fcc-4a1c-90ad-3f7422121023 . ============================================================= This crash in OSX Flash Player (almost exclusively Mac OS X 10.6.8 10K549 but also seen one inMac OS X 10.7.0 11A511) started happening on 2012-10-19 and has been rising since then, see https://crash-stats.mozilla.com/report/list?signature=CGSConvertRGBX8888toRGBA8888 for more reports.
It's #1 top crasher in 18.0a2 and #15 in 19.0a1 on Mac. It first appeared in 18.0a2/20121019 and 19.0a1/20121017. The regression ranges: * is in Aurora: http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=0d094ad653e8&tochange=435f85f6a54a * might be in the trunk: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8f145599e4bf&tochange=dac5700acf8b It might be a regression from bug 626245.
Component: Flash (Adobe) → Graphics
Keywords: regression, topcrash
Product: Plugins → Core
Summary: OSX Flash player crash in CGSConvertRGBX8888toRGBA8888 → OSX Flash player crash in F_1290421835 @ CGSConvertRGBX8888toRGBA8888
Version: unspecified → 18 Branch
That seems likely.
Assignee: nobody → matspal
Blocks: 626245
This is a plugin process crash. The "module CoreGraphics" at the top is an OSX system library, it's unrelated to the Core/Graphics bugzilla component. That said, bug 626245 does seem the most likely cause in the given range. It's hard to fix this without STR (unless we want to back out 626245).
Assignee: matspal → nobody
Component: Graphics → Plug-ins
Maybe the Flash object's window is sized to 0,0 or something and that's causing problems? But I thought we already allowed that to happen.
We could also try not doing widget configuration for plugins at all on Mac. I'm not sure if that's needed. But that sounds too scary for Aurora.
I see these going back at least to FF 14.0: https://crash-stats.mozilla.com/report/index/c5356c25-d0fc-45c9-9dee-144f42120928 Don't know why the volume has increased recently ... if it has.
Keywords: regression
Version: 18 Branch → unspecified
As this appears to be an old bug (and probably a Flash bug), we probably shouldn't be tracking it.
> If there has been a sudden increase, it definitely started more than a month ago. I take this back, partly. There are 36 of these crashes (over the last 4 weeks) in FF 15.0.1 (and 13 in FF 16.0.1). But there are also 41 in FF "18.0a2", even though the number of users for "18.0a2" is far less than for 15.0.1 or 16.0.1. So the number of crashes per user does seem to have gone up recently fairly dramatically. I don't know why there aren't any crashes listed for "FF 17.0a" -- I suspect that's a problem with Socorro.
> I don't know why there aren't any crashes listed for "FF 17.0a" -- I > suspect that's a problem with Socorro. Or a problem with Socorro's data.
Another thing: I wonder if many of the recent crashes (on FF 17 and 18) aren't from a very small number of users ... possibly even a single user. Can somebody check?
(In reply to Steven Michaud from comment #11) > I wonder if many of the recent crashes (on FF 17 and 18) aren't from a very > small number of users ... possibly even a single user. Can somebody check? I've already checked. In the first Aurora build, it has been hit by three different users (different Build Architecture Info and Device ID): https://crash-stats.mozilla.com/report/list?version=Firefox%3A18.0a2&range_value=4&range_unit=weeks&signature=CGSConvertRGBX8888toRGBA8888 https://crash-stats.mozilla.com/report/list?version=Firefox%3A19.0a1&range_value=4&range_unit=weeks&signature=CGSConvertRGBX8888toRGBA8888 Nevertheless, it's correlated to Intel GMA 950 (laptop) and X3100 (desktop) GPUs (see https://en.wikipedia.org/wiki/Comparison_of_Intel_graphics_processing_units).
Keywords: regression
Summary: OSX Flash player crash in F_1290421835 @ CGSConvertRGBX8888toRGBA8888 → OSX Flash player crash in F_1290421835 @ CGSConvertRGBX8888toRGBA8888 with Intel GMA 950/X3100 GPUs
Version: unspecified → 18 Branch
Removing needURLs, we don't get any URLs for these types of crashes.
Keywords: needURLs
(In reply to Mats Palmgren [:mats] from comment #3) > This is a plugin process crash. The "module CoreGraphics" at the top is an > OSX system library, it's unrelated to the Core/Graphics bugzilla component. I wont' be so sure. Indeed, bug 626245 belongs to Layout and this bug is related to one GPU set.
(In reply to Steven Michaud from comment #9) > There are 36 of these crashes (over the last 4 weeks) in FF 15.0.1 (and 13 > in FF 16.0.1). But there are also 41 in FF "18.0a2", even though the number > of users for "18.0a2" is far less than for 15.0.1 or 16.0.1. So the number > of crashes per user does seem to have gone up recently fairly dramatically. Yes, that's what brought me to report it in the first place. Look at the "Reports" tab of https://crash-stats.mozilla.com/report/list?query_search=signature&query_type=contains&reason_type=contains&range_value=4&range_unit=weeks&hang_type=any&process_type=any&signature=CGSConvertRGBX8888toRGBA8888 and you'll see that it started appearing in larger frequency on 18.0a2 on October 19th as I stated in comment #0 (and 19.0a1 goes back to 17th, as Scoobidiver mentions in comment #1). > I don't know why there aren't any crashes listed for "FF 17.0a" -- I suspect > that's a problem with Socorro. No, (guessing you mean any of 17, as there is no "17.0a" version exactly) this is probably just because before that recent regression it happened so rarely that we only ran into it on release, where we have enough active installations to even trigger more rare bugs. (In reply to Steven Michaud from comment #11) > I wonder if many of the recent crashes (on FF 17 and 18) aren't from a very > small number of users ... possibly even a single user. Can somebody check? I routinely check that as good as we can before even reporting such a bug. There are a ton of different installation times, and those are the only real clue we have of being the same or different users. It looks like there's a larger collection of users hitting this. I originally was assuming this to be a bug in Flash, but given that the rise seems to match the bug 626245 landing, we at least triggered it by a code change on our side.
Robert, I see now that you were more careful than I first realized (and I agree with almost everything you said in your followup comments). But this would have been clearer had you mentioned at the start that this is in fact an old bug, whose frequency has jumped recently -- possibly triggered by something we did. Given the small number of crashes on the 19 branch, and the possibility that any one of them might in fact be an "old" crash (not one that we triggered), I don't think we have enough data to find a regression range on that branch. And in fact I found one crash that happens on 2012-10-16: https://crash-stats.mozilla.com/report/index/baa4bcb0-9405-4d34-80a4-8665f2121017 The regression range on the 18 branch seems reasonably well established, though. We have 8 crashes in the 20121019042010 build and about 20 crashes in the 20121020042013 build, but no crashes in prior builds.
(In reply to comment #12) > it's correlated to Intel GMA 950 (laptop) and X3100 (desktop) GPUs (see > https://en.wikipedia.org/wiki/Comparison_of_Intel_graphics_processing_units). Are you talking about these? 60% (6/10) vs. 13% (7/52) AppleIntelGMAX3100GLDriver 40% (4/10) vs. 15% (8/52) AppleIntelGMA950GLDriver
(In reply to Steven Michaud from comment #16) > And in fact I found one crash that happens on 2012-10-16: > https://crash-stats.mozilla.com/report/index/baa4bcb0-9405-4d34-80a4- > 8665f2121017 It happened in Nightly UX whose I don't know the exact scope wrt m-c. Anyway, it's consistent with a regression from bug 626245 that first landed in the build from October 15. (In reply to Steven Michaud from comment #17) > Are you talking about these? > 60% (6/10) vs. 13% (7/52) AppleIntelGMAX3100GLDriver > 40% (4/10) vs. 15% (8/52) AppleIntelGMA950GLDriver My statement was based on a manual checking of App Notes in all post-spike crash reports that contain vendor and device IDs. But correlations are also another source despite restricted to one day.
>> And in fact I found one crash that happens on 2012-10-16: >> https://crash-stats.mozilla.com/report/index/baa4bcb0-9405-4d34-80a4-8665f2121017 > > It happened in Nightly UX whose I don't know the exact scope wrt > m-c. Anyway, it's consistent with a regression from bug 626245 that > first landed in the build from October 15. I hadn't noticed this was in a "nightly-ux" build. Thanks for pointing that out. And yes, the patches for bug 626245 did land on mozilla-central on 2012-10-14 (see bug 626245 comment #84).
This bug's crashes take place when the OS calls a "CFRunLoopSource" that's been created by the Flash plugin (using a call to CFRunLoopSourceCreate()). This is analogous to how we use nsAppShell::ProcessGeckoEvents() widget/cocoa/nsAppShell.mm. From the disparate crash addresses, I suspect the crashes happen when internal Flash code (the perform target of its CFRunLoopSource) tries to pass a deleted object to a call to CGContextDrawImage(). Whatever this deleted object is, it's most likely an internal Flash object. So this is most likely a Flash bug, whatever we may have done recently to increase its frequency.
There's some reason to think that the (Flash plugin) code that crashes isn't part of the "normal" code path that Flash uses to draw its plugins: CGContextDrawImage() never gets called from the plugin-container process when running any of the following Flash "movies": http://mirrors.creativecommons.org/getcreative/ http://mirrors.creativecommons.org/reticulum_rex/ http://flashnifties.com/products/flash-media-gallery/ (Tested with the current Flash plugin (11.4.402.287) on OS X 10.6.8, not using either of the graphics drivers listed in comment #17. Unfortunately I don't have a machine with either of those kinds of graphics hardware.)
(Following up comment #22) Actually CGContextDrawImage() gets called a lot if you uncheck "Enable Hardware Acceleration" in "Adobe Flash Player Settings". So (apparently) these crashes only happen if you turn off Flash's support for hardware acceleration. Here's what the stacks look like in gdb: #0 0x00007fff89e65a64 in CGContextDrawImage () #1 0x0000000118ddd956 in unregister_ShockwaveFlash () #2 0x0000000118ddc01e in unregister_ShockwaveFlash () #3 0x00007fff8443f910 in CABackingStoreUpdate () #4 0x00007fff8443ebaf in -[CALayer _display] () #5 0x00007fff843fd1f3 in CALayerDisplayIfNeeded () #6 0x00007fff843fc66c in CA::Context::commit_transaction () #7 0x00007fff843fc2c2 in CA::Transaction::commit () #8 0x00007fff89143b07 in __CFRunLoopDoObservers () #9 0x00007fff8911edaf in CFRunLoopRunSpecific () #10 0x00007fff83e1d7ee in RunCurrentEventLoopInMode () #11 0x00007fff83e1d5f3 in ReceiveNextEventCommon () #12 0x00007fff83e1d4ac in BlockUntilNextEventMatchingListInMode () #13 0x00007fff80b08eb2 in _DPSNextEvent () #14 0x00007fff80b08801 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] () #15 0x00007fff80ace68f in -[NSApplication run] () #16 0x00000001010576f3 in base::MessagePumpNSApplication::DoRun () #17 0x000000010105715a in base::MessagePumpCFRunLoopBase::Run () #18 0x0000000101048cae in MessageLoop::Run () #19 0x0000000100020449 in XRE_InitChildProcess () #20 0x0000000100000f1d in main ()
(Following up comment #23) Which, interestingly, doesn't really match the crash stacks. Sigh.
What versions of Flash are we currently blocking (on the 18 and 19 branches, and on OS X 10.6.8)? I'll try to find an unblocked version whose calls to CGContextDrawImage match the crash stacks.
(In reply to Steven Michaud from comment #25) > What versions of Flash are we currently blocking (on the 18 and 19 branches, > and on OS X 10.6.8)? Adobe Flash 10.2.* and lower. See https://addons.mozilla.org/firefox/blocked/p160 It's planned to block more versions in 17.0 and above (might fix the issue?). See bug 803152.
(Following up comment #25) > I'll try to find an unblocked version whose calls to CGContextDrawImage match > the crash stacks. I wasn't able to find one. I tested back to version 10.3.183.25 (the most recent 10.3 version available at http://helpx.adobe.com/flash-player/kb/archived-flash-player-versions.html), on OS X 10.6.8 with today's mozilla-central nightly. Where available (i.e. with all the 11 versions) I tested with a 64-bit variant of the plugin. In all cases I had to turn off Flash's support for hardware acceleration to see any calls to CFContextDrawImage in "normal" drawing (e.g. without bringing up the Flash context menu). In all cases the stacks for these calls matched the one in comment #23, not the crash stacks. (Specifically, all the stacks in my tests contained __CFRunLoopDoObservers, not __CFRunLoopDoSources0.) There doesn't seem to be much point in testing further back. Almost all the crashes on the 18 and 19 branches are in 64-bit mode, and only the 11 versions have a 64-bit variant. So it seems that these crashes happen on a code path that's only exercised when the Flash plugin runs on the video hardware mentioned in comment #17. Without being able to test that code path, there's no way to tell whether or not you need to turn off Flash's support for hardware acceleration to see calls to CGContextDrawImage, or to see the crashes.
It's entirely possible that the stack trace you are seeing in the debugger doesn't match what Breakpad is seeing, as sometimes its stackwalker gets things wrong - the minidump we send does not have all the memory that a debugger can access and so can get something wrong. You are able to reproduce a crash, what about trying to find a way to solve that one and see if the fix helps this as well?
> You are able to reproduce a crash No, I'm not. I've just been trying to find out as much as possible about the code path on which the crashes sometimes happen. So far I haven't had much luck. Pretty much the only conclusions I've been able to draw are what I say in comment #20.
> It's entirely possible that the stack trace you are seeing in the > debugger doesn't match what Breakpad is seeing, as sometimes its > stackwalker gets things wrong The "F..." symbols are added by Socorro, presumably from information that Adobe has provided us. None of those symbols is actually exported by the Flash plugin -- you can tell by running "nm -pam" on its constituent parts. And yes, Breakpad sometimes does get things wrong. But usually it leaves things out -- for example the bottom of the stack. It's highly unlikely it would mistake __CFRunLoopDoSources0 for __CFRunLoopDoObservers, for example.
(In reply to Steven Michaud from comment #29) > > You are able to reproduce a crash > > No, I'm not. I've just been trying to find out as much as possible about > the code path on which the crashes sometimes happen. Oh, sorry. I thought your stacks were crashes, I guess it's just breakpoints. :) And Scoobidiver, please don't lecture me on Flash symbols and what Breakpad gets, I have been working full-time on crash investigations for more than 18 months now, I know what we usually get. ;-) And, BTW, Breakpad can get totally wrong stacks at times, with completely wrong frames, though those are more often seen on Windows than on Mac or Linux. I would not guess that's true here, I just wanted to throw it out there as a possibility. That said, Steven, yes, I guess you need a machine with one of those Intel cards/chips to reproduce.
Gah, sorry, I somehow thought Scoobidiver was talking here, I'm sorry, I completely got things wrong. Still, Steven, I know about our obfuscated Flash symbols, I've been dealing with all this often enough. And I've seen cases where the stackwalker brings up completely useless stacks. I guess the ones here are good, though. Everything else stands, of course.
> Steven, I know about our obfuscated Flash symbols, I've been dealing > with all this often enough. I frankly didn't know about them until yesterday, when I couldn't find them in the Flash plugin with "nm -pam" :-)
Attached patch stab in the darkSplinter Review
This might possibly help; I don't know. With this patch, on Mac ApplyPluginGeometryUpdates gets called during a timer event (via the refresh driver). Keeping the nsView changes for plugins away from paint events might help address this bug.
Attachment #678852 - Flags: review?(smichaud)
Steven, you might want to test this patch a bit, since I don't have a Mac here :-)
Comment on attachment 678852 [details] [diff] [review] stab in the dark This is worth a try. But we probably won't know whether it has any effect until it's been on the aurora branch for at least a few days. This patch stops ApplyPluginGeometryUpdates() from ever being called on -[ChildView viewWillDraw], which actually isn't a (native) paint event. The native paint event is -[ChildView drawRect:], and -[ChildView viewWillDraw] is called (by the OS) just before it calls -[ChildView drawRect:]. The OS assumes that the NSWindow/NSView hierarchy won't change "too much" between these two calls. And we've gotten in trouble in the past because nsView::WillPaintWindow() breaks some of these assumptions. I hacked around this with my patch for bug 550392. I'd prefer nsView::WillPaintWindow() not to make any changes to the NSWindow/NSView hierarchy, but I suspect that's not feasible. Your patch reduces the number of changes it can make, which might help here. But fiddling with what happens (or doesn't happen) during calls to nsView::WillPaintWindow() is less likely to fix this bug than it was to fix bug 550392: Bug 550392 happened when the OS was preparing to call -[ChildView drawRect:] on parts of the NSView hierarchy. But since this bug's crashes happen when the OS calls a Flash plugin run loop source (and since we can't yet reproduce the code path on which this happens), we have no way of telling whether or not what happens in the run loop source is connected in some way with what happens when the OS calls -[NSView drawRect:] (or -[NSView viewWillDraw]). I'll wait to r+ this patch until I've had a chance to test it for a day or two.
Assignee: nobody → roc
Comment on attachment 678852 [details] [diff] [review] stab in the dark I've now played around with this patch on both OS X 10.6.8 and 10.7.5, mostly using sites with Flash, Silverlight or Java plugins. I didn't see any problems.
Attachment #678852 - Flags: review?(smichaud) → review+
Another reason to doubt that this patch will help: > But since this bug's crashes happen when the OS calls a Flash plugin > run loop source (and since we can't yet reproduce the code path on > which this happens), we have no way of telling whether or not what > happens in the run loop source is connected in some way with what > happens when the OS calls -[NSView drawRect:] (or -[NSView > viewWillDraw]). The OS's calls to -[NSView viewWillDraw] and -[NSView drawRect:] both happen on the same "trip" through the main thread's run loop (on the same call to RunCurrentEventLoopInMode(), which is why my hack for bug 550392 works). But we don't know if this is also true of the OS's call to the Flash plugin's run loop source (to __CFRunLoopDoSources0() and above). If the call to the Flash plugin's run loop source happens on another (subsequent) call to RunCurrentEventLoopInMode(), the OS will have had a chance to delete objects in the autorelease pool, and the Gecko event loop might also have spun again. So what we avoided in our call to nsView::WillPaintWindow() might still happen before the Flash plugin's run loop source gets a chance to run.
Crash Signature: [@ CGSConvertRGBX8888toRGBA8888] → [@ CGSConvertRGBX8888toRGBA8888] [@ hang | CGSConvertRGBX8888toRGBA8888]
This isn't terribly high volume currently - 60 crashes in the last week across all versions. Most of the crashes seems to be on 10.6.8 - I see only one on 10.7 and the other 2 are using 10.6.3.
(In reply to Marcia Knous [:marcia] from comment #39) > This isn't terribly high volume currently I disagree. It's still #2 top crasher in 18.0a2 and #13 in 19.0a1 on Mac.
Removing the crash signature caused by bug 810241.
Crash Signature: [@ CGSConvertRGBX8888toRGBA8888] [@ hang | CGSConvertRGBX8888toRGBA8888] → [@ CGSConvertRGBX8888toRGBA8888]
FlashPlayer users CoreAnimation to render its image to the window by default. When the gpu is blacklisted due to driver bugs with CoreAnimation (as the Intel GMA950 and x3100 are), or when the user explicitly disables hardware acceleration through the settings UI, then FlashPlayer will fall back to using the less efficient CoreGraphics api. That's where the CGContextDrawImage() call comes from. Not sure if Firefox (which is using CoreAnimation) has any expectation that the plugin must be using CoreAnimation? Is this a crash that is rising only with recent Firefox builds? Or one that is rising in general? The CoreGraphics rendering path has not changed sourcewise in FlashPlayer in a long time. If its rising in general, I would suspect some new revision of some popular content is triggering a latent bug. Are there any URLS that are known to reproduce?
> FlashPlayer users CoreAnimation to render its image to the window by > default. When the gpu is blacklisted due to driver bugs with > CoreAnimation (as the Intel GMA950 and x3100 are), or when the user > explicitly disables hardware acceleration through the settings UI, > then FlashPlayer will fall back to using the less efficient > CoreGraphics api. This is at least partly false: Testing with Flash 11.5.502.110 in current trunk code (on OS X 10.7.5), Flash still uses Invalidating Core Animation when you explicitly disable hardware acceleration. > Not sure if Firefox (which is using CoreAnimation) has any > expectation that the plugin must be using CoreAnimation? Firefox still (of course) supports CoreGraphics. And a while ago I tested current Flash in current trunk code using CoreGraphics (in a build I'd altered to tell plugins it didn't support either CoreAnimation or Invalidating CoreAnimation). I didn't see any problems. But I didn't test if it's possible in such a build to reproduce the code path on which these crashes have been happening. I'll do that now. > Is this a crash that is rising only with recent Firefox builds? This is an old bug, that's been around for years. But yes, it's frequency has risen in recent FF builds. See comment #1 and comment #16. On the Aurora branch the crashes start with the 20121019042010 build. On the trunk branch we don't have enough crashes to tell when they started. > Are there any URLS that are known to reproduce? No. This bug isn't reproducible.
> But I didn't test if it's possible in such a build to reproduce the > code path on which these crashes have been happening. Yes, it is! Breakpoint 1, 0x00007fff8851bcad in CGContextDrawImage () (gdb) bt #0 0x00007fff8851bcad in CGContextDrawImage () #1 0x000000010e2eb7e5 in unregister_ShockwaveFlash () #2 0x000000010e2f9c42 in NP_Initialize () #3 0x000000010e26d1be in FlashPlayer_11_5_502_110_FlashPlayer () #4 0x0000000101c73f37 in mozilla::plugins::PluginInstanceChild::AnswerNPP_HandleEvent (this=0x10a2bc000, event=@0x7fff5fbfc828, handled=0x7fff5fbfc826) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/dom/plugins/ipc/PluginInstanceChild.cpp:823 #5 0x0000000101c7451b in mozilla::plugins::PluginInstanceChild::CGDraw (this=0x10a2bc000, ref=0x10c12b680, aUpdateRect=<value temporarily unavailable, due to optimizations>) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/dom/plugins/ipc/PluginInstanceChild.cpp:956 #6 0x0000000101c74448 in CallCGDraw (ref=0x10c12b680, aPluginInstance=0x10a2bc000, aUpdateRect=<value temporarily unavailable, due to optimizations>) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/dom/plugins/ipc/PluginInstanceChild.cpp:936 #7 0x0000000101c995bb in -[CGBridgeLayer drawInContext:] (self=0x10a2ebd80, _cmd=0x7fff865ebdbe, aCGContext=0x10c12b680) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/dom/plugins/ipc/PluginUtilsOSX.mm:53 #8 0x00007fff91643225 in CABackingStoreUpdate_ () #9 0x00007fff9164213a in CA::Layer::display_ () #10 0x0000000101c99758 in mozilla::plugins::PluginUtilsOSX::Repaint (caLayer=0x10a2ebd80, aRect=@0x7fff5fbfcd58) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/dom/plugins/ipc/PluginUtilsOSX.mm:78 #11 0x0000000101c772f4 in mozilla::plugins::PluginInstanceChild::ShowPluginFrame (this=0x10a2bc000) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/dom/plugins/ipc/PluginInstanceChild.cpp:3553 #12 0x0000000101c77742 in mozilla::plugins::PluginInstanceChild::InvalidateRectDelayed (this=0x10a2bc000) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/dom/plugins/ipc/PluginInstanceChild.cpp:3771 #13 0x0000000101c7b823 in DispatchToMethod<mozilla::plugins::PluginInstanceChild, void (mozilla::plugins::PluginInstanceChild::*)()> (obj=0x10a2bc000, method={ptr = 4324816608, ptr = 0}, arg=@0x10a20b720) at tuple.h:383 #14 0x0000000101c7b71e in RunnableMethod<mozilla::plugins::PluginInstanceChild, void (mozilla::plugins::PluginInstanceChild::*)(), Tuple0>::Run (this=0x10a20b700) at task.h:307 #15 0x00000001021646a4 in MessageLoop::RunTask (this=0x7fff5fbfe618, task=0x10a20b700) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/ipc/chromium/src/base/message_loop.cc:333 #16 0x0000000102164b9f in MessageLoop::DeferOrRunPendingTask (this=0x7fff5fbfe618, pending_task=@0x7fff5fbfcef8) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/ipc/chromium/src/base/message_loop.cc:341 #17 0x0000000102164d5a in MessageLoop::DoWork (this=0x7fff5fbfe618) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/ipc/chromium/src/base/message_loop.cc:441 #18 0x00000001021dfb34 in base::MessagePumpCFRunLoopBase::RunWork (this=0x10a247160) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/ipc/chromium/src/base/message_pump_mac.mm:291 #19 0x00000001021df2ed in base::MessagePumpCFRunLoopBase::RunWorkSource (info=0x10a247160) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/ipc/chromium/src/base/message_pump_mac.mm:269 #20 0x00007fff85a904f1 in __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ () #21 0x00007fff85a8fd5d in __CFRunLoopDoSources0 () #22 0x00007fff85ab6b49 in __CFRunLoopRun () #23 0x00007fff85ab6486 in CFRunLoopRunSpecific () #24 0x00007fff8b2602bf in RunCurrentEventLoopInMode () #25 0x00007fff8b26756d in ReceiveNextEventCommon () #26 0x00007fff8b2673fa in BlockUntilNextEventMatchingListInMode () #27 0x00007fff85c5b779 in _DPSNextEvent () #28 0x00007fff85c5b07d in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] () #29 0x00007fff85c579b9 in -[NSApplication run] () #30 0x00000001021e0545 in base::MessagePumpNSApplication::DoRun (this=0x10a247160, delegate=0x7fff5fbfe618) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/ipc/chromium/src/base/message_pump_mac.mm:677 #31 0x00000001021df94f in base::MessagePumpCFRunLoopBase::Run (this=0x10a247160, delegate=0x7fff5fbfe618) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/ipc/chromium/src/base/message_pump_mac.mm:213 #32 0x00000001021645a5 in MessageLoop::RunInternal (this=0x7fff5fbfe618) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/ipc/chromium/src/base/message_loop.cc:215 #33 0x00000001021644d5 in MessageLoop::RunHandler (this=0x7fff5fbfe618) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/ipc/chromium/src/base/message_loop.cc:208 #34 0x000000010216447d in MessageLoop::Run (this=0x7fff5fbfe618) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/ipc/chromium/src/base/message_loop.cc:182 #35 0x00000001000384cc in XRE_InitChildProcess (aArgc=4, aArgv=0x7fff5fbffa10, aProcess=GeckoProcessType_Plugin) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:485 #36 0x0000000100000e9c in main (argc=7, argv=0x7fff5fbffa10) at /usr/local/src/Mozilla/bugzilla804606/mozilla-central/ipc/app/MozillaRuntimeMain.cpp:48 (gdb) I don't yet know what this teaches us. But over the next few days (as other bugs permit) I'll attempt to glean what I can from it.
(Following up comment #44) Note that the above stack was taken on OS X 10.7.5. It may be slightly different on OS X 10.6.8 (where all the crashes have been happening).
(In reply to Steven Michaud from comment #43) > > FlashPlayer users CoreAnimation to render its image to the window by > > default. When the gpu is blacklisted due to driver bugs with > > CoreAnimation (as the Intel GMA950 and x3100 are), or when the user > > explicitly disables hardware acceleration through the settings UI, > > then FlashPlayer will fall back to using the less efficient > > CoreGraphics api. > > This is at least partly false: Testing with Flash 11.5.502.110 in > current trunk code (on OS X 10.7.5), Flash still uses Invalidating > Core Animation when you explicitly disable hardware acceleration. > Sorry, I was imprecise. I'm a manager, not an engineer :) ICA is still used, but we use CG to do the presentation of the frame to the screen, rather than OpenGL. You can use CG with CA, its just that the OGL based presentation api is faster.
I'm no longer so sure the stack from comment #44 matches our crash stacks. I'll keep digging over the next few days. Sigh.
(In reply to comment #46) > ICA is still used, but we use CG to do the presentation of the frame > to the screen, rather than OpenGL. You can use CG with CA, its just > that the OGL based presentation api is faster. Thanks for the info. This make a lot more sense. Which means that we're no closer to figuring out how to reproduce the codepath on which this bug's crashes happen :-(
Target Milestone: --- → mozilla19
Did my patch here have any effect?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #51) > Did my patch here have any effect? No. See https://crash-stats.mozilla.com/report/list?version=Firefox%3A19.0a1&range_value=4&range_unit=weeks&signature=CGSConvertRGBX8888toRGBA8888
(In reply to comment #13) > Removing needURLs, we don't get any URLs for these types of crashes. That's puzzling. Anyone know why not? Also, anyone know why there aren't any comments associated with these crashes. Is it just that, because they're plugin crashes, users don't see the Breakpad UI?
(In reply to Steven Michaud from comment #53) > Is it just that, because they're plugin crashes, users don't see > the Breakpad UI? Yes, you only get to see the comment box on browser crashes.
Chris Nuuja: does Flash use its NSView's size and/or position anywhere in its drawing code? E.g. to size its drawing buffers? Steven Michaud: one thing you could try is to log the NSView size/position changes we send to the plugin, alongside the draw events we send to the plugin and the SetWindow calls (both window rect and clip rect) we send to the plugin, in builds before and after bug 626245, and see if there are any obvious differences in timing or in the values we send. I wonder if we're using a zero size in more places and causing something to crash for that reason.
Flags: needinfo?(cnuuja)
Anyone know of a better disassembler than gdb? One that does a better job tracing cross-references within the code of FlashPlayer-10.6? I've tried "Hopper Disassembler" (http://www.hopperapp.com/), and it does appear to be better at this than gdb. But it's not good enough. I've been able to use Hopper to identify the internal method from which CGContextDrawImage() is called when this bug's crashes happen (at least I think so). But it won't tell me this method's callers (probably because it's called through one or more levels of indirection). By the way, the method I found is at offset 0x56d590 in the 64-bit part of FlashPlayer-10.6 in the current version (11.5.502.110) of the Flash plugin.
By the way, it'd probably help a great deal if I could get access to the raw offsets in FlashPlayer-10.6 in Socorro's stacks, instead of the obfuscated F_... symbols that it displays. And no, these aren't in the "raw dump".
(Following up comment #58) Better still, it'd be nice to have the algorithm that Socorro uses to generate the F_... symbols. (No, I haven't been able to figure out how 0x56d590 gets translated into F_1290421835____________________________________.)
Socorro doesn't generate the F symbols, Adobe does it to obfuscate their actual symbols.
Do we have any record of the raw addresses (for Socorro stacks) in parts of the Flash plugin like FlashPlayer-10.6?
Sorry for the delay, been on vacation. We don't allocate the buffers based on the NSView size/position, but garbage values for the size/position could well cause us to crash. Not sure what Steven is after regarding who calles CGContextDrawImage(). Is this based on crash dump analysis? or do we have a reproducible case ? Knowing the call stack within Flash is probably not useful if the problem is due to incorrect NSView size/position values. If we knew what the size/position was at the time of the crash and what they should be, we could force the incorrect values in the debugger on the Flash side and attempt to reproduce the crash that way.
Flags: needinfo?(cnuuja)
> Not sure what Steven is after regarding who calls > CGContextDrawImage(). Is this based on crash dump analysis? Yes, partly. I'm still trying (as I've been for weeks) to learn as much as possible about the code path on which the crashes happen. With better disassemblers, I may be able to learn more. > or do we have a reproducible case? No we don't. (And if we did it'd have been mentioned earlier in this bug.) > Knowing the call stack within Flash is probably not useful if the > problem is due to incorrect NSView size/position values. It's just a guess that these crashes have anything to do with NSView sizes and positions.
I don't think we'd have garbage sizes/positions, but the size might be zero at slightly different times to before. (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #55) > Steven Michaud: one thing you could try is to log the NSView size/position > changes we send to the plugin, alongside the draw events we send to the > plugin and the SetWindow calls (both window rect and clip rect) we send to > the plugin, in builds before and after bug 626245, and see if there are any > obvious differences in timing or in the values we send. Steven, can you do this?
(In reply to Steven Michaud from comment #59) > Better still, it'd be nice to have the algorithm that Socorro uses to > generate the F_... symbols. Socorro doesn't generate them, Adobe does. They give us obfuscated symbol files. Maybe we need to work with Adobe there, Benjamin is in contact with them for some Windows Flash stuff already, so CCing him.
(Bah, and between a load of bugmail and trying to go fast through all of them I of course didn't spot the comments from Benjamin and Chris Nuuja of Adobe, which make my comment #65 completely superfluous...)
>> Steven Michaud: one thing you could try is to log the NSView >> size/position changes we send to the plugin, alongside the draw >> events we send to the > plugin and the SetWindow calls (both window >> rect and clip rect) we send to > the plugin, in builds before and >> after bug 626245, and see if there are any > obvious differences in >> timing or in the values we send. > > Steven, can you do this? I frankly think this is very unlikely to help. But I'll try to find time for it later this week, or early next week.
(In reply to comment #65) > They give us obfuscated symbol files. These files, if I can see them, should tell me what I need to know. For example they should show what raw address corresponds to F_1290421835____________________________________.
For what it's worth, here are the raw addresses corresponding to the obfuscated F_... symbols in this bug's x86_64 mode crash stacks (at least for the current version of the Flash plugin, which is 11.5.502.110). I found them by using gdb to get the CFRunLoopSourceContext.perform target for one of the Flash plugin's calls to CFRunLoopSourceCreate(), then looking for "calls" at the appropriate offsets of this method and those called by it, down the stack chain. The names "sub_xxxxxx()" contain the hex offsets of these methods in the x86_64 part of the FlashPlayer-10.6 binary. sub_4ff740() and sub_56d590() aren't normally called, at least on OS X 10.7.5, even when Flash hardware acceleration is turned off. But sub_4ff940() is. I'll see what more I can find out, especially on OS X 10.6.8. (For the record, I also did this for the i386 mode crash stacks. I'll post my results if anyone is interested.)
This is Adobe 3373629
Summary: OSX Flash player crash in F_1290421835 @ CGSConvertRGBX8888toRGBA8888 with Intel GMA 950/X3100 GPUs → [adbe 3373629] OSX Flash player crash in F_1290421835 @ CGSConvertRGBX8888toRGBA8888 with Intel GMA 950/X3100 GPUs
I can reproduce this on an older Mac - it happens every time in Yahoo mail. Using Version: 11.5.502.110, but it also happened using 11.4.x. Running Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:18.0) Gecko/18.0 Firefox/18.0 which is the first 18 beta.
Keywords: reproducible
QA Contact: mozillamarcia.knous
> I can reproduce this on an older Mac - it happens every time in Yahoo mail. What video hardware are you using on this Mac? Is it one of those mentioned in comment #17? Is there a quick and easy way to get a Yahoo mail account? Do you by any chance have a testing account?
Marcia, please at least tell me what video hardware you're using on your "older Mac", and whether or not you've disabled Flash's hardware acceleration.
Steven: here is the information regarding the graphics card in this older mac -it matches what is Comment 17. You can sign up for a Yahoo mail account on their site. Graphics Vendor ID 0x8086 Device ID 0x2a02 WebGL Renderer Intel Inc. -- Intel GMA X3100 OpenGL Engine -- 2.0 APPLE-1.6.36 GPU Accelerated Windows 0 AzureBackend quartz In this instances Flash hardware acceleration is pref'd on.
Thanks Marcia. Next week I'll try pretending to the Flash plugin that I have one of the "bad" kinds of graphics cards (presuming I can figure out how Flash determines what graphics hardware it's working with). > it happens every time in Yahoo mail Please expand on this.
Please also confirm whether or not you have Flash-specific hardware acceleration turned on or off. You can see this by right-clicking on a Flash movie (while it's running) and choosing "Settings".
(Following up comment #76) Sorry, please ignore that question. You've already told me the answer: > In this instances Flash hardware acceleration is pref'd on.
When the main Yahoo mail loads, it seems each time when I look at the top of the page there is already a notification that the flash plugin has crashed. I don't think necessarily anything triggers the crash from what I have seen, but I will look into it further. (In reply to Steven Michaud from comment #75) > Thanks Marcia. Next week I'll try pretending to the Flash plugin that I > have one of the "bad" kinds of graphics cards (presuming I can figure out > how Flash determines what graphics hardware it's working with). > > > it happens every time in Yahoo mail > > Please expand on this.
Hooray Marcia! Now we need to get a developer in front of your Mac to try to figure out what's going on ... hopefully to figure out what the offending parameters to CGContextDrawImage are, at least. I'd also be interested in finding out what the plugin's latest SetWindow was, and what size and position its NSView are. jschoenick is in Mountain View. Perhaps he can help?
roc: It actually is my mother's mac, which is not located in Mountain View. I will only be near the machine today, so can probably help troubleshoot remotely if you guys let me know what you need to know. This crash has moved up in Beta and in the advanced Socorro query is now #11 top crash overall (not just Mac).
Backing out bug 626245 would fix this crash.
I've closed this bug on our side. If you need additional action from us, please let me know.
> I've closed this bug on our side. What does this mean? That you've fixed it?
No, it means that it's our bug and Adobe doesn't need to track it.
We shall see.
(Following up comment #75) > Next week I'll try pretending to the Flash plugin that I have one of > the "bad" kinds of graphics cards (presuming I can figure out how > Flash determines what graphics hardware it's working with). I've got the Flash plugin partially fooled into thinking it's running on the Intel X3100 graphics card (by hooking CGLDescribeRenderer() and altering its return value). But this makes the Flash plugin fall back to using the CoreGraphics drawing model (and makes it call CGContextDrawImage() in stacks identical to the one in comment #23, which doesn't match the crash stacks). So there's probably some other method I also need to hook. I'll continue working on this tomorrow.
(In reply to Scoobidiver from comment #81) > Backing out bug 626245 would fix this crash. We can't do that. That's a really bad bug. (In reply to Steven Michaud from comment #86) > I've got the Flash plugin partially fooled into thinking it's running > on the Intel X3100 graphics card (by hooking CGLDescribeRenderer() and > altering its return value). I don't think this is a good use of your time. Let's just buy Marcia's mom's machine or one just like it.
Or find another one. I made a post to dev-platform to see if anyone listening there has one.
> I don't think this is a good use of your time. I think I'm the best judge of that. > Let's just buy Marcia's mom's machine or one just like it. Best of all would be if either the Intel GMA 950 or the Intel X3100 can be extracted and installed in another computer. Then we could either buy the card alone, or buy the computer and extract the card, and send it to me (or to whoever's going to be working on this bug).
We have plenty of Mac Minis with those specs. Please ping me for credentials so you can get access to mine here in the MV office where I was able to reproduce this Flash crash. This machine is running 10.6.8, Flash 11.4.402.265. Loading Yahoo! mail does it. https://crash-stats.mozilla.com/report/index/bp-bea4aad3-018a-4b02-a987-e78392121207
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #87) > (In reply to Scoobidiver from comment #81) > > Backing out bug 626245 would fix this crash. > We can't do that. That's a really bad bug. Crashing is worse (this bug and bug 816896 accounts for about 30% of Mac crashes) and you can back it out only in 18.0 before it's too late to land a big change. That will give you six more weeks to investigate.
(In reply to Scoobidiver from comment #92) > Crashing is worse (this bug and bug 816896 accounts for about 30% of Mac > crashes) and you can back it out only in 18.0 before it's too late to land a > big change. That will give you six more weeks to investigate. Bug 626245 is terribly ugly on Windows, which has a lot more users. It's also a regression in 18 (related to DLBI). And backing it out correctly would be very difficult at this point.
Crash Signature: [@ CGSConvertRGBX8888toRGBA8888] → [@ CGSConvertRGBX8888toRGBA8888] [@ sseCGSConvertXXXX8888Mask]
FYI my wife's MacBook has an X3100 if you need any more poking.
Thanks, Ted. I have access to Juan's machine, and that should be enough. > I've got the Flash plugin partially fooled into thinking it's running > on the Intel X3100 graphics card (by hooking CGLDescribeRenderer() and > altering its return value). What I've seen on Juan's machine tells me my spoofing was completely successful (I see exactly the same results on Juan's machine without spoofing as I do on mine with spoofing). So it's quite likely we'll soon be able to reproduce these crashes on any graphics hardware. I should know in the next few hours.
I'm able to reproduce these crashes (though the top's different) on OS X 10.7.5 with my spoofed build. More details later. For now I'm just commenting to save the Breakpad crash id, which is: bp-2cf2379d-5315-48db-a97b-805a42121207
(In reply to Steven Michaud from comment #96) > bp-2cf2379d-5315-48db-a97b-805a42121207 This one is bug 816896.
Keywords: testcase-wanted
Here's my spoof patch. And here's a tryserver build made from it: http://ftp-scl3.mozilla.com/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com-751f743c4d1a/try-macosx64/firefox-20.0a1.en-US.mac.dmg It will actually spoof Intel X3100 hardware to any plugin (that uses CGLDescribeRenderer() to detect the hardware).
Attachment #689784 - Attachment description: Patch to spoof Intel X3100 harware to Flash plugin (*not* a fix) → Patch to spoof Intel X3100 hardware to Flash plugin (*not* a fix)
Crash stack from my spoof build running on OS X 10.6.8: bp-1b36ce67-44f2-485a-8edb-9a2832121207 Note that these crashes are virtually 100% reproducible with the following steps: 1) Visit http://mail.yahoo.com/ 2) Log in to your account 3) Wait 10-15 seconds At the top of the browser page (below the page chrome) you should see a yellow tab telling you that "the Adobe Flash plugin has crashed".
This is a Flash bug. The proof follows. I've already thought of ways we might work around it (which I'll try out next week). But it'd be really nice for Adobe to acknowledge this bug and fix it themselves. In CoreGraphics drawing mode (which the Flash plugin uses when it detects Intel GMA 950 or Intel X3100 graphics hardware), a CGContextRef is provided to the plugin during a call to NPP_HandleEvent(NPCocoaEventDrawRect). This CGContextRef is only guaranteed to be valid during that call, so using it at any other time is inviting trouble. It turns out this bug's crashes (and those of bug 816896) happen when a CGContextRef is accessed (by Flash code) outside of any call to NPP_HandleEvent(NPCocoaEventDrawRect). Furthermore, the particular CGContextRef that's accessed is associated with a plugin instance that's just been deleted! My log pretty much speaks for itself, so I won't do a detailed commentary. Note, though, that if you try to reproduce what I've done, you'll get a more intelligible log if you turn off jemalloc (jemalloc loves to allocate new objects of a particular type (like a CGContextRef) at the address of a same-type object that's just been deleted). To do this, run FF from the command line, and beforehand enter "export NO_MAC_JEMALLOC=1". (The crashes happen with or without jemalloc.) In my next comment I'll post the debugging patch that I used to generate this log. I still don't know why the patch for bug 626245 triggered such a huge increase in the frequency of these crashes. But I suspect that it somehow caused plugin resources to be deleted more quickly.
(Following up comment #100) The CGContextRef accessed by the crashing call to CGContextDrawImage() doesn't seem itself to have been deleted. This is shown (I think) by the fact that a call on it to CGContextGetType() succeeds (returns the correct value and doesn't crash). Instead I think the data to which it points has been deleted. That a CGContextRef is of type 4 means that is is a bitmap context (created by a call to CGBitmapContextCreate() or CGBitmapContextCreateWithData()).
I've started a tryserver build, which should be available in a few hours.
Just noticed that my log (towards the end) shows Firefox calling Flash's NPP_HandleEvent(NPCocoaEventDrawRect) with what must be a (somehow) "bad" CGContextRef (one associated with a plugin instance that's just been deleted). So though this is clearly a Flash bug (since Flash uses the CGContextRef outside of any call to NPP_HandleEvent(NPCocoaEventDrawRect)), we appear also to be doing something wrong. I'll look more closely into this next week.
Mozilla-central seems to have gotten busted (by patches for bug 769288). So my tryserver build may take longer than I expected.
(In reply to Steven Michaud from comment #103) > Just noticed that my log (towards the end) shows Firefox calling > Flash's NPP_HandleEvent(NPCocoaEventDrawRect) with what must be a > (somehow) "bad" CGContextRef (one associated with a plugin instance > that's just been deleted). Are you sure that's not a new CGContextRef allocated at the same address?
It's definitely the same CGContextRef. But now I also know that's not a problem. I'll give a fuller explanation on Monday.
(Following up comment #107) In CoreGraphics drawing mode (when the plugin is running out-of-process) we use a CALayer subclass (CGBridgeLayer) to implement it. When -[CGBridgeLayer display] is called (via -[CGBridgeLayer displayIfNeeded]), this triggers a call from the OS to -[CGBridgeLayer drawInContext:(CGContextRef)], which (indirectly) calls NPP_HandleEvent(NPCocoaEventDrawRect) in the plugin. In normal practice, at least, -[CGBridgeLayer drawInContext:(CGContextRef)] is always called with the same CGContextRef, even after one plugin has been destroyed and another instantiated. Here's how this happens. In the process of setting up for each call to -[CGBridgeLayer drawInContext:], the OS calls CAGetCachedCGBitmapContext() (a non-exported method in the QuartzCore framework). The first time this method is called, CGBitmapContextCreate() is called and the result is stored to the head of context_list (__ZL12context_list, another unexported symbol), which appears to be a linked list. Subsequent calls to CAGetCachedCGBitmapContext() always (at least under normal circumstances) return this same "cached" CGContextRef. Puzzling, isn't it. Each new plugin will (in principle) require a different sized bitmap context, which should (in principle) require a new CGContextRef. CAGetCachedCGBitmapContext() gets around this problem by calling CGBitmapContextSetData() on the "cached" CGContextRef, each time with new data. CGBitmapContextSetData() is an undocumented method whose signature as best I can tell is CGBitmapContextSetData(CGContextRef c, unsigned int bitsPerRow, unsigned int bytesPerComponent, unsigned int width, unsigned int height, void *data). I've been trying in gdb to find out when the "data" gets deleted, so far without any luck. There's also a CAReleaseCachedCGContext(), which gets called after each call to -[CGBridgeLayer drawInContext:]. This doesn't appear to release the "data". And of course it doesn't normally release the "cached" CGContextRef.
I'm happy to see we're (or Steven is) understanding better and better what's going on here. Do we have any clue yet where the actual bug is and who needs to solve it (and/or how)? And if that's Apple or Adobe, if we can work around the problem reasonably?
> Do we have any clue yet where the actual bug is It's quite clear this is Adobe's bug. See comment #100 and comment #103. I'm reasonably confident I'll be able to find a workaround, but that may take a few days.
(In reply to Steven Michaud from comment #110) > It's quite clear this is Adobe's bug. See comment #100 and comment #103. Thanks, wasn't sure if the new insights changed the view of that. Jeromie, can you pick this up again on your side so we get a fix in future Flash versions? > I'm reasonably confident I'll be able to find a workaround, but that may > take a few days. OK, sounds good! Maybe we get both that and a long-term fix from Adobe! :)
We don't think this is a Flash bug, that context should remain valid between calls. The player drives its own update/rendering loop, as well as responding to requests from the browser from NPPHandleEvent. Updates do not necessarily originate from the browser host. We need render outside of NPPHandleEvent. We update the screen according to the content's authored framerate, as well as when actionscript requests a screen update programmatically. Its fairly common for content to change their framerate dynamically. In addition, video updates at the frequency the media is encoded at (24fps, 30fps, 60fps). To maintain timing/sync we need to manage our top level event loop ourselves, we can't be driven by NPPHandleEvent. "This CGContextRef is only guaranteed to be valid during that call, so using it at any other time is inviting trouble." We can't find any documentation to that effect. If this is a new requirement you are suggesting, then we'll need a new api to get a valid GCContextRef outside of NPPHandleEvent. (In reply to Steven Michaud from comment #103) > Just noticed that my log (towards the end) shows Firefox calling > Flash's NPP_HandleEvent(NPCocoaEventDrawRect) with what must be a > (somehow) "bad" CGContextRef (one associated with a plugin instance > that's just been deleted). > > So though this is clearly a Flash bug (since Flash uses the > CGContextRef outside of any call to > NPP_HandleEvent(NPCocoaEventDrawRect)), we appear also to be doing > something wrong. I'll look more closely into this next week.
We don't think this is a Flash bug, that context should remain valid between calls. The player drives its own update/rendering loop, as well as responding to requests from the browser from NPPHandleEvent. Updates do not necessarily originate from the browser host. We need render outside of NPPHandleEvent. We update the screen according to the content's authored framerate, as well as when actionscript requests a screen update programmatically. Its fairly common for content to change their framerate dynamically. In addition, video updates at the frequency the media is encoded at (24fps, 30fps, 60fps). To maintain timing/sync we need to manage our top level event loop ourselves, we can't be driven by NPPHandleEvent. "This CGContextRef is only guaranteed to be valid during that call, so using it at any other time is inviting trouble." We can't find any documentation to that effect. If this is a new requirement you are suggesting, then we'll need a new api to get a valid GCContextRef outside of NPPHandleEvent. (In reply to Steven Michaud from comment #103) > Just noticed that my log (towards the end) shows Firefox calling > Flash's NPP_HandleEvent(NPCocoaEventDrawRect) with what must be a > (somehow) "bad" CGContextRef (one associated with a plugin instance > that's just been deleted). > > So though this is clearly a Flash bug (since Flash uses the > CGContextRef outside of any call to > NPP_HandleEvent(NPCocoaEventDrawRect)), we appear also to be doing > something wrong. I'll look more closely into this next week.
> "This CGContextRef is only guaranteed to be valid during that call, > so using it at any other time is inviting trouble." We can't find > any documentation to that effect. After a quick look, I haven't been able to turn up any documentation, either. But it's only an accident that a CGContextRef passed in a call to NPP_HandleEvent() has ever been usable outside that call -- in either CoreGraphics mode or the CoreAnimation modes. I suspect the same is true of other browsers. At some point when I have more time (which may not happen for many months), I'll try to write a test plugin to determine how other browsers behave in a similar case. Josh, Benoit, what do you think? Do you know if this issue has ever come up before? Fortunately this is only a real issue with the CoreGraphics drawing model. In the CoreAnimation models the browser can, in principle, sometimes call NPP_HandleEvent(NPCocoaEventDrawRect), but I think this quite rare. And in any case it'd be completely unnecessary for the plugin to draw to that call's CGContextRef outside of the call -- in CoreAnimation mode a plugin can draw to its CALayer at any time. In the meantime I'll continue looking for a workaround. It'll probably be quite hacky, but I don't think we have any other option. I don't think it'd be reasonable for us to try to guarantee that a CGContextRef passed with a call to NPP_HandleEvent(NPCocoaEventDrawRect) is usable outside that call. > then we'll need a new api to get a valid GCContextRef outside of > NPPHandleEvent. This might work. But again it'd be only necessary for CoreGraphics mode -- which most major plugin vendors no longer use, and which even Flash only uses under unusual circumstances.
> We update the screen according to the content's authored framerate, > as well as when actionscript requests a screen update > programmatically. Its fairly common for content to change their > framerate dynamically. In addition, video updates at the frequency > the media is encoded at (24fps, 30fps, 60fps). To maintain > timing/sync we need to manage our top level event loop ourselves, we > can't be driven by NPPHandleEvent. Chis, please give me examples of this (URLs). I want to test them in CoreGraphics and CoreAnimation modes in Firefox, and also in other browsers.
Another thing, Chris. You can see from my log from comment #100 that the Flash plugin refuses twice to draw the flashing plugin during successive calls to NPP_HandleEvent(NPCocoaEventDrawRect), before finally trying to draw it from its event loop source. Also note that this plugin is 1 pixel high and 1 pixel wide. I wonder why Flash refuses to draw this plugin during calls to NPP_HandleEvent(NPCocoaEventDrawRect). If we knew why, it might be easier to come up with a workaround.
> flashing plugin crashing plugin
> I still don't know why the patch for bug 626245 triggered such a > huge increase in the frequency of these crashes. But I suspect that > it somehow caused plugin resources to be deleted more quickly. It's now quite clear that this bug has nothing to do with deleted plugin instances: It happens even if the 1x1 pixel plugin is the only plugin on the page. So now I have *no* idea why the patch for bug 626245 increased the frequency of these crashes, and am beginning to wonder *if* it did so.
> and am beginning to wonder *if* it did so I'm now bisecting to find out for sure.
Just tried FF 17.0.1 with my patch from comment #102. This bug's crash doesn't happen. But, though Flash also uses CoreGraphics with FF 17.0.1 and my spoofing code, it never calls CGContextDrawImage() from its event loop source (i.e. from outside a call to NPP_HandleEvent(NPCocoaEventDrawRect)).
>> and am beginning to wonder *if* it did so > > I'm now bisecting to find out for sure. Yes, it turns out that the increase in frequency of these crashes was triggered by one of the patches for bug 626245: http://hg.mozilla.org/mozilla-central/rev/be09855c0f5c Bug 626245. Part 4: Compute plugin widget geometry updates via the refresh driver's painting, and defer actual widget updates until we've just composited the window. r=mats author Robert O'Callahan <robert@ocallahan.org> Sun Oct 07 02:03:23 2012 +1300 (at Sun Oct 07 02:03:23 2012 +1300) I'll try to figure out why it was this patch, in particular, that did it.
Hmm, does bug 785348 fix this by any chance? (just landed on m-c)
> Hmm, does bug 785348 fix this by any chance? (just landed on m-c) Nope :-(
Attached file Reduced testcase (obsolete) —
Here's a reduced testcase. This still (of course) requires one of my spoofed builds (unless you're lucky/unlucky enough to have Intel GMA 950 or X3100 hardware). I'll try to reduce it still further.
This is as far as I can reduce it. Some interesting facts: 1) The plugin needs to be in an absolute-positioned div, and the div's top needs to be a fairly large number of pixels above the top of the page. Get rid of the div, or stop it being absolutely positioned, or make its top a small enough number of pixels above the top of the page, and the bug doesn't happen. This is puzzling -- Flash apparently knows about the div, but how? (I'd guess Flash learns this from NPNVPluginElementNPObject, NPNVWindowNPObject and/or NPNVdocumentOrigin.) 2) The 'width' and 'height' of the plugin need to be quite small (either in percentages or in numbers of pixels). Make them big enough and the bug doesn't happen, even if the plugin doesn't display. When the crash doesn't happen, most of the time all calls from the Flash plugin to CGContextDrawImage() happen inside calls to NPP_HandleEvent(NPCocoaEventDrawRect). But if you get 'width' and 'height' just right (small but not too small), you get a single call to CGContextDrawImage() from Flash's run loop source (i.e. from outside any call to NPP_HandleEvent()) that *doesn't* crash. CGContextDrawImage() is definitely crashier when called from outside NPP_HandleEvent(). But the fact that it doesn't always crash suggests that this bug's crashes aren't a simple case of accessing deleted memory.
Attachment #690933 - Attachment is obsolete: true
Comment on attachment 691016 [details] Testcase, further reduced > http://mirrors.creativecommons.org/reticulum_rex/cc.remixculture.101906.swf I put this in my testcase for two reasons: 1) It's publicly accessible. 2) It has sound, so it's easier to tell whether or not it's crashed.
When a plugin instance is positioned offscreen, Firefox informs the plugin to stop painting the plugin by sending a NPP_SetWindow with a clipRect of 0,0,0,0. Did you keep a log of the SetWindow calls in this case, to see if this is being sent?
> Did you keep a log of the SetWindow calls in this case, to see if this is > being sent? Oops, I forgot to do that. I'll do it, then post a revised version of my patch from comment #102.
Here's another crash trace, one that logs calls to NPP_SetWindow(). I made it using my testcase from comment #125. Very interestingly, the calls to NPP_SetWindow() from PluginInstanceChild::UpdateWindowAttributes() and PluginInstanceChild::EnsureCurrentBuffer() pass different information. The call from UpdateWindowAttributes() passes the "correct" clipRect (0,0,0,0). But the calls from EnsureCurrentBuffer() pass an "incorrect" clipRect (0,0,200,5). And both calls from EnsureCurrentBuffer() happen after the one from UpdateWindowAttributes(). This could be the key to the problem! I'll try my new logging patch against FF 17.0.1.
Here's the patch I used to make the log from comment #129. I've started a tryserver build.
Attachment #689784 - Attachment is obsolete: true
Attachment #689784 - Attachment is obsolete: false
Attachment #690004 - Attachment is obsolete: true
> This could be the key to the problem! No such luck. FF 17.0.1 also (sometimes) sends the wrong clipRect.
An engineer here at Adobe tried to reproduce the crash on http://mirrors.creativecommons.org/reticulum_rex/cc.remixculture.101906.swf and could not. He tried both disabling HW acceleration in the flash settings panel as well as with a custom build where FlashPlayer always thinks its on an Intel X3100 (we actually have method internally called HasIntelGMA9xxOrX31xx). Neither attempt reproduced the crash. He even tried release builds (i.e. not debuggable builds) on the theory that it was timing related. Would like to help with additional info, but until we can reproduce its difficult.
Chris, I can't give you any more information than is already in this bug. In order to see these crashes on graphics hardware other than Intel GMA 950 or X3100, you will (of course) need my spoof patch. Have you tried loading my testcase from comment #125? For me it repros these crashes 100% of the time.
Also note that you should test with the latest Flash version. With older Flash versions, FF forces the user to click on Flash plugins to make them run. This behavior stops this bugs crashes from being reproducible, since they only happen with invisible plugins.
Attached patch Fix (obsolete) — Splinter Review
Here's a fix for this bug (really a workaround). It actually makes the CGContextRef passed with a call (to the Flash plugin) to NPP_HandleEvent(NPCocoaEventDrawRect) "usable" outside that call, though in a very narrow sense -- it just doesn't trigger a crash. I've limited my fix to the Flash plugin, and to CoreGraphics mode. Tomorrow I'll further limit it to versions of OS X no higher than 10.8.X (or perhaps 10.7.X). We might in principle have continued trying to figure out how to persuade Flash not to try to access our CGContextRef outside of any call to NPP_HandleEvent(NPCocoaEventDrawRect) -- as it was generally doing before the patch for bug 626245. But my patch's approach is much simpler and more efficient. I've started a tryserver build, which should be available in a few hours. It will (unfortunately) only be useful for people who have Intel GMA 950 or X3100 graphics hardware. So I've also started two more tryserver builds -- one with my fix plus my spoofing patch, and the other with my fix plus my spoofing patch plus my logging patch. I'll post those patches in my next two comments.
Assignee: roc → smichaud
Status: NEW → ASSIGNED
I've tested my patches with spoofing on OS X 10.6.8, 10.7.5 and 10.8.2, without seeing any problems. Tomorrow I'll test the fix without spoofing on Juan's machine. I'll post the tryserver builds as soon as they're available.
Scoobidiver, I've got a question for you (or for anyone who knows quite a lot about getting information from Socorro): This bug's crashes are reproducible on OS X 10.8.2 with my spoofed build from comment #98. But I'd expect the number to be very small, because Intel GMA 950 and X3100 hardware is quite old, and the machines that have it are therefore unlikely to be running Mountain Lion (or even to be able to run Mountain Lion). Are you able to check? Note that the signatures are quite different on OS X 10.8 than they are on either 10.6 or 10.7. Here are some examples (from testing with my spoofed build): bp-0416505f-cff8-4abe-a937-946ff2121212 bp-4abf9026-cd0a-460a-a147-5eb102121212 bp-3a13f533-6482-4266-aef7-6d0fb2121212 The reason I ask is that my fix makes assumptions about undocumented behavior of the OS. So we should probably limit it to those versions of OS X where we know it works. The question is whether or not we should include 10.8 inside the limit. My patch works just fine on OS X 10.6.X, 10.7.X and 10.8.X. Knowing Apple's past behavior wrt undocumented APIs that are used by their own apps and frameworks, it's unlikely Apple will do anything to break my patch even in future major releases. But there *is* a chance -- Apple sometimes does make major changes to their undocumented APIs in new major releases. (So, for example, I generally had to make a new release of my Java Embedding Plugin (which used lots of undocumented APIs) every time Apple did a new major release of OS X).
The different signature from the crashes listed in comment 143 are very likely because breakapd lacks the symbol information for that version of CoreGraphics, so it's just giving you the offset signature. Of crashes with the actual signature "CGSConvertRGB...", 97% are with MacOS 10.6, and 3% are with MacOS 10.7 (you can see this at the signature summary page here: https://crash-stats.mozilla.com/report/list?signature=CGSConvertRGBX8888toRGBA8888) Your three reports are the only three for those offsets specifically. Of the other crashes with "CoreGraphics@someoffset" in the signature (this link: https://crash-stats.mozilla.com/query/query?product=Firefox&version=ALL%3AALL&platform=mac&range_value=2&range_unit=weeks&date=12%2F13%2F2012+17%3A45%3A27&query_search=signature&query_type=contains&query=CoreGraphics&reason=&build_id=&process_type=any&hang_type=any&do_query=1 ) These all seem to be from 10.6 and 10.7. But since adoption of 10.8 is pretty small in general, we may not be getting enough data to be sure.
Thanks, Benjamin! Did you include the crash stacks from bug 816896 in your analysis? That bug's really a dup of this one, as can be seen from comment #96 and comment #97.
(In reply to Steven Michaud from comment #143) > This bug's crashes are reproducible on OS X 10.8.2 with my spoofed build > from comment #98. But I'd expect the number to be very small, because Intel > GMA 950 and X3100 hardware is quite old, and the machines that have it are > therefore unlikely to be running Mountain Lion (or even to be able to run > Mountain Lion). For the last four weeks, there have been no crashes on Mac OS X 10.8 for the signatures of this bug and bug 816896 (see https://crash-stats.mozilla.com/query/query?product=Firefox&version=ALL%3AALL&platform=mac&range_value=4&range_unit=weeks&query_search=signature&query_type=startswith&query=CoreGraphics&process_type=plugin&hang_type=crash&plugin_field=filename&plugin_query_type=exact&do_query=1) except those you pointed in comment 143. In addition, OS X 10.8 doesn't support Intel GMA 950 or x3100 integrated graphics card.
Attached patch Fix rev2Splinter Review
Thanks, Scoobidiver! I've gone ahead and limited my patch to OS X 10.7.X and below.
Attachment #691659 - Attachment is obsolete: true
Attachment #691885 - Flags: review?(bgirard)
Crash Signature: [@ CGSConvertRGBX8888toRGBA8888] [@ sseCGSConvertXXXX8888Mask] → [@ CGSConvertRGBX8888toRGBA8888] [@ sseCGSConvertXXXX8888Mask] [@ CoreGraphics@0x3167c0]
Comment on attachment 691885 [details] [diff] [review] Fix rev2 Review of attachment 691885 [details] [diff] [review]: ----------------------------------------------------------------- It's a shame having to add a flash quirk rather then getting a proper fix for this. I fell like a better work around would be to keep a global CGContextRef clipped and sized 1x1 (or even 0x0 if possible) and send a dummy event after the real paint. Assuming here that flash tries to access the last CGContextRef it's seen. I think the patch here is fine but I think we could find a more elegant solution, but the downside of waiting is too high.
Attachment #691885 - Flags: review?(bgirard) → review+
> I fell like a better work around would be to keep a global > CGContextRef clipped and sized 1x1 (or even 0x0 if possible) and > send a dummy event after the real paint. I think that's too likely to confuse Flash. My workaround is definitely a hack. But it's a very safe one, and has the strong advantage that it doesn't change the information we send to Flash. A real fix for this bug can only come from Adobe.
Whiteboard: [leave open]
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
(In reply to Steven Michaud from comment #152) > Landed on mozilla-inbound: > https://hg.mozilla.org/integration/mozilla-inbound/rev/fefbcfe3575a If this is low risk , please nominate for uplift on aurora/beta so that we can get this into our fifth beta, going to build Tomorrow.
Comment on attachment 691885 [details] [diff] [review] Fix rev2 [Approval Request Comment] Bug caused by (feature/regressing bug #): Flash bug made significantly worse by patches for bug 626245. User impact if declined: This bug accounts for about 30% of Mac crashes Testing completed (on m-c, etc.): Some hand testing by myself, none yet by others. Risk to taking this patch (and alternatives if risky): Low to moderate risk (with more testing I'd have said "low risk") String or UUID changes made by this patch: none
Attachment #691885 - Flags: approval-mozilla-beta?
Attachment #691885 - Flags: approval-mozilla-aurora?
Juanb,Marcia can we please get some testing around this to get as much feedback possible before we take this in for beta 5 considering the risk in comment 155 ?Thanks !
Attachment #691885 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Keywords: qawanted, verifyme
Juan and Marcia, in your testing please test that my patch fixes the bug, and also that it doesn't cause other problems. For the second part, I'd say it's enough to test a few of your favorite Flash sites. If it's not convenient to do some of this testing on Intel GMA 950 or X3100 hardware, I'd say it's fine to use my spoofed build from comment #149. Do at least test on that hardware to confirm the bug is fixed, though. (I've already tested on Juan's machine, and could no longer reproduce the bug.)
I tested with the two tryserver builds in Comment 148 and the Yahoo mail crash is no longer there. Using Flash Version: 11.5.502.136 and using a mac mini that has GMA 950. Will test nightly tomorrow when the fix is in, or perhaps a daily tinderbox build if I have time today.
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:19.0) Gecko/20121218 Firefox/19.0 I tried to reproduce the Flash plugin crash on latest Aurora build but I had no success: Flash plugin didn't crash (I tried the steps from comment #99, loading the testcase from comment #126, watching videos on youtube - even from playlist, playing flash games on Facebook and other sites, visiting flash sites). Machine: Mini Mac, 1.5 GHz Intel Core Solo, Intel GMA 950 Graphics: (from about:support) Device ID0x27a2GPU Accelerated Windows1/1 OpenGLVendor ID0x8086WebGL RendererIntel Inc. -- Intel GMA 950 OpenGL Engine AzureCanvasBackend quartz AzureContentBackend none AzureFallbackCanvasBackend none
Thanks Mihaela! It's *very* strange this hasn't yet landed on the trunk. If need be I'll land it myself.
> It's *very* strange this hasn't yet landed on the trunk. If need be I'll land > it myself. Oops. Missed comment #153.
Comment on attachment 691885 [details] [diff] [review] Fix rev2 Approving for beta, considering the testing we have had recently.Request to QA to have as much testing possible around this bug before the beta 5 release.
Attachment #691885 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:18.0) Gecko/20100101 Firefox/18.0 - beta 5 Build id 20121219074241 Verified also on Firefox 18 beta 5, as described in comment #160.
Verified fixed for all Desktop branches.
Status: RESOLVED → VERIFIED
Keywords: qawanted, verifyme
As best I can tell, this bug's crashes have completely disappeared from Socorro on the 19 and 20 branches (including those originally for bug 816896). I see none in builds dated later than 2012-12-17.
Target Milestone: mozilla19 → mozilla20
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: