Closed Bug 637367 Opened 14 years ago Closed 14 years ago

Firefox 4.0b12 crash [@ DrawPlugin ]

Categories

(Core :: Graphics, defect)

x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- Macaw+
status2.0 --- .1-fixed

People

(Reporter: marcia, Assigned: smichaud)

Details

(Keywords: crash, topcrash, Whiteboard: [approved-patches-landed])

Crash Data

Attachments

(3 files, 3 obsolete files)

Seen while reviewing crash stats. http://tinyurl.com/4atn9ct links to the crashes which are all Mac. So far fairly low volume with 98 crashes since we shipped on Friday and ranks as #2 top Mac crash. Frame Module Signature [Expand] Source 0 XUL DrawPlugin 1 XUL mozilla::layers::ImageLayerOGL::RenderLayer gfx/layers/opengl/ImageLayerOGL.cpp:473 2 XUL mozilla::layers::ContainerLayerOGL::RenderLayer gfx/layers/opengl/ContainerLayerOGL.cpp:297 3 XUL mozilla::layers::ContainerLayerOGL::RenderLayer gfx/layers/opengl/ContainerLayerOGL.cpp:297 4 XUL mozilla::layers::LayerManagerOGL::Render gfx/layers/opengl/LayerManagerOGL.cpp:611 5 XUL mozilla::layers::LayerManagerOGL::EndTransaction gfx/layers/opengl/LayerManagerOGL.cpp:427 6 XUL mozilla::layers::LayerManagerOGL::EndEmptyTransaction gfx/layers/opengl/LayerManagerOGL.cpp:401 7 XUL PresShell::Paint layout/base/nsPresShell.cpp:6173 8 XUL nsViewManager::RenderViews view/src/nsViewManager.cpp:458 9 XUL nsViewManager::Refresh view/src/nsViewManager.cpp:424 10 XUL nsViewManager::DispatchEvent view/src/nsViewManager.cpp:925 11 XUL HandleEvent view/src/nsView.cpp:161 12 XUL nsChildView::DispatchEvent widget/src/cocoa/nsChildView.mm:1807 13 XUL nsChildView::DispatchWindowEvent widget/src/cocoa/nsChildView.mm:1817 14 XUL -[ChildView drawRect:inContext:] widget/src/cocoa/nsChildView.mm:2874 15 XUL -[ChildView drawRect:] widget/src/cocoa/nsChildView.mm:2780 16 AppKit AppKit@0x100cf8 17 AppKit AppKit@0xfbf6a 18 AppKit AppKit@0x73c5ff 19 AppKit AppKit@0xfb81f 20 Foundation Foundation@0x16bd9 21 AppKit AppKit@0x801aff 22 AppKit AppKit@0xfe4ce 23 libSystem.B.dylib libSystem.B.dylib@0xa2d8 24 libSystem.B.dylib libSystem.B.dylib@0xa2d8 25 AppKit AppKit@0x753197 26 AppKit AppKit@0x2379a 27 CoreFoundation CoreFoundation@0xbc24 28 CoreFoundation CoreFoundation@0x104c7 29 CoreFoundation CoreFoundation@0xfcd6 30 CoreFoundation CoreFoundation@0xfb2e 31 CoreFoundation CoreFoundation@0x24814 32 CoreFoundation CoreFoundation@0x24688 33 Foundation Foundation@0x14dbb 34 AppKit AppKit@0xfee59 35 libSystem.B.dylib libSystem.B.dylib@0xa2d8 36 AppKit AppKit@0x753197 37 AppKit AppKit@0x2379a 38 CoreFoundation CoreFoundation@0xbc24 39 CoreFoundation CoreFoundation@0x13e1a7 40 CoreFoundation CoreFoundation@0xfcd6
It is #1 top crasher on Mac OS X in 4.0b12.
blocking2.0: --- → ?
Keywords: topcrash
Almost assuredly blocking; bsmedberg/roc/josh - can you see if this is obviously caused by a change in b12?
Discussed in triage today - bsmedberg speculatively assigned this to matt, I'm just making it so.
Assignee: nobody → matt.woodrow+bugzilla
blocking2.0: ? → final+
Whiteboard: [hardblocker]
Attached patch Add a few null checks — — Splinter Review
I haven't been able to reproduce the crash, so taking a guess here. The crash address in the linked report is 0x50, in a debug build (x64) mInstanceOwner is 0x60 bytes into nsObjectFrame. Assuming this 16 byte difference is due to debugging code, then it's likely that aObjectFrame was null and we are crashing inside UpdateImageLayer with |this| == null.
Attachment #516009 - Flags: review?(roc)
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
mostly seems to affect 10.6.3 and beyond with highest numbers on 10.6.6. os verions mar1-3 1 Mac OS X 10.6.3 10D573 2 Mac OS X 10.6.4 10F569 4 Mac OS X 10.6.5 10H574 147 Mac OS X 10.6.6 10J567 1 Mac OS X 10.7.0 11A390
That means: our users keep up to date, and this only affects OOPP.
not to much in comments. a couple that indicate clicking on links at google analytics
there is a tiny residual crash on 3.[0-6].x with this signature, and the new volume regression seems to have gone undetected in 4.0.x builds until beta 12 was released. here are counts of all the reports with version an build id info since the beginning of the year. 1 3.0.15 2009101600 1 3.0.5 2008120121 1 3.0.7 2009021906 1 3.6 20100115132715 7 3.6.13 20101203074432 298 4.0b12 20110222205430 2 4.0b13pre 20110302030357
(In reply to comment #10) > That means: our users keep up to date, and this only affects OOPP. actually about 1/2 our mac users stay up-to-date if crash counts across mac users are an indication of the distribution of versions in use. crash counts for mar1-2 18071 Mac OS X 10.6 9445 Mac OS X 10.5 7143 Mac OS X 10.4 67 Mac OS X 10.7 1 Mac OS X 10.9
I tried to repro this with my Google Analytics account but no luck so far. I emailed someone from Mozilla who crashed and asked if they could give me STR.
What is the volume of this crash now?
About 100 crashes per day, it's the #1/#2 topcrash *on mac*, ~70 overall. It seems to correlate pretty highly with having multiple cores, but that may be an accident. I'm currently wondering if something within the rendering stack is being re-entered by plugin code, which modifies the DOM at an unsafe time during painting. If so, there are two possibilities: 1) We are resolving RPC races incorrectly. This has caused topcrashes in the past, but we haven't made any changes to the race resolution or IPDL protocols involved with mac drawing in any of the recent changes. 2) The plugin is just making unsafe calls. There's nothing we can do about this except implement async painting on mac as we do on other platforms.
> What is the volume of this crash now? It was happening only in Beta 12, after the patch landing, it happens in nightlies: 20110302: 2 20110301: 0 (patch) 20110228: 0 20110227: 0 20110226: 0
Drivers decided to move this to .x based on previous comments. Please renominate if it spikes up to many many more crashes.
blocking2.0: final+ → .x+
Whiteboard: [hardblocker] → [hardblocker][approved-patches-landed]
Whiteboard: [hardblocker][approved-patches-landed] → [approved-patches-landed]
IMO, this should probably hardblock the final release of 4.0 and not be deferred to a .x release. I encounter it in the first RC1 candidate build around once every-other-day on my Mac (dual core). It is a rather annoying, invasive issue that probably shouldn't be inflicted on uses in a general release.
Nick, we haven't been able to reproduce this. What are you doing at the time of the crash? Are you navigating to or away from a page with plugins? Any particular page?
A couple of crash stacks show these crashes taking place at the folllowing line: http://hg.mozilla.org/releases/mozilla-2.0/annotate/5f8f494a4c29/layout/generic/nsObjectFrame.cpp#l5870 So we should try null-checking mInstance at the top of nsPluginInstanceOwner::DoCocoaEventDrawRect(). Or maybe we should null-check both mInstance and mObjectFrame, as nsPluginInstanceOwner::Paint() does. https://crash-stats.mozilla.com/report/index/f447eada-9345-4931-b3e5-32e642110311 https://crash-stats.mozilla.com/report/index/001e6ef3-7f69-4ea4-b83f-2c12b2110311
Steven, can you put up your proposed patch today? I'd like to get something landed on trunk to verify that it fixes the crashes so we can potentially include it in a RC2 if that happens.
Assignee: matt.woodrow+bugzilla → smichaud
No problem. It should take me another hour or so.
Attached patch Potential fix — — Splinter Review
Attachment #518830 - Flags: review?(benjamin)
Attachment #518830 - Flags: review?(benjamin) → review+
Comment on attachment 518830 [details] [diff] [review] Potential fix I'd like to get this trivial null-check on trunk now so that we can check crash-stats over the weekend for possible inclusion in RC2, if that happens.
Attachment #518830 - Flags: approval2.0?
Attachment #518830 - Flags: approval2.0? → approval2.0+
Comment on attachment 518830 [details] [diff] [review] Potential fix https://crash-stats.mozilla.com/report/index/b4f1ed60-15ab-46e3-a8cf-874f62110313 Oops, my patch doesn't seem to have fixed these crashes. As this stack shows, it seems that nsPluginInstanceOwner::DoCocoaEventDrawRect() is being called on a deleted nsPluginInstanceOwner instance. I'm not sure what we can do about this (without being able to reproduce the crashes), though I'll take a closer look tomorrow ... if no-one else beats me to it.
Attached patch Another potential fix (obsolete) — — Splinter Review
> As this stack shows, it seems that > nsPluginInstanceOwner::DoCocoaEventDrawRect() is being called on a > deleted nsPluginInstanceOwner instance. Looking through the code, I don't see how an nsObjectFrame object could have a non-zero mInstanceOwner that's also invalid (since mInstanceOwner holds a strong reference). So now I think it's the nsObjectFrame object itself that's been deleted. Here's a patch that should, I think, prevent nsObjectFrame::UpdateImageLayer() from being called (from DrawPlugin and MacIOSurfaceImageOGL::Update()) on a deleted nsObjectFrame object.
Attachment #519289 - Flags: review?(benjamin)
Attachment #519289 - Flags: review?(benjamin) → review?(roc)
Comment on attachment 519289 [details] [diff] [review] Another potential fix I've thought of a better way to do this. New patch coming up.
Attachment #519289 - Attachment is obsolete: true
Attachment #519289 - Flags: review?(roc)
Attached patch Another potential fix, rev1 (clean up) (obsolete) — — Splinter Review
Attachment #519429 - Flags: review?(roc)
Why do you need the DestroyCallback? A simpler way to do this would probably be to let the DrawPlugin callback data be a *strong* reference to the nsPluginInstanceOwner. Then if DrawPlugin gets called after the nsObjectFrame has been destroyed, its mObjectFrame will be null and we can bail out.
> Why do you need the DestroyCallback? I was afraid there might be several calls to the UpdateCallback from the MacIOSurfaceImage created in nsPluginInstanceOwner::SetCurrentImage(), or none at all. In that case I wouldn't be able to delete the reference passed by SetUpdateCallback() in DrawPlugin(). This is true regardless of what kind of strong reference I pass in SetUpdateCallback() (whether it's an nsWeakFrame or an nsPluginInstanceOwner). Now that I look again, I see that SetCurrentImage() is called from nsObjectFrame::UpdateImageLayer() (which creates a new MacIOSurfaceImage). Which I guess means there will be at most one call to the UpdateCallback from any MacIOSurfaceImage. But in principle it might still be possible that the UpdateCallback will never be called -- in which case the strong reference passed in SetUpdateCallback() would be leaked. Is it guaranteed that there's exactly one call to the UpdateCallback from every MacIOSurfaceImage on which SetUpdateCallback() is called?
> A simpler way to do this would probably be to let the DrawPlugin > callback data be a *strong* reference to the nsPluginInstanceOwner. I could do this, but I need to know where I can release this strong reference. So I need to know your response to my previous comment.
Oh right, so you do need the destroy callback. Sorry! The other part of what I said still holds, use a strong pointer to the nsPluginInstanceOwner instead of an nsWeakFrame. Instead of adding a destroy callback, the best approach might be to add a LayerUserDataSet to Image so we can have user-data on Images. Then the strong reference can be held by a LayerUserData attached to the Image and will automatically be removed when the Image dies. But I'll understand if you don't want to make that change.
(and I don't have any other need for Image user-data right now.)
> Instead of adding a destroy callback, the best approach might be to > add a LayerUserDataSet to Image so we can have user-data on > Images. Then the strong reference can be held by a LayerUserData > attached to the Image and will automatically be removed when the > Image dies. But I'll understand if you don't want to make that > change. I'm completely unfamiliar with that code, so it should probably be someone else who does this. > The other part of what I said still holds, use a strong pointer to > the nsPluginInstanceOwner instead of an nsWeakFrame Will do. New patch coming up.
Attachment #519429 - Attachment is obsolete: true
Attachment #519518 - Flags: review?(roc)
Attachment #519429 - Flags: review?(roc)
Comment on attachment 519518 [details] [diff] [review] Another potential fix, rev2 (follow Roc's suggestion) This looks good, but in nsPluginInstanceOwner::PrepareToStop, please clear out the callbacks on the image and unref 'this', to make sure we don't leak.
> This looks good, but in nsPluginInstanceOwner::PrepareToStop, please > clear out the callbacks on the image and unref 'this', to make sure > we don't leak. I don't understand. Won't this bring back the original bug? Won't OnDestroyImage() always be called from the MacIOSurfaceImageOGL destructor, and won't this prevent any leaks?
Maybe I should call aContainer->SetCurrentImage(nsnull) from PrepareToStop()?
Oops, I see this is already happening.
(In reply to comment #40) > Maybe I should call aContainer->SetCurrentImage(nsnull) from PrepareToStop()? That won't remove the callbacks on the image though, if there happens to be a reference to the image keeping it alive. (In reply to comment #39) > > This looks good, but in nsPluginInstanceOwner::PrepareToStop, please > > clear out the callbacks on the image and unref 'this', to make sure > > we don't leak. > > I don't understand. Won't this bring back the original bug? No, because if the callbacks have been cleared, we won't try to use a dangling nsObjectFrame pointer. > Won't OnDestroyImage() always be called from the MacIOSurfaceImageOGL > destructor, and won't this prevent any leaks? Not if there's a cycle involving the image and the pluginInstanceOwner. (I actually don't think there is, but I want to make sure it doesn't matter.)
OK. Come to think of it, clearing the image's update callback in nsPluginInstanceOwner::PrepareToStop() should be enough by itself to fix these crashes: It should stop DrawPlugin ever being called after the nsObjectFrame it wants to reference has been deleted. I can get rid of the destroy callback and my other changes. I'll prepare a new patch that does only this. How does that sound to you?
(Following up comment #43) Now I see why my suggestion won't work -- there might have been many other images created besides the one that happens to be current when PrepareToStop() is called. I'll just add calls to clear the callbacks in PrepareToStop() to my previous patch.
How's this?
Attachment #519518 - Attachment is obsolete: true
Attachment #519548 - Flags: review?(roc)
Attachment #519518 - Flags: review?(roc)
- oglImage->SetCallback(&DrawPlugin, mObjectFrame); + NS_ADDREF_THIS(); + oglImage->SetUpdateCallback(&DrawPlugin, this); + oglImage->SetDestroyCallback(&OnDestroyImage); Ooh, I just realized that we need to be sure to only addref once per image here. So only do this block if the image doesn't already have a destroy-callback.
+ oglImage->SetUpdateCallback(nsnull, nsnull); + oglImage->SetDestroyCallback(nsnull); + // If we have a current image here, its destructor hasn't yet been + // called, so OnDestroyImage() can't yet have been called. So we need + // to do ourselves what OnDestroyImage() would have done. + NS_RELEASE_THIS(); And it's probably worth checking here and only doing this block if there is a destroy-callback set. Then the invariant is --- if there's a destroy-callback, we need to release 'this'.
> Ooh, I just realized that we need to be sure to only addref once per > image here. That's already taken care of -- every call to nsIPluginInstance_MOZILLA_2_0_BRANCH::GetImage() creates a new image. > And it's probably worth checking here and only doing this block if > there is a destroy-callback set. I assume that the only way for an image to have been created and made current is via a call to nsPluginInstanceOwner::SetCurrentImage(). So in every case a destroy-callback would have already been set.
Comment on attachment 519548 [details] [diff] [review] Another potential fix, rev3 (follow Roc's suggestion) ok!
Attachment #519548 - Flags: review?(roc) → review+
Comment on attachment 519548 [details] [diff] [review] Another potential fix, rev3 (follow Roc's suggestion) > ok! Thanks. Whew :-) I'd like to get this on the trunk reasonably soon, so that we can find out if it does actually fix this bug -- which is after all the #1 topcrasher on the Mac. The patch isn't trivial. But it's quite simple, and should be low-risk.
Attachment #519548 - Flags: approval2.0?
>> Ooh, I just realized that we need to be sure to only addref once >> per image here. > > That's already taken care of -- every call to > nsIPluginInstance_MOZILLA_2_0_BRANCH::GetImage() creates a new > image. I've added a comment to this effect to nsPluginInstanceOwner::SetCurrentImage(), which I'll include when I land my patch.
Nick: If you can give us the info in Comment 20 as well as the flash version you are using, that would be great. Our QA team has not yet been able to reproduce the crash. (In reply to comment #19) > IMO, this should probably hardblock the final release of 4.0 and not be > deferred to a .x release. I encounter it in the first RC1 candidate build > around once every-other-day on my Mac (dual core). It is a rather annoying, > invasive issue that probably shouldn't be inflicted on uses in a general > release.
Also exactly what steps you're doing at the time of the crash: opening a new tab, closing a tab, are there multiple youtube videos playing, or whatever is actually happening.
If anyone does figure out how to reproduce these crashes, here's a tryserver build to test with, made using my rev3 patch (the one Roc r+ed): http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/smichaud@pobox.com-0183b48015a3/tryserver-macosx64/firefox-4.0b13pre.en-US.mac.dmg
I believe I now have some STR this bug, but so far I have only been able to generate a crash once using the latest trunk: 1. Download and install http://www.adobe.com/products/contribute/. The key seems to be Adobe Contribute Toolbar 6.0 true {01A8CA0A-4C96-465b-A49B-65C46FAD54F9} which gets installed as an addon into Firefox as part of this install. You will have to register and install the trial version. 2. Click on this link: http://consumerist.com/2011/03/debtor-turns-entire-suv-into-anti-bofa-collage.html https://crash-stats.mozilla.com/report/index/bp-72f232c1-a1a6-4fd6-b844-44cb02110316 is my report.
The second time I crashed I got a different stack, which is the one from Bug 636707. In this case I clicked on a link on consumerist.com in the "Related posts" section to generate the crash - https://crash-stats.mozilla.com/report/index/bp-49aa6d03-b11a-4462-a829-e96212110316.
I just reproduced the crash on a different machine. This time my steps were similar to Comment 55, except I did the following: 1. Load http://consumerist.com 2. In the Adobe Contribute toolbar, select "Edit in Contribute" - I had the program running in the background at the time. 3. Click on a "Related Posts" link. 4. Crash.
Can we get Steven's patch on the trunk? That way we can see if at least the trunk crashes go away. I still haven't been able to reproduce this crash consistently.
Comment on attachment 519548 [details] [diff] [review] Another potential fix, rev3 (follow Roc's suggestion) a=beltzner to land on mozilla-central to see if it repairs the issue
Attachment #519548 - Flags: approval2.0? → approval2.0+
Comment on attachment 519548 [details] [diff] [review] Another potential fix, rev3 (follow Roc's suggestion) Landed on trunk: http://hg.mozilla.org/mozilla-central/rev/4da037fe27b0 It should take 2-3 days to tell whether or not this helps.
Comment on attachment 519548 [details] [diff] [review] Another potential fix, rev3 (follow Roc's suggestion) I also needed to land a breakage patch :-( http://hg.mozilla.org/mozilla-central/rev/59f8f339653f
(Following up comment #61) For Windows and Linux.
Of late I had more luck crashing after installing Test Pilot 1.1 and simply clicking links on http://consumerist.com/. Using Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b13pre) Gecko/20110318 Firefox/4.0b13pre. https://crash-stats.mozilla.com/report/index/bp-d0c28bde-aeca-4052-9419-dfe402110318 is one of the reports. Adding Jono to the bug.
There haven't been any of these crashes since my latest patch landed. So it looks like this bug is fixed!
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
I just hit this crash using Firefox 4.0 final. Is this patch supposed to be in the final release?
No, it did not make it.
I would like to try and get this in the first .x release if possible. For a Mac only crash, it's pretty high up on the overall list.
I'd be happy to land my latest patch on the 2.0.x branch (or its equivalent) -- once that's been created, and I have approval.
Steven, Marcia -- have we tested Steven's patch on a profile with Test Pilot 1.1 installed?
I haven't.
This is in the top 15 for 4.0 even if it's only a mac crash. We have fixed it so I would like to see it in 4.0.1.
Technically my rev3 patch (the one that fixed this bug) already has approval2.0+, and there exists a mozilla-2.0 branch that seems live (http://hg.mozilla.org/releases/mozilla-2.0/). Should I just go ahead and land my patch there?
blocking2.0: .x+ → Macaw
Approved for mozilla2.0 repository, a=dveditz for release-drivers
I hit what looks like this same crash today with FF4 watching a video on youtube: https://crash-stats.mozilla.com/report/index/0dcb1b52-4be5-49d0-b9d1-0b15e2110428 Do you want a new bug or reopen this?
Benjamin tells me on irc that this is expected for 4.0 vs. 4.0.1, please ignore.
Crash Signature: [@ DrawPlugin ]
I see a similar crash on Firefox 5 beta in 64 bit mode. https://crash-stats.mozilla.com/report/index/bp-9a14c30b-1a52-4c08-a571-233df2110613
(In reply to comment #77) Your crash is unrelated (though it's true both problems happen under a call to nsObjectFrame.cpp's DrawPlugin()). Your bug is actually two bugs: 1) A plugin process appears to have hung or crashed. 2) The browser process has crashed while attempting to time out a failed call to the plugin process. To find related bugs, you need to search on "WriteMinidump" in the summary or in a comment. Just now I've found a bunch of seemingly related bugs with "WriteMinidump" in the summary, but I don't know if any of them is actually your bug. Take a look for yourself, and decide whether you want to open a new bug.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: