Closed Bug 788436 Opened 12 years ago Closed 12 years ago

Firefox 17 crash in nsGfxScrollFrameInner::ScrollToImpl

Categories

(Core :: Layout, defect)

17 Branch
x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla18
Tracking Status
firefox17 + verified
firefox18 + verified

People

(Reporter: scoobidiver, Assigned: tnikkel)

References

Details

(4 keywords)

Crash Data

Attachments

(1 file)

There's a spike in crashes from 18.0a1/20120831 making it #1 top crasher on Mac with 35% of all crashes. The regression range for the spike is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=706174d31a02&tochange=fcc533f691e9
It might be a regression from bug 786672 or bug 674477.

Some comments say:
"Ateempt two-finger scroll"
"HuffingtonPost front page!"

Signature 	nsGfxScrollFrameInner::ScrollToImpl More Reports Search
UUID	df4e7dcd-3a84-461d-ac4b-be07c2120905
Date Processed	2012-09-05 00:58:13
Uptime	1702
Last Crash	28.4 minutes before submission
Install Age	30.3 minutes since version was first installed.
Install Time	2012-09-05 00:27:45
Product	Firefox
Version	18.0a1
Build ID	20120904030512
Release Channel	nightly
OS	Mac OS X
OS Version	10.6.8 10K549
Build Architecture	x86
Build Architecture Info	GenuineIntel family 6 model 14 stepping 8
Crash Reason	EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE
Crash Address	0x0
App Notes 	
AdapterVendorID: 0x1002, AdapterDeviceID: 0x71c5
EMCheckCompatibility	True
Adapter Vendor ID	0x1002
Adapter Device ID	0x71c5

Frame 	Module 	Signature 	Source
0 	XUL 	nsGfxScrollFrameInner::ScrollToImpl 	nsGfxScrollFrame.cpp:2098
1 	XUL 	nsGfxScrollFrameInner::AsyncScrollCallback 	nsGfxScrollFrame.cpp:1676 

More reports at:
https://crash-stats.mozilla.com/report/list?signature=nsGfxScrollFrameInner%3A%3AScrollToImpl
https://crash-stats.mozilla.com/report/list?signature=nsGfxScrollFrameInner%3A%3AAsyncScrollCallback
Severity: normal → critical
Crash Signature: [@ nsGfxScrollFrameInner::ScrollToImpl] [@ nsGfxScrollFrameInner::AsyncScrollCallback] → [@ nsGfxScrollFrameInner::ScrollToImpl] [@ nsGfxScrollFrameInner::AsyncScrollCallback] [@ @0x0 | nsGfxScrollFrameInner::AsyncScrollCallback] [@ nsGfxScrollFrameInner::ScrollToWithOrigin] [@ @0x0 | nsGfxScrollFrameInner::ScrollToWithOrigin]
Any chance this bug is related to (dupe of) bug 753251? If so, we've been seeing this for a while and it looks like there's a new avenue of investigation here.
Yeah, bug 753251 look like possible, since the suspicious code is plugin stuff here.
I have no idea. I believe that my patch for bug 674477 touches only safe objects...
Yeah, I think so too. This is something else.
David, I doubt that  https://hg.mozilla.org/mozilla-central/rev/1bc40cb5b27c is responsible for this.  That patch only added code in the case of touch events, while these crashes are occurring on OSX.
Assignee: dzbarsky → nobody
Keywords: reproducible
I tried scrolling around on that page in the 9/10 nightly on my 10.6.8 machine and I didn't get a crash.
(In reply to Bill McCloskey (:billm) from comment #9)
> I'm running a 9/10 nightly on MacOS 10.6.8. I get a reproducible crash when
> doing any scrolling on this page:
Does it happen in Safe Mode?
It stops happening if I disable Flash. I'm running Flash version 11.4.402.265.
Hmm, I have the same version of Flash too (and it's enabled).
We AddScrollPositionListener/RemoveScrollPositionListener in nsPluginInstanceOwner::SetFrame. It looks like the nsPluginInstanceOwner is dieing after calling AddScrollPositionListener without calling RemoveScrollPositionListener for some reason.
If you need to test out a hypothesis or something, I can run a try build. This bug is super annoying. I basically crash when scrolling any page with Flash content on it.
Bill, what other plugins do you have installed?
There's a fairly strong correlation with the "DivX Plus Web Player HTML5" extension:

43% (6/14) vs.  20% (16/80) {23fcfd51-4958-4f00-80a3-ae97e717ed8b}
http://www.systemlookup.com/FF_Extensions/15-DivX_Plus_Web_Player_HTML5.html
The nsGfxScrollFrameInner::ScrollToImpl crashes occur in large numbers (over the last week) even in FF 14.0.1!  So if they're all the same crash, the spike isn't due to a change in our code.
I made a fresh profile and disabled every plugin except Flash. The crash still happens. Can having a plugin installed but disabled affect anything?

Here's another page where I crash every time when scrolling:
http://support.mozilla.org/en-US/kb/troubleshoot-firefox-issues-using-safe-mode
That page doesn't even have plugins. Do you have accessibility enabled in about:support?
(In reply to Timothy Nikkel (:tn) from comment #20)
> That page doesn't even have plugins. Do you have accessibility enabled in
> about:support?

Weird. I have zeroes for the accessibility fields. Here's the about:support data from my fresh profile:



  Application Basics

        Name
        Firefox

        Version
        18.0a1

        User Agent
        Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:18.0) Gecko/18.0 Firefox/18.0

        Profile Folder

          Show in Finder

        Enabled Plugins

          about:plugins

        Build Configuration

          about:buildconfig

        Crash Reports

          about:crashes

        Memory Use

          about:memory

  Extensions

        Name

        Version

        Enabled

        ID

  Important Modified Preferences

      Name

      Value

        browser.cache.disk.capacity
        1048576

        browser.cache.disk.smart_size.first_run
        false

        browser.cache.disk.smart_size.use_old_max
        false

        browser.cache.disk.smart_size_cached_value
        358400

        browser.places.smartBookmarksVersion
        4

        browser.startup.homepage_override.buildID
        20120911140351

        browser.startup.homepage_override.mstone
        18.0a1

        extensions.lastAppVersion
        18.0a1

        network.cookie.prefsMigrated
        true

        places.history.expiration.transient_current_max_pages
        53688

        privacy.sanitize.migrateFx3Prefs
        true

  Graphics

        Vendor ID
        0x1002

        Device ID
        0x71c5

        WebGL Renderer
        ATI Technologies Inc. -- ATI Radeon X1600 OpenGL Engine -- 2.0 ATI-1.6.36

        GPU Accelerated Windows
        0. Blocked for your graphics card because of unresolved driver issues.

        AzureCanvasBackend
        quartz

        AzureFallbackCanvasBackend
        none

        AzureContentBackend
        none

  JavaScript

        Incremental GC
        1

  Accessibility

        Activated
        0

        Prevent Accessibility
        0

  Library Versions

        Expected minimum version

        Version in use

        NSPR
        4.9.2
        4.9.2

        NSS
        3.13.6.0 Basic ECC
        3.13.6.0 Basic ECC

        NSS Util
        3.13.6.0
        3.13.6.0

        NSS SSL
        3.13.6.0 Basic ECC
        3.13.6.0 Basic ECC

        NSS S/MIME
        3.13.6.0 Basic ECC
        3.13.6.0 Basic ECC
(In reply to Steven Michaud from comment #18)
> The nsGfxScrollFrameInner::ScrollToImpl crashes occur in large numbers (over
> the last week) even in FF 14.0.1!  So if they're all the same crash, the
> spike isn't due to a change in our code.
I compared their ratios in each channel:
* 15.0.1:  1.5%
* 16.0b2:  1.0%
* 17.0a2: 32.5%  <-- drop support for OS X 10.5
* 18.0a1: 20.0%
The spike was uplifted to Aurora.
Based on comment 10, I don't think dropping the support for OS X 10.5 is fully responsible for the spikes. There's another cause.
No longer blocks: 780692
Summary: Firefox 18 crash in nsGfxScrollFrameInner::ScrollToImpl → Firefox 17 crash in nsGfxScrollFrameInner::ScrollToImpl
Version: 18 Branch → 17 Branch
I ripped out the plugin scroll position listener code to try to narrow down why Bill is crashing and made this try build http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tnikkel@gmail.com-0db479a5eedb/try-macosx64/
(In reply to Timothy Nikkel (:tn) from comment #23)
> I ripped out the plugin scroll position listener code to try to narrow down
> why Bill is crashing and made this try build
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tnikkel@gmail.com-
> 0db479a5eedb/try-macosx64/

I tried both pages with the build and didn't get any crashes.
Ok thanks for testing.

I made a new try build which makes the scroll position listeners noops (but they are still registered). It will be available in about an hour at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tnikkel@gmail.com-c932bd93b2e8/
It occurred to me that I have a pretty old Intel Core Duo Mac that can only run 32-bit apps. Tim, are you able to reproduce if you run FF in 32-bit mode?
Yes! I can reproduce in 32 bit mode!
Well at least a crash, it had a different signature.
(In reply to Timothy Nikkel (:tn) from comment #28)
> Well at least a crash, it had a different signature.

Scratch that, it has the right signature.
nsPluginHost::InstantiateEmbeddedPluginInstance creates an nsPluginInstanceOwner, its event model is carbon by default. InstantiateEmbeddedPluginInstance then calls nsPluginInstanceOwner::SetFrame which registers the scroll position listener because the event model is carbon. InstantiateEmbeddedPluginInstance then calls SetUpPluginInstance which eventually results in us getting a setEventModel from the plugin which sets us to cocoa. Now that the event model is cocoa we never unregister the scroll position listener.
Attached patch patchSplinter Review
Assignee: nobody → tnikkel
Attachment #661352 - Flags: review?(joshmoz)
Comment on attachment 661352 [details] [diff] [review]
patch

Review of attachment 661352 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the patch! Happy to have this fixed.

BTW - I'm planning to remove all this code (scroll bar listening in plugins) pretty soon, when I remove Carbon NPAPI support entirely. Probably in the next release cycle.

::: dom/plugins/base/nsPluginInstanceOwner.cpp
@@ +779,5 @@
>  NS_IMETHODIMP nsPluginInstanceOwner::SetEventModel(int32_t eventModel)
>  {
>  #ifdef XP_MACOSX
> +#ifndef NP_NO_CARBON
> +  bool eventModelChange = (mEventModel != static_cast<NPEventModel>(eventModel));

Maybe just save the statically-casted event model variable at the top of the method and use it all three times instead of having three casts.

@@ +3698,5 @@
> +{
> +  // Our frame is changing or going away, unregister for a scroll position listening.
> +  // It's OK to unregister when we didn't register, so don't be strict about unregistering.
> +  // Better to unregister when we didn't have to than to not unregister when we should.
> +  if (mRegisteredScrollPositionListener && GetEventModel() == NPEventModelCarbon) {

Do you really need to check the event model here? If you know that you already registered a scroll position listener then you should probably unregister regardless of what the current event model is.
Attachment #661352 - Flags: review?(joshmoz) → review+
(In reply to Josh Aas (Mozilla Corporation) from comment #32)
> BTW - I'm planning to remove all this code (scroll bar listening in plugins)
> pretty soon, when I remove Carbon NPAPI support entirely. Probably in the
> next release cycle.

Yeah, I was aware (and looking forward to it), just didn't know how soon. This patch didn't take much time so still worth it.

> Maybe just save the statically-casted event model variable at the top of the
> method and use it all three times instead of having three casts.

Ok.

> Do you really need to check the event model here? If you know that you
> already registered a scroll position listener then you should probably
> unregister regardless of what the current event model is.

Good point. I changed the patch.
Had to back this out for more bustage.
https://hg.mozilla.org/integration/mozilla-inbound/rev/645486a66baf

PluginProcessChild.cpp
../../../../dom/plugins/base/nsPluginInstanceOwner.cpp:782:16: error: unused variable 'newEventModel' [-Werror,-Wunused-variable]
  NPEventModel newEventModel = static_cast<NPEventModel>(eventModel);
               ^
1 error generated.
make[6]: *** [nsPluginInstanceOwner.o] Error 1
make[5]: *** [plugins/base_libs] Error 2
make[5]: *** Waiting for unfinished jobs....
https://hg.mozilla.org/mozilla-central/rev/eeb9dd66ce64
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Comment on attachment 661352 [details] [diff] [review]
patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): not sure, there are crashes for this on all current Firefox versions, not sure if the volume changed
User impact if declined: crashes on common pages with plugins when using 32-bit version of Firefox on Mac.
Testing completed (on m-c, etc.): will be in the next nightly
Risk to taking this patch (and alternatives if risky): this patch should be safe, if there are any regressions we should notice them quickly on nightly
String or UUID changes made by this patch: none

requesting this now, although I'd like this to survive at least a day on nightlies before landing on aurora
Attachment #661352 - Flags: approval-mozilla-aurora?
Comment on attachment 661352 [details] [diff] [review]
patch

[Triage Comment]
Top crash fix, I can confirm that 15/16 are unaffected.
Attachment #661352 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
When I went to apply this patch to aurora I found out that the code had changed. Turns out that bug 598401 introduced the bug that this patch fixed by checking the event model before unregistering instead of just unregistering unconditionally. Bug 598401 is only on nightly so it makes no sense to uplift my patch anywhere.

The crashes on <= 17 must be a different issue.
Blocks: 598401
(In reply to Timothy Nikkel (:tn) from comment #41)
> The crashes on <= 17 must be a different issue.
The spike is gone after 17.0a2/20120920. The working range is:
http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=eb69897f7d24&tochange=c039da3793f7
I think bug 790856 may have fixed it.
(In reply to Scoobidiver from comment #42)
> The spike is gone after 17.0a2/20120920.
I was mislead because of bug 793088. There are still crashes in 17.0a2/20120923 and above. If the level stays lower than previously, it will be considered as unaffected because bug 753251 tracks remaining crashes.
It still accounts for 35% of crashes in 17.0a2 over the last 3 days, so it's not bug 753251 that accounts for 1% in 15.0.1 and 1.5% in 16.0b4 for the same period.

Here are some comment in Aurora:
"Repro: https://twitter.com/MarsCuriosity Expand and collapse tweets until crash (happened twice on those with videos but also non-video tweets)."
"Trying to read an article here http://techland.time.com/2012/09/18/50-best-websites-2012/?iid=tl-main-lede#introduction-2 Has already crashed 5 or 6 times in a matter of less than 10 minutes! :~("
"trying to browse Amazon--all Amazon pages are slow to load & allow scrolling"
(In reply to Scoobidiver from comment #44)
> "Repro: https://twitter.com/MarsCuriosity Expand and collapse tweets until
> crash (happened twice on those with videos but also non-video tweets)."

It's pretty easy to reproduce following these steps, thanks!

The problem is that the plugin is in an iframe that gets removed from the document. When it gets removed we first disconnect the root view of the iframe (thus disconnecting the document) and then later we destroy all the frames in the iframe. When we destroy the frame for the plugin we get to nsPluginInstanceOwner::SetFrame with a new null frame and we remove scroll position listeners up the frame tree. But we've been disconnected from our parent document already so we can't unregister the scroll position listeners we have on scroll frames in the parent document.

The "disconnect root view then destroy frames" behaviour was introduced by bug 775965. Which explains the volume on the different versions.

Porting the patch from this bug would fix this type of crash for all plugins except Carbon plugins (but I think Carbon plugins are very rare nowadays). This is because we would unregister all scroll position listeners during initialization of the plugins when we change the event model away from Carbon. I'm not sure what a proper direct fix for this problem would be, so I think I'll just port this patch and then the Carbon crashes will be fixed when we drop support for Carbon plugins shortly.
Blocks: 775965
Crash Signature: [@ nsGfxScrollFrameInner::ScrollToImpl] [@ nsGfxScrollFrameInner::AsyncScrollCallback] [@ @0x0 | nsGfxScrollFrameInner::AsyncScrollCallback] [@ nsGfxScrollFrameInner::ScrollToWithOrigin] [@ @0x0 | nsGfxScrollFrameInner::ScrollToWithOrigin] → [@ nsGfxScrollFrameInner::ScrollToImpl] [@ @0x0 | nsGfxScrollFrameInner::ScrollToImpl] [@ nsGfxScrollFrameInner::AsyncScrollCallback] [@ @0x0 | nsGfxScrollFrameInner::AsyncScrollCallback] [@ nsGfxScrollFrameInner::ScrollToWithOrigin] [@ @0x0 | nsGfxS…
Please verify by checking Socorro.
Keywords: verifyme
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/17.0 Firefox/17.0

I verified the Socorro reports and I found 94 crash reports that are related with 2 of the signatures of this bug on Firefox 17 beta 1, Firefox 17 beta 2 and Firefox 17 beta 3.

The crash reports on Firefox 17 beta are:
[@ nsGfxScrollFrameInner::ScrollToImpl]
http://bit.ly/Svq4Rc - 17 Beta 1
http://bit.ly/Rsv3RI - 17 Beta 2
http://bit.ly/TWUsWr - 17 Beta 3

[@ @0x0 | nsGfxScrollFrameInner::ScrollToImpl]
http://bit.ly/Rvt0L8 - 17 Beta 1
http://bit.ly/SwMJvw - 17 Beta 2
http://bit.ly/PlBztE - 17 Beta 3

Is this expected in any way or should I file a separate bug?
(In reply to Simona B [QA] from comment #48)
> Is this expected in any way or should I file a separate bug?
It's bug 753251.
Verified the crash reports in Socorro for all the signatures mentioned in this bug and there are no other crash reports after the fix landed  (except for the ones from Comment 48 - Bug 753251 is filed for that matter). 

Based on Comment 49 - setting the tracking flag for Firefox 17 to VERIFIED.
QA Contact: simona.marcu
Verified in Socorro and there are no crashes on Firefox 18 beta for the signature mentioned in this bug.

Setting the tracking flag for Firefox 18 to VERIFIED.
(In reply to Simona B [QA] from comment #51)
> Verified in Socorro and there are no crashes on Firefox 18 beta for the
> signature mentioned in this bug.
There are still crashes in 18.0 Beta for this crash signature (see https://crash-stats.mozilla.com/report/list?version=Firefox:18.0&signature=nsGfxScrollFrameInner%3A%3AScrollToImpl) but it's bug 753251.
mass remove verifyme requests greater than 4 months old
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: