Closed Bug 1169457 Opened 9 years ago Closed 9 years ago

Firefox Crashes on sites with jQuery

Categories

(Core Graveyard :: Plug-ins, defect)

40 Branch
Unspecified
Linux
defect
Not set
normal

Tracking

(firefox39 unaffected, firefox40+ fixed, firefox41+ fixed)

RESOLVED FIXED
mozilla41
Tracking Status
firefox39 --- unaffected
firefox40 + fixed
firefox41 + fixed

People

(Reporter: janne, Assigned: bugzilla)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file, 1 obsolete file)

Steps to reproduce:
-------------------

1. Open Firefox
2. Navigate to a site using jQuery, for example imgur.com

Expected results:
-----------------

Firefox loads the page

Actual results:
---------------

Firefox crashes, the Firefox window closes

Additional Information:
-----------------------

Using Mozilla Regression, I narrowed it down to https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=77b3cc5607cd&tochange=cecb29d62dbc
This seems to be fallout from bug 1146471, maybe an instance of bug 1154871? Janne has gone so far as to build Firefox so if you need some debugging info they should be able to help.
Blocks: 1146471
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(wmccloskey)
OS: Unspecified → Linux
Version: unspecified → 40 Branch
Janne, could you post a crashdump? You can find crashdumps by going to about:crashes.

Also, can you paste the contents of about:support?
Flags: needinfo?(wmccloskey) → needinfo?(janne)
Spoke on IRC. https://crash-stats.mozilla.com/report/index/d605270d-783c-43d4-9a6f-0f4ad2150528
Crash Signature: [@ RunnableMethod<SoftwareDisplay, void (SoftwareDisplay::*)(mozilla::TimeStamp), Tuple1<mozilla::TimeStamp> >::Run()]
Group: core-security
Looks like something is going wrong with SoftwareDisplay running off the main thread.
Component: IPC → Graphics
Is this happening reliably and often or was it a one off?
Reliably, often, on every site using jQuery. I've seen it on other sites, too, but on sites like imgur.com it happens every single time. Which makes it quite annoying to use the Browser.
Flags: needinfo?(janne)
about:support would be really useful so we can see if it's related to a specific gfx card or add-on.
I recall Janne mentioning that about:support crashes. The crash report has GeForce GTX 550 Ti/PCIe/SSE2 -- 4.5.0 NVIDIA 352.09 which looks to be the current beta driver. http://www.geforce.com/drivers/results/85057
Since the regression window is so narrow, can we make a try build to bisect that further? It seems quite likely to me that the new release assert is triggering this and the stack in the crash report is likely bogus.
I'd be willing to try those builds on my sistem, that should be no problem, yes.
Component: Graphics → IPC
[Tracking Requested - why for this release]: I'm claiming jQuery is fairly commonly used :) and I know there isn't a lot of crashes at this point, but it seems it could be widespread eventually.
(In reply to Kevin Brosnan [:kbrosnan] from comment #3)
> https://crash-stats.mozilla.com/report/index/d605270d-783c-43d4-9a6f-0f4ad2150528

The main thread (thread 0) in that crash is interesting — if I'm reading it right, script is doing something that causes plugins to be reloaded, so we wind up in the actor destruction path and start processing a crash dump, and there's a hash table that's being touched when the crash happens?

As for the crash itself... did someone schedule a task to call a method on a NULL object, maybe?  It would be nice if we could find out where that task came from, but I don't know if those bits are in the minidump.
It seems odd that this crash stack would only happen with sites on jquery. The vsync source doesn't really care what website is being viewed and it should basically tick independently of any site. We even check that if we're on the main thread, we explicitly post the task to the vsync thread, not the main thread. Comment 12 might be interesting, in that the SoftwareDisplay would be NULL and the task is running on a NULL object, but the object lifetime should be the whole firefox process. Curious.
To test comment 9, can you please try this:

1) Go to about:config
2) type in gfx.vsync
3) You should see 3 preferences. gfx.vsync.refreshdriver, gfx.vsync.hw-vsync.enabled, and gfx.vsync.compositor. Can you set all 3 to false.
4) Restart firefox.

Try a site with jquery again. Does it still consistently crash? If so, can you please paste that crash report link? Thanks!
Flags: needinfo?(janne)
Yes, still consistently crashes. Crashreport: https://crash-stats.mozilla.com/report/index/f6fac17b-819e-49f2-944a-fc0de2150530
Flags: needinfo?(janne)
(In reply to janne from comment #15)
> Yes, still consistently crashes. Crashreport:
> https://crash-stats.mozilla.com/report/index/f6fac17b-819e-49f2-944a-fc0de2150530

Null pointer dereference of a variable that's (debug only) asserted to be non-null, now at a different place but still in something to do with NPAPI plugin instance creation/destruction, and this time it's the main thread that crashes instead of some other thread.  Also, this one is a content process crash.

I'm not sure if comment #4 is really what's going on here.
(In reply to Jed Davis [:jld] {UTC-7} from comment #16)
> (In reply to janne from comment #15)
> > Yes, still consistently crashes. Crashreport:
> > https://crash-stats.mozilla.com/report/index/f6fac17b-819e-49f2-944a-fc0de2150530
> 
> Null pointer dereference of a variable that's (debug only) asserted to be
> non-null, now at a different place but still in something to do with NPAPI
> plugin instance creation/destruction, and this time it's the main thread
> that crashes instead of some other thread.  Also, this one is a content
> process crash.
> 
> I'm not sure if comment #4 is really what's going on here.

I think comment 4 is ruled out from comment 15. With the preferences disabled from comment 14, the SoftwareDisplay is never used and it's still crashing regularly.
I'll try disabling all plugins etc next, but as it also crashed with a freshly installed Firefox with no enabled plugins or Add-ons, I doubt that that is the issue.
Okay, sorry for all this trouble – the issue only appears if Shockwave Flash is installed and enabled.
I still wonder why it happens, though.
And it does not appear if one uses pepper-fresh (the port of the Chrome version of Flash) instead of Adobe Flash. Sorry for all the trouble :/
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
I'm not sure we should resolve this just yet. It maybe that the combination of jQuery and Flash is triggering a bad code path in Firefox, and that is now crashing because of the MOZ_ASSERT -> MOZ_RELEASE_ASSERT change. The "thread 0" stack Jed pointed out fits with this since it also deals with some plugin stuff.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
It seems like bug 1146471 is probably causing the plugin process to die. That could be a bug in Flash or our bug. When the plugin process dies, it seems like we don't handle it very well. In the second crash, we're trying to tear down the plugin while it's starting up and that seems to be causing problems. Aaron, could you take a look at the crash in comment 15? That one might not be hard to fix.

The first crash is weirder. It's too bad we don't have any crash stacks for the plugin process itself. Maybe if we fix these other bugs, we can get some.
(In reply to Bill McCloskey (:billm) from comment #22)
> It seems like bug 1146471 is probably causing the plugin process to die.
> That could be a bug in Flash or our bug. When the plugin process dies, it
> seems like we don't handle it very well. In the second crash, we're trying
> to tear down the plugin while it's starting up and that seems to be causing
> problems. Aaron, could you take a look at the crash in comment 15? That one
> might not be hard to fix.

Looks like plugin instantiation failed and we hit the crash when we tried to clean up the associated nsJSNPRuntime. Adding a nullptr check would be a quick-n-dirty way to wallpaper over this, but I am concerned because this is actually guarded by an assertion. Looking that the code it's not obvious how that call stack could occur with that pointer being null -- it should not be possible.
Flags: needinfo?(aklotz)
Component: IPC → Plug-ins
Would it make sense to add the null check? Is there anything else we can do for this bug?
Flags: needinfo?(aklotz)
Flags: needinfo?(aklotz)
Attached patch Add null check (obsolete) — Splinter Review
I think we should add the null check but leave in the assertion so that we can catch this and examine it in debug builds. I think that this is happening because the functions that allocate these things are not perfect inverses of the things that free them.
Assignee: nobody → aklotz
Status: REOPENED → ASSIGNED
Attachment #8625608 - Flags: review?(jmathies)
Attachment #8625608 - Flags: review?(jmathies) → review+
This is marked security sensitive and affecting more than trunk, so it needs to go through the security approval process or it needs a rating.
https://wiki.mozilla.org/Security/Bug_Approval_Process
Keywords: checkin-needed
Comment on attachment 8625608 [details] [diff] [review]
Add null check

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
Not easily; bug is very timing sensitive.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
They indicate that a nullptr check was added. I don't think it paints a bulls-eye on anything.

Which older supported branches are affected by this flaw?
Aurora

If not all supported branches, which bug introduced the flaw?
Unclear. The crash started happening in Aurora but the affected code is older than that.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
Should be an easy hg merge.

How likely is this patch to cause regressions; how much testing does it need?
Unlikely, just a trivial nullptr check.
Attachment #8625608 - Flags: sec-approval?
This needs a rating. I don't see anything above that even shows that this crash is exploitable. Is it?
(By rating, I mean "security rating.")
(In reply to Al Billings [:abillings] from comment #29)
> This needs a rating. I don't see anything above that even shows that this
> crash is exploitable. Is it?

I don't believe so.
Comment on attachment 8625608 [details] [diff] [review]
Add null check

Then this doesn't need sec-approval and, frankly, we normally open up these sorts of bugs instead of hiding them as security issues.
Attachment #8625608 - Flags: sec-approval?
(In reply to Al Billings [:abillings] from comment #32)
> Comment on attachment 8625608 [details] [diff] [review]
> Add null check
> 
> Then this doesn't need sec-approval and, frankly, we normally open up these
> sorts of bugs instead of hiding them as security issues.

I don't have the capability to do this. Andrew?
Flags: needinfo?(continuation)
Group: core-security
Flags: needinfo?(continuation)
https://hg.mozilla.org/mozilla-central/rev/8947bf6b866a
Status: ASSIGNED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Comment on attachment 8625608 [details] [diff] [review]
Add null check

Approval Request Comment
[Feature/regressing bug #]: Plugin JSNP runtime
[User impact if declined]: Potential crashes if the timing is correct
[Describe test coverage new/current, TreeHerder]: Plugin test suite on mc
[Risks and why]: None, trivial patch adding null check
[String/UUID change made/needed]: None
Attachment #8625608 - Flags: approval-mozilla-aurora?
Adding a tracking flag for FF40 and FF41 and the keyword "crash".
Aaron, I will most likely approve this patch for uplift to Aurora. I was wondering if you have had a chance to verify the fix. Please let me know.
Flags: needinfo?(aklotz)
Comment on attachment 8625608 [details] [diff] [review]
Add null check

The patch looks good and is making the code more safe by adding a null check. It has been in moz-central for a week. Approving for uplift to Aurora.
Attachment #8625608 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Ritu Kothari (:ritu) from comment #38)
> Aaron, I will most likely approve this patch for uplift to Aurora. I was
> wondering if you have had a chance to verify the fix. Please let me know.

It's pretty much impossible to manually verify this because of the timing sensitivities.
Flags: needinfo?(aklotz)
(In reply to Ritu Kothari (:ritu) from comment #39)
> Comment on attachment 8625608 [details] [diff] [review]
> Add null check
> 
> The patch looks good and is making the code more safe by adding a null
> check. It has been in moz-central for a week. Approving for uplift to Aurora.

I don't see how this patch would actually fix the issue if it is related to sCallbackRuntime being null, since right above the check there is a call to sCallbackRuntime->GetRuntime, which would crash if said variable were null. Assuming I am not misreading, this patch be expected to not have any affect on the issue, positive or negative.

The fix that would need to be made is that sCallbackRuntime would have to have a null check and return right after the assert.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Lee Salzman [:eihrul] from comment #41)
> (In reply to Ritu Kothari (:ritu) from comment #39)
> > Comment on attachment 8625608 [details] [diff] [review]
> > Add null check
> > 
> > The patch looks good and is making the code more safe by adding a null
> > check. It has been in moz-central for a week. Approving for uplift to Aurora.
> 
> I don't see how this patch would actually fix the issue if it is related to
> sCallbackRuntime being null, since right above the check there is a call to
> sCallbackRuntime->GetRuntime, which would crash if said variable were null.
> Assuming I am not misreading, this patch be expected to not have any affect
> on the issue, positive or negative.
> 
> The fix that would need to be made is that sCallbackRuntime would have to
> have a null check and return right after the assert.

The fix I landed was in response to comment 22.
(In reply to Lee Salzman [:eihrul] from comment #41)
> (In reply to Ritu Kothari (:ritu) from comment #39)
> > Comment on attachment 8625608 [details] [diff] [review]
> > Add null check
> > 
> > The patch looks good and is making the code more safe by adding a null
> > check. It has been in moz-central for a week. Approving for uplift to Aurora.
> 
> I don't see how this patch would actually fix the issue if it is related to
> sCallbackRuntime being null, since right above the check there is a call to
> sCallbackRuntime->GetRuntime, which would crash if said variable were null.
> Assuming I am not misreading, this patch be expected to not have any affect
> on the issue, positive or negative.
> 
> The fix that would need to be made is that sCallbackRuntime would have to
> have a null check and return right after the assert.

Oh, I see what you're saying. Well, the good news is that the fix in bug 886459 rewrote the infringing code anyway, making my patch moot in 41 onward. This still needs to be fixed in Beta.
Attachment #8625608 - Attachment is obsolete: true
Attachment #8630239 - Flags: review?(jmathies)
Attachment #8630239 - Flags: review?(jmathies) → review+
Comment on attachment 8630239 [details] [diff] [review]
Add null check (beta-specific patch)

Approval Request Comment
[Feature/regressing bug #]: NPAPI Plugins
[User impact if declined]: Crashes
[Describe test coverage new/current, TreeHerder]: Plugin test suite; this patch is beta-specific and cannot land on central first.
[Risks and why]: None, trivial patch
[String/UUID change made/needed]: None
Attachment #8630239 - Flags: approval-mozilla-beta?
Comment on attachment 8630239 [details] [diff] [review]
Add null check (beta-specific patch)

Given that the fix is only needed in 40, let's get it into Beta. Beta+
Attachment #8630239 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.