Closed Bug 715757 Opened 12 years ago Closed 6 years ago

crashes under JS_AbortIfWrongThread, likely from malware or bad extensions

Categories

(Core :: JavaScript Engine, defect)

10 Branch
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox11 - ---

People

(Reporter: luke, Unassigned)

References

Details

(Keywords: crash)

Crash Data

To conservatively test the waters for making JSRuntime single-threaded, bug 650411 added a release mode safe crash to catch misuse.  This shows up in crash-stats as CrashInJS (under JS_AbortIfWrongThread) and is currently the #8 topcrash on beta (FF10).

Unfortunately the stacks are truncated on crashstats, but several of the dumps from msvc looks like:

>	mozjs.dll!CrashInJS()  Line 86
 	mozjs.dll!JS_AbortIfWrongThread(JSRuntime * rt=0x04415000)  Line 6316
 	mozjs.dll!js_NewContext()  Line 328
 	xul.dll!XPCJSContextStack::GetSafeJSContext()  Line 236
 	xul.dll!XPCPerThreadData::GetDataImpl()  Line 426
 	xul.dll!xpc_UnmarkGrayObject()  Line 181
 	xul.dll!nsXPCWrappedJSClass::DelegatedQueryInterface()  Line 661
 	xul.dll!nsXPCWrappedJS::QueryInterface()  Line 158
 	xul.dll!nsQueryReferent::operator()()  Line 88
 	xul.dll!nsCOMPtr_base::assign_from_helper()  Line 150
 	xul.dll!nsCOMPtr<nsIObserver>::nsCOMPtr<nsIObserver>()  Line 676
 	xul.dll!PrefCallback::GetObserver()
 	xul.dll!nsPrefBranch::NotifyObserver()  Line 660
 	xul.dll!pref_DoCallback()  Line 944
 	xul.dll!PREF_ClearUserPref()  Line 619
 	xul.dll!nsPrefBranch::ClearUserPref()  Line 442
 	ntdll.dll!_ZwReleaseSemaphore@12()  + 0xc bytes	
 	kernel32.dll!_ConDllInitialize@8()  + 0x44 bytes	
 	kernel32.dll!__BaseDllInitialize@12()  + 0x7f bytes	
 	0541f660()	
 	ntdll.dll!_NtTestAlert@0()  + 0xc bytes	
 	ntdll.dll!_NtContinue@8()  + 0xc bytes	
 	ntdll.dll!_LdrInitializeThunk@8()  + 0x1a bytes	
 	90909090()

For one thing, khuey says nsPrefBranch shouldn't called off the main thread at all.  For another, the base of this thread doesn't look like any of the ones mozilla creates.  khuey suggests that malware or an addon is creating the thread and identified at least two common trojans in the modules lists of several reports: bug 715744 and bug 715748.

I filed this bug to let crash-watching folk know about the issue.  JS_AbortIfWrongThread can be easily backed (by design), so that is a last resort if this ends up crashing a lot of people.  However, there is a long convoy of good stuff waiting for the assert to stick in release, so I really hope we can avoid doing this.  Hopefully, we can blocklist our way to an acceptable crash volume.

Lastly, there is (so far) one known non-malware case of an extension hitting this assert: bug 704249 (IcedTea plugin for a particular applet).
Severity: normal → critical
Crash Signature: [@ CrashInJS | JS_AbortIfWrongThread] → [@ CrashInJS] [@ CrashInJS | JS_AbortIfWrongThread]
Depends on: 715713
Keywords: crash
Version: unspecified → 10 Branch
Depends on: 715744, 715748
(In reply to Luke Wagner [:luke] from comment #0)
> JS_AbortIfWrongThread can be easily backed (by design), so that is a last
> resort if this ends up crashing a lot of people.
It's #7 top crasher in 10.0b3.
Keywords: topcrash
Is there any data for what percent of those crashes are due to malware or bad addons we can blocklist?
(In reply to Luke Wagner [:luke] from comment #2)
> Is there any data for what percent of those crashes are due to malware or
> bad addons we can blocklist?
First Beta correlations are usually generated about 16 days before the release, so it means there will be there on Jan 15th.
Bug 714041 generated correlations for 10.0b3 on Jan 11th:
http://people.mozilla.org/~rhelmer/temp/Firefox-10.0b3-correlation/20120111_Firefox_10.0-interesting-modules.txt

Mac crashes are highly correlated to the DivX Browser Plugin (see http://www.divx.com/en/software/mac) that is also in the stack and Windows ones to a Polish program, gemgecko10.dll (see http://www.gemius.com/pl/main), and an unknown DLL, BExternal.dll:
* Mac OS X:
  CrashInJS|EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE (14 crashes)
     79% (11/14) vs.   4% (14/371) DivXBrowserPlugin

* Windows:
  CrashInJS|EXCEPTION_ACCESS_VIOLATION_WRITE (227 crashes)
     55% (124/227) vs.   1% (132/19911) gemgecko10.dll (2.1.0.1)
     44% (101/227) vs.   1% (102/19911) BExternal.dll
  CrashInJS|EXCEPTION_ACCESS_VIOLATION_WRITE (227 crashes)
     51% (116/227) vs.   1% (124/19911) gemgecko@gemius.com
         34% (78/227) vs.   0% (82/19911) 10.0.0.1
          3% (6/227) vs.   0% (6/19911) 8.0.0.1
         14% (32/227) vs.   0% (36/19911) 9.0.0.1
Can we add more wrong-thread aborts so we would have more detailed coverage of what exactly is wrong like aborting early in nsPrefBranch? This way we would have more detailed crash reports plus for a a particular top-abort we would have an option for reporting error, not aborting.
(In reply to Igor Bukanov from comment #5)
Currently these aborts happen at about the earliest possible time (when creating a context or associating an existing context with a new thread).  If we added more aborts, I don't think they would be hit because we'd be hitting the current aborts first.
A recent correlation file for Beta 4 confirms the tendencies described in comment 4 for Windows (see http://people.mozilla.org/~rhelmer/temp/Firefox-10.0b4-correlation/20120117_Firefox_10.0-interesting-modules.txt):
  CrashInJS|EXCEPTION_ACCESS_VIOLATION_WRITE (230 crashes)
     58% (133/230) vs.   1% (134/12616) gemgecko10.dll (2.1.0.1)
     41% (95/230) vs.   1% (95/12616) BExternal.dll
I checked some crash reports and none contains both the two DLLs, so these two pieces of software represents 99% of crashes.
(In reply to Luke Wagner [:luke] from comment #6)
> (In reply to Igor Bukanov from comment #5)
> Currently these aborts happen at about the earliest possible time (when
> creating a context or associating an existing context with a new thread).

I meant adding checks to non-JS code to verify that it runs on the main thread.
Depends on: 721264
On Windows, it's now only #63 top crasher in 10.0.
On Mac, it's #2 top crasher in 10.0 and is still correlated to the DivX player plugin.
On Linux, it's #1 top crasher in 10.0 and is correlated to the IcedTea plugin. It might related to bug 704249.
Has anyone reached out to the DivX people?
Crash Signature: [@ CrashInJS] [@ CrashInJS | JS_AbortIfWrongThread] → [@ CrashInJS] [@ CrashInJS | JS_AbortIfWrongThread] [@ libpthread-2.13.so@0xee6d]
Depends on: 704249
CCing Kev for DivX contacts.
Would this crash fit here or should I open a new bug for that?

https://crash-stats.mozilla.com/report/index/bp-3ae24631-0e55-4295-a9ae-a882a2120214
That crash doesn't make any sense, _getstringidentifier returns early if it's not on the main thread.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #13)
> That crash doesn't make any sense, _getstringidentifier returns early if
> it's not on the main thread.

Oh, no, it doesn't, it just logs.  That seems ... weird.
(In reply to Sergio Oliveira Campos from comment #12)
> Would this crash fit here or should I open a new bug for that?
It's bug 704249.
(In reply to Scoobidiver from comment #15)
> (In reply to Sergio Oliveira Campos from comment #12)
> > Would this crash fit here or should I open a new bug for that?
> It's bug 704249.

Ok. Thanks!
This is topcrash #3 in 11.0b3 now: https://crash-stats.mozilla.com/topcrasher/byversion/Firefox/11.0b3

Do we have any clue how to fix this?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #17)
> Do we have any clue how to fix this?

Bug 715744 would go a long way here.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #17)
> This is topcrash #3 in 11.0b3 now:
> https://crash-stats.mozilla.com/topcrasher/byversion/Firefox/11.0b3
> 
> Do we have any clue how to fix this?

This crash signature indicates that an add-on is doing bad stuff, so it's not a bug in our code. To fix, we need to reach out to add-on authors and get them to fix their threading.
(In reply to David Mandelin from comment #19)
> This crash signature indicates that an add-on is doing bad stuff, so it's
> not a bug in our code. To fix, we need to reach out to add-on authors and
> get them to fix their threading.

Well, I don't see any strong pointers to add-ons in the correlations, but loaded libraries ("modules") speak a strong language:

Correlations for Firefox 11.0 on Windows, 2012-02-27:

Add-ons:
  CrashInJS|EXCEPTION_ACCESS_VIOLATION_WRITE (956 crashes)
     99% (945/956) vs.  79% (17813/22655) testpilot@labs.mozilla.com (Mozilla Labs - Test Pilot, https://addons.mozilla.org/addon/13661)
     15% (145/956) vs.   8% (1722/22655) ffxtlbr@babylon.com

Modules:
  CrashInJS|EXCEPTION_ACCESS_VIOLATION_WRITE (956 crashes)
    100% (955/956) vs.   4% (956/22655) BExternal.dll
     78% (744/956) vs.  26% (5931/22655) samlib.dll
     65% (617/956) vs.  19% (4323/22655) rasman.dll
     64% (615/956) vs.  19% (4296/22655) rasapi32.dll
     65% (622/956) vs.  21% (4736/22655) rtutils.dll
     89% (850/956) vs.  48% (10900/22655) ntmarta.dll
     91% (867/956) vs.  51% (11477/22655) apphelp.dll
     99% (951/956) vs.  61% (13902/22655) shdocvw.dll

BExternal.dll has a blocker request up and is known to be connected to this one already, see bug 715744 comment #6 - a strong argument for doing this for 11 still, as this signature is still topcrash #3 in 11.0b4.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #20)

> Add-ons:
>   CrashInJS|EXCEPTION_ACCESS_VIOLATION_WRITE (956 crashes)
>      99% (945/956) vs.  79% (17813/22655) testpilot@labs.mozilla.com
> (Mozilla Labs - Test Pilot, https://addons.mozilla.org/addon/13661)
>      15% (145/956) vs.   8% (1722/22655) ffxtlbr@babylon.com
> 

Does this imply that Test Pilot is doing something wrong? I just looked at the 1.2 source and it doesn't appear to be doing anything threading-related. Am I just misreading the data?
(In reply to Bobby Holley (:bholley) from comment #21)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #20)
> 
> > Add-ons:
> >   CrashInJS|EXCEPTION_ACCESS_VIOLATION_WRITE (956 crashes)
> >      99% (945/956) vs.  79% (17813/22655) testpilot@labs.mozilla.com
> > (Mozilla Labs - Test Pilot, https://addons.mozilla.org/addon/13661)
> >      15% (145/956) vs.   8% (1722/22655) ffxtlbr@babylon.com
> > 
> 
> Does this imply that Test Pilot is doing something wrong?

Very probably not. We install Test Pilot by default on beta, so it's expected to be high and almost 100% on betas. The reason why it's lower (79%) for the total of all crash reports is probably because a number of those are startup crashes happening before we load add-ons or (mostly OOM) crashes where we fail to send the add-on data at all.

The relevant info there is that the babylon toolbar is only in 15% of crashes with this signature, so blocking it won't help. Looking at modules tells us that we need to block the DLL itself.
Crash Signature: [@ CrashInJS] [@ CrashInJS | JS_AbortIfWrongThread] [@ libpthread-2.13.so@0xee6d] → [@ CrashInJS] [@ CrashInJS | JS_AbortIfWrongThread ] [@ libpthread-2.13.so@0xee6d]
No longer depends on: 721264
Depends on: 733813
We have a steep decline of those crashes on 11 and 10, I think we should keep looking for this for a few days and then can call this fixed by the Babylon update (see bug 715744 for numbers). Agreed?
[@ CrashInJS | nsXPConnect::GetSafeJSContext ] is still happening for Mac users on the current beta, so if that signature is related we should keep this open. At some point I think someone filed another bug specific to DivX and the main thread but I cannot find it at the moment.
There are no crashes in versions above 13.0.
Keywords: topcrash
Assignee: general → nobody
Closing because no crash reported since 12 weeks.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.