Open Bug 817396 Opened 8 years ago Updated 3 years ago

crash in PRMJ_Now

Categories

(Core :: JavaScript Engine, defect)

19 Branch
defect
Not set
critical

Tracking

()

Tracking Status
firefox18 --- unaffected
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- wontfix
firefox22 --- wontfix
firefox23 --- wontfix
firefox24 --- wontfix
firefox25 --- wontfix
firefox26 --- affected
firefox27 --- affected
firefox28 --- affected
firefox29 --- affected

People

(Reporter: scoobidiver, Unassigned)

Details

(Keywords: crash, regression, testcase-wanted)

Crash Data

With combined signatures, it's #23 top browser crasher in 19.0a2.
It first showed up in 19.0a1/20121103 and it's discontinuous across builds.

Stack trace are various:
Frame 	Module 	Signature 	Source
0 		@0x33 	
1 	mozjs.dll 	PRMJ_Now 	js/src/prmjtime.cpp:398
2 	mozjs.dll 	BeginMarkPhase 	js/src/jsgc.cpp:3416
3 	mozjs.dll 	IncrementalCollectSlice 	js/src/jsgc.cpp:4362
4 	mozjs.dll 	GCCycle 	js/src/jsgc.cpp:4583
5 	mozjs.dll 	Collect 	js/src/jsgc.cpp:4698
6 	mozjs.dll 	js::GC 	js/src/jsgc.cpp:4723
7 	mozjs.dll 	js::ShrinkingGC 	js/src/jsfriendapi.cpp:176
...

Frame 	Module 	Signature 	Source
0 		@0x798a78 	
1 	mozjs.dll 	PRMJ_Now 	js/src/prmjtime.cpp:398
2 	mozjs.dll 	EndSweepPhase 	js/src/jsgc.cpp:4060
3 	xul.dll 	nsRange::ToString 	content/base/src/nsRange.cpp:2341
4 	mozjs.dll 	GCCycle 	js/src/jsgc.cpp:4569
5 	mozjs.dll 	Collect 	js/src/jsgc.cpp:4684
6 	mozjs.dll 	js::GC 	js/src/jsgc.cpp:4709
...

Frame 	Module 	Signature 	Source
0 		@0x5e110f 	
1 	mozjs.dll 	PRMJ_Now 	js/src/prmjtime.cpp:398
2 	mozjs.dll 	date_now 	js/src/jsdate.cpp:1261
3 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:369
4 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:2331
5 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:326
...

Frame 	Module 	Signature 	Source
0 		@0x21169e 	
1 	mozjs.dll 	PRMJ_Now 	js/src/prmjtime.cpp:398
2 	mozjs.dll 	js::gcstats::Statistics::endPhase 	js/src/gc/Statistics.cpp:621
3 	mozjs.dll 	IncrementalCollectSlice 	js/src/jsgc.cpp:4407
4 	mozjs.dll 	Collect 	js/src/jsgc.cpp:4684

Frame 	Module 	Signature 	Source
0 		@0x278a78 	
1 	mozjs.dll 	PRMJ_Now 	js/src/prmjtime.cpp:398
2 	mozjs.dll 	js::MaybeGC 	js/src/jsgc.cpp:2673
3 	xul.dll 	nsJSContext::ScriptEvaluated 	dom/base/nsJSEnvironment.cpp:2818
4 	xul.dll 	nsJSContext::CallEventHandler 	dom/base/nsJSEnvironment.cpp:1962
5 	xul.dll 	nsGlobalWindow::RunTimeoutHandler 	dom/base/nsGlobalWindow.cpp:9675
6 	xul.dll 	nsGlobalWindow::RunTimeout 	dom/base/nsGlobalWindow.cpp:9924
...

Frame 	Module 	Signature 	Source
0 		@0x5ac658 	
1 		@0x730074 	
2 	mozjs.dll 	PRMJ_Now 	js/src/prmjtime.cpp:445
3 	xul.dll 	nsIFrame::GetFrameSelection 	layout/generic/nsFrame.cpp:5595
4 	xul.dll 	nsEventTargetChainItem::HandleEventTargetChain 	content/events/src/nsEventDispatcher.cpp:339
5 	xul.dll 	nsEventTargetChainItem::HandleEventTargetChain 	content/events/src/nsEventDispatcher.cpp:371
...

More reports at:
https://crash-stats.mozilla.com/report/list?signature=PRMJ_Now%28%29
https://crash-stats.mozilla.com/report/list?signature=%400x0+|+PRMJ_Now%28%29
Looked at https://crash-stats.mozilla.com/report/index/3bcc929e-c13a-404b-b47b-851af2121203

We appear to be in the process of shutting down XPCOM, calling the xpcom-shutdown-thread notification (which happens after xpcom-shutdown). The main thread is calling RuntimeService::Observe (dom/workers)

http://hg.mozilla.org/releases/mozilla-aurora/annotate/2767c038198b/dom/workers/RuntimeService.cpp#l1370 and we appear to be in this nested event loop:
http://hg.mozilla.org/releases/mozilla-aurora/annotate/2767c038198b/dom/workers/RuntimeService.cpp#l1148

Reading tealeaves, I think that the crash is actually occuring on this line: http://hg.mozilla.org/releases/mozilla-aurora/annotate/2767c038198b/js/src/prmjtime.cpp#l393

But the only way I can really see that happening is if we called PRMJ_Shutdown (via JS_Shutdown). This is called from XPCJSRuntime::~XPCJSRuntime() but it's hard to imagine that we wouldn't still have an XPCJSRuntime at this point during shutdown.
Those crashes in the GC are probably from trying to compute how long the phase took for telemetry purposes, which doesn't seem super essential at shutdown time.
Bill, can you take a look at this? If you don't have time, I can try to make some headway, and you can assign it to me.
Assignee: general → wmccloskey
Summary: crash in PRMJ_Now → crash in PRMJ_Now during shutdown GC
Many of these seem not to be shutdown crashes. For example, this one happens in a worker:
https://crash-stats.mozilla.com/report/index/026ebd4a-166d-41c6-8337-0a6752121129

I also don't understand why this would have suddenly started happening. We haven't changed anything fundamental related to when we call PRMJ_Now.

It seems possible that someone else is corrupting data used by this code. In any case, I'm not sure what we could do here.
Assignee: wmccloskey → continuation
Summary: crash in PRMJ_Now during shutdown GC → crash in PRMJ_Now
In that particular crash, two different threads are *both* calling PRMJ_Now:

Crashing thread from JSCompartment::sweep
Main thread from BeginSweepingCompartmentGroup

Is there a race condition related to the PR_CallOnce initialization of PRMJ_Now that we're triggering more often?
I looked at all of the [@ PRMJ_Now() ] crashes for the last week. Pretty much all of the stacks involved GC. As Bill said, not a lot of them are at shutdown. (The ones with 0 at the front that I've looked at so far don't seem to be as GC heavy for whatever reason.)

The calls from inside js::MaybeGC is in the logic that decides whether to trigger the GC, which is less superficial than the stats code.

I saw a few crash stacks where thread 1 looked like the following (but maybe that's just something that the crash reporter calls to generate its report...):

KiFastSystemCallRet
CheckMemAvailable // calling GlobalMemoryStatusEx

(That was added in Dec 2011)

https://crash-stats.mozilla.com/report/index/c2006470-2b05-49de-940e-a64b82121201
Thread 1 from comment 6 doesn't look suspicious, I'm pretty sure that is just the crash reporter thread (which is triggering the memory tracker hook code, but that doesn't look bad).

I'm going to load a minidump into MSVC and see if the stacks are different or provide more detail: I'd really like to confirm my theory that we're crashing at MUTEX_UNLOCK and not actually under GetSystemTimeAdjustment.
I was wrong. We are actually calling GetSystemTimeAdjustment:

00C13D6B  call        dword ptr ds:[51504010h]

According to the disassembler, this should be:

  100A3D6B: FF 15 10 40 26 10  call        dword ptr [__imp__GetSystemTimeAdjust
ment@12]

According to the memory dump from MSVC, the memory is actually:

0x00C13D6B  ff 15 10 40 50 51

According to MSVC, the correct relocated address would be

__imp__GetSystemTimeAdjustment@12	0x00dd4010

scoobidiver, can you check DLL correlations? This almost feels like injected DLLs mucking about, although I'm not sure.

Also, did we change anything about the Windows SDK or compiler toolchain in the regression window? The other possibility is that this is PGO-related, which could explain some of the discontinuity across builds, but I don't see anything wrong in the *binary*.

If anyone who can use a debugger is experiencing this, we can probably track it down pretty quickly using data breakpoints.
This doesn't really sound like anything I can help with, though I'll continue to poke around a little in crash-stats...
Assignee: continuation → general
With combined signatures, it's #16 top browser crasher in 19.0.2 and 20.0b6, #85 in 21.0a2, and #48 in 22.0a1.

Based on the variety of stack traces (different issues?), should we add PRMJ_Now to the prefix skiplist?
Flags: needinfo?(continuation)
Keywords: topcrash
bsmedberg is a better person to ask
Flags: needinfo?(continuation) → needinfo?(benjamin)
Given what we know about the bug, I think this is all one issue and PRMJ_Now should not be in the skip or appendlist.
Flags: needinfo?(benjamin)
Adding the Mac signature: https://crash-stats.mozilla.com/report/list?signature=IncrementalCollectSlice
Crash Signature: [@ PRMJ_Now() ] [@ @0x0 | PRMJ_Now()] → [@ PRMJ_Now() ] [@ @0x0 | PRMJ_Now()] [@ IncrementalCollectSlice ]
OS: Windows 7 → All
Hardware: x86 → All
Keywords: topcrash
Currently #20 crash on 25.0. 

There is definitely malware here, at least in the 0x0 variant of the signature. All the dumps have a pair of modules with a random name and sequential numbers, like UGQJ.006/UGQJ.007 or ELV.01/ELV.02.
__imp__GetSystemTimeAdjustment and nearby imports have been redirected, usually off into nowhere (as far as the dump can see, at least) but one time I saw an import point into one of these modules. 

I wonder what the vector for these things is. Injection? AppInit?
Currently #37 on v26, #21 on v27. Very low but nonzero counts on versions 28 and 29 (perhaps due to the different user base). Still correlated with the suspected malware modules.
Assignee: general → nobody
Crash Signature: [@ PRMJ_Now() ] [@ @0x0 | PRMJ_Now()] [@ IncrementalCollectSlice ] → [@ PRMJ_Now() ] [@ @0x0 | PRMJ_Now()] [@ IncrementalCollectSlice ] [@ PRMJ_Now ] [@ @0x0 | PRMJ_Now]
Some of this crashes are still present on the latest 44.0.2 release version. I don't think we should close this bug until the crash signature completely disappears. Some steps to reproduce or a test case should help us reduce this crash signature.
You need to log in before you can comment on or make changes to this bug.