817396 - crash in PRMJ_Now

Reporter

Description

•

12 years ago

With combined signatures, it's #23 top browser crasher in 19.0a2.
It first showed up in 19.0a1/20121103 and it's discontinuous across builds.

Stack trace are various:
Frame 	Module 	Signature 	Source
0 		@0x33 	
1 	mozjs.dll 	PRMJ_Now 	js/src/prmjtime.cpp:398
2 	mozjs.dll 	BeginMarkPhase 	js/src/jsgc.cpp:3416
3 	mozjs.dll 	IncrementalCollectSlice 	js/src/jsgc.cpp:4362
4 	mozjs.dll 	GCCycle 	js/src/jsgc.cpp:4583
5 	mozjs.dll 	Collect 	js/src/jsgc.cpp:4698
6 	mozjs.dll 	js::GC 	js/src/jsgc.cpp:4723
7 	mozjs.dll 	js::ShrinkingGC 	js/src/jsfriendapi.cpp:176
...

Frame 	Module 	Signature 	Source
0 		@0x798a78 	
1 	mozjs.dll 	PRMJ_Now 	js/src/prmjtime.cpp:398
2 	mozjs.dll 	EndSweepPhase 	js/src/jsgc.cpp:4060
3 	xul.dll 	nsRange::ToString 	content/base/src/nsRange.cpp:2341
4 	mozjs.dll 	GCCycle 	js/src/jsgc.cpp:4569
5 	mozjs.dll 	Collect 	js/src/jsgc.cpp:4684
6 	mozjs.dll 	js::GC 	js/src/jsgc.cpp:4709
...

Frame 	Module 	Signature 	Source
0 		@0x5e110f 	
1 	mozjs.dll 	PRMJ_Now 	js/src/prmjtime.cpp:398
2 	mozjs.dll 	date_now 	js/src/jsdate.cpp:1261
3 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:369
4 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:2331
5 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:326
...

Frame 	Module 	Signature 	Source
0 		@0x21169e 	
1 	mozjs.dll 	PRMJ_Now 	js/src/prmjtime.cpp:398
2 	mozjs.dll 	js::gcstats::Statistics::endPhase 	js/src/gc/Statistics.cpp:621
3 	mozjs.dll 	IncrementalCollectSlice 	js/src/jsgc.cpp:4407
4 	mozjs.dll 	Collect 	js/src/jsgc.cpp:4684

Frame 	Module 	Signature 	Source
0 		@0x278a78 	
1 	mozjs.dll 	PRMJ_Now 	js/src/prmjtime.cpp:398
2 	mozjs.dll 	js::MaybeGC 	js/src/jsgc.cpp:2673
3 	xul.dll 	nsJSContext::ScriptEvaluated 	dom/base/nsJSEnvironment.cpp:2818
4 	xul.dll 	nsJSContext::CallEventHandler 	dom/base/nsJSEnvironment.cpp:1962
5 	xul.dll 	nsGlobalWindow::RunTimeoutHandler 	dom/base/nsGlobalWindow.cpp:9675
6 	xul.dll 	nsGlobalWindow::RunTimeout 	dom/base/nsGlobalWindow.cpp:9924
...

Frame 	Module 	Signature 	Source
0 		@0x5ac658 	
1 		@0x730074 	
2 	mozjs.dll 	PRMJ_Now 	js/src/prmjtime.cpp:445
3 	xul.dll 	nsIFrame::GetFrameSelection 	layout/generic/nsFrame.cpp:5595
4 	xul.dll 	nsEventTargetChainItem::HandleEventTargetChain 	content/events/src/nsEventDispatcher.cpp:339
5 	xul.dll 	nsEventTargetChainItem::HandleEventTargetChain 	content/events/src/nsEventDispatcher.cpp:371
...

More reports at:
https://crash-stats.mozilla.com/report/list?signature=PRMJ_Now%28%29
https://crash-stats.mozilla.com/report/list?signature=%400x0+|+PRMJ_Now%28%29

Benjamin Smedberg

Comment 1

•

12 years ago

Looked at https://crash-stats.mozilla.com/report/index/3bcc929e-c13a-404b-b47b-851af2121203

We appear to be in the process of shutting down XPCOM, calling the xpcom-shutdown-thread notification (which happens after xpcom-shutdown). The main thread is calling RuntimeService::Observe (dom/workers)

http://hg.mozilla.org/releases/mozilla-aurora/annotate/2767c038198b/dom/workers/RuntimeService.cpp#l1370 and we appear to be in this nested event loop:
http://hg.mozilla.org/releases/mozilla-aurora/annotate/2767c038198b/dom/workers/RuntimeService.cpp#l1148

Reading tealeaves, I think that the crash is actually occuring on this line: http://hg.mozilla.org/releases/mozilla-aurora/annotate/2767c038198b/js/src/prmjtime.cpp#l393

But the only way I can really see that happening is if we called PRMJ_Shutdown (via JS_Shutdown). This is called from XPCJSRuntime::~XPCJSRuntime() but it's hard to imagine that we wouldn't still have an XPCJSRuntime at this point during shutdown.

Andrew McCreight [:mccr8]

Comment 2

•

12 years ago

Those crashes in the GC are probably from trying to compute how long the phase took for telemetry purposes, which doesn't seem super essential at shutdown time.

Andrew McCreight [:mccr8]

Comment 3

•

12 years ago

Bill, can you take a look at this? If you don't have time, I can try to make some headway, and you can assign it to me.

Assignee: general → wmccloskey

Andrew McCreight [:mccr8]

Updated

•

12 years ago

Summary: crash in PRMJ_Now → crash in PRMJ_Now during shutdown GC

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 4

•

12 years ago

Many of these seem not to be shutdown crashes. For example, this one happens in a worker:
https://crash-stats.mozilla.com/report/index/026ebd4a-166d-41c6-8337-0a6752121129

I also don't understand why this would have suddenly started happening. We haven't changed anything fundamental related to when we call PRMJ_Now.

It seems possible that someone else is corrupting data used by this code. In any case, I'm not sure what we could do here.

Assignee: wmccloskey → continuation

Summary: crash in PRMJ_Now during shutdown GC → crash in PRMJ_Now

Benjamin Smedberg

Comment 5

•

12 years ago

In that particular crash, two different threads are *both* calling PRMJ_Now:

Crashing thread from JSCompartment::sweep
Main thread from BeginSweepingCompartmentGroup

Is there a race condition related to the PR_CallOnce initialization of PRMJ_Now that we're triggering more often?

Andrew McCreight [:mccr8]

Comment 6

•

12 years ago

I looked at all of the [@ PRMJ_Now() ] crashes for the last week. Pretty much all of the stacks involved GC. As Bill said, not a lot of them are at shutdown. (The ones with 0 at the front that I've looked at so far don't seem to be as GC heavy for whatever reason.)

The calls from inside js::MaybeGC is in the logic that decides whether to trigger the GC, which is less superficial than the stats code.

I saw a few crash stacks where thread 1 looked like the following (but maybe that's just something that the crash reporter calls to generate its report...):

KiFastSystemCallRet
CheckMemAvailable // calling GlobalMemoryStatusEx

(That was added in Dec 2011)

https://crash-stats.mozilla.com/report/index/c2006470-2b05-49de-940e-a64b82121201

Benjamin Smedberg

Comment 7

•

12 years ago

Thread 1 from comment 6 doesn't look suspicious, I'm pretty sure that is just the crash reporter thread (which is triggering the memory tracker hook code, but that doesn't look bad).

I'm going to load a minidump into MSVC and see if the stacks are different or provide more detail: I'd really like to confirm my theory that we're crashing at MUTEX_UNLOCK and not actually under GetSystemTimeAdjustment.

Benjamin Smedberg

Comment 8

•

12 years ago

I was wrong. We are actually calling GetSystemTimeAdjustment:

00C13D6B  call        dword ptr ds:[51504010h]

According to the disassembler, this should be:

  100A3D6B: FF 15 10 40 26 10  call        dword ptr [__imp__GetSystemTimeAdjust
ment@12]

According to the memory dump from MSVC, the memory is actually:

0x00C13D6B  ff 15 10 40 50 51

According to MSVC, the correct relocated address would be

__imp__GetSystemTimeAdjustment@12	0x00dd4010

scoobidiver, can you check DLL correlations? This almost feels like injected DLLs mucking about, although I'm not sure.

Also, did we change anything about the Windows SDK or compiler toolchain in the regression window? The other possibility is that this is PGO-related, which could explain some of the discontinuity across builds, but I don't see anything wrong in the *binary*.

If anyone who can use a debugger is experiencing this, we can probably track it down pretty quickly using data breakpoints.

Andrew McCreight [:mccr8]

Comment 9

•

12 years ago

This doesn't really sound like anything I can help with, though I'll continue to poke around a little in crash-stats...

Assignee: continuation → general

Andrew McCreight [:mccr8]

Updated

•

12 years ago

Keywords: testcase-wanted

Scoobidiver (away)

Reporter

Comment 10

•

11 years ago

With combined signatures, it's #16 top browser crasher in 19.0.2 and 20.0b6, #85 in 21.0a2, and #48 in 22.0a1.

Based on the variety of stack traces (different issues?), should we add PRMJ_Now to the prefix skiplist?

status-firefox21: --- → affected

status-firefox22: --- → affected

Flags: needinfo?(continuation)

Keywords: topcrash

Andrew McCreight [:mccr8]

Comment 11

•

11 years ago

bsmedberg is a better person to ask

Flags: needinfo?(continuation) → needinfo?(benjamin)

Benjamin Smedberg

Comment 12

•

11 years ago

Given what we know about the bug, I think this is all one issue and PRMJ_Now should not be in the skip or appendlist.

Flags: needinfo?(benjamin)

Scoobidiver (away)

Reporter

Updated

•

11 years ago

status-firefox23: --- → affected

Scoobidiver (away)

Reporter

Comment 13

•

11 years ago

Adding the Mac signature: https://crash-stats.mozilla.com/report/list?signature=IncrementalCollectSlice

Crash Signature: [@ PRMJ_Now() ] [@ @0x0 | PRMJ_Now()] → [@ PRMJ_Now() ] [@ @0x0 | PRMJ_Now()] [@ IncrementalCollectSlice ]

OS: Windows 7 → All

Hardware: x86 → All

Scoobidiver (away)

Reporter

Updated

•

11 years ago

status-firefox24: --- → affected

status-firefox25: --- → affected

Robert Kaiser

Updated

•

11 years ago

Keywords: topcrash

(Away)

Comment 14

•

11 years ago

Currently #20 crash on 25.0. 

There is definitely malware here, at least in the 0x0 variant of the signature. All the dumps have a pair of modules with a random name and sequential numbers, like UGQJ.006/UGQJ.007 or ELV.01/ELV.02.
__imp__GetSystemTimeAdjustment and nearby imports have been redirected, usually off into nowhere (as far as the dump can see, at least) but one time I saw an import point into one of these modules. 

I wonder what the vector for these things is. Injection? AppInit?

(Away)

Comment 15

•

10 years ago

Currently #37 on v26, #21 on v27. Very low but nonzero counts on versions 28 and 29 (perhaps due to the different user base). Still correlated with the suspected malware modules.

status-firefox19: affected → wontfix

status-firefox20: affected → wontfix

status-firefox21: affected → wontfix

status-firefox22: affected → wontfix

status-firefox23: affected → wontfix

status-firefox24: affected → wontfix

status-firefox25: affected → wontfix

status-firefox26: --- → affected

status-firefox27: --- → affected

status-firefox28: --- → affected

status-firefox29: --- → affected

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Assignee: general → nobody

BMO Automation

Updated

•

9 years ago

Crash Signature: [@ PRMJ_Now() ] [@ @0x0 | PRMJ_Now()] [@ IncrementalCollectSlice ] → [@ PRMJ_Now() ] [@ @0x0 | PRMJ_Now()] [@ IncrementalCollectSlice ] [@ PRMJ_Now ] [@ @0x0 | PRMJ_Now]

Paul Pasca[Away. Please needinfo? Paul Oiegas [:pauloiegasSV]]

Comment 16

•

8 years ago

Some of this crashes are still present on the latest 44.0.2 release version. I don't think we should close this bug until the crash signature completely disappears. Some steps to reproduce or a test case should help us reduce this crash signature.

Andrei Purice

Comment 17

•

3 years ago

Marking this as Resolved > Worksforme since there are no crashes with this signature reported in the last 6 months.

Status: NEW → RESOLVED

Closed: 3 years ago

Resolution: --- → WORKSFORME