Closed Bug 817396 Opened 12 years ago Closed 3 years ago

crash in PRMJ_Now

Categories

(Core :: JavaScript Engine, defect)

19 Branch
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox18 --- unaffected
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- wontfix
firefox22 --- wontfix
firefox23 --- wontfix
firefox24 --- wontfix
firefox25 --- wontfix
firefox26 --- affected
firefox27 --- affected
firefox28 --- affected
firefox29 --- affected

People

(Reporter: scoobidiver, Unassigned)

Details

(Keywords: crash, regression, testcase-wanted)

Crash Data

With combined signatures, it's #23 top browser crasher in 19.0a2. It first showed up in 19.0a1/20121103 and it's discontinuous across builds. Stack trace are various: Frame Module Signature Source 0 @0x33 1 mozjs.dll PRMJ_Now js/src/prmjtime.cpp:398 2 mozjs.dll BeginMarkPhase js/src/jsgc.cpp:3416 3 mozjs.dll IncrementalCollectSlice js/src/jsgc.cpp:4362 4 mozjs.dll GCCycle js/src/jsgc.cpp:4583 5 mozjs.dll Collect js/src/jsgc.cpp:4698 6 mozjs.dll js::GC js/src/jsgc.cpp:4723 7 mozjs.dll js::ShrinkingGC js/src/jsfriendapi.cpp:176 ... Frame Module Signature Source 0 @0x798a78 1 mozjs.dll PRMJ_Now js/src/prmjtime.cpp:398 2 mozjs.dll EndSweepPhase js/src/jsgc.cpp:4060 3 xul.dll nsRange::ToString content/base/src/nsRange.cpp:2341 4 mozjs.dll GCCycle js/src/jsgc.cpp:4569 5 mozjs.dll Collect js/src/jsgc.cpp:4684 6 mozjs.dll js::GC js/src/jsgc.cpp:4709 ... Frame Module Signature Source 0 @0x5e110f 1 mozjs.dll PRMJ_Now js/src/prmjtime.cpp:398 2 mozjs.dll date_now js/src/jsdate.cpp:1261 3 mozjs.dll js::InvokeKernel js/src/jsinterp.cpp:369 4 mozjs.dll js::Interpret js/src/jsinterp.cpp:2331 5 mozjs.dll js::RunScript js/src/jsinterp.cpp:326 ... Frame Module Signature Source 0 @0x21169e 1 mozjs.dll PRMJ_Now js/src/prmjtime.cpp:398 2 mozjs.dll js::gcstats::Statistics::endPhase js/src/gc/Statistics.cpp:621 3 mozjs.dll IncrementalCollectSlice js/src/jsgc.cpp:4407 4 mozjs.dll Collect js/src/jsgc.cpp:4684 Frame Module Signature Source 0 @0x278a78 1 mozjs.dll PRMJ_Now js/src/prmjtime.cpp:398 2 mozjs.dll js::MaybeGC js/src/jsgc.cpp:2673 3 xul.dll nsJSContext::ScriptEvaluated dom/base/nsJSEnvironment.cpp:2818 4 xul.dll nsJSContext::CallEventHandler dom/base/nsJSEnvironment.cpp:1962 5 xul.dll nsGlobalWindow::RunTimeoutHandler dom/base/nsGlobalWindow.cpp:9675 6 xul.dll nsGlobalWindow::RunTimeout dom/base/nsGlobalWindow.cpp:9924 ... Frame Module Signature Source 0 @0x5ac658 1 @0x730074 2 mozjs.dll PRMJ_Now js/src/prmjtime.cpp:445 3 xul.dll nsIFrame::GetFrameSelection layout/generic/nsFrame.cpp:5595 4 xul.dll nsEventTargetChainItem::HandleEventTargetChain content/events/src/nsEventDispatcher.cpp:339 5 xul.dll nsEventTargetChainItem::HandleEventTargetChain content/events/src/nsEventDispatcher.cpp:371 ... More reports at: https://crash-stats.mozilla.com/report/list?signature=PRMJ_Now%28%29 https://crash-stats.mozilla.com/report/list?signature=%400x0+|+PRMJ_Now%28%29
Looked at https://crash-stats.mozilla.com/report/index/3bcc929e-c13a-404b-b47b-851af2121203 We appear to be in the process of shutting down XPCOM, calling the xpcom-shutdown-thread notification (which happens after xpcom-shutdown). The main thread is calling RuntimeService::Observe (dom/workers) http://hg.mozilla.org/releases/mozilla-aurora/annotate/2767c038198b/dom/workers/RuntimeService.cpp#l1370 and we appear to be in this nested event loop: http://hg.mozilla.org/releases/mozilla-aurora/annotate/2767c038198b/dom/workers/RuntimeService.cpp#l1148 Reading tealeaves, I think that the crash is actually occuring on this line: http://hg.mozilla.org/releases/mozilla-aurora/annotate/2767c038198b/js/src/prmjtime.cpp#l393 But the only way I can really see that happening is if we called PRMJ_Shutdown (via JS_Shutdown). This is called from XPCJSRuntime::~XPCJSRuntime() but it's hard to imagine that we wouldn't still have an XPCJSRuntime at this point during shutdown.
Those crashes in the GC are probably from trying to compute how long the phase took for telemetry purposes, which doesn't seem super essential at shutdown time.
Bill, can you take a look at this? If you don't have time, I can try to make some headway, and you can assign it to me.
Assignee: general → wmccloskey
Summary: crash in PRMJ_Now → crash in PRMJ_Now during shutdown GC
Many of these seem not to be shutdown crashes. For example, this one happens in a worker: https://crash-stats.mozilla.com/report/index/026ebd4a-166d-41c6-8337-0a6752121129 I also don't understand why this would have suddenly started happening. We haven't changed anything fundamental related to when we call PRMJ_Now. It seems possible that someone else is corrupting data used by this code. In any case, I'm not sure what we could do here.
Assignee: wmccloskey → continuation
Summary: crash in PRMJ_Now during shutdown GC → crash in PRMJ_Now
In that particular crash, two different threads are *both* calling PRMJ_Now: Crashing thread from JSCompartment::sweep Main thread from BeginSweepingCompartmentGroup Is there a race condition related to the PR_CallOnce initialization of PRMJ_Now that we're triggering more often?
I looked at all of the [@ PRMJ_Now() ] crashes for the last week. Pretty much all of the stacks involved GC. As Bill said, not a lot of them are at shutdown. (The ones with 0 at the front that I've looked at so far don't seem to be as GC heavy for whatever reason.) The calls from inside js::MaybeGC is in the logic that decides whether to trigger the GC, which is less superficial than the stats code. I saw a few crash stacks where thread 1 looked like the following (but maybe that's just something that the crash reporter calls to generate its report...): KiFastSystemCallRet CheckMemAvailable // calling GlobalMemoryStatusEx (That was added in Dec 2011) https://crash-stats.mozilla.com/report/index/c2006470-2b05-49de-940e-a64b82121201
Thread 1 from comment 6 doesn't look suspicious, I'm pretty sure that is just the crash reporter thread (which is triggering the memory tracker hook code, but that doesn't look bad). I'm going to load a minidump into MSVC and see if the stacks are different or provide more detail: I'd really like to confirm my theory that we're crashing at MUTEX_UNLOCK and not actually under GetSystemTimeAdjustment.
I was wrong. We are actually calling GetSystemTimeAdjustment: 00C13D6B call dword ptr ds:[51504010h] According to the disassembler, this should be: 100A3D6B: FF 15 10 40 26 10 call dword ptr [__imp__GetSystemTimeAdjust ment@12] According to the memory dump from MSVC, the memory is actually: 0x00C13D6B ff 15 10 40 50 51 According to MSVC, the correct relocated address would be __imp__GetSystemTimeAdjustment@12 0x00dd4010 scoobidiver, can you check DLL correlations? This almost feels like injected DLLs mucking about, although I'm not sure. Also, did we change anything about the Windows SDK or compiler toolchain in the regression window? The other possibility is that this is PGO-related, which could explain some of the discontinuity across builds, but I don't see anything wrong in the *binary*. If anyone who can use a debugger is experiencing this, we can probably track it down pretty quickly using data breakpoints.
This doesn't really sound like anything I can help with, though I'll continue to poke around a little in crash-stats...
Assignee: continuation → general
With combined signatures, it's #16 top browser crasher in 19.0.2 and 20.0b6, #85 in 21.0a2, and #48 in 22.0a1. Based on the variety of stack traces (different issues?), should we add PRMJ_Now to the prefix skiplist?
Flags: needinfo?(continuation)
Keywords: topcrash
bsmedberg is a better person to ask
Flags: needinfo?(continuation) → needinfo?(benjamin)
Given what we know about the bug, I think this is all one issue and PRMJ_Now should not be in the skip or appendlist.
Flags: needinfo?(benjamin)
Crash Signature: [@ PRMJ_Now() ] [@ @0x0 | PRMJ_Now()] → [@ PRMJ_Now() ] [@ @0x0 | PRMJ_Now()] [@ IncrementalCollectSlice ]
OS: Windows 7 → All
Hardware: x86 → All
Keywords: topcrash
Currently #20 crash on 25.0. There is definitely malware here, at least in the 0x0 variant of the signature. All the dumps have a pair of modules with a random name and sequential numbers, like UGQJ.006/UGQJ.007 or ELV.01/ELV.02. __imp__GetSystemTimeAdjustment and nearby imports have been redirected, usually off into nowhere (as far as the dump can see, at least) but one time I saw an import point into one of these modules. I wonder what the vector for these things is. Injection? AppInit?
Currently #37 on v26, #21 on v27. Very low but nonzero counts on versions 28 and 29 (perhaps due to the different user base). Still correlated with the suspected malware modules.
Assignee: general → nobody
Crash Signature: [@ PRMJ_Now() ] [@ @0x0 | PRMJ_Now()] [@ IncrementalCollectSlice ] → [@ PRMJ_Now() ] [@ @0x0 | PRMJ_Now()] [@ IncrementalCollectSlice ] [@ PRMJ_Now ] [@ @0x0 | PRMJ_Now]
Some of this crashes are still present on the latest 44.0.2 release version. I don't think we should close this bug until the crash signature completely disappears. Some steps to reproduce or a test case should help us reduce this crash signature.

Marking this as Resolved > Worksforme since there are no crashes with this signature reported in the last 6 months.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.