Startup crash in mozJSComponentLoader::GetSharedGlobal

REOPENED
Unassigned

Status

()

defect
P1
critical
REOPENED
2 years ago
8 months ago

People

(Reporter: philipp, Unassigned)

Tracking

(Blocks 1 bug, {crash, regression})

57 Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox-esr60 affected, firefox56 unaffected, firefox57 wontfix, firefox58 wontfix, firefox59 wontfix, firefox60 wontfix, firefox61 fix-optional, firefox62 fix-optional)

Details

(crash signature)

Attachments

(1 attachment)

Reporter

Description

2 years ago
This bug was filed from the Socorro interface and is 
report bp-c5858d66-e5bf-48a8-accc-8a0de0171001.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozJSComponentLoader::GetSharedGlobal(JSContext*) 	js/xpconnect/loader/mozJSComponentLoader.cpp:584
1 	xul.dll 	mozilla::ScriptPreloader::DecodeNextBatch(unsigned int) 	js/xpconnect/loader/ScriptPreloader.cpp:1006
2 	xul.dll 	mozilla::ScriptPreloader::InitCacheInternal() 	js/xpconnect/loader/ScriptPreloader.cpp:523
3 	xul.dll 	mozilla::ScriptPreloader::InitCache(nsTSubstring<char16_t> const&) 	js/xpconnect/loader/ScriptPreloader.cpp:425
4 	xul.dll 	mozilla::ScriptPreloader::GetChildSingleton() 	js/xpconnect/loader/ScriptPreloader.cpp:137
5 	xul.dll 	mozilla::ScriptPreloader::GetSingleton() 	js/xpconnect/loader/ScriptPreloader.cpp:94
6 	xul.dll 	NS_InitXPCOM2 	xpcom/build/XPCOMInit.cpp:711
7 	xul.dll 	ScopedXPCOMStartup::Initialize() 	toolkit/xre/nsAppRunner.cpp:1587
8 	xul.dll 	XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4861
9 	xul.dll 	XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4960
10 	xul.dll 	mozilla::BootstrapImpl::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/Bootstrap.cpp:45
11 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:115
12 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
13 	kernel32.dll 	BaseThreadInitThunk 	
14 	ntdll.dll 	__RtlUserThreadStart 	
15 	ntdll.dll 	_RtlUserThreadStart

this cross-platform crash signature is newly showing up in firefox 57 with "MOZ_RELEASE_ASSERT(globalObj)" that got added in bug 1381976.
Hm. This is worrying. The only reason we should expect to fail to create a global at this point is OOM, but all of these users appear to have plenty of available memory.

There's really no way to make this a non-fatal error, though. If we can't create that module global, we can't load JS components, which means we can't start the browser. The only real hope is that it might succeed when we call it a bit later to actually execute the script, rather than just to compile it.

I'll see if I can add some additional assertions to pinpoint exactly where this is failing.
Assignee: nobody → kmaglione+bmo
It looks like this is in the main process, which seems even weirder. Maybe this would have shown up as another crash before shared JSM modules?
(In reply to Andrew McCreight (PTO-ish Oct 1 - 12) [:mccr8] from comment #2)
> It looks like this is in the main process, which seems even weirder. Maybe
> this would have shown up as another crash before shared JSM modules?

Yeah, that's what I'm thinking. It's possible that this is happening now because we're creating the global earlier now, and wouldn't have happened before. But if so, I'd expect it to fail every time, not just for certain users.

Also, before the shared global changes, we would have treated this as a non-fatal error, and passed it on to whoever tried to load the component/module. But failure to load the components we load at startup causes us to abort startup. And failure to load other modules during startup generally makes the browser unusable. So we wouldn't have been in a better position.
Reporter

Updated

2 years ago
See Also: → 1404743
Hey Andy, looks like Kris is on PTO. Suggestions on what to do with htis? Low volume crash but new in 57.
Flags: needinfo?(amckay)
Priority: -- → P1
I'm not on PTO, just still looking into options for debugging this.

So far, it looks like the odds are that this isn't a new issue in 57, just a new failure mode.
Flags: needinfo?(amckay)
Blocks: 1404743
Comment hidden (mozreview-request)

Comment 7

2 years ago
mozreview-review
Comment on attachment 8916165 [details]
Bug 1404741: Don't call mozJSComponentLoader::CompilationScope during URLPreloader critical section.

https://reviewboard.mozilla.org/r/187412/#review192476

::: commit-message-ccb09:3
(Diff revision 1)
> +Bug 1404741: Don't call mozJSComponentLoader::CompilationScope during URLPreloader critical section. r?mccr8
> +
> +The URLPreloader's initialization code access the Omnijar cache off-main

micronit: accesses

::: js/xpconnect/loader/ScriptPreloader.cpp:419
(Diff revision 1)
>  
>      if (!XRE_IsParentProcess()) {
>          return Ok();
>      }
>  
> +    // Grab the compilation scope before initializing the URLPreloader, it's not

nit: this should be "because it's not" or whatever
Attachment #8916165 - Flags: review?(continuation) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/a5ab6b153cccc38a2fae62a529923f8370734c39
Bug 1404741: Don't call mozJSComponentLoader::CompilationScope during URLPreloader critical section. r=mccr8
https://hg.mozilla.org/mozilla-central/rev/a5ab6b153ccc
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Comment on attachment 8916165 [details]
Bug 1404741: Don't call mozJSComponentLoader::CompilationScope during URLPreloader critical section.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1381976
[User impact if declined]: This causes unpredictable startup crashes for some users, due to a race condition.
[Is this code covered by automated tests?]: It is exercised by automated tests, but there are no tests for this specific problem, since it's a race condition.
[Has the fix been verified in Nightly?]: N/A
[Needs manual test from QE? If yes, steps to reproduce]: No. This is a race condition, which shows up rarely, mostly in crashstats.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: It simply moves an operation to a slightly earlier point in startup, when it doesn't risk causing a data race with a background thread.
[String changes made/needed]: None.
Attachment #8916165 - Flags: approval-mozilla-beta?
Comment on attachment 8916165 [details]
Bug 1404741: Don't call mozJSComponentLoader::CompilationScope during URLPreloader critical section.

Fix for a new crash, Beta57+
Attachment #8916165 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Reporter

Comment 13

2 years ago
the crash signature is still present in beta 7 with that patch...
Flags: needinfo?(kmaglione+bmo)
(In reply to [:philipp] from comment #13)
> the crash signature is still present in beta 7 with that patch...

Thanks. The crash stacks are different now, though, and the background threads no longer have nsZipArchive::GetItem in their stacks. So hopefully this at least fixed bug 1404743, but we still need to sort out what's causing the global creation to fail.
Status: RESOLVED → REOPENED
Flags: needinfo?(kmaglione+bmo)
Resolution: FIXED → ---
Every crash in 57b7 that I've looked at so far has ZVFORT32.DLL loaded, which seems to be part of a software suite called "Net Protector".

If I had to bet at this point, that's where I'd put my money.

Comment 16

2 years ago
Adam, it sounds like this crash might be caused by a 3rd party, would you be able to help?
Flags: needinfo?(astevenson)
Yes, will reach out and update when I hear back.
Flags: needinfo?(astevenson)
Got a response that their engineering team is taking a look at this.
Reporter

Comment 19

2 years ago
url correlations on beta do not indicate that this is correlated to particular (rare) dll modules as far as i can see:
https://crash-stats.mozilla.com/signature/?signature=mozJSComponentLoader%3A%3AGetSharedGlobal#correlations
Looking at some of these from 57 release, I don't see the ZVFORT32.DLL module in any of them. Some crashes have AVAST, and others don't like to have AV.
I experienced it this morning on my Linux system.
My hard drive was full. Cleaning up didn't fix the issue.

I don't have any AV or stuff like that.

As I can reproduce it everytime, I am happy to help debugging.

Examples:
bp-ac4731c8-cc44-43a8-8796-91d4b0171220
bp-f2e17192-60d3-40ce-9d54-1cbd80171220
Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(continuation)
Sorry for the delay. I was making an effort not to work during the holidays. Can you still reproduce this?

If so, can you try to reproduce it under rr? If you can, we can reverse-step to find the location of the actual failure.

Also, a copy of your profile's startupCache directory would be helpful.
Flags: needinfo?(kmaglione+bmo) → needinfo?(sledru)
Flags: needinfo?(continuation)
I still can reproduce it and rr works with it.
What do I need to do with that? (ie not do make it some on the failure)?

I sent you the profile by email.
Flags: needinfo?(sledru)
I did reproduce it to on Nightly on linux ubuntu 16.04 environment. (might have been related to a drive close to full)
Sounds like this was reproducible? Did anything come of the rr trace in comment 23?
Reporter

Comment 26

Last year
could this issue be the same as bug 1276488? the crashing graphs looks fairly similar (peaks and lows fall on the same days)...
Sorry, I lost track of this last time.

I'm pretty sure this is just another startup cache/disk corruption issue.

I ran into it once when I changed a file in my local build, and a lazy source hook got called to stringify a closure with offsets that didn't make sense in the new version.

We also run into that in bug 1403348, where we get error reports when trying to execute JS files whose contents appear to be corrupt. It's still not clear whether those are a result of XDR corruption or omni jar corruption. I'm still looking into them.

We've also seen other similar reports of network failure causing errors when running Firefox from a network drive. We know that a lot of the XDR decoding crashes we see are disk access errors when accessing mmapped files. Some of those are probably from network failures. The failures to open the app jar that we've seen in a few cases (and could also be responsible for some of these crashes) may be the same issue.

(In reply to [:philipp] from comment #26)
> could this issue be the same as bug 1276488? the crashing graphs looks
> fairly similar (peaks and lows fall on the same days)...

Yeah, assuming this is a disk corruption or failure issue, that seems pretty likely.
Flags: needinfo?(kmaglione+bmo)

Comment 28

Last year
Is this the same bug that has been the top crasher for firefox 59.0.3 for the last couple of days?
https://crash-stats.mozilla.com/topcrashers/?product=Firefox&version=59.0.3&days=3
https://crash-stats.mozilla.com/signature/?date=%3C2018-05-04T14%3A31%3A56%2B00%3A00&date=%3E%3D2018-05-01T14%3A31%3A56%2B00%3A00&product=Firefox&version=59.0.3&signature=mozJSComponentLoader%3A%3AGetSharedGlobal

All of those reports that I clicked open had the same stack trace. It's different from the one pasted here but ends in 	mozJSComponentLoader::GetSharedGlobal(JSContext*).
Over 1000 startup crashes in the last week on release. That's a lot of bad disks...
Still moderately high volume on release 61 (~600 crashes/week). Only 3 or so on beta 62.
I'm fairly certain at this point that this is some sort of corruption. I don't have time to work on omnijar checksums, though, and I'm pretty sure that's what we need.
Assignee: kmaglione+bmo → nobody
I wonder if any of this would be helped by better install integrity. I've completely busted my install when my computer crashed during an auto update of firefox. I needed a full reinstall to fix it. It is possible correlations with updates are due to this.
You need to log in before you can comment on or make changes to this bug.