Crash in [@ mozilla::StaticPrefs::InitStaticPrefsFromShared]
Categories
(Core :: Preferences: Backend, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox-esr68 | --- | unaffected |
firefox69 | --- | unaffected |
firefox70 | + | fixed |
People
(Reporter: lizzard, Assigned: n.nethercote)
References
Details
(Keywords: crash, regression, topcrash)
Crash Data
Attachments
(1 file)
This bug is for crash report bp-7155c1a1-e483-404c-84ee-facc60190814.
A few crashes suddenly in the 20190812215403 build.
Top 10 frames of crashing thread:
0 xul.dll mozilla::StaticPrefs::InitStaticPrefsFromShared obj-firefox/dist/include/mozilla/StaticPrefList_dom.h:1286
1 xul.dll mozilla::ipc::SharedPreferenceDeserializer::DeserializeFromSharedMemory ipc/glue/ProcessUtils_common.cpp:176
2 xul.dll mozilla::dom::ContentProcess::Init dom/ipc/ContentProcess.cpp:174
3 xul.dll XRE_InitChildProcess toolkit/xre/nsEmbedFunctions.cpp:739
4 firefox.exe static int content_process_main ipc/contentproc/plugin-container.cpp:56
5 firefox.exe static int NS_internal_main browser/app/nsBrowserApp.cpp:267
6 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:131
7 firefox.exe static int __scrt_common_main_seh f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:288
8 kernel32.dll BaseThreadInitThunk
9 ntdll.dll RtlUserThreadStart
Comment 1•5 years ago
|
||
Any ideas? I don't know if it's even possible to tell from these crash reports what went wrong and if there's any reason to blame IPC.
Comment 2•5 years ago
|
||
NI? Nick because this is likely related to the latest changes to the StaticPrefs. There's one thing that's worth mentioning: these crashes happened early during content process startup so they weren't collected properly until I landed the fix for bug 1282776. I had already seen these as unreported crashes on my machine days before (see bug 1448219 comment 6) which is one of the reasons why I speed-landed that fix.
Comment 3•5 years ago
|
||
It looks as if these stopped in the 8-17 nightly. I wonder if something got backed out?
Comment 4•5 years ago
|
||
#2 crash on the 8-21 Linux Nightly, with 16 crashes.
Assignee | ||
Comment 5•5 years ago
|
||
The stack trace is hard to read because there is generated C++ code and macros involved. The actual crash is a failure of one of the diagnostic asserts here, which means that one of the Internals::GetSharedPrefValue()
calls is failing. I can see two possibilities there:
pref_SharedLookup()
succeeds and thenpref->GetValue()
returns an error result, which would be caused byWantValueKind()
failing.pref_SharedLookup()
fails, which would be caused bygSharedMap->Get()
failing.
Unfortunately I don't have a deep understanding of the pref IPC stuff. jya, do you have any ideas?
Comment 6•5 years ago
|
||
The assertion is that outside a parent process any static prefs (be it always or once) must exist in the shared pref map as it must have been read (and set) at least once prior the shared preference map was created in the parent process.
For this assertion to be triggered, it indicates that the static prefs weren't initialised in the main process properly, before the shared pref map global object got created.
As here Preferences::Internals::GetSharedPrefValue returned false (can't find the pref)
Now, I'd be more keen to know on what happens on August 14th so suddenly you have those crashes.
AFAIK, :njn you're the only person who has touched that code around that time. Has anything be changed related to how StaticPrefs are initialised in the main process and when they are initialised?
I've tried to grab some memory dump found in crash-stats, but neither Visual Studio 2019 nor WinDbg gets something of use. Do we know in which process this crash is occurring?
Comment 7•5 years ago
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #6)
Now, I'd be more keen to know on what happens on August 14th so suddenly you have those crashes.
The crashes were already happening but were not being reported until I landed the fix for bug 1282776. That's why they seem to start on that day.
I've tried to grab some memory dump found in crash-stats, but neither Visual Studio 2019 nor WinDbg gets something of use. Do we know in which process this crash is occurring?
They're all happening in content processes.
Comment 8•5 years ago
|
||
This signature spiked again on 8-22. So far we have 1206 crashes, but only 144 installations.
(100.0% in signature vs 01.03% overall) moz_crash_reason = MOZ_DIAGNOSTIC_ASSERT(false) (NS_SUCCEEDED(rv))
Comment 9•5 years ago
|
||
When I checked my minidump files the issue seemed to affect the dom.webdriver.enabled
entry. From what I can tell this pref seems to be unused apart from an entry in Navigator.webidl
. See this search: https://searchfox.org/mozilla-central/search?q=dom.webdriver.enabled&path=
Could this be the cause or I'm I missing something?
Assignee | ||
Comment 10•5 years ago
|
||
I can't see anything special about dom.webdriver.enabled
. It's possible there's a problem that affects a lot of prefs and this just happens to be the unlucky first one.
Comment 11•5 years ago
|
||
This is still crashing extensively in the Windows builds for 8-22.
Comment 12•5 years ago
|
||
Random comment .. I wonder if this is somehow related to bug 1576454. That seems to be related to early-stage allocator crashing, and per comment 2 above, these crashes are also early-in-process and have no obvious explanation.
Assignee | ||
Comment 13•5 years ago
|
||
Random comment .. I wonder if this is somehow related to bug 1576454. That seems to be related to early-stage allocator crashing, and per comment 2 above, these crashes are also early-in-process and have no obvious explanation.
Bug 1576454 now has a clear explanation -- stack overflow due to too much recursion. It appears to be unrelated to this bug.
Comment 14•5 years ago
|
||
Bugbug thinks this bug is a regression, but please revert this change in case of error.
Reporter | ||
Comment 15•5 years ago
|
||
I would love to see this fixed for 70 but won't consider it a blocker despite the high volume since (from comment 7) they were already happening but weren't being reported. It is the top crash other than Shutdown and OOM so I think it should be a high priority.
Comment 16•5 years ago
•
|
||
I just hit a crash with this signature a whole bunch of times in a row, without even noticing that a crash was happening under the hood:
bp-7a051745-85cb-4313-9ff7-575820190829 8/29/19, 11:08 AM
bp-b76447a3-7154-4c95-aefe-351640190829 8/29/19, 11:08 AM
bp-88958db6-1793-4091-a8ed-60c3e0190829 8/29/19, 11:08 AM
bp-39fb396e-8dde-4d96-b00b-154f10190829 8/29/19, 11:06 AM
bp-c37f0dbd-30eb-4a23-b8ce-fe1b40190829 8/29/19, 11:06 AM
bp-260bdbbc-5da3-4d2f-871d-0062e0190829 8/29/19, 11:06 AM
bp-fcf08662-d714-4c51-86b6-347c30190829 8/29/19, 11:06 AM
bp-1c36c8d2-bcbf-4680-8eaf-05a8b0190829 8/29/19, 11:06 AM
bp-40a639e7-7052-4907-832c-5fe2e0190829 8/29/19, 11:06 AM
My STR (not sure if they're reliable) were:
(1) have a pending Nightly update, ready to install (green update arrow visible on hamburger menu)
(2) Start a separate session of Nightly, e.g. mkdir /tmp/foo; firefox -no-remote -profile /tmp/foo
(This triggers the update to happen underneath your existing session)
(3) Back in your main session of Firefox, Ctrl+N to open a new window.
For me, the New Window would open with the "Sorry, just one more thing we need to do" error page (indicating that it was needing to start a new content process + getting blocked from doing so due to the update that'd happened). And each time I did this (opened a new window & hit that page), I would end up with a new entry in about:crashes with this crash signature.
Comment 17•5 years ago
|
||
#1 crash for Linux for the August 28 Nightlies, with about 27% of all crashes.
Comment 18•5 years ago
|
||
Also the #4 crash on Window and #2 crash on OSX for those Nightlies.
Assignee | ||
Comment 19•5 years ago
|
||
dholbert: thank you for the info, that's very helpful. I suspect the problem is that the main process and content processes have slightly different ideas of what prefs are defined (due to them coming from different binaries with different prefs define) and this triggers the diagnostic assertion failure. I will investigate some more.
Assignee | ||
Comment 20•5 years ago
|
||
Because it's violated when updates occur, and when the violation occurs it's
safe to continue, for reasons explained in the patch. This should fix a top
crash.
Assignee | ||
Comment 21•5 years ago
|
||
Assuming I've correctly understood what's happening...
- I've written a patch that should fix the problem.
- The good news is that all it does is disable the diagnostic assertion. Which means that Beta and Release are unaffected by this.
- Diagnostic assertions do affect Dev Edition, which is built from mozilla-beta, so this will need backporting to mozilla-beta.
Comment 22•5 years ago
|
||
Pushed by nnethercote@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a4c348d43116 Remove a diagnostic assertion in InitStaticPrefsFromShared(). r=jya
Comment 23•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Assignee | ||
Comment 24•5 years ago
|
||
Here are the current stats for this crash over the last seven nightly builds:
- 20190829094151: 14
- 20190829214656: 10
- 20190830093857: 4
- 20190830215433: 0 [the fix landed in this build]
- 20190831095143: 22 [???]
- 20190831221004: 0
- 20190901094958: 0
It looks as expected except for the 22 crashes marked with "[???]". I don't know what to make of that. Nonetheless, my inclination is to wait a few days and see if the crash rate remains at 0 for subsequent builds.
Comment 25•5 years ago
|
||
bugherder uplift |
Assignee | ||
Comment 26•5 years ago
|
||
Things have improved, but crashes with this signature are still happening.
I see now there have been two types of crash happening with this signature. In the past week there have been 916 crashes.
- 313 of them involve the diagnostic assert, e.g. Windows, Linux. These ones stopped happening after this bug's patch landed, unsurprisingly.
- 603 of them do not involve the diagnostic assert, e.g. Windows, Linux, Mac. These have continued.
The non-diagnostic-assert ones have a crash address and crash reason field that is consistent with a diagnostic assert (e.g. 0x7ffd697135c7 and EXCEPTION_BREAKPOINT
on Windows; 0 and SIGSEGV
on Linux), but lack a "MOZ_CRASH Reason (Raw)" field. I can't see what code remaining within InitStaticPrefsFromShared()
could cause crashes like these.
Comment 27•5 years ago
|
||
Thanks Nick. I spun off Bug 1578430 to track the continued crashes that are happening on both 70 and now 71 nightly.
Updated•5 years ago
|
Assignee | ||
Comment 28•5 years ago
|
||
The crashes that dholbert experienced in comment 16 are interesting. I have annotated each one.
> bp-7a051745-85cb-4313-9ff7-575820190829 8/29/19, 11:08 AM DIAGNOSTIC_ASSERT
> bp-b76447a3-7154-4c95-aefe-351640190829 8/29/19, 11:08 AM DIAGNOSTIC_ASSERT
> bp-88958db6-1793-4091-a8ed-60c3e0190829 8/29/19, 11:08 AM DIAGNOSTIC_ASSERT
> bp-39fb396e-8dde-4d96-b00b-154f10190829 8/29/19, 11:06 AM Minimal info
> bp-c37f0dbd-30eb-4a23-b8ce-fe1b40190829 8/29/19, 11:06 AM Minimal info
> bp-260bdbbc-5da3-4d2f-871d-0062e0190829 8/29/19, 11:06 AM Minimal info
> bp-fcf08662-d714-4c51-86b6-347c30190829 8/29/19, 11:06 AM gfx crash, unrelated
> bp-1c36c8d2-bcbf-4680-8eaf-05a8b0190829 8/29/19, 11:06 AM DIAGNOSTIC_ASSERT
> bp-40a639e7-7052-4907-832c-5fe2e0190829 8/29/19, 11:06 AM DIAGNOSTIC_ASSERT
Five of them crashed at the diagnostic assert. But three of them have the "minimal info" form:
- The crash reason and address are the same as for the diagnostic assertion crash reports.
- They are missing some fields (Install age, Process type).
- They have some empty fields (Install time, Adapter Vendor ID, Adapter Device ID).
gsvelto, erahm suggested that these "minimal info" crash reports might be due to them lacking an "extra" file due to the crash happening very early during content process startup. Can you explain this a little, e.g. when can that happen, how crash reports are generated in that case, and if any of the fields might be considered unreliable? Thanks.
Comment 29•5 years ago
|
||
The "minimal info" crash reports are found when we periodically scan the Crash Reports/pending
folder and we find minidumps w/o an .extra file attached to them. All the metadata fields should be considered unreliable, I added code to synthesize the extra file from the current running version of Firefox but that might not be the one in which the crash happened (this is especially true for nightly).
Note that we noticed those orphaned minidumps precisely because of this bug. My guess is that they might have been leftovers from before I fixed the issue with .extra file generation in bug 1282776. The change that synthesizes the .extra file for those crashes happened later in bug 1566855. If current version of nightlies start generating more of those orphaned minidumps then we have a major problem with crash generation.
Comment 30•5 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #29)
the current running version of Firefox but that might not be the one in which the crash happened (this is especially true for nightly).
That might explain what happened when I tried investigating one of those crashes by disassembling the “matching” build: the reported offset from libxul wasn't in the right function (or even at an instruction boundary), but the offset from the function was at the right place inside a MOZ_CRASH
(writing the value of __LINE__
to address 0… but maybe not the right line).
Updated•5 years ago
|
Description
•