MOZ_CRASH("Unhandlable OOM while clearing document dependent slots.") [@ nsGlobalWindowInner::InitDocumentDependentState]
Categories
(Core :: DOM: Navigation, defect, P3)
Tracking
()
People
(Reporter: tsmith, Assigned: jjaschke)
References
(Blocks 1 open bug)
Details
(Keywords: crash, testcase, Whiteboard: [tbird crash])
Crash Data
Attachments
(2 files)
390 bytes,
text/html
|
Details | |
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-esr102+
|
Details | Review |
==50528==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x7f4152632b4d bp 0x7ffddf94b2d0 sp 0x7ffddf94b1e0 T0)
==50528==The signal is caused by a WRITE memory access.
==50528==Hint: address points to the zero page.
#0 0x7f4152632b4c in ~RefPtr /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:80:9
#1 0x7f4152632b4c in nsGlobalWindowInner::InitDocumentDependentState(JSContext*) /gecko/dom/base/nsGlobalWindowInner.cpp:1618:1
#2 0x7f415267da6c in nsGlobalWindowOuter::SetNewDocument(mozilla::dom::Document*, nsISupports*, bool, mozilla::dom::WindowGlobalChild*) /gecko/dom/base/nsGlobalWindowOuter.cpp:2381:23
#3 0x7f4157462e3b in nsDocumentViewer::InitInternal(nsIWidget*, nsISupports*, mozilla::dom::WindowGlobalChild*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, bool, bool, bool) /gecko/layout/base/nsDocumentViewer.cpp:961:22
#4 0x7f415746229a in nsDocumentViewer::Init(nsIWidget*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::dom::WindowGlobalChild*) /gecko/layout/base/nsDocumentViewer.cpp:750:10
#5 0x7f4159fe7cd5 in nsDocShell::SetupNewViewer(nsIContentViewer*, mozilla::dom::WindowGlobalChild*) /gecko/docshell/base/nsDocShell.cpp:7741:7
#6 0x7f4159fe67d8 in nsDocShell::Embed(nsIContentViewer*, mozilla::dom::WindowGlobalChild*) /gecko/docshell/base/nsDocShell.cpp:5332:17
#7 0x7f4159ff40ac in nsDocShell::CreateAboutBlankContentViewer(nsIPrincipal*, nsIPrincipal*, nsIContentSecurityPolicy*, nsIURI*, bool, bool, mozilla::dom::WindowGlobalChild*) /gecko/docshell/base/nsDocShell.cpp:6350:14
#8 0x7f4159fb3a66 in nsDocShell::EnsureContentViewer() /gecko/docshell/base/nsDocShell.cpp:6177:17
#9 0x7f4159fce247 in GetDocument /gecko/docshell/base/nsDocShell.cpp:2948:3
#10 0x7f4159fce247 in non-virtual thunk to nsDocShell::GetDocument() /gecko/docshell/base/nsDocShell.cpp
#11 0x7f41526b058d in nsPIDOMWindowOuter::MaybeCreateDoc() /gecko/dom/base/nsGlobalWindowOuter.cpp:7625:45
#12 0x7f415a1130c6 in GetDoc /builds/worker/workspace/obj-build/dist/include/nsPIDOMWindow.h:812:7
#13 0x7f415a1130c6 in mozilla::a11y::DocManager::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned int, nsresult) /gecko/accessible/base/DocManager.cpp:220:43
#14 0x7f41512875a0 in nsDocLoader::DoFireOnStateChange(nsIWebProgress*, nsIRequest*, int&, nsresult) /gecko/uriloader/base/nsDocLoader.cpp:1377:3
#15 0x7f4151285dfc in nsDocLoader::FireOnStateChange(nsIWebProgress*, nsIRequest*, int, nsresult) /gecko/uriloader/base/nsDocLoader.cpp:1340:14
#16 0x7f41512845f2 in nsDocLoader::OnStartRequest(nsIRequest*) /gecko/uriloader/base/nsDocLoader.cpp
#17 0x7f4151284b8c in non-virtual thunk to nsDocLoader::OnStartRequest(nsIRequest*) /gecko/uriloader/base/nsDocLoader.cpp
#18 0x7f414eb34b2e in mozilla::net::nsLoadGroup::AddRequest(nsIRequest*, nsISupports*) /gecko/netwerk/base/nsLoadGroup.cpp:484:22
#19 0x7f414eaf2feb in nsBaseChannel::AsyncOpen(nsIStreamListener*) /gecko/netwerk/base/nsBaseChannel.cpp:723:31
#20 0x7f41512954d8 in nsURILoader::OpenURI(nsIChannel*, unsigned int, nsIInterfaceRequestor*) /gecko/uriloader/base/nsURILoader.cpp:730:17
#21 0x7f415a0116d1 in nsDocShell::OpenInitializedChannel(nsIChannel*, nsIURILoader*, unsigned int) /gecko/docshell/base/nsDocShell.cpp:9795:20
#22 0x7f415a00abf5 in nsDocShell::DoURILoad(nsDocShellLoadState*, nsIDocShell**, nsIRequest**) /gecko/docshell/base/nsDocShell.cpp:9610:8
#23 0x7f4159f823cb in nsDocShell::InternalLoad(nsDocShellLoadState*, nsIDocShell**, nsIRequest**) /gecko/docshell/base/nsDocShell.cpp:8807:8
#24 0x7f4159fb57d1 in nsDocShell::LoadURI(nsDocShellLoadState*, bool) /gecko/docshell/base/nsDocShell.cpp:808:10
#25 0x7f4152b0e364 in nsFrameLoader::ReallyStartLoadingInternal() /gecko/dom/base/nsFrameLoader.cpp:684:23
#26 0x7f4152b0d76e in nsFrameLoader::ReallyStartLoading() /gecko/dom/base/nsFrameLoader.cpp:555:17
#27 0x7f4152864276 in mozilla::dom::Document::MaybeInitializeFinalizeFrameLoaders() /gecko/dom/base/Document.cpp:8503:13
#28 0x7f41529235e4 in applyImpl<mozilla::dom::Document, void (mozilla::dom::Document::*)()> /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1185:12
#29 0x7f41529235e4 in apply<mozilla::dom::Document, void (mozilla::dom::Document::*)()> /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1191:12
#30 0x7f41529235e4 in mozilla::detail::RunnableMethodImpl<mozilla::dom::Document*, void (mozilla::dom::Document::*)(), true, (mozilla::RunnableKind)0>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1237:13
#31 0x7f415259e703 in nsContentUtils::RemoveScriptBlocker() /gecko/dom/base/nsContentUtils.cpp:5344:15
#32 0x7f4152854792 in mozilla::dom::Document::EndUpdate() /gecko/dom/base/Document.cpp:7090:3
#33 0x7f415250e3b6 in mozAutoDocUpdate::~mozAutoDocUpdate() /gecko/dom/base/mozAutoDocUpdate.h:34:18
#34 0x7f4152b4d9bf in nsINode::ReplaceOrInsertBefore(bool, nsINode*, nsINode*, mozilla::ErrorResult&) /gecko/dom/base/nsINode.cpp:2698:1
#35 0x7f415328d2e0 in InsertBefore /gecko/dom/base/nsINode.h:1971:12
#36 0x7f415328d2e0 in AppendChild /gecko/dom/base/nsINode.h:1974:12
#37 0x7f415328d2e0 in mozilla::dom::Node_Binding::appendChild(JSContext*, JS::Handle<JSObject*>, void*, JSJitMethodCallArgs const&) /builds/worker/workspace/obj-build/dom/bindings/NodeBinding.cpp:989:60
#38 0x7f40c506e552 (<unknown module>)
Reporter | ||
Comment 1•5 years ago
|
||
A Pernosco session is available here: https://pernos.co/debug/RaWTyMDDNECMEIZiQ44pPA/index.html
Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Updated•4 years ago
|
Comment 2•4 years ago
|
||
bp-caeeeef2-d80c-4255-947e-6ebc50201229 Richard's crash (one of many)
But perhaps more interesting is Firefox's significant increase in crash rate in late October, in the range of 82.0 - 82.0.2
https://crash-stats.mozilla.org/signature/?signature=nsGlobalWindowInner%3A%3AInitDocumentDependentState&date=%3E%3D2020-10-01T13%3A08%3A00.000Z&date=%3C2020-11-01T13%3A08%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#graphs (more apparent in this longer term graph https://crash-stats.mozilla.org/signature/?signature=nsGlobalWindowInner%3A%3AInitDocumentDependentState&date=%3E%3D2020-08-18T13%3A15%3A00.000Z&date=%3C2021-02-18T13%3A15%3A00.000Z#graphs )
82.0.2 windows bp-cb10864e-e5ef-4f41-945b-2eaf40201101
82.0.2 Mac bp-d82301a8-e6c6-4a73-a485-fd3340201101
86.0rc1 bp-223785f5-e20e-459f-a927-309e30210218
Comment 3•4 years ago
|
||
Nika is most familiar with this part of the code, so I'll defer to Nika to review these Thunderbird crashes.
Comment 4•4 years ago
|
||
Very odd that I just crashed firefox bp-15215314-3c3e-4f0a-86e2-dfb550210219. I was AFK, so I have no idea why it crashed. The tab was https://mcafee-total-protection-mtp10us.en.softonic.com/windows?utm_source=wakeup and I had restarted Firefox 10 hours before, so not open very long. My average crash rate is <1 per month, so I don't have a crashy system
Comment 5•4 years ago
|
||
(In reply to Tyson Smith [:tsmith] from comment #1)
A Pernosco session is available here: https://pernos.co/debug/RaWTyMDDNECMEIZiQ44pPA/index.html
The crash recorded in this pernosco trace is an over-recursion error which happened to occur within InitDocumentDependentState
. Looking at the pernosco trace, it appears as though the JS code is repeatedly calling o.parentNode.removeNode(o)
from within a DOMNodeRemoved
mutation event listener until it fails to execute the callbacks, perhaps due to the recursion limit, and then calling window.open()
which eventually tries to create a new document, and ends up crashing as it cannot create the Document
object prototype with this deep of a native stack.
It's unlikely that the code in the pernosco trace is exactly the same as the code from crashes in the wild, but it may be the case that the crashes are caused by overrecursion errors in the wild being caught.
It may help diagnose issues like this in the future to include information about the depth of the main thread native stack in crash metadata, and note when the stack is close to tripping js overrecursion protection.
Comment 6•4 years ago
|
||
¡Hola y'all!
My significant other Firefox crashed like https://crash-stats.mozilla.org/report/index/f9b6148e-9a43-4580-9a5f-e9a8a0210605
Updating flags per https://crash-stats.mozilla.org/signature/?product=Firefox&signature=nsGlobalWindowInner%3A%3AInitDocumentDependentState FWIW
¡Gracias!
Alex
Comment 7•4 years ago
|
||
Fairly high crash rate; ~1100/week. Nika's analysis is likely correct, and I suspect as she indicates that this scenario or one very like it is happening in the wild (including to me on a nightly)
![]() |
||
Comment 8•3 years ago
|
||
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #7)
Fairly high crash rate; ~1100/week. Nika's analysis is likely correct, and I suspect as she indicates that this scenario or one very like it is happening in the wild (including to me on a nightly)
FWIW, crash ping telemetry shows this to be the #7 top crash in content in 92.
Updated•3 years ago
|
Comment 9•3 years ago
|
||
The crash volume is still concerning.
Andreas, can you please put this into your queue? I think we'd revisit this a bit and figure out a plan for it.
Comment 10•3 years ago
|
||
Note that the crash from the testcase looks to be caused by a stack overflow, but there might actually be other crashes here with the same top-level signature but with a different cause mixed in.
Comment 11•3 years ago
|
||
With the testcase and Nika's finding we can see that one reason for this could be overrecursion. But looking at bunch of callstacks on crash-stats it doesn't seem to be the whole story. So this bug probably captures way to many OOM crashes to be just one thing. And overrecursion is possibly something we can deal with, but in the face of OOM it is a bit hard to do something else than crashing.
So I wonder if this should really be an S3 as it was to begin with. Otherwise we probably need a better way to distinguish what's actually causing the OOM.
Tentatively setting this to S3 again, that ok Hsin-Yi?
Comment 12•3 years ago
|
||
(In reply to Andreas Farre [:farre] [Out until 2022-10-16] from comment #11)
With the testcase and Nika's finding we can see that one reason for this could be overrecursion. But looking at bunch of callstacks on crash-stats it doesn't seem to be the whole story. So this bug probably captures way to many OOM crashes to be just one thing. And overrecursion is possibly something we can deal with, but in the face of OOM it is a bit hard to do something else than crashing.
So I wonder if this should really be an S3 as it was to begin with. Otherwise we probably need a better way to distinguish what's actually causing the OOM.
Tentatively setting this to S3 again, that ok Hsin-Yi?
No objection, thanks for taking another look!
Comment 13•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on release (startup)
- Top 10 content process crashes on beta
- Top 10 content process crashes on release
:hsivonen, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 14•2 years ago
|
||
Nika, we check for the recursion limit a couple of stack frames earlier: https://searchfox.org/mozilla-central/rev/6f9fdc5c3b164a46004c98a5baaf55b05e2ad329/dom/base/nsGlobalWindowOuter.cpp#2118
Could we extend the check to take some kind of "this close or closer to the limit" parameter and then size the parameter to cover the depth that we need for this stuff to succeed?
Comment 15•2 years ago
|
||
That might work as a potential approach, with some allowance for how conservative we need to be. From the pernosco trace in comment 1, it looks like we'd need at least ~20k bytes of leeway (probably more) to avoid the assertion check which failed. The current check appears to only check for ~8k bytes of leeway on top of JS::StackForUntrustedScript
(https://searchfox.org/mozilla-central/rev/b1e5f2c7c96be36974262551978d54f457db2cae/js/public/friend/StackLimits.h#309).
Comment 16•2 years ago
|
||
Thanks. Considering the top crash nature (according to the bot) and the existence of a path forward, let's triage this as S2.
Comment 17•2 years ago
|
||
Comment 18•2 years ago
|
||
Let's see if we have tests that recurse so close to the limit that a larger gap between the point of check and the end of the stack would make them fail:
https://treeherder.mozilla.org/jobs?repo=try&revision=5f62602828246cda34f4b9c62217442ac97eced6
Comment 19•2 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Comment 20•2 years ago
|
||
(In reply to Release mgmt bot [:suhaib / :marco/ :calixte] from comment #19)
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
This might be caused by whatever made the testcase not trigger the stack limit for me locally.
Updated•2 years ago
|
Comment 21•2 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Comment 22•2 years ago
|
||
I don't think we really need bug 1593704, bug 1405521, this one, and who knows how many others for the same basic issue. This one at least has a patch that is in progress, at least.
Comment 23•2 years ago
|
||
Hsin-Yi, do you think there's somebody who could work on this patch while Henri is out? The basic idea looks simple, and the patch looked mostly finished. It would be good if we could fail in a gentler way. This issue seems fairly common.
Comment 24•2 years ago
|
||
Thank you for bringing it up, Andrew. Jan is willing to take a look at where Henri left and resume the patch landing.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 25•2 years ago
|
||
Comment 26•2 years ago
|
||
bugherder |
Updated•2 years ago
|
Comment 27•2 years ago
|
||
The patch landed in nightly and beta is affected.
:jjaschke, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox112
towontfix
.
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 28•2 years ago
|
||
Crashes mostly happen on older versions (102 esr and such). I don't think an uplift is necessary.
Comment 29•2 years ago
|
||
Given the crash rate on ESR, I think this might be worth an uplift to that branch still. What do you think, Jan?
Assignee | ||
Comment 30•2 years ago
|
||
Fine by me. (I'm on PTO, would you mind taking care of that?)
Comment 31•2 years ago
|
||
Comment on attachment 9297774 [details]
Bug 1645865 - Increase the distance to stack size limit when setting new document on a window. r=nika,tcampbell
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: crashing fairly often on ESR
- User impact if declined: some crashes
- Fix Landed on Version: 113
- Risk to taking this patch: Medium
- Why is the change risky/not risky? (and alternatives if risky): The main risk is that we will fail earlier in some cases that would have been fine, causing some pages to fail to load or whatever.
Updated•2 years ago
|
Comment 32•2 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #31)
- Why is the change risky/not risky? (and alternatives if risky): The main risk is that we will fail earlier in some cases that would have been fine, causing some pages to fail to load or whatever.
This patch will shift crashes like the one in the bug, where we randomly fail in a weird way like in a crash, to a more controlled failure where some JS throws an exception. So the more controlled failure is probably better, but we're going to do it a bit more often, I'd expect. I don't think we've seen any reports of issues, though, but it would probably be subtle.
Comment 33•2 years ago
|
||
Comment on attachment 9297774 [details]
Bug 1645865 - Increase the distance to stack size limit when setting new document on a window. r=nika,tcampbell
Approved for 102.11esr.
Comment 34•2 years ago
|
||
bugherder uplift |
Description
•