Crash in nsGlobalWindow::ClearDocumentDependentSlots: MOZ_CRASH(Unhandlable OOM while clearing document dependent slots.)
Categories
(Core :: DOM: Core & HTML, defect, P3)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr102 | --- | wontfix |
| firefox-esr115 | --- | affected |
| firefox57 | --- | wontfix |
| firefox58 | --- | wontfix |
| firefox59 | --- | wontfix |
| firefox60 | --- | wontfix |
| firefox92 | --- | wontfix |
| firefox93 | --- | wontfix |
| firefox105 | --- | wontfix |
| firefox106 | --- | wontfix |
| firefox107 | --- | wontfix |
| firefox125 | --- | affected |
| firefox126 | --- | affected |
| firefox127 | --- | affected |
People
(Reporter: n.nethercote, Unassigned)
References
Details
(Keywords: crash, topcrash)
Crash Data
Comment 1•8 years ago
|
||
| Reporter | ||
Comment 2•8 years ago
|
||
Comment 3•8 years ago
|
||
Comment 4•8 years ago
|
||
Updated•8 years ago
|
Comment 5•8 years ago
|
||
Updated•8 years ago
|
Comment 7•8 years ago
|
||
Updated•8 years ago
|
Comment 8•7 years ago
|
||
Comment 9•7 years ago
|
||
Comment 10•7 years ago
|
||
| Assignee | ||
Updated•6 years ago
|
Comment 11•6 years ago
|
||
While trying to get an rr trace for bug 1593704 I kept triggering this issue. I created a Pernosco session which can be found here: https://pernos.co/debug/HeNG0Imk-tsJryLVWyqD2g/index.html
Updated•5 years ago
|
Comment 13•5 years ago
|
||
(In reply to Tyson Smith [:tsmith] from comment #11)
While trying to get an rr trace for bug 1593704 I kept triggering this issue. I created a Pernosco session which can be found here: https://pernos.co/debug/HeNG0Imk-tsJryLVWyqD2g/index.html
I happend to resurrect the pernosco trace. It seems that GetOrCreateDOMReflector returns false (I added an entry to the notebook there). Due to massive inlining it is less clear (to me), if this can really only be caused by OOM.
Comment 14•5 years ago
|
||
Here is a Pernosco session created with a -O0 build hopefully this is more helpful. https://pernos.co/debug/wo7vFFam6FDy7kYhiopX5g/index.html
Comment 15•5 years ago
|
||
Thanks a lot! That makes it easier. It seems, we are definitely not seeing an OOM here.
The low-level analysis is, that we rely on that call stack on GetWrapperMaybeDead to give us always a living wrapper, which is not the case. I was not yet able to check, what might cause the wrapper to be "dead and in the process of being finalized." as the comment points out to be a possible cause for being nullptr.
Olli, does this help to understand this better?
Comment 16•5 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #15)
I was not yet able to check, what might cause the wrapper to be "dead and in the process of being finalized."
This means that the GC has determined that the wrapper is dead, but the wrapper has not been destroyed yet (and pointer to it has not been set to null).
Comment 17•5 years ago
•
|
||
The stack shows an interesting cycle of:
XMLHttpRequest_Binding::open
...
XMLHttpRequestMainThread::FireReadystatechangeEvent
...
js::RunScript
...
XMLHttpRequest_Binding::send
...
XMLHttpRequestMainThread::ResumeEventDispatching
EventTarget::DispatchEvent
...
js::RunScript
...
XMLHttpRequest_Binding::open
over and over again. So sync XHR that triggers sync XHR etc.
Eventually we are way down into that stack, in danger of hitting JS engine stack-overflow checks, and processing events under a sync XHR. We land in nsDocumentOpenInfo::OnStartRequest and go from there. We try to create a wrapper for the document, try to create its proto, try to define properties on it, hit the over-recursion check in CallJSAddPropertyOp and fail it, fail to add the property and bubble up the stack failing things.
I added some notes to the Pernosco session for these bits.
Updated•4 years ago
|
Comment 18•3 years ago
|
||
We had a report on webcompat with regards to this website
https://www.ensonhaber.com/
https://crash-stats.mozilla.org/report/index/5d90800a-fe28-4785-833e-dd8b60220302#tab-bugzilla
https://github.com/webcompat/web-bugs/issues/100455
If it helps.
Comment 19•3 years ago
|
||
(In reply to Karl Dubostđź’ˇ :karlcow from comment #18)
We had a report on webcompat with regards to this website
This crash is just a symptom of running out of memory. What is more interesting is what is causing the browser to use a lot of memory. You'll want a new bug for that.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 20•3 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on beta
For more information, please visit auto_nag documentation.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 21•3 years ago
|
||
As of comment 19 I'd assume this to be not worth S2 ?
Comment 22•3 years ago
|
||
Comment 19 was mostly relevant to a specific site that was maybe showing this issue.
It is a reasonably common crash, so it might qualify as S2, but we have no real plan of action here. Boris landed a ton of instrumentation in 2018 to try to figure out why this is happening, but he didn't post any kind of conclusion in these bugs as far as I can see, so I guess nothing useful resulted from it.
Although looking at this now, it is possible that bug 1543537 helped here. Weird null derefs in documents are a possible symptom of that issue. I fixed it in 107, and the volume in 107 beta seems to be a lot lower than 106 beta (I think comment 20 was made when 106 was in beta). Maybe we can wait a few weeks and see if the volume on 107 continues to be low, then we could remove the top crash and mark it S3.
Comment 23•3 years ago
|
||
This got frequent with 109.0a1 20221129084032: 25-70 crashes per Nightly build. There are no crash reports for the latest Nightly (20221030214707?) so far. Push log lists nothing obvious.
Comment 24•3 years ago
|
||
The bug is linked to topcrash signatures, which match the following criteria:
- Top 5 desktop browser crashes on Mac on beta
- Top 5 desktop browser crashes on Mac on release
- Top 20 desktop browser crashes on release (startup)
- Top 20 desktop browser crashes on beta
- Top 10 desktop browser crashes on nightly
- Top 10 content process crashes on beta
- Top 10 content process crashes on release
- Top 5 desktop browser crashes on Linux on beta
- Top 5 desktop browser crashes on Linux on release
- Top 5 desktop browser crashes on Windows on release (startup)
For more information, please visit auto_nag documentation.
Comment 25•3 years ago
|
||
Hello Andrew, we got new reports and this is a current topcrash-startup. Would you please take another look? Thanks.
Comment 26•3 years ago
|
||
The set of patches I'm seeing for the build that started crashing a lot is this. That includes bug 1219128, which has already had multiple OOM issues in automation associated with it already. I think we should back that patch out if the memory issues can't be resolved very quickly.
Comment 27•3 years ago
|
||
Hmm I guess it got backed out immediately, but still got marked fixed somehow, so I guess that can't be to blame?
Updated•3 years ago
|
Comment 28•3 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #26)
The set of patches I'm seeing for the build that started crashing a lot is this. That includes bug 1219128, which has already had multiple OOM issues in automation associated with it already. I think we should back that patch out if the memory issues can't be resolved very quickly.
The OOM issues related to this patch are strictly only caused by the test suite configured to be greedy at finding OOM issues by having a loop simulating OOM, and the associated backout is caused by failing to annotate the test cases that such tests cases were instrumented as such. None of these OOM are caused by the system running out of memory.
Otherwise Bug 1219128 only changes how Object and Function classes are registered in the GlobalObject, by allocating them eagerly, which would most likely be present anyway if the global is not unused.
Comment 29•3 years ago
|
||
I filed a new bug for this recent spike as it seems to involve a bunch of unrelated signatures.
Comment 30•3 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Comment 31•3 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on release (startup)
- Top 10 content process crashes on release
For more information, please visit auto_nag documentation.
Comment 32•3 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Comment 33•2 years ago
|
||
This is a symptom of an OOM. There has been extensive investigation that hasn't found anything.
Updated•2 years ago
|
Comment 34•2 years ago
|
||
I don't know if it helps. But i recently stumbled over this a few times. A website that quite produced this error on me is https://www.welt.de/
It's a Newswebsite. If you let it "idle" for a while the website will notify you that there have been news updates.. First i though that this trigger the crash, however it doesn't. so just don'T touch the website and let it stay for a while longer most of time it took around 1hour till the tab crashes (though sometimes it doesn't happen at all). What's pretty inetresting to me is that before it crashes i'll get spamed with "save file to" with a lot of empty .html files (don't even know why that happens). After you saved them all or "cancle" the download you'll then realize that the tab has crashed. It also once gave me an Memory Expection Read as well with the exact same behaviour.
Comment 35•2 years ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on release (startup)
- Top 10 content process crashes on release
For more information, please visit BugBot documentation.
Comment 36•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
Comment 37•2 years ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on release (startup)
- Top 10 content process crashes on release
For more information, please visit BugBot documentation.
Comment 38•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
Comment 39•2 years ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on release (startup)
- Top 10 content process crashes on release
For more information, please visit BugBot documentation.
Comment 40•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
Comment 41•2 years ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on release (startup)
- Top 10 content process crashes on release
For more information, please visit BugBot documentation.
Comment 42•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
Comment 43•1 year ago
|
||
Just had a (Google Photos) tab crash with FF 125.0.3 pointing to this signature.
I'd imagine the Mozilla crash report has all the info you need, but I got a coredump I'd be happy provide a gdb backtrace from if wanted.
Updated•1 year ago
|
Comment 44•1 year ago
|
||
Got this just now on latst nightly on my Win11x64 machine with 16gb RAM: https://crash-stats.mozilla.org/report/index/fc8f4c99-f96b-495c-a9ae-bad790250212#tab-bugzilla
Comment 45•1 year ago
|
||
(In reply to Mayank Bansal from comment #44)
Got this just now on latst nightly on my Win11x64 machine with 16gb RAM: https://crash-stats.mozilla.org/report/index/fc8f4c99-f96b-495c-a9ae-bad790250212#tab-bugzilla
So the assertion claims there being a JS OOM and indeed the (protected) annotations show JSOutOfMemory: Reported. We come through nsGlobalWindowOuter::SetNewDocument which probably would have no good way to react on OOM if we passed the error down, neither ?
Note that a JS OOM does not necessarily mean that your system was completely out of memory.
Comment 48•6 months ago
|
||
Bug 1980016 does have a Pernosco session if that is helpful.
Comment 49•5 months ago
|
||
FYI this is an extremely high-volume crash. According to crash telemetry this is the highest volume content process crash excluding explicit OOMs and bad hardware. From the looks of the data in Socorro this doesn't even look like a genuine OOM as the affected machines have plenty of memory available. Are we consuming other resources? Can we gather some additional information to diagnose it?
Comment 50•5 months ago
|
||
Bug 1980016 comment 4 and this bug's comment 45 seem to match. It feels slightly surprising that the two functions named ClearCachedDocumentValue and ClearCachedPerformanceValue would cause this such often, but IIUC we just lazily create the document inside the first call to getDocument which happens to be inside ClearCachedDocumentValue.
From bug 1980016 comment 4:
Is it possible that the DOM is holding things alive that it needn't at this point? Maybe we need to do a cycle collection when we do a last ditch GC too.
sounds like something to consider?
Comment 51•5 months ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #49)
Can we gather some additional information to diagnose it?
Boris spent like 3 months investigating this crash with adding lots of instrumentation (starting with bug 1491313 and ending with removing it in bug 1491925) so I'm not sure how fruitful that will be.
Comment 52•5 months ago
•
|
||
CC would likely use so much memory in last-ditch case that we'd crash then when running it.
Or if it is JS engine memory which is we're running out of, then CC might help. Not sure when to running it though.
Comment 53•5 months ago
•
|
||
As of the pernosco session it looks like we hit the JS memory limit (where gcMaxBytes_.value=4294967295) and then GCRuntime::attemptLastDitchGC is what we try currently, and it seems to be only ever called here in RetryTenuredAlloc.
Not sure if that would be the best moment to CC, but it looks like a last possible attempt one could make easily?
(In reply to Andrew McCreight [:mccr8] from comment #51)
Boris spent like 3 months investigating this crash with adding lots of instrumentation (starting with bug 1491313 and ending with removing it in bug 1491925) so I'm not sure how fruitful that will be.
Well, that was pre-pernosco, AFAICT. The pernosco session might also give some hint on what is actually holding that JS memory, but I don't know how to inspect that easily (or how representative that would be for all the crashes).
Comment 54•4 months ago
|
||
Is https://crash-stats.mozilla.org/report/index/bca4ce60-6393-4a99-b10f-2d04e0250827#tab-details an example? It shows plenty of virtual memory available. The user writes "i like to stream/listen to smooth jazz all day and night this happens EVERY day i have to restart my laptop every day the other streaming outlets crash 2- 3 times as much." (strangely, the crash report does not show the CPU INFO)
Comment 55•4 months ago
|
||
It shows plenty of virtual memory available.
does not matter. What we hit here is our fix JS memory limit (where gcMaxBytes_.value=4294967295). If that user found a reliable way to leak JS memory and to hit that 4GB limit, it could be interesting to learn more about what they do exactly, but pretty much every website can leak memory without us being able to do much.
Comment 56•4 months ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #54)
Is https://crash-stats.mozilla.org/report/index/bca4ce60-6393-4a99-b10f-2d04e0250827#tab-details an example? It shows plenty of virtual memory available. The user writes "i like to stream/listen to smooth jazz all day and night this happens EVERY day i have to restart my laptop every day the other streaming outlets crash 2- 3 times as much." (strangely, the crash report does not show the CPU INFO)
The missing CPU info is due to the fact that this is a 64-bit ARM machine and we don't print out the CPU info for those.
Description
•