Crash in [@ mozilla::dom::Promise::CreateInfallible | ...] on YouTube live streams
Categories
(Core :: DOM: Streams, defect, P3)
Tracking
()
People
(Reporter: mccr8, Assigned: saschanaz)
References
(Blocks 1 open bug, )
Details
(Keywords: crash)
Crash Data
Attachments
(3 files)
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-release+
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/fcdcba8f-67b1-4143-9f5c-d6ca60241009
MOZ_CRASH Reason:
MOZ_CRASH(Out of memory)
Top 10 frames:
0 xul.dll mozilla::dom::Promise::CreateInfallible(nsIGlobalObject*, mozilla::dom::Promi... dom/promise/Promise.cpp:112
1 xul.dll mozilla::dom::InputToReadableStreamAlgorithms::PullCallbackImpl(JSContext*, m... dom/streams/UnderlyingSourceCallbackHelpers.cpp:256
2 xul.dll mozilla::dom::NonAsyncInputToReadableStreamAlgorithms::PullCallbackImpl(JSCon... dom/streams/UnderlyingSourceCallbackHelpers.cpp:581
3 xul.dll mozilla::dom::UnderlyingSourceAlgorithmsWrapper::PullCallback::<lambda_8>::op... dom/streams/UnderlyingSourceCallbackHelpers.cpp:139
3 xul.dll mozilla::dom::PromisifyAlgorithm(nsIGlobalObject*, mozilla::dom::UnderlyingSo... dom/streams/StreamUtils.h:33
3 xul.dll mozilla::dom::UnderlyingSourceAlgorithmsWrapper::PullCallback(JSContext*, moz... dom/streams/UnderlyingSourceCallbackHelpers.cpp:136
4 xul.dll mozilla::dom::streams_abstract::ReadableByteStreamControllerCallPullIfNeeded(... dom/streams/ReadableByteStreamController.cpp:622
5 xul.dll mozilla::dom::streams_abstract::ReadableByteStreamControllerCallPullIfNeeded:... dom/streams/ReadableByteStreamController.cpp:640
5 xul.dll mozilla::dom::Promise::AddCallbacksWithCycleCollectedArgs<`lambda at /builds/... dom/promise/Promise-inl.h:320
5 xul.dll mozilla::dom::(anonymous namespace)::NativeThenHandler<`lambda at /builds/wor... dom/promise/Promise-inl.h:206
The spike detector reported this. This is an OOM. I don't know if we can do anything about this.
| Reporter | ||
Comment 1•1 year ago
|
||
The available memory and page file on these crashes doesn't seem super high so maybe this isn't really an OOM?
| Assignee | ||
Comment 2•1 year ago
|
||
For some reason JS::NewPromiseObject failed, which we consider as OOM and crash here. Looks like the failure can happen via access denied error though.
Interestingly all related crash reports are from streams even though other users exist. (Arguably Streams code is the main user of this)
Updated•1 year ago
|
| Assignee | ||
Comment 3•1 year ago
|
||
Crash report spike seen in all channels, maybe not a regression but certain web service is hitting the crash trigger?
Comment 4•1 year ago
|
||
(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #3)
Crash report spike seen in all channels, maybe not a regression but certain web service is hitting the crash trigger?
Yeah, it is our #7 top content crash currently. I marked it S3 for now, but if we understand how it affects people / which sites we might need to raise it to S2. Not sure if you can do anything pro-actively here?
| Assignee | ||
Comment 5•1 year ago
|
||
65.85 % of reports are with SetUpReadableByteStreamController and it seems the stack frequently involves FetchBody::GetBody? (Not in comment #0 though)
| Reporter | ||
Comment 6•1 year ago
|
||
According to the numerous comments, this is frequently happening on YouTube live streams.
[Tracking Requested - why for this release]: Frequent crash when viewing YouTube live streams. This appears to be due to a site change, as it is occurring on all branches.
| Reporter | ||
Comment 7•1 year ago
|
||
There's a bunch of other signatures with CreateInfallible in them. I don't know if they are all related or not. Fortunately their volume is much lower. I'll add one that had a comment about YouTube live streams.
| Reporter | ||
Comment 9•1 year ago
|
||
It doesn't look limited to any particular platform, though the volume is much lower on Android.
Updated•1 year ago
|
| Reporter | ||
Comment 10•1 year ago
|
||
Some comments from the crashes that have interesting details:
I was watching YouTube creator livestream and they changed their setting to remove DVR playback. Did that cause YT to crash on me?
It looks like chat was too long and caused Firefox to crash? Or perhaps it was YT trying to put in an advert?
Streaming a youtube live stream [...] with 100k+ viewers.
| Reporter | ||
Comment 11•1 year ago
|
||
One commenter was having crashes on this specific live stream which appears to have been live for more than 2 years so that would be a good one to use for testing. I'm not hitting problems with it myself.
More comments:
same tab, always just the same tab (they linked to the video I mentioned above)
adding videos to queue in YouTube
The crashes are happening with really active chats on YouTube livestreams. It's not happening on Chrome.
Some live youtube videos have been just cutting out and freezing the last 3 days. other live ones work fine.
Comment 13•1 year ago
|
||
Or maybe not. Looking at reports.
Updated•1 year ago
|
Comment 14•1 year ago
|
||
(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #2)
For some reason JS::NewPromiseObject failed, which we consider as OOM and crash here. Looks like the failure can happen via access denied error though.
It does look like you can't throw access denied on this path however; CreateWrapper calls JS::NewPromiseObject, which calls js::Promise::createSkippingExecutor, and so we never traverse that path.
Comment 15•1 year ago
|
||
The bug is marked as tracked for firefox131 (release), tracked for firefox132 (beta) and tracked for firefox133 (nightly). We have limited time to fix this, the soft freeze is in 10 days. However, the bug still isn't assigned.
:jstutte, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
| Assignee | ||
Comment 16•1 year ago
|
||
Per comment #11 being resource consuming, can this be an actual OOM? Maybe something in YT made it consume more memory in general and thus causing more OOM? Not sure I have any plausible explanation here than that. It's not very actionable right now. Inclining to add stalled keyword.
(Didn't have luck with that video either, but maybe my machine has too much memory.)
Comment 17•1 year ago
|
||
Have we tried reaching out on the youtube-discuss list regarding this spike?
| Reporter | ||
Comment 18•1 year ago
|
||
I hit this crash myself just now, on Nightly. I had a YouTube live stream open for a few hours. The crash report says I have almost 6GB of memory free. That was on MacOS.
Comment 19•1 year ago
|
||
Arai: Can you think of any other ways we could be failing to create a promise object?
Comment 20•1 year ago
|
||
If JS::NewPromiseObject is called with executor == nullptr, almost all failure cases are just OOM, except for the string handling inside JS::CaptureCurrentStack, where it walks over the stack frames and atomize some UTF-8 strings, in the following places. If the input string is malformed UTF-8, the atomization fails
Rooted<JSAtom*> causeAtom(cx, AtomizeUTF8Chars(cx, cause, strlen(cause)));
cause comes from the following, and I assume it's guaranteed to be valid UTF-8
JS::AutoSetAsyncStackForNewCalls sas(
cx, asyncStackObj, utf8Cause.get(),
The other is the location string.
It's less likely a invalid UTF-8 string flows into filename tho, at least I don't know about wasm's case.
bool SavedStacks::getLocation(JSContext* cx, const FrameIter& iter,
MutableHandle<LocationValue> locationp) {
...
if (iter.isWasm()) {
...
if (const char16_t* displayURL = iter.displayURL()) {
...
} else {
const char* filename = iter.filename() ? iter.filename() : "";
locationp.setSource(AtomizeUTF8Chars(cx, filename, strlen(filename)));
...
return true;
}
...
if (const char16_t* displayURL = iter.displayURL()) {
...
} else {
const char* filename = script->filename() ? script->filename() : "";
source = AtomizeUTF8Chars(cx, filename, strlen(filename));
If this crash can somehow be reliably reproduced, it would be nice to test with flipping javascript.options.asyncstack pref to false, so that the above stack capturing code isn't executed.
| Assignee | ||
Comment 21•1 year ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #17)
Have we tried reaching out on the youtube-discuss list regarding this spike?
Good point, I just posted there.
Comment 22•1 year ago
|
||
Actually, the above stack capturing is enabled only when DevTools is opened (or somehow the global becomes debuggee), as long as javascript.options.asyncstack_capture_debuggee_only pref is set to true, which is the default value.
If the pref value is set to false for some reason, or if DevTools is opened, the codepath affects.
Comment 23•1 year ago
•
|
||
(In reply to Tooru Fujisawa [:arai] from comment #22)
Actually, the above stack capturing is enabled only when DevTools is opened (or somehow the global becomes debuggee), as long as javascript.options.asyncstack_capture_debuggee_only pref is set to true, which is the default value.
If the pref value is set to false for some reason, or if DevTools is opened, the codepath affects.
Does that mean, this is not impacting users while browsing normally with default settings ? It seems to happen a bit often, though?
Comment 24•1 year ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #23)
Does that mean, this is not impacting users while browsing normally with default settings ? It seems to happen a bit often, though?
Yeah, the stack trace part won't affect users with default settings as long as they don't have DevTools or Browser Toolbox opened.
The other OOM path in the Promise allocation can still happen in any case.
Comment 25•1 year ago
|
||
This is a reminder regarding comment #15!
The bug is marked as tracked for firefox131 (release), tracked for firefox132 (beta) and tracked for firefox133 (nightly). We have limited time to fix this, the soft freeze is in 3 days. However, the bug still isn't assigned.
| Assignee | ||
Comment 26•1 year ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #24)
Yeah, the stack trace part won't affect users with default settings as long as they don't have DevTools or Browser Toolbox opened.
The other OOM path in the Promise allocation can still happen in any case.
Do we know there's any other not-really-oom path like comment #20? 👀
Updated•1 year ago
|
| Assignee | ||
Comment 27•1 year ago
|
||
We throw away potential JS exception here, could there be some exception we could use for debugging?
Comment 28•1 year ago
|
||
(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #26)
Do we know there's any other not-really-oom path like comment #20? 👀
Other than the stack capturing, there are only the following:
- resolve promise prototype/constructor
- allocate a promise object itself
The prototype/constructor handling happens only once, at the first access.
So it's less likely the case unless this crash happens immediately after loading the page.
The operation does allocating objects and defining properties, but there's nothing special, compared to other prototypes.
The promise object allocation is regular object allocation, and here too nothing special happens.
So, if this crash happens without DevTools opened and happens after some amount of time elapsed from pageload, it's likely an OOM, or something goes wrong inside allocator.
fwiw, here's the call graph for fallible cases in JS::NewPromiseObject, called from Promise::CreateInfallible, if the DevTools is closed:
Promise::CreateInfallible
Promise::CreateWrapper
JS::NewPromiseObject # executor = nullptr
PromiseObject::createSkippingExecutor
CreatePromiseObjectWithoutResolutionFunctions
CreatePromiseObjectInternal # proto = nullptr
# protoIsWrapped = false
# informDebugger = true
NewObjectWithClassProto<PromiseObject> # proto = nullptr
NewObjectWithClassProto # clasp = PromiseObject::class_
# proto = nullptr
# newKind = GenericObject
# objFlags = ObjectFlags()
NewObjectWithClassProto # clasp = PromiseObject::class_
# protoArg = nullptr
# allocKind = OBJECT4
# newKind = GenericObject
# objFlags = ObjectFlags()
GlobalObject::getOrCreatePrototype # key = JSProto_Promise
GlobalObject::ensureConstructor # global = ...
# key = JSProto_Promise
GlobalObject::resolveConstructor # global = ...
# key = JSProto_Promise
# IfClassIsDisabled::Throw
NewObject # clasp = PromiseObject::class_
# proto = promise prototype
# allocKind = OBJECT4
# newKind = GenericObject
# objFlags = ObjectFlags()
# allocSite = nullptr
sfink, is there any special thing that can happen in NewObject ?
| Assignee | ||
Comment 29•1 year ago
|
||
Thanks! Now I wonder how to prove this IS an OOM. (I suspect it is)
Per the usual "OOM small" reports, it's not hard to find reports with enough Available Page File and Physical Memory (at least a few gigabytes)
- https://crash-stats.mozilla.org/report/index/04f4b0ac-acdb-4e71-a294-d7be70241023
- https://crash-stats.mozilla.org/report/index/1b713ec3-3e36-4439-9249-4c62b0241023
- https://crash-stats.mozilla.org/report/index/cfe70133-5e7f-4d6b-9a7c-854490241023
mccr8, should we be concerned?
| Reporter | ||
Comment 30•1 year ago
|
||
I'm not sure what you are asking. The crash volume here isn't great but it isn't horrible and it is at least steady for now. Of course if YouTube is rolling something out then maybe that's just because it isn't fully rolled out yet.
Maybe you could add more instrumentation to figure out whether it is an OOM or not and crash in a different way?
I don't have a great understanding of all of the reasons we can hit OOM crashes. The first one has a very high system memory usage percentage, and the other two are x86, so maybe there's some kind of weird address space fragmentation issue or the available virtual memory calculation is off.
| Assignee | ||
Comment 31•1 year ago
|
||
I'm not sure what you are asking.
That we are OOMing when there's seemingly enough memory space in other "OOM small" reports. Could be some general memory allocation thing?
Maybe you could add more instrumentation to figure out whether it is an OOM or not and crash in a different way?
We only get nullptr from NewPromise, but perhaps we do get some JS exception when it becomes null? A quick code read seems to suggest that it just returns null without throwing any exception. Maybe SpiderMonkey people have more thoughts?
Comment 32•1 year ago
|
||
(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #27)
We throw away potential JS exception here, could there be some exception we could use for debugging?
It's possible to query if the error was OOM or not like the following.
The consumer of Promise::CreateWrapper in Promise::CreateInfallible can then branch on the error code,
for example use different message for MOZ_CRASH, that can show if it's OOM or not.
mPromiseObj = JS::NewPromiseObject(cx, nullptr);
if (!mPromiseObj) {
if (JS_IsThrowingOutOfMemory(cx)) {
aRv.Throw(NS_ERROR_OUT_OF_MEMORY);
} else {
aRv.Throw(NS_ERROR_FAILURE); // or anything that can be distinguished from NS_ERROR_UNEXPECTED set by AutoJSAPI::init failure
}
JS_ClearPendingException(cx);
return;
}
or maybe expose the error message to the crash reason somehow?
| Reporter | ||
Comment 33•1 year ago
|
||
(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #31)
That we are OOMing when there's seemingly enough memory space in other "OOM small" reports. Could be some general memory allocation thing?
The JS engine has its own memory management system, with a heap limit, so it is possible for a JS runtime to be unable to allocate even if the system still has memory, I think.
| Assignee | ||
Updated•1 year ago
|
| Assignee | ||
Comment 34•1 year ago
|
||
Updated•1 year ago
|
Comment 35•1 year ago
|
||
Comment 36•1 year ago
|
||
| bugherder | ||
Comment 37•1 year ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #28)
sfink, is there any special thing that can happen in NewObject ?
The only special thing I see is what mccr8 was talking about, where we could fail if exceeding the max heap size.
Usually when approaching the max heap size, we'll start to thrash by doing lots and lots of entirely or almost entirely useless GCs. I have seen this happen on my own machine.
This seems like it could be what's happening here. It would be good to have the current GC heap size in the crash reports. It's a per-thread atomic, though. I filed bug 1927639.
Updated•1 year ago
|
| Assignee | ||
Comment 38•1 year ago
|
||
No nightly report from 29 and 30 build, meaning it was indeed OOM? Let's uplift and see.
| Assignee | ||
Comment 39•1 year ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D226997
Updated•1 year ago
|
Comment 40•1 year ago
|
||
beta Uplift Approval Request
- User impact if declined: None, this is only to get better crash reports.
- Code covered by automated testing: no
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: None
- Risk associated with taking this patch: Low
- Explanation of risk level: Only the crash behavior changes
- String changes made/needed: None
- Is Android affected?: yes
| Assignee | ||
Comment 41•1 year ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D226997
Updated•1 year ago
|
Comment 42•1 year ago
|
||
release Uplift Approval Request
- User impact if declined: None, no user visible behavior change
- Code covered by automated testing: no
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: None
- Risk associated with taking this patch: Low
- Explanation of risk level: Only crash behavior changes
- String changes made/needed: None
- Is Android affected?: yes
Updated•1 year ago
|
Comment 43•1 year ago
|
||
| uplift | ||
| Assignee | ||
Comment 44•1 year ago
|
||
If this is real OOM, I wonder whether we also observe some spike for general OOM.
| Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
Comment 45•1 year ago
|
||
| uplift | ||
| Assignee | ||
Comment 46•1 year ago
|
||
Not seeing any non-OOM crash. Closing with no action. Someone who can, might want to check whether the total OOM rate is increasing though. (I don't know how to)
| Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Description
•