Closed Bug 1923864 Opened 1 year ago Closed 1 year ago

Crash in [@ mozilla::dom::Promise::CreateInfallible | ...] on YouTube live streams

Categories

(Core :: DOM: Streams, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox-esr115 --- wontfix
firefox-esr128 --- wontfix
firefox131 + wontfix
firefox132 + fixed
firefox133 + fixed

People

(Reporter: mccr8, Assigned: saschanaz)

References

(Blocks 1 open bug, )

Details

(Keywords: crash)

Crash Data

Attachments

(3 files)

Crash report: https://crash-stats.mozilla.org/report/index/fcdcba8f-67b1-4143-9f5c-d6ca60241009

MOZ_CRASH Reason:

MOZ_CRASH(Out of memory)

Top 10 frames:

0  xul.dll  mozilla::dom::Promise::CreateInfallible(nsIGlobalObject*, mozilla::dom::Promi...  dom/promise/Promise.cpp:112
1  xul.dll  mozilla::dom::InputToReadableStreamAlgorithms::PullCallbackImpl(JSContext*, m...  dom/streams/UnderlyingSourceCallbackHelpers.cpp:256
2  xul.dll  mozilla::dom::NonAsyncInputToReadableStreamAlgorithms::PullCallbackImpl(JSCon...  dom/streams/UnderlyingSourceCallbackHelpers.cpp:581
3  xul.dll  mozilla::dom::UnderlyingSourceAlgorithmsWrapper::PullCallback::<lambda_8>::op...  dom/streams/UnderlyingSourceCallbackHelpers.cpp:139
3  xul.dll  mozilla::dom::PromisifyAlgorithm(nsIGlobalObject*, mozilla::dom::UnderlyingSo...  dom/streams/StreamUtils.h:33
3  xul.dll  mozilla::dom::UnderlyingSourceAlgorithmsWrapper::PullCallback(JSContext*, moz...  dom/streams/UnderlyingSourceCallbackHelpers.cpp:136
4  xul.dll  mozilla::dom::streams_abstract::ReadableByteStreamControllerCallPullIfNeeded(...  dom/streams/ReadableByteStreamController.cpp:622
5  xul.dll  mozilla::dom::streams_abstract::ReadableByteStreamControllerCallPullIfNeeded:...  dom/streams/ReadableByteStreamController.cpp:640
5  xul.dll  mozilla::dom::Promise::AddCallbacksWithCycleCollectedArgs<`lambda at /builds/...  dom/promise/Promise-inl.h:320
5  xul.dll  mozilla::dom::(anonymous namespace)::NativeThenHandler<`lambda at /builds/wor...  dom/promise/Promise-inl.h:206

The spike detector reported this. This is an OOM. I don't know if we can do anything about this.

The available memory and page file on these crashes doesn't seem super high so maybe this isn't really an OOM?

For some reason JS::NewPromiseObject failed, which we consider as OOM and crash here. Looks like the failure can happen via access denied error though.

Interestingly all related crash reports are from streams even though other users exist. (Arguably Streams code is the main user of this)

Crash Signature: [@ mozilla::dom::Promise::CreateInfallible | mozilla::dom::InputToReadableStreamAlgorithms::PullCallbackImpl] → [@ mozilla::dom::Promise::CreateInfallible | mozilla::dom::InputToReadableStreamAlgorithms::PullCallbackImpl] [@ mozilla::dom::Promise::CreateInfallible | mozilla::dom::streams_abstract::SetUpReadableByteStreamController]

Crash report spike seen in all channels, maybe not a regression but certain web service is hitting the crash trigger?

(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #3)

Crash report spike seen in all channels, maybe not a regression but certain web service is hitting the crash trigger?

Yeah, it is our #7 top content crash currently. I marked it S3 for now, but if we understand how it affects people / which sites we might need to raise it to S2. Not sure if you can do anything pro-actively here?

Severity: -- → S3
Flags: needinfo?(krosylight)
Priority: -- → P2

65.85 % of reports are with SetUpReadableByteStreamController and it seems the stack frequently involves FetchBody::GetBody? (Not in comment #0 though)

Flags: needinfo?(krosylight)

According to the numerous comments, this is frequently happening on YouTube live streams.

[Tracking Requested - why for this release]: Frequent crash when viewing YouTube live streams. This appears to be due to a site change, as it is occurring on all branches.

Summary: Crash in [@ mozilla::dom::Promise::CreateInfallible | mozilla::dom::InputToReadableStreamAlgorithms::PullCallbackImpl] → Crash in [@ mozilla::dom::Promise::CreateInfallible | ...] on YouTube live streams

There's a bunch of other signatures with CreateInfallible in them. I don't know if they are all related or not. Fortunately their volume is much lower. I'll add one that had a comment about YouTube live streams.

Crash Signature: [@ mozilla::dom::Promise::CreateInfallible | mozilla::dom::InputToReadableStreamAlgorithms::PullCallbackImpl] [@ mozilla::dom::Promise::CreateInfallible | mozilla::dom::streams_abstract::SetUpReadableByteStreamController] → [@ mozilla::dom::Promise::CreateInfallible | mozilla::dom::InputToReadableStreamAlgorithms::PullCallbackImpl] [@ mozilla::dom::Promise::CreateInfallible | mozilla::dom::streams_abstract::SetUpReadableByteStreamController] [@ mozilla::dom::Promise::Create…

Randell, you might be interested in this.

Flags: needinfo?(rjesup)

It doesn't look limited to any particular platform, though the volume is much lower on Android.

OS: Windows 10 → All
Hardware: Unspecified → All

Some comments from the crashes that have interesting details:

I was watching YouTube creator livestream and they changed their setting to remove DVR playback. Did that cause YT to crash on me?

It looks like chat was too long and caused Firefox to crash? Or perhaps it was YT trying to put in an advert?

Streaming a youtube live stream [...] with 100k+ viewers.

One commenter was having crashes on this specific live stream which appears to have been live for more than 2 years so that would be a good one to use for testing. I'm not hitting problems with it myself.

More comments:

same tab, always just the same tab (they linked to the video I mentioned above)

adding videos to queue in YouTube

The crashes are happening with really active chats on YouTube livestreams. It's not happening on Chrome.

Some live youtube videos have been just cutting out and freezing the last 3 days. other live ones work fine.

~3 days would match the timeframe in crashreports

Flags: needinfo?(rjesup)

Or maybe not. Looking at reports.

Severity: S3 → S2

(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #2)

For some reason JS::NewPromiseObject failed, which we consider as OOM and crash here. Looks like the failure can happen via access denied error though.

It does look like you can't throw access denied on this path however; CreateWrapper calls JS::NewPromiseObject, which calls js::Promise::createSkippingExecutor, and so we never traverse that path.

The bug is marked as tracked for firefox131 (release), tracked for firefox132 (beta) and tracked for firefox133 (nightly). We have limited time to fix this, the soft freeze is in 10 days. However, the bug still isn't assigned.

:jstutte, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(jstutte)

Per comment #11 being resource consuming, can this be an actual OOM? Maybe something in YT made it consume more memory in general and thus causing more OOM? Not sure I have any plausible explanation here than that. It's not very actionable right now. Inclining to add stalled keyword.

(Didn't have luck with that video either, but maybe my machine has too much memory.)

Have we tried reaching out on the youtube-discuss list regarding this spike?

I hit this crash myself just now, on Nightly. I had a YouTube live stream open for a few hours. The crash report says I have almost 6GB of memory free. That was on MacOS.

Arai: Can you think of any other ways we could be failing to create a promise object?

Flags: needinfo?(arai.unmht)

If JS::NewPromiseObject is called with executor == nullptr, almost all failure cases are just OOM, except for the string handling inside JS::CaptureCurrentStack, where it walks over the stack frames and atomize some UTF-8 strings, in the following places. If the input string is malformed UTF-8, the atomization fails

https://searchfox.org/mozilla-central/rev/ec4996707f5aa721097183d6da6a9546e2fbec30/js/src/vm/SavedStacks.cpp#1584

Rooted<JSAtom*> causeAtom(cx, AtomizeUTF8Chars(cx, cause, strlen(cause)));

cause comes from the following, and I assume it's guaranteed to be valid UTF-8

https://searchfox.org/mozilla-central/rev/ec4996707f5aa721097183d6da6a9546e2fbec30/js/xpconnect/src/XPCComponents.cpp#1818-1819

JS::AutoSetAsyncStackForNewCalls sas(
    cx, asyncStackObj, utf8Cause.get(),

The other is the location string.
It's less likely a invalid UTF-8 string flows into filename tho, at least I don't know about wasm's case.

https://searchfox.org/mozilla-central/rev/ec4996707f5aa721097183d6da6a9546e2fbec30/js/src/vm/SavedStacks.cpp#1845-1846,1859,1861,1863-1865,1874-1875,1884,1886-1888

bool SavedStacks::getLocation(JSContext* cx, const FrameIter& iter,
                              MutableHandle<LocationValue> locationp) {
...
  if (iter.isWasm()) {
...
    if (const char16_t* displayURL = iter.displayURL()) {
...
    } else {
      const char* filename = iter.filename() ? iter.filename() : "";
      locationp.setSource(AtomizeUTF8Chars(cx, filename, strlen(filename)));
...
    return true;
  }
...
    if (const char16_t* displayURL = iter.displayURL()) {
...
    } else {
      const char* filename = script->filename() ? script->filename() : "";
      source = AtomizeUTF8Chars(cx, filename, strlen(filename));

If this crash can somehow be reliably reproduced, it would be nice to test with flipping javascript.options.asyncstack pref to false, so that the above stack capturing code isn't executed.

Flags: needinfo?(arai.unmht)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #17)

Have we tried reaching out on the youtube-discuss list regarding this spike?

Good point, I just posted there.

Actually, the above stack capturing is enabled only when DevTools is opened (or somehow the global becomes debuggee), as long as javascript.options.asyncstack_capture_debuggee_only pref is set to true, which is the default value.
If the pref value is set to false for some reason, or if DevTools is opened, the codepath affects.

(In reply to Tooru Fujisawa [:arai] from comment #22)

Actually, the above stack capturing is enabled only when DevTools is opened (or somehow the global becomes debuggee), as long as javascript.options.asyncstack_capture_debuggee_only pref is set to true, which is the default value.
If the pref value is set to false for some reason, or if DevTools is opened, the codepath affects.

Does that mean, this is not impacting users while browsing normally with default settings ? It seems to happen a bit often, though?

Flags: needinfo?(jstutte)

(In reply to Jens Stutte [:jstutte] from comment #23)

Does that mean, this is not impacting users while browsing normally with default settings ? It seems to happen a bit often, though?

Yeah, the stack trace part won't affect users with default settings as long as they don't have DevTools or Browser Toolbox opened.
The other OOM path in the Promise allocation can still happen in any case.

This is a reminder regarding comment #15!

The bug is marked as tracked for firefox131 (release), tracked for firefox132 (beta) and tracked for firefox133 (nightly). We have limited time to fix this, the soft freeze is in 3 days. However, the bug still isn't assigned.

(In reply to Tooru Fujisawa [:arai] from comment #24)

Yeah, the stack trace part won't affect users with default settings as long as they don't have DevTools or Browser Toolbox opened.
The other OOM path in the Promise allocation can still happen in any case.

Do we know there's any other not-really-oom path like comment #20? 👀

Flags: needinfo?(arai.unmht)

We throw away potential JS exception here, could there be some exception we could use for debugging?

(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #26)

Do we know there's any other not-really-oom path like comment #20? 👀

Other than the stack capturing, there are only the following:

  • resolve promise prototype/constructor
  • allocate a promise object itself

The prototype/constructor handling happens only once, at the first access.
So it's less likely the case unless this crash happens immediately after loading the page.
The operation does allocating objects and defining properties, but there's nothing special, compared to other prototypes.

The promise object allocation is regular object allocation, and here too nothing special happens.

So, if this crash happens without DevTools opened and happens after some amount of time elapsed from pageload, it's likely an OOM, or something goes wrong inside allocator.

fwiw, here's the call graph for fallible cases in JS::NewPromiseObject, called from Promise::CreateInfallible, if the DevTools is closed:

Promise::CreateInfallible
  Promise::CreateWrapper
    JS::NewPromiseObject # executor = nullptr
      PromiseObject::createSkippingExecutor
        CreatePromiseObjectWithoutResolutionFunctions
          CreatePromiseObjectInternal # proto = nullptr
                                      # protoIsWrapped = false
                                      # informDebugger = true
            NewObjectWithClassProto<PromiseObject> # proto = nullptr
              NewObjectWithClassProto # clasp = PromiseObject::class_
                                      # proto = nullptr
                                      # newKind = GenericObject
                                      # objFlags = ObjectFlags()
                NewObjectWithClassProto # clasp = PromiseObject::class_
                                        # protoArg = nullptr
                                        # allocKind = OBJECT4
                                        # newKind = GenericObject
                                        # objFlags = ObjectFlags()
                  GlobalObject::getOrCreatePrototype # key = JSProto_Promise
                    GlobalObject::ensureConstructor # global = ...
                                                    # key = JSProto_Promise
                      GlobalObject::resolveConstructor # global = ...
                                                       # key = JSProto_Promise
                                                       # IfClassIsDisabled::Throw
                  NewObject # clasp = PromiseObject::class_
                            # proto = promise prototype
                            # allocKind = OBJECT4
                            # newKind = GenericObject
                            # objFlags = ObjectFlags()
                            # allocSite = nullptr

sfink, is there any special thing that can happen in NewObject ?

Flags: needinfo?(arai.unmht) → needinfo?(sphink)

Thanks! Now I wonder how to prove this IS an OOM. (I suspect it is)

Per the usual "OOM small" reports, it's not hard to find reports with enough Available Page File and Physical Memory (at least a few gigabytes)

mccr8, should we be concerned?

Flags: needinfo?(continuation)

I'm not sure what you are asking. The crash volume here isn't great but it isn't horrible and it is at least steady for now. Of course if YouTube is rolling something out then maybe that's just because it isn't fully rolled out yet.

Maybe you could add more instrumentation to figure out whether it is an OOM or not and crash in a different way?

I don't have a great understanding of all of the reasons we can hit OOM crashes. The first one has a very high system memory usage percentage, and the other two are x86, so maybe there's some kind of weird address space fragmentation issue or the available virtual memory calculation is off.

Flags: needinfo?(continuation)

I'm not sure what you are asking.

That we are OOMing when there's seemingly enough memory space in other "OOM small" reports. Could be some general memory allocation thing?

Maybe you could add more instrumentation to figure out whether it is an OOM or not and crash in a different way?

We only get nullptr from NewPromise, but perhaps we do get some JS exception when it becomes null? A quick code read seems to suggest that it just returns null without throwing any exception. Maybe SpiderMonkey people have more thoughts?

(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #27)

We throw away potential JS exception here, could there be some exception we could use for debugging?

It's possible to query if the error was OOM or not like the following.
The consumer of Promise::CreateWrapper in Promise::CreateInfallible can then branch on the error code,
for example use different message for MOZ_CRASH, that can show if it's OOM or not.

  mPromiseObj = JS::NewPromiseObject(cx, nullptr);
  if (!mPromiseObj) {
    if (JS_IsThrowingOutOfMemory(cx)) {
      aRv.Throw(NS_ERROR_OUT_OF_MEMORY);
    } else {
      aRv.Throw(NS_ERROR_FAILURE); // or anything that can be distinguished from NS_ERROR_UNEXPECTED set by AutoJSAPI::init failure
    }
    JS_ClearPendingException(cx);
    return;
  }

or maybe expose the error message to the crash reason somehow?

(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #31)

That we are OOMing when there's seemingly enough memory space in other "OOM small" reports. Could be some general memory allocation thing?

The JS engine has its own memory management system, with a heap limit, so it is possible for a JS runtime to be unable to allocate even if the system still has memory, I think.

Assignee: nobody → krosylight
Status: NEW → ASSIGNED
Pushed by krosylight@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/326c334fe386 Crash differently if not JS_IsThrowingOutOfMemory r=arai
See Also: → 1927639

(In reply to Tooru Fujisawa [:arai] from comment #28)

sfink, is there any special thing that can happen in NewObject ?

The only special thing I see is what mccr8 was talking about, where we could fail if exceeding the max heap size.

Usually when approaching the max heap size, we'll start to thrash by doing lots and lots of entirely or almost entirely useless GCs. I have seen this happen on my own machine.

This seems like it could be what's happening here. It would be good to have the current GC heap size in the crash reports. It's a per-thread atomic, though. I filed bug 1927639.

Flags: needinfo?(sphink)

No nightly report from 29 and 30 build, meaning it was indeed OOM? Let's uplift and see.

Attachment #9434359 - Flags: approval-mozilla-beta?

beta Uplift Approval Request

  • User impact if declined: None, this is only to get better crash reports.
  • Code covered by automated testing: no
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: None
  • Risk associated with taking this patch: Low
  • Explanation of risk level: Only the crash behavior changes
  • String changes made/needed: None
  • Is Android affected?: yes
Attachment #9434360 - Flags: approval-mozilla-release?

release Uplift Approval Request

  • User impact if declined: None, no user visible behavior change
  • Code covered by automated testing: no
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: None
  • Risk associated with taking this patch: Low
  • Explanation of risk level: Only crash behavior changes
  • String changes made/needed: None
  • Is Android affected?: yes
Attachment #9434359 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

If this is real OOM, I wonder whether we also observe some spike for general OOM.

Crash Signature: mozilla::dom::Promise::CreateInfallible | mozilla::dom::ReadableStreamDefaultReader::Read ] → mozilla::dom::Promise::CreateInfallible | mozilla::dom::ReadableStreamDefaultReader::Read ] [@ OOM | unknown | NS_ABORT_OOM | mozilla::dom::Promise::CreateInfallible | mozilla::dom::streams_abstract::SetUpReadableByteStreamController] [@ OOM | unknown | …
Attachment #9434360 - Flags: approval-mozilla-release? → approval-mozilla-release+

Not seeing any non-OOM crash. Closing with no action. Someone who can, might want to check whether the total OOM rate is increasing though. (I don't know how to)

Severity: S2 → S4
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Keywords: leave-open
Priority: P2 → P3
Resolution: --- → WONTFIX
No longer blocks: 1931717
Depends on: 1931717
Blocks: 1931717
No longer depends on: 1931717
See Also: → 1934986
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: