Closed Bug 1595603 Opened 5 months ago Closed 3 months ago

Crash in [@ OOM | large | NS_ABORT_OOM | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | mozilla::dom::HTMLMediaElement::DispatchAsyncEvent]

Categories

(Core :: Audio/Video: Playback, defect, P3)

70 Branch
x86_64
All
defect

Tracking

()

RESOLVED FIXED
mozilla74
Tracking Status
firefox-esr68 --- unaffected
firefox70 --- wontfix
firefox71 --- wontfix
firefox72 --- wontfix
firefox73 --- fixed
firefox74 --- fixed

People

(Reporter: philipp, Assigned: alwu)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

This bug is for crash report bp-07132137-9bba-45b0-82c8-19e520191111.

Top 10 frames of crashing thread:

0 xul.dll NS_ABORT_OOM xpcom/base/nsDebugImpl.cpp:604
1 xul.dll nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator> xpcom/ds/nsTArray-inl.h:136
2 xul.dll class nsTString<char16_t>* nsTArray_Impl<nsTString<char16_t>, nsTArrayInfallibleAllocator>::AppendElement<const nsTLiteralString<char16_t>&, nsTArrayInfallibleAllocator> xpcom/ds/nsTArray.h:2460
3 xul.dll mozilla::dom::HTMLMediaElement::DispatchAsyncEvent dom/html/HTMLMediaElement.cpp:5952
4 xul.dll mozilla::dom::HTMLMediaElement::SeekCompleted dom/html/HTMLMediaElement.cpp:5281
5 xul.dll void mozilla::MediaDecoder::OnSeekResolved dom/media/MediaDecoder.cpp:830
6 xul.dll void mozilla::MozPromise<bool, bool, 0>::ThenValue<mozilla::MediaDecoder*, void  xpcom/threads/MozPromise.h:597
7 xul.dll mozilla::MozPromise<bool, bool, 1>::ThenValueBase::ResolveOrRejectRunnable::Run xpcom/threads/MozPromise.h:402
8 xul.dll mozilla::AutoTaskDispatcher::TaskGroupRunnable::Run xpcom/threads/TaskDispatcher.h:197
9 xul.dll nsresult mozilla::EventTargetWrapper::Runner::Run xpcom/threads/AbstractThread.cpp:113

these cross-platform tab crashes on 64bit versions of the browser are regressing in firefox 70 - oom allocation size is always reported as 2,208,301,056 bytes (2.21 GB) in the reports.

Nils, could you find an owner for this bug and see if a fix is possible in 71/72? Thanks

Flags: needinfo?(drno)

Too late for 71 as we shipped our last beta. Given that the volume is low to medium on beta, I am marking it as fix-optional for 71 in case we have a safe fix for a dot release as a ridealong.

Nathan, any guesses as to why we have so many OOMs at exactly 2,208,301,056 bytes for an array we only ever add one element at a time to? It seems bizarre.

Flags: needinfo?(nfroyd)

We're failing here:

https://hg.mozilla.org/releases/mozilla-beta/annotate/aa565b96885044b65e176acc9e3286e81eb0abe5/xpcom/ds/nsTArray-inl.h#l130

So we previously had an array of 2208301056/2 = 1104150528 bytes that we're trying to enlarge, but the larger size no longer fits into a 31-bit bitfield. This is an array of nsString, so ((2208301056/2)/24) = 46006272, or ~46M elements. We're only appending to this array when the page is in the bfcache:

https://hg.mozilla.org/releases/mozilla-beta/annotate/aa565b96885044b65e176acc9e3286e81eb0abe5/dom/html/HTMLMediaElement.cpp#l5949

So...maybe some weird page that is constantly seeking a media element, but the page isn't actually...being shown? Is that an expected thing?

I don't have an explanation for the robot-like consistency of the allocation sizes, though.

Flags: needinfo?(nfroyd)

I guess because we always add one element at a time, the size growth means that we'd always pass the breakpoint at the same value. I'm a little surprised we don't hit an actual OOM before this, but maybe on 64 bit systems you aren't going to run out of address space that easily.

(In reply to Andrew McCreight [:mccr8] from comment #5)

I guess because we always add one element at a time, the size growth means that we'd always pass the breakpoint at the same value.

That makes perfect sense, actually. So it doesn't even have to be a specific page that's doing this.

See Also: → 1603879

With Nils out, maybe Jean-Yves knows what's going on here?

Flags: needinfo?(drno) → needinfo?(jyavenard)

:alwu could you please investigate?

thank you.

Flags: needinfo?(jyavenard) → needinfo?(alwu)

Looking at when the OOM started to occur, it seems to coincide with bug 1578615, which touch that area.

Or at a guess: https://searchfox.org/mozilla-central/source/dom/html/HTMLMediaElement.cpp#6529

Here mEventDeliveryPaused could go back to true , but that wouldn't cause any queued events to be dispatched once again.

Another thought is that we know some sites will call play() in a loop while a document is paused due to autoplay policy. Could it be that sites are also seeking in a loop to determine if playback can start, but as we keep queuing those events and end up running out of memoru?

See Also: → 1578615
Assignee: nobody → alwu
Keywords: leave-open

Two parameters in SuspendOrResumeElement() are acutally the same, they are both related with IsActive(), so using one parameter is enough.

I tested this page [1] which would constantly call play() but I found that when the page went to the bf cache, the script seems stopping running as well. It's different than what I thought it would be, I need to do more testing to see what the correct behavior is when page is in the bfcache.

[1] https://alastor0325.github.io/htmltests/autoplay_tests/autoplay_test2.html

Priority: -- → P3

(In reply to Alastor Wu [:alwu] from comment #10)

As it's possible for script to call methods which can generate events while media is in the bfcache, I'm wondering if we can avoid queuing same event again if we've queued that event already?

I think that is not a good idea, because there might be a lots of different combination of events. ex. play, pause, play is not equal to play, play, pause.

For my benefit, would anyone be able to explain how a script is able to call methods while in the bfcache? My naive understanding would be that once a page is in the bfcache it would be suspended.

(In reply to Bryce Seager van Dyk (:bryce) from comment #15)

For my benefit, would anyone be able to explain how a script is able to call methods while in the bfcache? My naive understanding would be that once a page is in the bfcache it would be suspended.

When I tested this issue on Nightly, I couldn't see any media element method being triggered while media element is in the bfcache, even if the page is calling media's method all the time. But it's depending on the premise that the page would always be freezed. However, we don't sure that this premise would be 100% correct and from the crash report we indeed add pending event when media is inactive. So I guess HTMLMediaElement::IsAcitve() doesn't always reflect if media in the bfcache, but I don't know too much details about all possible scenario where media is inactive.

Therefore, I think we should not rely on that premise and can stop unnecessary seeking calls when media is inactive.

Attachment #9118773 - Attachment description: Bug 1595603 - part1 : remove duplicate parameter. → Bug 1595603 - part1 : remove duplicate parameter and rename variable
Attachment #9119047 - Attachment description: Bug 1595603 - part2 : delay seeking task when media is in the bfcache. → Bug 1595603 - part2 : delay seeking task when media is inactive

Via IRC:

bz> bryce: A page not in the bfcache could have script running that interacts with a media element in a bfcached page

I'm curious as to how such a situation arises (why would a page want to do this?), but that clarifies how such a situation could possibly take place.

(In reply to Bryce Seager van Dyk (:bryce) from comment #17)

Via IRC:

bz> bryce: A page not in the bfcache could have script running that interacts with a media element in a bfcached page

I'm curious as to how such a situation arises (why would a page want to do this?), but that clarifies how such a situation could possibly take place.

It's good to know that, thank you.

Pushed by alwu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3dfd75a07472
part1 : remove duplicate parameter and rename variable r=bryce
https://hg.mozilla.org/integration/autoland/rev/d0d7ed8937ea
part2 : delay seeking task when media is inactive r=bryce

Looks like we haven't seen any Nightly crashes since this landed. Is there more to do still here or can we resolve this and look at uplifting?

Flags: needinfo?(alwu)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #21)

Looks like we haven't seen any Nightly crashes since this landed. Is there more to do still here or can we resolve this and look at uplifting?

Even before landing, the crash amount on Nightly is also very little, but I think it's worth to try to uplift it to beta and see how it goes.

Flags: needinfo?(alwu)

Comment on attachment 9118773 [details]
Bug 1595603 - part1 : remove duplicate parameter and rename variable

Beta/Release Uplift Approval Request

  • User impact if declined: OOM crash when media is queueing too many events.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Our change is to prevent queueing too many events, so we remove those events we think are not useful when media is inactive. As those events are not visible for the script when media is inactive, so it's not risky.
  • String changes made/needed:
Attachment #9118773 - Flags: approval-mozilla-beta?
Attachment #9119047 - Flags: approval-mozilla-beta?
Keywords: leave-open
Target Milestone: --- → mozilla74

Comment on attachment 9118773 [details]
Bug 1595603 - part1 : remove duplicate parameter and rename variable

Avoids some OOM crashes during media playback. Crash rate is looking good on Nightly so far. Approved for 73.0b5 for wider testing and feedback.

Attachment #9118773 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9119047 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

As those patches has been landed in both Nightly and Beta, mark this bug as fixed.

Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.