Crash in [@ OOM | large | NS_ABORT_OOM | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | mozilla::dom::HTMLMediaElement::DispatchAsyncEvent]
Categories
(Core :: Audio/Video: Playback, defect, P3)
Tracking
()
People
(Reporter: philipp, Assigned: alwu)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(2 files)
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details | Review |
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details | Review |
This bug is for crash report bp-07132137-9bba-45b0-82c8-19e520191111.
Top 10 frames of crashing thread:
0 xul.dll NS_ABORT_OOM xpcom/base/nsDebugImpl.cpp:604
1 xul.dll nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator> xpcom/ds/nsTArray-inl.h:136
2 xul.dll class nsTString<char16_t>* nsTArray_Impl<nsTString<char16_t>, nsTArrayInfallibleAllocator>::AppendElement<const nsTLiteralString<char16_t>&, nsTArrayInfallibleAllocator> xpcom/ds/nsTArray.h:2460
3 xul.dll mozilla::dom::HTMLMediaElement::DispatchAsyncEvent dom/html/HTMLMediaElement.cpp:5952
4 xul.dll mozilla::dom::HTMLMediaElement::SeekCompleted dom/html/HTMLMediaElement.cpp:5281
5 xul.dll void mozilla::MediaDecoder::OnSeekResolved dom/media/MediaDecoder.cpp:830
6 xul.dll void mozilla::MozPromise<bool, bool, 0>::ThenValue<mozilla::MediaDecoder*, void xpcom/threads/MozPromise.h:597
7 xul.dll mozilla::MozPromise<bool, bool, 1>::ThenValueBase::ResolveOrRejectRunnable::Run xpcom/threads/MozPromise.h:402
8 xul.dll mozilla::AutoTaskDispatcher::TaskGroupRunnable::Run xpcom/threads/TaskDispatcher.h:197
9 xul.dll nsresult mozilla::EventTargetWrapper::Runner::Run xpcom/threads/AbstractThread.cpp:113
these cross-platform tab crashes on 64bit versions of the browser are regressing in firefox 70 - oom allocation size is always reported as 2,208,301,056 bytes (2.21 GB)
in the reports.
Comment 1•5 years ago
|
||
Nils, could you find an owner for this bug and see if a fix is possible in 71/72? Thanks
Updated•5 years ago
|
Comment 2•5 years ago
|
||
Too late for 71 as we shipped our last beta. Given that the volume is low to medium on beta, I am marking it as fix-optional for 71 in case we have a safe fix for a dot release as a ridealong.
Comment 3•4 years ago
|
||
Nathan, any guesses as to why we have so many OOMs at exactly 2,208,301,056 bytes for an array we only ever add one element at a time to? It seems bizarre.
Comment 4•4 years ago
|
||
We're failing here:
So we previously had an array of 2208301056/2 = 1104150528
bytes that we're trying to enlarge, but the larger size no longer fits into a 31-bit bitfield. This is an array of nsString
, so ((2208301056/2)/24) = 46006272
, or ~46M elements. We're only appending to this array when the page is in the bfcache:
So...maybe some weird page that is constantly seeking a media element, but the page isn't actually...being shown? Is that an expected thing?
I don't have an explanation for the robot-like consistency of the allocation sizes, though.
Comment 5•4 years ago
|
||
I guess because we always add one element at a time, the size growth means that we'd always pass the breakpoint at the same value. I'm a little surprised we don't hit an actual OOM before this, but maybe on 64 bit systems you aren't going to run out of address space that easily.
Comment 6•4 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #5)
I guess because we always add one element at a time, the size growth means that we'd always pass the breakpoint at the same value.
That makes perfect sense, actually. So it doesn't even have to be a specific page that's doing this.
Comment 7•4 years ago
|
||
With Nils out, maybe Jean-Yves knows what's going on here?
Updated•4 years ago
|
Comment 8•4 years ago
|
||
:alwu could you please investigate?
thank you.
Comment 9•4 years ago
|
||
Looking at when the OOM started to occur, it seems to coincide with bug 1578615, which touch that area.
Or at a guess: https://searchfox.org/mozilla-central/source/dom/html/HTMLMediaElement.cpp#6529
Here mEventDeliveryPaused could go back to true , but that wouldn't cause any queued events to be dispatched once again.
Another thought is that we know some sites will call play() in a loop while a document is paused due to autoplay policy. Could it be that sites are also seeking in a loop to determine if playback can start, but as we keep queuing those events and end up running out of memoru?
Assignee | ||
Comment 10•4 years ago
|
||
I think this issue is unrelated with bug 1578615, because it first happened at 9/3 [1] which is almost one month earlier than the landing day of the bug 1578615.
As it's possible for script to call methods which can generate events while media is in the bfcache, I'm wondering if we can avoid queuing same event again if we've queued that event already?
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 11•4 years ago
|
||
Two parameters in SuspendOrResumeElement()
are acutally the same, they are both related with IsActive()
, so using one parameter is enough.
Assignee | ||
Comment 12•4 years ago
|
||
I tested this page [1] which would constantly call play()
but I found that when the page went to the bf cache, the script seems stopping running as well. It's different than what I thought it would be, I need to do more testing to see what the correct behavior is when page is in the bfcache.
[1] https://alastor0325.github.io/htmltests/autoplay_tests/autoplay_test2.html
Updated•4 years ago
|
Assignee | ||
Comment 13•4 years ago
•
|
||
(In reply to Alastor Wu [:alwu] from comment #10)
As it's possible for script to call methods which can generate events while media is in the bfcache, I'm wondering if we can avoid queuing same event again if we've queued that event already?
I think that is not a good idea, because there might be a lots of different combination of events. ex. play
, pause
, play
is not equal to play
, play
, pause
.
Assignee | ||
Comment 14•4 years ago
|
||
For my benefit, would anyone be able to explain how a script is able to call methods while in the bfcache? My naive understanding would be that once a page is in the bfcache it would be suspended.
Assignee | ||
Comment 16•4 years ago
|
||
(In reply to Bryce Seager van Dyk (:bryce) from comment #15)
For my benefit, would anyone be able to explain how a script is able to call methods while in the bfcache? My naive understanding would be that once a page is in the bfcache it would be suspended.
When I tested this issue on Nightly, I couldn't see any media element method being triggered while media element is in the bfcache, even if the page is calling media's method all the time. But it's depending on the premise that the page would always be freezed. However, we don't sure that this premise would be 100% correct and from the crash report we indeed add pending event when media is inactive. So I guess HTMLMediaElement::IsAcitve()
doesn't always reflect if media in the bfcache, but I don't know too much details about all possible scenario where media is inactive.
Therefore, I think we should not rely on that premise and can stop unnecessary seeking calls when media is inactive.
Updated•4 years ago
|
Updated•4 years ago
|
Via IRC:
bz> bryce: A page not in the bfcache could have script running that interacts with a media element in a bfcached page
I'm curious as to how such a situation arises (why would a page want to do this?), but that clarifies how such a situation could possibly take place.
Assignee | ||
Comment 18•4 years ago
|
||
(In reply to Bryce Seager van Dyk (:bryce) from comment #17)
Via IRC:
bz> bryce: A page not in the bfcache could have script running that interacts with a media element in a bfcached page
I'm curious as to how such a situation arises (why would a page want to do this?), but that clarifies how such a situation could possibly take place.
It's good to know that, thank you.
Comment 19•4 years ago
|
||
Pushed by alwu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3dfd75a07472 part1 : remove duplicate parameter and rename variable r=bryce https://hg.mozilla.org/integration/autoland/rev/d0d7ed8937ea part2 : delay seeking task when media is inactive r=bryce
Comment 20•4 years ago
|
||
bugherder |
Comment 21•4 years ago
|
||
Looks like we haven't seen any Nightly crashes since this landed. Is there more to do still here or can we resolve this and look at uplifting?
Assignee | ||
Comment 22•4 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #21)
Looks like we haven't seen any Nightly crashes since this landed. Is there more to do still here or can we resolve this and look at uplifting?
Even before landing, the crash amount on Nightly is also very little, but I think it's worth to try to uplift it to beta and see how it goes.
Assignee | ||
Comment 23•4 years ago
|
||
Comment on attachment 9118773 [details]
Bug 1595603 - part1 : remove duplicate parameter and rename variable
Beta/Release Uplift Approval Request
- User impact if declined: OOM crash when media is queueing too many events.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Our change is to prevent queueing too many events, so we remove those events we think are not useful when media is inactive. As those events are not visible for the script when media is inactive, so it's not risky.
- String changes made/needed:
Assignee | ||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 24•4 years ago
|
||
Comment on attachment 9118773 [details]
Bug 1595603 - part1 : remove duplicate parameter and rename variable
Avoids some OOM crashes during media playback. Crash rate is looking good on Nightly so far. Approved for 73.0b5 for wider testing and feedback.
Updated•4 years ago
|
Comment 25•4 years ago
|
||
bugherder uplift |
Assignee | ||
Comment 26•4 years ago
|
||
As those patches has been landed in both Nightly and Beta, mark this bug as fixed.
Description
•