Closed Bug 1595603 Opened 6 years ago Closed 6 years ago

Crash in [@ OOM | large | NS_ABORT_OOM | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | mozilla::dom::HTMLMediaElement::DispatchAsyncEvent]

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla74

Tracking Flags:

Tracking

Status

firefox-esr68

---

unaffected

firefox70

---

wontfix

firefox71

---

wontfix

firefox72

---

wontfix

firefox73

---

fixed

firefox74

---

fixed

People

(Reporter: philipp, Assigned: alwu)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

Bug 1595603 - part1 : remove duplicate parameter and rename variable 6 years ago Alastor Wu [:alwu] 47 bytes, text/x-phabricator-request	RyanVM : approval-mozilla-beta+	Details \| Review
Bug 1595603 - part2 : delay seeking task when media is inactive 6 years ago Alastor Wu [:alwu] 47 bytes, text/x-phabricator-request	RyanVM : approval-mozilla-beta+	Details \| Review

[:philipp]

Reporter

Description

•

6 years ago

This bug is for crash report bp-07132137-9bba-45b0-82c8-19e520191111.

Top 10 frames of crashing thread:

0 xul.dll NS_ABORT_OOM xpcom/base/nsDebugImpl.cpp:604
1 xul.dll nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator> xpcom/ds/nsTArray-inl.h:136
2 xul.dll class nsTString<char16_t>* nsTArray_Impl<nsTString<char16_t>, nsTArrayInfallibleAllocator>::AppendElement<const nsTLiteralString<char16_t>&, nsTArrayInfallibleAllocator> xpcom/ds/nsTArray.h:2460
3 xul.dll mozilla::dom::HTMLMediaElement::DispatchAsyncEvent dom/html/HTMLMediaElement.cpp:5952
4 xul.dll mozilla::dom::HTMLMediaElement::SeekCompleted dom/html/HTMLMediaElement.cpp:5281
5 xul.dll void mozilla::MediaDecoder::OnSeekResolved dom/media/MediaDecoder.cpp:830
6 xul.dll void mozilla::MozPromise<bool, bool, 0>::ThenValue<mozilla::MediaDecoder*, void  xpcom/threads/MozPromise.h:597
7 xul.dll mozilla::MozPromise<bool, bool, 1>::ThenValueBase::ResolveOrRejectRunnable::Run xpcom/threads/MozPromise.h:402
8 xul.dll mozilla::AutoTaskDispatcher::TaskGroupRunnable::Run xpcom/threads/TaskDispatcher.h:197
9 xul.dll nsresult mozilla::EventTargetWrapper::Runner::Run xpcom/threads/AbstractThread.cpp:113

these cross-platform tab crashes on 64bit versions of the browser are regressing in firefox 70 - oom allocation size is always reported as 2,208,301,056 bytes (2.21 GB) in the reports.

Pascal Chevrel (relman team) -> :pascalc

Comment 1

•

6 years ago

Nils, could you find an owner for this bug and see if a fix is possible in 71/72? Thanks

Flags: needinfo?(drno)

Liz Henry (:lizzard)

Updated

•

6 years ago

status-firefox70: affected → wontfix

Pascal Chevrel (relman team) -> :pascalc

Comment 2

•

6 years ago

Too late for 71 as we shipped our last beta. Given that the volume is low to medium on beta, I am marking it as fix-optional for 71 in case we have a safe fix for a dot release as a ridealong.

status-firefox71: affected → fix-optional

Andrew McCreight [:mccr8]

Comment 3

•

6 years ago

Nathan, any guesses as to why we have so many OOMs at exactly 2,208,301,056 bytes for an array we only ever add one element at a time to? It seems bizarre.

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Comment 4

•

6 years ago

We're failing here:

https://hg.mozilla.org/releases/mozilla-beta/annotate/aa565b96885044b65e176acc9e3286e81eb0abe5/xpcom/ds/nsTArray-inl.h#l130

So we previously had an array of 2208301056/2 = 1104150528 bytes that we're trying to enlarge, but the larger size no longer fits into a 31-bit bitfield. This is an array of nsString, so ((2208301056/2)/24) = 46006272, or ~46M elements. We're only appending to this array when the page is in the bfcache:

https://hg.mozilla.org/releases/mozilla-beta/annotate/aa565b96885044b65e176acc9e3286e81eb0abe5/dom/html/HTMLMediaElement.cpp#l5949

So...maybe some weird page that is constantly seeking a media element, but the page isn't actually...being shown? Is that an expected thing?

I don't have an explanation for the robot-like consistency of the allocation sizes, though.

Flags: needinfo?(nfroyd)

Andrew McCreight [:mccr8]

Comment 5

•

6 years ago

I guess because we always add one element at a time, the size growth means that we'd always pass the breakpoint at the same value. I'm a little surprised we don't hit an actual OOM before this, but maybe on 64 bit systems you aren't going to run out of address space that easily.

Nathan Froyd [:froydnj]

Comment 6

•

6 years ago

(In reply to Andrew McCreight [:mccr8] from comment #5)

I guess because we always add one element at a time, the size growth means that we'd always pass the breakpoint at the same value.

That makes perfect sense, actually. So it doesn't even have to be a specific page that's doing this.

Andrew McCreight [:mccr8]

Updated

•

6 years ago

Comment 7

•

6 years ago

With Nils out, maybe Jean-Yves knows what's going on here?

Flags: needinfo?(drno) → needinfo?(jyavenard)

Julien Cristau [:jcristau]

Updated

•

6 years ago

status-firefox71: fix-optional → wontfix

status-firefox72: affected → wontfix

status-firefox73: --- → affected

Jean-Yves Avenard [:jya]

Comment 8

•

6 years ago

:alwu could you please investigate?

thank you.

Flags: needinfo?(jyavenard) → needinfo?(alwu)

Jean-Yves Avenard [:jya]

Comment 9

•

6 years ago

Looking at when the OOM started to occur, it seems to coincide with bug 1578615, which touch that area.

Or at a guess: https://searchfox.org/mozilla-central/source/dom/html/HTMLMediaElement.cpp#6529

Here mEventDeliveryPaused could go back to true , but that wouldn't cause any queued events to be dispatched once again.

Another thought is that we know some sites will call play() in a loop while a document is paused due to autoplay policy. Could it be that sites are also seeking in a loop to determine if playback can start, but as we keep queuing those events and end up running out of memoru?

Comment 10

•

6 years ago

I think this issue is unrelated with bug 1578615, because it first happened at 9/3 [1] which is almost one month earlier than the landing day of the bug 1578615.

As it's possible for script to call methods which can generate events while media is in the bfcache, I'm wondering if we can avoid queuing same event again if we've queued that event already?

[1] https://crash-stats.mozilla.org/signature/?signature=OOM%20%7C%20large%20%7C%20NS_ABORT_OOM%20%7C%20nsTArray_base%3CT%3E%3A%3AEnsureCapacity%3CT%3E%20%7C%20nsTArray_Impl%3CT%3E%3A%3AAppendElement%3CT%3E%20%7C%20mozilla%3A%3Adom%3A%3AHTMLMediaElement%3A%3ADispatchAsyncEvent&date=%3E%3D2019-07-06T08%3A34%3A00.000Z&date=%3C2020-01-06T08%3A34%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=63

Flags: needinfo?(alwu)

Alastor Wu [:alwu]

Assignee

Updated

•

6 years ago

Assignee: nobody → alwu

Keywords: leave-open

Alastor Wu [:alwu]

Assignee

Comment 11

•

6 years ago

Attached file Bug 1595603 - part1 : remove duplicate parameter and rename variable — Details

Two parameters in SuspendOrResumeElement() are acutally the same, they are both related with IsActive(), so using one parameter is enough.

Alastor Wu [:alwu]

Assignee

Comment 12

•

6 years ago

I tested this page [1] which would constantly call play() but I found that when the page went to the bf cache, the script seems stopping running as well. It's different than what I thought it would be, I need to do more testing to see what the correct behavior is when page is in the bfcache.

[1] https://alastor0325.github.io/htmltests/autoplay_tests/autoplay_test2.html

Paul Adenot (:padenot)

Updated

•

6 years ago

Priority: -- → P3

Alastor Wu [:alwu]

Assignee

Comment 13

•

6 years ago

•

Edited

(In reply to Alastor Wu [:alwu] from comment #10)

As it's possible for script to call methods which can generate events while media is in the bfcache, I'm wondering if we can avoid queuing same event again if we've queued that event already?

I think that is not a good idea, because there might be a lots of different combination of events. ex. play, pause, play is not equal to play, play, pause.

Alastor Wu [:alwu]

Assignee

Comment 14

•

6 years ago

Attached file Bug 1595603 - part2 : delay seeking task when media is inactive — Details

Bryce Seager van Dyk [:bryce] (he/him) - Not reading bugmail

Comment 15

•

6 years ago

For my benefit, would anyone be able to explain how a script is able to call methods while in the bfcache? My naive understanding would be that once a page is in the bfcache it would be suspended.

Alastor Wu [:alwu]

Assignee

Comment 16

•

6 years ago

(In reply to Bryce Seager van Dyk (:bryce) from comment #15)

For my benefit, would anyone be able to explain how a script is able to call methods while in the bfcache? My naive understanding would be that once a page is in the bfcache it would be suspended.

When I tested this issue on Nightly, I couldn't see any media element method being triggered while media element is in the bfcache, even if the page is calling media's method all the time. But it's depending on the premise that the page would always be freezed. However, we don't sure that this premise would be 100% correct and from the crash report we indeed add pending event when media is inactive. So I guess HTMLMediaElement::IsAcitve() doesn't always reflect if media in the bfcache, but I don't know too much details about all possible scenario where media is inactive.

Therefore, I think we should not rely on that premise and can stop unnecessary seeking calls when media is inactive.

Phabricator Automation

Updated

•

6 years ago

Attachment #9118773 - Attachment description: Bug 1595603 - part1 : remove duplicate parameter. → Bug 1595603 - part1 : remove duplicate parameter and rename variable

Phabricator Automation

Updated

•

6 years ago

Attachment #9119047 - Attachment description: Bug 1595603 - part2 : delay seeking task when media is in the bfcache. → Bug 1595603 - part2 : delay seeking task when media is inactive

Bryce Seager van Dyk [:bryce] (he/him) - Not reading bugmail

Comment 17

•

6 years ago

Via IRC:

bz> bryce: A page not in the bfcache could have script running that interacts with a media element in a bfcached page

I'm curious as to how such a situation arises (why would a page want to do this?), but that clarifies how such a situation could possibly take place.

Alastor Wu [:alwu]

Assignee

Comment 18

•

6 years ago

(In reply to Bryce Seager van Dyk (:bryce) from comment #17)

Via IRC:

bz> bryce: A page not in the bfcache could have script running that interacts with a media element in a bfcached page

I'm curious as to how such a situation arises (why would a page want to do this?), but that clarifies how such a situation could possibly take place.

It's good to know that, thank you.

Pulsebot

Comment 19

•

6 years ago

Pushed by alwu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3dfd75a07472 part1 : remove duplicate parameter and rename variable r=bryce https://hg.mozilla.org/integration/autoland/rev/d0d7ed8937ea part2 : delay seeking task when media is inactive r=bryce

Bogdan Tara[:bogdan_tara | bogdant]

Comment 20

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3dfd75a07472
https://hg.mozilla.org/mozilla-central/rev/d0d7ed8937ea

Ryan VanderMeulen [:RyanVM]

Comment 21

•

6 years ago

Looks like we haven't seen any Nightly crashes since this landed. Is there more to do still here or can we resolve this and look at uplifting?

Flags: needinfo?(alwu)

Alastor Wu [:alwu]

Assignee

Comment 22

•

6 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #21)

Looks like we haven't seen any Nightly crashes since this landed. Is there more to do still here or can we resolve this and look at uplifting?

Even before landing, the crash amount on Nightly is also very little, but I think it's worth to try to uplift it to beta and see how it goes.

Flags: needinfo?(alwu)

Alastor Wu [:alwu]

Assignee

Comment 23

•

6 years ago

Comment on attachment 9118773 [details]
Bug 1595603 - part1 : remove duplicate parameter and rename variable

Beta/Release Uplift Approval Request

User impact if declined: OOM crash when media is queueing too many events.
Is this code covered by automated tests?: No
Has the fix been verified in Nightly?: No
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): Our change is to prevent queueing too many events, so we remove those events we think are not useful when media is inactive. As those events are not visible for the script when media is inactive, so it's not risky.
String changes made/needed:

Attachment #9118773 - Flags: approval-mozilla-beta?

Alastor Wu [:alwu]

Assignee

Updated

•

6 years ago

Attachment #9119047 - Flags: approval-mozilla-beta?

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox74: --- → fixed

Keywords: leave-open

Target Milestone: --- → mozilla74

Ryan VanderMeulen [:RyanVM]

Comment 24

•

6 years ago

Comment on attachment 9118773 [details]
Bug 1595603 - part1 : remove duplicate parameter and rename variable

Avoids some OOM crashes during media playback. Crash rate is looking good on Nightly so far. Approved for 73.0b5 for wider testing and feedback.

Attachment #9118773 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

Attachment #9119047 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Cristian Brindusan [:cbrindusan]

Comment 25

•

6 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/84c1d1cd152f
https://hg.mozilla.org/releases/mozilla-beta/rev/ab154f7d32b0

status-firefox73: affected → fixed

Alastor Wu [:alwu]

Assignee

Comment 26

•

6 years ago

As those patches has been landed in both Nightly and Beta, mark this bug as fixed.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.