Open Bug 1856672 Opened 2 years ago Updated 7 months ago

Crash in [@ mozilla::dom::TypedArray_base<T>::ProcessFixedData<T>]

Tracking

()

Status:

ASSIGNED

Tracking Flags:

Tracking

Status

firefox-esr115

---

unaffected

firefox118

---

unaffected

firefox119

wontfix

firefox120

wontfix

firefox121

wontfix

firefox122

wontfix

firefox123

wontfix

firefox124

---

wontfix

People

(Reporter: diannaS, Assigned: sfink)

References

(Depends on 1 open bug, Regression)

Details

(Keywords: crash, regression, stalled)

Crash Data

Attachments

(1 file, 2 obsolete files)

Non-testcase with discarded iframe 2 years ago Karl Tomlinson (:karlt) 1.94 KB, text/html		Details
Bug 1856672 - Remove more remnants of old API 2 years ago Steve Fink [:sfink] [:s:] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1856672 - Diagnostics for OOM (?) crash 2 years ago Steve Fink [:sfink] [:s:] 48 bytes, text/x-phabricator-request		Details \| Review

Dianna Smith [:diannaS]

Reporter

Description

•

2 years ago

Crash report: https://crash-stats.mozilla.org/report/index/dafaefbe-6ebc-44f4-89d3-d344b0230927

MOZ_CRASH Reason: MOZ_CRASH(small oom when moving inline data out-of-line)

Top 10 frames of crashing thread:

0  xul.dll  mozilla::dom::TypedArray_base<JS::ArrayBufferView>::ProcessFixedData<`lambda at /builds/worker/checkouts/gecko/dom/media/mediasource/SourceBuffer.cpp:732:33'> const  dom/bindings/TypedArray.h:660
1  xul.dll  mozilla::dom::SourceBuffer::PrepareAppend  dom/media/mediasource/SourceBuffer.cpp:732
1  xul.dll  mozilla::dom::SourceBuffer::AppendBuffer  dom/media/mediasource/SourceBuffer.cpp:194
2  xul.dll  mozilla::dom::SourceBuffer_Binding::appendBuffer  dom/bindings/SourceBufferBinding.cpp:891
3  xul.dll  mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ThrowExceptions>  dom/bindings/BindingUtils.cpp:3327
4  ?  @0x3d329a0a  
5  ?  @0x3d2b8a50  
6  ?  @0x3d321805  
7  ?  @0x3d2b8a50  
8  ?  @0x3d3a3a15

Dianna Smith [:diannaS]

Reporter

Updated

•

2 years ago

status-firefox119: --- → affected

Henri Sivonen (:hsivonen)

Comment 1

•

2 years ago

Looks like a normal 32-bit OOM to me.

Component: DOM: Bindings (WebIDL) → Audio/Video

BugBot [:suhaib / :marco/ :calixte]

Comment 2

•

2 years ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 10 content process crashes on beta

For more information, please visit BugBot documentation.

Keywords: topcrash

Karl Tomlinson (:karlt)

Comment 3

•

2 years ago

This is happening with plenty of memory available and with amd64 builds.

I wonder why ProcessFixedData() was chosen over ProcessData() here.

Severity: -- → S2

Keywords: regression

OS: Windows 10 → All

Regressed by: 1690111

Hardware: x86 → All

BugBot [:suhaib / :marco/ :calixte]

Updated

•

2 years ago

status-firefox118: --- → unaffected

status-firefox120: --- → affected

status-firefox-esr115: --- → unaffected

Flags: needinfo?(peterv)

Donal Meehan [:dmeehan]

Updated

•

2 years ago

tracking-firefox119: --- → +

Steve Fink [:sfink] [:s:]

Assignee

Comment 5

•

2 years ago

(In reply to Karl Tomlinson (:karlt) from comment #3)

This is happening with plenty of memory available and with amd64 builds.

I wonder why ProcessFixedData() was chosen over ProcessData() here.

Probably because PrepareAppend() calls EvictData() which does all kinds of stuff, including calling TrackBuffersManager::ProcessTasks() that sounds like it might conceivably do a GC. SetReadyState() also does lots of things that smell like they could GC, at least to the analysis and me.

Steve Fink [:sfink] [:s:]

Assignee

Comment 6

•

2 years ago

It is possible that the jsapi.Init(mImplObj) is failing here rather than OOMing in JS::EnsureNonInlineArrayBufferOrView(). Though I don't really know how that could be, that seems like it should only happen if the global object is gone and that shouldn't happen afaict.

Karl Tomlinson (:karlt)

Comment 8

•

2 years ago

Thanks. Yes, there's definitely value in picking an option for which the analysis can guarantee we're good.

PrepareAppend() uses only the length, not the data, across EvictData(). Presumably the length won't change during GC and so ProcessData() could be used after EvictData() and some other API used to get the length for EvictData().

Karl Tomlinson (:karlt)

Comment 9

•

2 years ago

Attached file Non-testcase with discarded iframe — Details

Just posting so that no-one else goes to the trouble of testing the same thing.
This is not producing the crash.

Karl Tomlinson (:karlt)

Comment 10

•

2 years ago

•

Edited

This crash is also occurring (mac, linux) with plenty of available memory with Crypto.getRandomValues().

That has only the ArrayBufferView form.

All the reports I've seen have the ArrayBufferView template parameter, even though SourceBuffer.appendBuffer() has the ArrayBuffer overload, but perhaps the compiler has folded the two instantiations into one.

Component: Audio/Video → DOM: Bindings (WebIDL)

Steve Fink [:sfink] [:s:]

Assignee

Comment 12

•

2 years ago

(In reply to Karl Tomlinson (:karlt) from comment #8)

PrepareAppend() uses only the length, not the data, across EvictData(). Presumably the length won't change during GC and so ProcessData() could be used after EvictData() and some other API used to get the length for EvictData().

"GC" here is unhelpfully used to stand in for "run JS code", and JS code can change the length. Currently the only way would be to detach the ArrayBuffer, but we'll be adding more ways in bug 1842773 after the dust settles here.

BugBot [:suhaib / :marco/ :calixte]

Comment 13

•

2 years ago

The bug is marked as tracked for firefox119 (beta). We have limited time to fix this, the soft freeze is in 7 days. However, the bug still isn't assigned.

:hsinyi, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(htsai)

Steve Fink [:sfink] [:s:]

Assignee

Comment 14

•

2 years ago

Attached file Bug 1856672 - Remove more remnants of old API (obsolete) — Details

Phabricator Automation

Updated

•

2 years ago

Assignee: nobody → sphink

Status: NEW → ASSIGNED

Phabricator Automation

Comment 15

•

2 years ago

Comment on attachment 9358146 [details]
Bug 1856672 - Remove more remnants of old API

Revision D190845 was moved to bug 1858533. Setting attachment 9358146 [details] to obsolete.

Attachment #9358146 - Attachment is obsolete: true

Steve Fink [:sfink] [:s:]

Assignee

Comment 16

•

2 years ago

Sorry, attached a patch for the wrong bug. I still don't know what's going on with this one.

Assignee: sphink → nobody

Status: ASSIGNED → NEW

Peter Van der Beken [:peterv]

Comment 17

•

2 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #6)

It is possible that the jsapi.Init(mImplObj) is failing here rather than OOMing in JS::EnsureNonInlineArrayBufferOrView(). Though I don't really know how that could be, that seems like it should only happen if the global object is gone and that shouldn't happen afaict.

We could try to separate the two calls to get better stacks. Though it does look unlikely that the Init is failing

Flags: needinfo?(peterv)

Hsin-Yi Tsai (she/her) [:hsinyi]

Comment 18

•

2 years ago

•

Edited

(In reply to BugBot [:suhaib / :marco/ :calixte] from comment #13)

The bug is marked as tracked for firefox119 (beta). We have limited time to fix this, the soft freeze is in 7 days. However, the bug still isn't assigned.

:hsinyi, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

I am following this to ensure we give this right attention and actions. We are actively investigating the issue. We will assign an assignee after we know more about what's up.

Flags: needinfo?(htsai)

Hsin-Yi Tsai (she/her) [:hsinyi]

Updated

•

2 years ago

Flags: needinfo?(htsai)

Donal Meehan [:dmeehan]

Comment 19

•

2 years ago

:hsinyi we are in RC week for Fx119 and go-live is on 2023-10-24, though monitoring this for a fix that we could uplift for a dot release

BugBot [:suhaib / :marco/ :calixte]

Comment 20

•

2 years ago

This is a reminder regarding comment #13!

The bug is marked as tracked for firefox119 (beta). We have limited time to fix this, the soft freeze is in a day. However, the bug still isn't assigned.

Hsin-Yi Tsai (she/her) [:hsinyi]

Comment 21

•

2 years ago

Peter, I am assigning to you for now since you've been actively investing this.

Once we have more ideas about the root cause, we may reassign and ask for the help from other teams .

Assignee: nobody → peterv

Flags: needinfo?(htsai)

BugBot [:suhaib / :marco/ :calixte]

Updated

•

2 years ago

status-firefox121: --- → affected

Steve Fink [:sfink] [:s:]

Assignee

Updated

•

2 years ago

Assignee: peterv → sphink

Status: NEW → ASSIGNED

Steve Fink [:sfink] [:s:]

Assignee

Comment 23

•

2 years ago

Attached file Bug 1856672 - Diagnostics for OOM (?) crash (obsolete) — Details

Steve Fink [:sfink] [:s:]

Assignee

Comment 24

•

2 years ago

When looking at this again, I noticed another way this could fail that doesn't involve OOM: if the supposed ArrayBufferView object passed in is of the wrong type, then it will throw a JS exception and return false.

If that is what is going on, then it seems like we should be handling errors here somehow instead of just (safely) crashing. Unfortunately, ProcessFixedData doesn't have a way to handle errors that originate in the infrastructure code itself, and it forwards the return value through from the given lambda, so the body of ProcessFixedData doesn't know what sort of error value it could return.

We could perhaps add another parameter giving an error value? In this case, the return value is RefPtr<MediaByteBuffer> if I'm looking at the right thing, and it is immediately bool-checked. Although it just returns on error in this case; we'd really probably want to use the ErrorResult& aRv in this case.

If necessary, instead of an error return value, we could have a whole additional lambda passed in to handle the error case.

What I mean by "the wrong type" above is anything that doesn't resolve to ArrayBufferObject or ArrayBufferViewObject when passed through CheckedUnwrapStatic.

Steve Fink [:sfink] [:s:]

Assignee

Comment 25

•

2 years ago

Bleh. Automation accidentally reassigned this to me. I put a diagnostic patch up for review, but we might want to deal with the type issue first instead?

Assignee: sphink → peterv

Donal Meehan [:dmeehan]

Comment 26

•

2 years ago

It's not looking likely for 119, but do we expect something for Fx120?
Setting Fx120 to tracking+

tracking-firefox120: --- → +

Flags: needinfo?(peterv)

Steve Fink [:sfink] [:s:]

Assignee

Comment 27

•

2 years ago

peterv - do you want me to land the diagnostics patch, or would you rather come up with a mechanism for error handling first (or instead)?

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

2 years ago

Crash Signature: [@ mozilla::dom::TypedArray_base<T>::ProcessFixedData<T>] → [@ mozilla::dom::TypedArray_base<T>::ProcessFixedData<T>] [@ mozilla::dom::TypedArray_base<T>::ProcessFixedData]

Ryan VanderMeulen [:RyanVM]

Updated

•

2 years ago

status-firefox119: affected → wontfix

tracking-firefox121: --- → +

Peter Van der Beken [:peterv]

Comment 28

•

2 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #24)

When looking at this again, I noticed another way this could fail that doesn't involve OOM: if the supposed ArrayBufferView object passed in is of the wrong type, then it will throw a JS exception and return false.

I don't understand how that could happen for the crash in comment 0. The argument conversion code in the binding calls ArrayBufferView_base::Init, which eventually makes us end up calling JS::ArrayBufferView::unwrap. JS::ArrayBufferView::unwrap should be doing the necessary checks, how can that end up with an object of the wrong type? The binding code will just immediately throw an exception if JS::ArrayBufferView::unwrap returns false.

Flags: needinfo?(peterv) → needinfo?(sphink)

Peter Van der Beken [:peterv]

Comment 29

•

2 years ago

Also, I've been digging into why this only seems to be happening for ArrayBufferView. Would it be possible that ArrayBufferViewObject::ensureBufferObject is returning nullptr? I'm not sure when that could happen, but it would cause ArrayBufferViewObject::ensureNonInline to return false.

Ryan VanderMeulen [:RyanVM]

Updated

•

2 years ago

status-firefox120: affected → wontfix

Peter Van der Beken [:peterv]

Updated

•

2 years ago

Depends on: 1865987

Peter Van der Beken [:peterv]

Updated

•

2 years ago

Depends on: 1866744

Ryan VanderMeulen [:RyanVM]

Updated

•

2 years ago

status-firefox122: --- → affected

tracking-firefox122: --- → +

Phabricator Automation

Updated

•

2 years ago

Attachment #9360498 - Attachment is obsolete: true

Steve Fink [:sfink] [:s:]

Assignee

Comment 30

•

2 years ago

(I think peterv is right in comment 23, but he's also landing a bunch of new diagnostics that should pinpoint this problem precisely.)

Flags: needinfo?(sphink)

Peter Van der Beken [:peterv]

Updated

•

2 years ago

Depends on: 1867941

Peter Van der Beken [:peterv]

Updated

•

2 years ago

Depends on: 1868654

Stuart Colville [:muffinresearch]

Comment 31

•

2 years ago

Ni'ing peterv to provide insights from the debugging added in bug 1867941 once available.

Flags: needinfo?(peterv)

Peter Van der Beken [:peterv]

Comment 32

•

2 years ago

We're waiting for crashes on nightly.

Flags: needinfo?(peterv)

Ryan VanderMeulen [:RyanVM]

Comment 33

•

2 years ago

Hi Peter, any updates now that we've got some newer crash reports?

Flags: needinfo?(peterv)

Ryan VanderMeulen [:RyanVM]

Updated

•

2 years ago

status-firefox121: affected → wontfix

status-firefox123: --- → affected

tracking-firefox123: --- → +

Donal Meehan [:dmeehan]

Comment 34

•

2 years ago

:mccr8 as triage owner, do you have any insight here? See Comment 33

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 35

•

2 years ago

Here is a crash stats search for this signature with a facet on reasons. The most recent ones are "We did run out of memory!" but there are some older ones that are different. I'll message Peter and see if he has any ideas.

Flags: needinfo?(continuation)

Peter Van der Beken [:peterv]

Updated

•

2 years ago

Depends on: 1876107

Donal Meehan [:dmeehan]

Updated

•

2 years ago

status-firefox122: affected → wontfix

Peter Van der Beken [:peterv]

Comment 36

•

2 years ago

We're still trying to diagnose what exactly is going on. Landed another diagnostics patch.

Flags: needinfo?(peterv)

Pascal Chevrel:pascalc

Updated

•

2 years ago

status-firefox124: --- → affected

Pascal Chevrel:pascalc

Updated

•

2 years ago

status-firefox123: affected → wontfix

BugBot [:suhaib / :marco/ :calixte]

Comment 37

•

2 years ago

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash

Dianna Smith [:diannaS]

Reporter

Comment 38

•

2 years ago

:peterv the volume seems to have gone down but still not fixed. Has the diagnostic patches helped uncover the issue here?

Flags: needinfo?(peterv)

Peter Van der Beken [:peterv]

Comment 39

•

2 years ago

Nope, Steve and I are still trying to understand the additional data that we have.

Flags: needinfo?(peterv)

Dianna Smith [:diannaS]

Reporter

Updated

•

2 years ago

status-firefox124: affected → wontfix

Peter Van der Beken [:peterv]

Updated

•

1 year ago

Depends on: 1898399

Hsin-Yi Tsai (she/her) [:hsinyi]

Comment 40

•

1 year ago

We added diagnostic patches and have been looking into additional data from crash reports for months, but yet figure out a clear path forward. I am marking this stalled for now.

Keywords: stalled

Peter Van der Beken [:peterv]

Updated

•

7 months ago

Assignee: peterv → sphink

You need to log in before you can comment on or make changes to this bug.