Crash in [@ mozilla::dom::TypedArray_base<T>::ProcessFixedData<T>]
Categories
(Core :: DOM: Bindings (WebIDL), defect)
Tracking
()
People
(Reporter: diannaS, Assigned: sfink)
References
(Depends on 1 open bug, Regression)
Details
(Keywords: crash, regression, stalled)
Crash Data
Attachments
(1 file, 2 obsolete files)
1.94 KB,
text/html
|
Details |
Crash report: https://crash-stats.mozilla.org/report/index/dafaefbe-6ebc-44f4-89d3-d344b0230927
MOZ_CRASH Reason: MOZ_CRASH(small oom when moving inline data out-of-line)
Top 10 frames of crashing thread:
0 xul.dll mozilla::dom::TypedArray_base<JS::ArrayBufferView>::ProcessFixedData<`lambda at /builds/worker/checkouts/gecko/dom/media/mediasource/SourceBuffer.cpp:732:33'> const dom/bindings/TypedArray.h:660
1 xul.dll mozilla::dom::SourceBuffer::PrepareAppend dom/media/mediasource/SourceBuffer.cpp:732
1 xul.dll mozilla::dom::SourceBuffer::AppendBuffer dom/media/mediasource/SourceBuffer.cpp:194
2 xul.dll mozilla::dom::SourceBuffer_Binding::appendBuffer dom/bindings/SourceBufferBinding.cpp:891
3 xul.dll mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ThrowExceptions> dom/bindings/BindingUtils.cpp:3327
4 ? @0x3d329a0a
5 ? @0x3d2b8a50
6 ? @0x3d321805
7 ? @0x3d2b8a50
8 ? @0x3d3a3a15
Reporter | ||
Updated•2 years ago
|
Comment 1•2 years ago
|
||
Looks like a normal 32-bit OOM to me.
Comment 2•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on beta
For more information, please visit BugBot documentation.
Comment 3•2 years ago
|
||
This is happening with plenty of memory available and with amd64 builds.
I wonder why ProcessFixedData()
was chosen over ProcessData()
here.
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 5•2 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #3)
This is happening with plenty of memory available and with amd64 builds.
I wonder why
ProcessFixedData()
was chosen overProcessData()
here.
Probably because PrepareAppend()
calls EvictData()
which does all kinds of stuff, including calling TrackBuffersManager::ProcessTasks()
that sounds like it might conceivably do a GC. SetReadyState()
also does lots of things that smell like they could GC, at least to the analysis and me.
Assignee | ||
Comment 6•2 years ago
|
||
It is possible that the jsapi.Init(mImplObj)
is failing here rather than OOMing in JS::EnsureNonInlineArrayBufferOrView()
. Though I don't really know how that could be, that seems like it should only happen if the global object is gone and that shouldn't happen afaict.
Comment 8•2 years ago
|
||
Thanks. Yes, there's definitely value in picking an option for which the analysis can guarantee we're good.
PrepareAppend()
uses only the length, not the data, across EvictData()
. Presumably the length won't change during GC and so ProcessData()
could be used after EvictData()
and some other API used to get the length for EvictData()
.
Comment 9•2 years ago
|
||
Just posting so that no-one else goes to the trouble of testing the same thing.
This is not producing the crash.
Comment 10•2 years ago
•
|
||
This crash is also occurring (mac, linux) with plenty of available memory with Crypto.getRandomValues()
.
That has only the ArrayBufferView
form.
All the reports I've seen have the ArrayBufferView
template parameter, even though SourceBuffer.appendBuffer()
has the ArrayBuffer
overload, but perhaps the compiler has folded the two instantiations into one.
Assignee | ||
Comment 12•2 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #8)
PrepareAppend()
uses only the length, not the data, acrossEvictData()
. Presumably the length won't change during GC and soProcessData()
could be used afterEvictData()
and some other API used to get the length forEvictData()
.
"GC" here is unhelpfully used to stand in for "run JS code", and JS code can change the length. Currently the only way would be to detach the ArrayBuffer, but we'll be adding more ways in bug 1842773 after the dust settles here.
Comment 13•2 years ago
|
||
The bug is marked as tracked for firefox119 (beta). We have limited time to fix this, the soft freeze is in 7 days. However, the bug still isn't assigned.
:hsinyi, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 14•2 years ago
|
||
Updated•2 years ago
|
Comment 15•2 years ago
|
||
Comment on attachment 9358146 [details]
Bug 1856672 - Remove more remnants of old API
Revision D190845 was moved to bug 1858533. Setting attachment 9358146 [details] to obsolete.
Assignee | ||
Comment 16•2 years ago
|
||
Sorry, attached a patch for the wrong bug. I still don't know what's going on with this one.
Comment 17•2 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #6)
It is possible that the
jsapi.Init(mImplObj)
is failing here rather than OOMing inJS::EnsureNonInlineArrayBufferOrView()
. Though I don't really know how that could be, that seems like it should only happen if the global object is gone and that shouldn't happen afaict.
We could try to separate the two calls to get better stacks. Though it does look unlikely that the Init
is failing
Comment 18•2 years ago
•
|
||
(In reply to BugBot [:suhaib / :marco/ :calixte] from comment #13)
The bug is marked as tracked for firefox119 (beta). We have limited time to fix this, the soft freeze is in 7 days. However, the bug still isn't assigned.
:hsinyi, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
I am following this to ensure we give this right attention and actions. We are actively investigating the issue. We will assign an assignee after we know more about what's up.
Updated•2 years ago
|
Comment 19•2 years ago
|
||
:hsinyi we are in RC week for Fx119 and go-live is on 2023-10-24, though monitoring this for a fix that we could uplift for a dot release
Comment 20•2 years ago
|
||
This is a reminder regarding comment #13!
The bug is marked as tracked for firefox119 (beta). We have limited time to fix this, the soft freeze is in a day. However, the bug still isn't assigned.
Comment 21•2 years ago
|
||
Peter, I am assigning to you for now since you've been actively investing this.
Once we have more ideas about the root cause, we may reassign and ask for the help from other teams .
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 23•2 years ago
|
||
Assignee | ||
Comment 24•2 years ago
|
||
When looking at this again, I noticed another way this could fail that doesn't involve OOM: if the supposed ArrayBufferView object passed in is of the wrong type, then it will throw a JS exception and return false.
If that is what is going on, then it seems like we should be handling errors here somehow instead of just (safely) crashing. Unfortunately, ProcessFixedData
doesn't have a way to handle errors that originate in the infrastructure code itself, and it forwards the return value through from the given lambda, so the body of ProcessFixedData
doesn't know what sort of error value it could return.
We could perhaps add another parameter giving an error value? In this case, the return value is RefPtr<MediaByteBuffer>
if I'm looking at the right thing, and it is immediately bool-checked. Although it just returns on error in this case; we'd really probably want to use the ErrorResult& aRv
in this case.
If necessary, instead of an error return value, we could have a whole additional lambda passed in to handle the error case.
What I mean by "the wrong type" above is anything that doesn't resolve to ArrayBufferObject
or ArrayBufferViewObject
when passed through CheckedUnwrapStatic
.
Assignee | ||
Comment 25•2 years ago
|
||
Bleh. Automation accidentally reassigned this to me. I put a diagnostic patch up for review, but we might want to deal with the type issue first instead?
Comment 26•2 years ago
|
||
It's not looking likely for 119, but do we expect something for Fx120?
Setting Fx120 to tracking+
Assignee | ||
Comment 27•2 years ago
|
||
peterv - do you want me to land the diagnostics patch, or would you rather come up with a mechanism for error handling first (or instead)?
![]() |
||
Updated•2 years ago
|
Updated•2 years ago
|
Comment 28•2 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #24)
When looking at this again, I noticed another way this could fail that doesn't involve OOM: if the supposed ArrayBufferView object passed in is of the wrong type, then it will throw a JS exception and return false.
I don't understand how that could happen for the crash in comment 0. The argument conversion code in the binding calls ArrayBufferView_base::Init, which eventually makes us end up calling JS::ArrayBufferView::unwrap. JS::ArrayBufferView::unwrap should be doing the necessary checks, how can that end up with an object of the wrong type? The binding code will just immediately throw an exception if JS::ArrayBufferView::unwrap
returns false.
Comment 29•2 years ago
|
||
Also, I've been digging into why this only seems to be happening for ArrayBufferView
. Would it be possible that ArrayBufferViewObject::ensureBufferObject
is returning nullptr
? I'm not sure when that could happen, but it would cause ArrayBufferViewObject::ensureNonInline
to return false
.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 30•2 years ago
|
||
(I think peterv is right in comment 23, but he's also landing a bunch of new diagnostics that should pinpoint this problem precisely.)
Comment 31•2 years ago
|
||
Ni'ing peterv to provide insights from the debugging added in bug 1867941 once available.
Comment 33•2 years ago
|
||
Hi Peter, any updates now that we've got some newer crash reports?
Updated•2 years ago
|
Comment 34•2 years ago
|
||
:mccr8 as triage owner, do you have any insight here? See Comment 33
Comment 35•2 years ago
|
||
Here is a crash stats search for this signature with a facet on reasons. The most recent ones are "We did run out of memory!" but there are some older ones that are different. I'll message Peter and see if he has any ideas.
Updated•2 years ago
|
Comment 36•2 years ago
|
||
We're still trying to diagnose what exactly is going on. Landed another diagnostics patch.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 37•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
Reporter | ||
Comment 38•2 years ago
|
||
:peterv the volume seems to have gone down but still not fixed. Has the diagnostic patches helped uncover the issue here?
Comment 39•2 years ago
|
||
Nope, Steve and I are still trying to understand the additional data that we have.
Reporter | ||
Updated•2 years ago
|
Comment 40•1 year ago
|
||
We added diagnostic patches and have been looking into additional data from crash reports for months, but yet figure out a clear path forward. I am marking this stalled for now.
Updated•7 months ago
|
Description
•