Crash in [@ mozilla::AudioDecoderInputTrack::EnsureTimeStretcher]
Categories
(Core :: Audio/Video, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox-esr128 | --- | affected |
firefox135 | --- | affected |
firefox136 | --- | affected |
firefox137 | --- | affected |
People
(Reporter: aryx, Assigned: shravanrn, NeedInfo)
Details
(Keywords: crash)
Crash Data
Attachments
(2 files)
Not a new signature. 174 content crashes from 58 installs of Firefox for Android v135.
Code touched for version 128 in bug 1894110 which downgraded the handling.
Crash report: https://crash-stats.mozilla.org/report/index/da1fd81a-a271-4477-9d4e-c0b0a0250217
MOZ_CRASH Reason:
MOZ_RELEASE_ASSERT(mTimeStretcher->Init())
Top 10 frames:
0 libxul.so mozilla::AudioDecoderInputTrack::EnsureTimeStretcher() dom/media/mediasink/AudioDecoderInputTrack.cpp:625
1 libxul.so mozilla::AudioDecoderInputTrack::AppendTimeStretchedDataToSegment(long, mozil... dom/media/mediasink/AudioDecoderInputTrack.cpp:424
2 libxul.so mozilla::AudioDecoderInputTrack::AppendBufferedDataToOutput(long) dom/media/mediasink/AudioDecoderInputTrack.cpp:390
3 libxul.so mozilla::AudioDecoderInputTrack::ProcessInput(long, long, unsigned int) dom/media/mediasink/AudioDecoderInputTrack.cpp:343
4 libxul.so mozilla::MediaTrackGraphImpl::ProduceDataForTracksBlockByBlock(unsigned int, ... dom/media/MediaTrackGraph.cpp:1294
5 libxul.so mozilla::MediaTrackGraphImpl::Process(mozilla::MixerCallbackReceiver*) dom/media/MediaTrackGraph.cpp:1517
6 libxul.so mozilla::MediaTrackGraphImpl::OneIterationImpl(long, long, mozilla::MixerCall... dom/media/MediaTrackGraph.cpp:1678
7 libxul.so mozilla::GraphRunner::Run() dom/media/GraphRunner.cpp:144
8 libxul.so nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1153
8 libxul.so NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:480
Assignee | ||
Comment 1•1 month ago
•
|
||
This looks like a memory fragmentation issue. The error seems to come up in two spots
-
Windows 32-bit machines --- not a lot we can do here. But we do need or more graceful failure channel than crashing because we can't change audio speed :(
-
Android 64-bit machines --- these are machines with 39-bit virtual address spaces. So it is still a little limited compared to the 48-bit virtual address spaces in desktops, but intuition says there should be enough to run firefox without running into fragmentation issues.
In analyzing this bug with glandium, we realize we probably want multiple patches that would make this sort of bug easier to diagnose and fix in general. In doing this, we may eliminate this bug as well. Each of these action items probably need to be tracked in their own bug.
-
The assert comes from this line of the
AudioDecoderInputTrack::EnsureTimeStretcher
function and tries to create an rlbox sandbox . Ideally if the sandbox creation fails, we should be able to return an nsresult and fail gracefully, but unfortunately,AudioDecoderInputTrack::EnsureTimeStretcher
and at least a couple of layers of functions above it don't have any way to bail out with an nsresult. I think we need some suggestions from the media team on whether it is feasible to change this API to allow for graceful failure, or if this is code that is not written to handle graceful failure and needs to crash -
RLBox memory maps in /proc/pid are currently untagged. It would be helpful if these are tagged with both something like "rlbox wasm2c - expat" indicating both that this is an rlbox wasm memory as well as the type of sandbox that is being used for
-
The process's memory maps indicate that there are a lot of 8GB RLBox virtual memory regions (as expected) interspaced by 4GB fragmented memory that isn't usable by RLBox. It is possible that there is a bug in rlbox's aligned allocation code that is resulting in this pattern --- so this needs to be checked. If this is not a bug in rlbox's aligned allocation, then for some reason this is how the OS is creating allocations. This can be circumvented if rlbox is modified to use mmap hints and then carefully choosing hints to avoid fragmentation, but this would require more complication to the aligned allocation code which I would like to avoid if possible.
-
There seem to be a lot of "active" RLBox memories in this region. I'm seeing at least 39 sandboxes. It's not super clear which sandboxes are being created (point 2 will help with this) and why there are so many active sandboxes. The best guess is probably expat since XML parsing is the most commonly run rlboxed workload. One possibility is that the rlbox-sandbox-pool used for expat and for woff2. This pool is supposed to dispose of unused sandboxes, but perhaps this crash happens due to a bug somewhere in there? Sandbox pool link and expat use of pool link
Comment 2•21 days ago
|
||
(In reply to Shravan Narayan from comment #1)
- The assert comes from this line of the
AudioDecoderInputTrack::EnsureTimeStretcher
function and tries to create an rlbox sandbox . Ideally if the sandbox creation fails, we should be able to return an nsresult and fail gracefully, but unfortunately,AudioDecoderInputTrack::EnsureTimeStretcher
and at least a couple of layers of functions above it don't have any way to bail out with an nsresult. I think we need some suggestions from the media team on whether it is feasible to change this API to allow for graceful failure, or if this is code that is not written to handle graceful failure and needs to crash
Sure, but it means something won't work, and it will lead to unexpected behaviour. We can do this if it's better than just crashing. Please file and NI me if you need us to do this.
- There seem to be a lot of "active" RLBox memories in this region. I'm seeing at least 39 sandboxes. It's not super clear which sandboxes are being created (point 2 will help with this) and why there are so many active sandboxes. The best guess is probably expat since XML parsing is the most commonly run rlboxed workload. One possibility is that the rlbox-sandbox-pool used for expat and for woff2. This pool is supposed to dispose of unused sandboxes, but perhaps this crash happens due to a bug somewhere in there? Sandbox pool link and expat use of pool link
If you create 39 media elements and time stretch them, and connect them to an AudioContext
we'll run this: https://searchfox.org/mozilla-central/source/dom/media/mediasink/AudioDecoderInputTrack.cpp#623 39 times, and I don't know if this leads to 39 sandboxes.
But it's trivial to do in a web page.
Updated•21 days ago
|
Assignee | ||
Comment 3•16 days ago
•
|
||
Ah gotcha. Thanks for the info. Yup, invoking the ensuretimestretcher function 39 times will create 39 sandboxes if each invocation is on a different object
I'll try to follow-up and try to find a repro of this.
Sure, but it means something won't work, and it will lead to unexpected behaviour. We can do this if it's better than just crashing. Please file and NI me if you need us to do this.
I think it would makes sense to show a broken media icon but not crash the tab. If this sounds like reasonable behavior to you, this would be good to set up via nsresult. I can then modify the sandbox creation failure to return an error codes and fail gracefully.
Comment 4•15 days ago
|
||
I'm told by :pehrsons
than the right way to do this is to inject the TimeStretcher
(that shouldn't have to reallocate) around here, that is main thread: https://searchfox.org/mozilla-central/rev/9c395b6371eaea0d15f9c8a37889022be350cf0b/dom/media/mediasink/AudioDecoderInputTrack.cpp#33, and then passed across runnable until its current use.
I lack the time to do it though, sorry.
Assignee | ||
Comment 5•3 days ago
|
||
Ok, I am making some progress on this issue now.
- RLBox memory maps in /proc/pid are currently untagged. It would be helpful if these are tagged with both something like "rlbox wasm2c - expat" indicating both that this is an rlbox wasm memory as well as the type of sandbox that is being used for
I have added this support in rlbox_wasm2c_sandbox and will submit a phab shortly.
- The process's memory maps indicate that there are a lot of 8GB RLBox virtual memory regions (as expected) interspaced by 4GB fragmented memory that isn't usable by RLBox. It is possible that there is a bug in rlbox's aligned allocation code that is resulting in this pattern
This has also been fixed. This is not a "bug" in rlbox_wasm2c_sandbox's aligned allocation, but rather an artifact of how it does the aligning. Wasm has 8GB allocations, and rlbox_wasm2c requires this to be aligned to 4GB address. So it creates a 12GB allocation and finds the first location that is 4GB is aligned in this space (and has 8GB size). The remaining space is discarded/unmapped. However, when done sequentially this will result in unmapping of 4GB chunks which become unusable for future sandboxes. Instead, RLBox should create an 8GB allocation aligned to 8GB to avoid internal fragmentation.
- Ideally if the sandbox creation fails, we should be able to return an nsresult and fail gracefully, but unfortunately, AudioDecoderInputTrack::EnsureTimeStretcher and at least a couple of layers of functions above it don't have any way to bail out with an nsresult.
@padenot: I understand that the more complicated fix with injecting TimeStretcher won't be possible due to time constraints. However, would a fix of being able to return an "nsresult" from SoundTouch (which would then either not stretch the media or show a broken media symbol) be possible? (or would this also take too much time given your current commitments?)
Assignee | ||
Comment 6•3 days ago
|
||
Updated•3 days ago
|
Assignee | ||
Comment 7•3 days ago
|
||
Description
•