This looks like a memory fragmentation issue. The error seems to come up in two spots - Windows 32-bit machines --- not a lot we can do here. But we do need or more graceful failure channel than crashing because we can't change audio speed :( - Android 64-bit machines --- these are machines with 39-bit virtual address spaces. So it is still a little limited compared to the 48-bit virtual address spaces in desktops, but intuition says there should be enough to run firefox without running into fragmentation issues. In analyzing this bug with glandium, we realize we probably want multiple patches that would make this sort of bug easier to diagnose and fix in general. In doing this, we may eliminate this bug as well. Each of these action items probably need to be tracked in their own bug. 1. The assert comes from [this](https://searchfox.org/mozilla-central/source/dom/media/mediasink/AudioDecoderInputTrack.cpp#625) line is part of the `AudioDecoderInputTrack::EnsureTimeStretcher` function and tries to create an rlbox sandbox . Ideally if the sandbox creation fails, we should be able to return an nsresult and fail gracefully, but unfortunately, `AudioDecoderInputTrack::EnsureTimeStretcher` and at least a couple of layers of functions above it don't have any way to bail out with an nsresult. I think we need some suggestions from the media team on whether it is feasible to change this API to allow for graceful failure, or if this is code that is not written to handle graceful failure and needs to crash 2. RLBox memory maps in /proc/pid are currently untagged. It would be helpful if these are tagged with both something like "rlbox wasm2c - expat" indicating both that this is an rlbox wasm memory as well as the type of sandbox that is being used for 3. The process's memory maps indicate that there are a lot of 8GB RLBox virtual memory regions (as expected) interspaced by 4GB fragmented memory that isn't usable by RLBox. It is possible that there is a bug in rlbox's aligned allocation code that is resulting in this pattern --- so this needs to be checked. If this is not a bug in rlbox's aligned allocation, then for some reason this is how the OS is creating allocations. This can be circumvented if rlbox is modified to use mmap hints and then carefully choosing hints to avoid fragmentation, but this would require more complication to the aligned allocation code which I would like to avoid if possible. 4. There seem to be a lot of "active" RLBox memories in this region. I'm seeing at least 39 sandboxes. It's not super clear which sandboxes are being created (point 2 will help with this) and why there are so many active sandboxes. The best guess is probably expat since XML parsing is the most commonly run rlboxed workload. One possibility is that the rlbox-sandbox-pool used for expat and for woff2. This pool is supposed to dispose of unused sandboxes, but perhaps this crash happens due to a bug somewhere in there? Sandbox pool [link](https://searchfox.org/mozilla-central/source/xpcom/base/RLBoxSandboxPool.h) and [expat use of pool link](https://searchfox.org/mozilla-central/source/parser/htmlparser/nsExpatDriver.h#218)
Bug 1948620 Comment 1 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
This looks like a memory fragmentation issue. The error seems to come up in two spots - Windows 32-bit machines --- not a lot we can do here. But we do need or more graceful failure channel than crashing because we can't change audio speed :( - Android 64-bit machines --- these are machines with 39-bit virtual address spaces. So it is still a little limited compared to the 48-bit virtual address spaces in desktops, but intuition says there should be enough to run firefox without running into fragmentation issues. In analyzing this bug with glandium, we realize we probably want multiple patches that would make this sort of bug easier to diagnose and fix in general. In doing this, we may eliminate this bug as well. Each of these action items probably need to be tracked in their own bug. 1. The assert comes from [this](https://searchfox.org/mozilla-central/source/dom/media/mediasink/AudioDecoderInputTrack.cpp#625) line of the `AudioDecoderInputTrack::EnsureTimeStretcher` function and tries to create an rlbox sandbox . Ideally if the sandbox creation fails, we should be able to return an nsresult and fail gracefully, but unfortunately, `AudioDecoderInputTrack::EnsureTimeStretcher` and at least a couple of layers of functions above it don't have any way to bail out with an nsresult. I think we need some suggestions from the media team on whether it is feasible to change this API to allow for graceful failure, or if this is code that is not written to handle graceful failure and needs to crash 2. RLBox memory maps in /proc/pid are currently untagged. It would be helpful if these are tagged with both something like "rlbox wasm2c - expat" indicating both that this is an rlbox wasm memory as well as the type of sandbox that is being used for 3. The process's memory maps indicate that there are a lot of 8GB RLBox virtual memory regions (as expected) interspaced by 4GB fragmented memory that isn't usable by RLBox. It is possible that there is a bug in rlbox's aligned allocation code that is resulting in this pattern --- so this needs to be checked. If this is not a bug in rlbox's aligned allocation, then for some reason this is how the OS is creating allocations. This can be circumvented if rlbox is modified to use mmap hints and then carefully choosing hints to avoid fragmentation, but this would require more complication to the aligned allocation code which I would like to avoid if possible. 4. There seem to be a lot of "active" RLBox memories in this region. I'm seeing at least 39 sandboxes. It's not super clear which sandboxes are being created (point 2 will help with this) and why there are so many active sandboxes. The best guess is probably expat since XML parsing is the most commonly run rlboxed workload. One possibility is that the rlbox-sandbox-pool used for expat and for woff2. This pool is supposed to dispose of unused sandboxes, but perhaps this crash happens due to a bug somewhere in there? Sandbox pool [link](https://searchfox.org/mozilla-central/source/xpcom/base/RLBoxSandboxPool.h) and [expat use of pool link](https://searchfox.org/mozilla-central/source/parser/htmlparser/nsExpatDriver.h#218)