AudioContext::DecodeAudioData fails with too many concurrent decode jobs
Categories
(Core :: Web Audio, defect, P3)
Tracking
()
People
(Reporter: rpsalmi, Assigned: jya)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: regression)
Attachments
(6 files)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:75.0) Gecko/20100101 Firefox/75.0
Steps to reproduce:
Try to decode multiple (around 60) audio buffers concurrently. For example, as with the following JS:
const ctx = new AudioContext();
const N = 80;
fetch("audio.wav").then((r) => { return r.arrayBuffer() }).then((ab) => {
for (let i = 0; i < N; i++) {
ctx.decodeAudioData(ab.slice(0))
.catch((e) => {
document.body.innerHTML += e + "<br>";
});
}
});
I tried this with a ~1MB WAV file:
https://ollpu.fi/decodeAudioData_demo.html
https://ollpu.fi/audio.wav
A similar scenario worked fine on a Mac, so this is most likely Linux specific.
Actual results:
Some or all of the audio decode jobs fail. Error messages "The buffer passed to decodeAudioData contains invalid content which cannot be decoded successfully." are emitted to the console. The catch clause is executed and error messages "
EncodingError: The given encoding is not supported." are shown on the page.
In addition, a lot of IPC-related errors are logged in the terminal, which makes me believe some message queues are overflowing. Using MOZ_LOG="MediaDecoder:5" didn't seem to add anything insightful. I've attached the log file.
Expected results:
No error messages and successful decoding of all files. If all jobs can't be processed immediately, the jobs should wait instead of failing.
Reporter | ||
Comment 1•5 years ago
|
||
Reporter | ||
Updated•5 years ago
|
![]() |
||
Updated•5 years ago
|
Reporter | ||
Comment 2•5 years ago
|
||
Can also confirm that the issue exist on version 77.0.1.
Reporter | ||
Updated•5 years ago
|
Comment 3•5 years ago
|
||
Thanks for filing this, I can reproduce it locally with decodeAudioData_demo.html
.
It looks like a resource limit issue, specifically we're running out of file descriptors while allocating shmem segments for RDD via this stack:
#19 0x00007ff97e68dd8b in base::SharedMemory::CreateInternal(unsigned long, bool) (this=0x7ff94730a260, size=4096, freezeable=false)
at /home/mjg/src/mozilla/ipc/chromium/src/base/shared_memory_posix.cc:246
#20 0x00007ff97dc88ebf in base::SharedMemory::Create(unsigned long)
(this=0x7ff94730a260, size=4096)
at /home/mjg/src/mozilla/ipc/chromium/src/base/shared_memory.h:70
#21 0x00007ff97dc88c4e in mozilla::ipc::SharedMemoryBasic::Create(unsigned long) (this=0x7ff94730a240, aNbytes=4096)
at /home/mjg/src/mozilla/obj-x86_64-pc-linux-gnu/dist/include/mozilla/ipc/SharedMemoryBasic_chromium.h:37
#22 0x00007ff97e7889dd in mozilla::ipc::CreateSegment(mozilla::ipc::SharedMemory::SharedMemoryType, unsigned long, unsigned long)
(aType=mozilla::ipc::SharedMemory::TYPE_BASIC, aNBytes=768, aExtraSize=4)
at /home/mjg/src/mozilla/ipc/glue/Shmem.cpp:77
#23 0x00007ff97e7888a9 in mozilla::ipc::Shmem::Alloc(mozilla::ipc::Shmem::PrivateIPDLCaller, unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, bool, bool) (aNBytes=768, aType=mozilla::ipc::SharedMemory::TYPE_BASIC)
at /home/mjg/src/mozilla/ipc/glue/Shmem.cpp:383
#24 0x00007ff97e76c2fe in mozilla::ipc::IToplevelProtocol::CreateSharedMemory(unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, bool, int*)
(this=0x7ff9623d9800, aSize=768, aType=mozilla::ipc::SharedMemory::TYPE_BASIC, aUnsafe=true, aId=0x7ff95e1c5ee4)
at /home/mjg/src/mozilla/ipc/glue/ProtocolUtils.cpp:685
#25 0x00007ff97e76c27e in mozilla::ipc::IProtocol::CreateSharedMemory(unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, bool, int*)
(this=0x7ff966683340, aSize=768, aType=mozilla::ipc::SharedMemory::TYPE_BASIC, aUnsafe=true, aId=0x7ff95e1c5ee4)
at /home/mjg/src/mozilla/ipc/glue/ProtocolUtils.cpp:301
#26 0x00007ff97e76d2bd in mozilla::ipc::IProtocol::AllocUnsafeShmem(unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, mozilla::ipc::Shmem*)
(this=0x7ff966683340, aSize=768, aType=mozilla::ipc::SharedMemory::TYPE_BASIC, aOutMem=0x7ff947306010)
at /home/mjg/src/mozilla/ipc/glue/ProtocolUtils.cpp:413
#27 0x00007ff981f6fe8c in mozilla::ShmemPool::AllocateShmem<mozilla::RemoteDecoderChild>(mozilla::RemoteDecoderChild*, unsigned long, mozilla::ShmemBuffer&, mozilla::ShmemPool::AllocationPolicy) (this=0x7ff966683400,
aInstance=0x7ff966683340, aSize=768, aRes=..., aPolicy=mozilla::ShmemPool::AllocationPolicy::Unsafe)
at /home/mjg/src/mozilla/obj-x86_64-pc-linux-gnu/dist/include/mozilla/ShmemPool.h:167
#28 0x00007ff981f6dffc in mozilla::ShmemPool::Get<mozilla::RemoteDecoderChild>(mozilla::RemoteDecoderChild*, unsigned long, mozilla::ShmemPool::AllocationPolicy)
(this=0x7ff966683400, aInstance=0x7ff966683340, aSize=768, aPolicy=mozilla::ShmemPool::AllocationPolicy::Unsafe)
at /home/mjg/src/mozilla/obj-x86_64-pc-linux-gnu/dist/include/mozilla/ShmemPool.h:119
#29 0x00007ff981f66174 in mozilla::RemoteDecoderChild::Decode(nsTArray<RefPtr<mozilla::MediaRawData> > const&)
(this=0x7ff966683340, aSamples=const nsTArray<RefPtr<mozilla::MediaRawData> > & = {...}) at /home/mjg/src/mozilla/dom/media/ipc/RemoteDecoderChild.cpp:109
Disabling RDD by setting media.rdd-process.enabled
to false
in about:config allows decodeAudioData_demo.html
to complete successfully (for the default N = 80
, I imagine we may reach various limits for higher N
), so unclear if RDD is the root cause or if we need to address this at some other level.
Comment 4•5 years ago
|
||
Perhaps https://bugzilla.mozilla.org/show_bug.cgi?id=1453735#c1 may be relevant here.
Comment 5•5 years ago
•
|
||
This isn't strictly RDD related. However, webaudio decodes are processed in batches to limit the number of IPC calls to RDD. The pref that controls the webaudio batch size is media.rdd-webaudio.batch.size. I tried this on my linux machine and can repro the problem with the default pref setting (100), but it appears to work with the pref set to 60 (I just picked a lower number). Lowering this setting directly affects the decode performance for webaudio, so it may require some trials to figure out a better setting.
Setting this pref lower can help solve the immediate issue, but it seems like there is an underlying shmem issue at work here.
For reference, the batch decoding was added in Bug 1568058 to deal with performance problems.
Comment 6•5 years ago
•
|
||
sorry - replied instead of edited the original comment
Comment 7•5 years ago
|
||
The current file descriptor limit is 4096 per process (and on some distributions that's the hard limit), but the rest of bug 1453735 comment #1 is still accurate.
So, 100 shared memory segments in flight at once shouldn't be a problem, but if each of those decodes is using many shared memory segments at once, then it would make sense that we'd have fd exhaustion problems. In that case we'd need to see if the shared memory segments could be consolidated somehow (or cached, as GMP does), or if this could be a use case for something like Chromium's data pipes.
Comment 8•5 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #7)
The current file descriptor limit is 4096 per process (and on some distributions that's the hard limit), but the rest of bug 1453735 comment #1 is still accurate.
So, 100 shared memory segments in flight at once shouldn't be a problem, but if each of those decodes is using many shared memory segments at once, then it would make sense that we'd have fd exhaustion problems. In that case we'd need to see if the shared memory segments could be consolidated somehow (or cached, as GMP does), or if this could be a use case for something like Chromium's data pipes.
Two things of note: The '100' is not requests inflight at once, it is how many samples are batched together per request. So size of segments, not number. There is a ShmemPool used by RDD as well, to keep from thrashing shmem alloc/dealloc. That could possibly be at play here, but because changing the batch size makes an difference, it doesn't seem as if the pool size is the issue.
Reporter | ||
Comment 9•5 years ago
|
||
I did some quick testing with my archaic method of watch -n 0.1 "ls /proc/pid/fd | wc -l"
, and observed the 4096 limit being hit. With N = 20
I already got over a thousand fds, but a more precise method would be needed to see how much exactly. (Tried stracing but there was no hint of file descriptors in the thousands. I'm probably missing something about how they're opened.)
This was in the relevant worker process, not the main process or the RDD process, they don't seem to open very many fds at all.
Comment 10•4 years ago
|
||
I took a moment to check on this - we're creating 80 RemoteDecoderChild objects that each will attempt to decode samples in batches of 100 (in order to optimize the number of IPC calls). Each sample is contained in a separate ShmemBuffer[1]. That means we're attempting to create 8000 ShmemBuffers in one process. We'll need to decide whether having 80 decoders working concurrently in a process is something we should support. If so, we'll need to rework how the samples are transmitted across IPC to RDD.
In streaming audio, this isn't an issue, but for webaudio decodes, the decoding happens all at once. Bug 1590475 added support for batch decoding (to support fixing 1568058). Reducing the batch size here could help, but will 1) slow down the batch decodes; 2) only work until someone tries creating more decoders in the same process.
Comment 11•4 years ago
|
||
I don't understand why we're creating new shmems instead of either multiplexing or using a cross process ring buffer of some sort. Is it because nobody's done it?
Updated•4 years ago
|
Comment 12•4 years ago
|
||
I believe we have 80 decoders running simultaneously, and each RemoteDecoderChild/RemoteDecoderParent pair needs it's own set of buffers for passing raw and decoded samples between the content process and RDD. My guess is that when :jya added the ability for batch processing, running 80 decoders at the same time wasn't envisioned. Comment 7 seems to say we'll have a 4096 limit on simultaneous decodes even if we can optimize this to each decoder child/parent pair using a single shmem.
Comment 13•4 years ago
|
||
(In reply to Michael Froman [:mjf] from comment #10)
I took a moment to check on this - we're creating 80 RemoteDecoderChild objects that each will attempt to decode samples in batches of 100 (in order to optimize the number of IPC calls). Each sample is contained in a separate ShmemBuffer[1]. That means we're attempting to create 8000 ShmemBuffers in one process.
Creating a Shmem
sends an IPC message (with ID SHMEM_CREATED_MESSAGE_TYPE
), which goes through essentially the same path as normal messages and presumably has similar overhead, so the batching may not be saving much.
This helps clarify something I was confused about: “samples” in this context seems to mean something like separately decodeable packets, not individual PCM samples.
--
(In reply to Paul Adenot (:padenot) from comment #11)
I don't understand why we're creating new shmems instead of either multiplexing or using a cross process ring buffer of some sort. Is it because nobody's done it?
It's been done but not generically. The WebGL remoting project has one (talk to :handyman), and I think <canvas>
remoting on Windows also has one (talk to :bobowen). There were some issues around signaling; we don't have a CrossProcessSemaphore
implementation that works on macOS.
Related work also includes Chromium's new IPC system and its data pipes; interestingly, from looking at the code, they appear to signal changes to how much of the buffer is in use simply by sending IPC messages, although their messaging is more efficient than ours.
Comment 14•4 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #13)
(In reply to Michael Froman [:mjf] from comment #10)
I took a moment to check on this - we're creating 80 RemoteDecoderChild objects that each will attempt to decode samples in batches of 100 (in order to optimize the number of IPC calls). Each sample is contained in a separate ShmemBuffer[1]. That means we're attempting to create 8000 ShmemBuffers in one process.
Creating a
Shmem
sends an IPC message (with IDSHMEM_CREATED_MESSAGE_TYPE
), which goes through essentially the same path as normal messages and presumably has similar overhead, so the batching may not be saving much.
Keep in mind, we're reusing the shmems in a pool within each decoder, so we only create them once for each decoder. At least at the time, the data I collected in https://bugzilla.mozilla.org/show_bug.cgi?id=1568058#c20 shows that it was a signifcant performance improvement.
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 15•4 years ago
|
||
(In reply to Paul Adenot (:padenot) from comment #11)
I don't understand why we're creating new shmems instead of either multiplexing or using a cross process ring buffer of some sort. Is it because nobody's done it?
We are, kind of.
We use a single shmem pool per RemoteDecoderChild/RemoteDecoderParent pair and shmem are re-used over and over. However, right now we will create RemoteAudioDataIPDL object for each audio block.
What we could do instead is concatenate all those audio frames together so we do use a single shmem per batch, or change the data type so that we continue to have multiple packets of audio (Rather than a very large one); they just all fit within a single shmem.
Assignee | ||
Comment 16•4 years ago
|
||
In bug 1663227, I have to modify how we exchange data between the parent and child as we need to handle the lifecycle of the images differently to what we currently have.
This requires the need of a custom IPDL object with a proper destructor that would notify the parent that the wrapped object is no longer needed.
This could give an easy basis to allocate a single shmem for all the AudioData objects at no extra cost.
Assignee | ||
Comment 17•4 years ago
|
||
Going to take that one as it's blocking me with bug 1595994
Assignee | ||
Comment 18•4 years ago
|
||
Assignee | ||
Comment 19•4 years ago
|
||
Those two objects can be used to pack multiple array of objects into a minimal amount of shmem.
An ArrayOfRemoteByteBuffer will take at most a single shmem and perform a single memory allocation..
Similarly, an ArrayOfMediaRawData will pack multiple MediaRawData in at most 3 shmems (one for each array of bytes a MediaRawData contains).
They are designed to work in combination with a ShmemPool which will own each of the allocated Shmem and so can be re-used over and over.
Depends on D91536
Assignee | ||
Comment 20•4 years ago
|
||
The reduce the amount of shmem allocations to 1 instead of 100 when used with webaudio.
We also fix the AudioTrimmer for remote decoders as the trimming information was lost over the IPC serialization.
Another issue corrected is that we no longer crash if the first MediaRawData decoded had an empty size.
Depends on D91537
Assignee | ||
Comment 21•4 years ago
|
||
An ArrayOfRemoteAudioData pack all its AudioData objects into a single Shmem.
This Shmem will be-reused by the remote decoder over and over.
When used with webaudio, this reduces the number of memory allocation from 100 to 1 for each remote decoder.
Depends on D91538
Assignee | ||
Comment 22•4 years ago
|
||
It causes Chrome to crash too FWIW
Comment 23•4 years ago
|
||
In addition to making the test case work, those patches, applied to a linux build with default mozconfig (non-debug + opt) is about 25% faster [0] on https://chinmay.audio/decodethis/ than an official Nightly build, presumably PGOed.
[0]: Click test all
then copy paste this in the console:
sum = 0; document.querySelectorAll("td").forEach((a, i) => {
if ((i%10) == 9) {
sum+=parseInt(a.textContent)
}}); sum
to sum the last columns.
Comment 24•4 years ago
|
||
Comment 25•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/fa783cf37776
https://hg.mozilla.org/mozilla-central/rev/3eba7a72e498
https://hg.mozilla.org/mozilla-central/rev/62d400bfeb95
https://hg.mozilla.org/mozilla-central/rev/d21058e4bdeb
Updated•4 years ago
|
Description
•