Closed Bug 1648309 Opened 4 years ago Closed 4 years ago

AudioContext::DecodeAudioData fails with too many concurrent decode jobs

Categories

(Core :: Web Audio, defect, P3)

77 Branch
x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
83 Branch
Tracking Status
firefox-esr78 --- wontfix
firefox82 --- wontfix
firefox83 --- fixed

People

(Reporter: rpsalmi, Assigned: jya)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(6 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:75.0) Gecko/20100101 Firefox/75.0

Steps to reproduce:

Try to decode multiple (around 60) audio buffers concurrently. For example, as with the following JS:

const ctx = new AudioContext();
const N = 80;
fetch("audio.wav").then((r) => { return r.arrayBuffer() }).then((ab) => {
  for (let i = 0; i < N; i++) {
    ctx.decodeAudioData(ab.slice(0))
    .catch((e) => {
      document.body.innerHTML += e + "<br>";
    });
  }
});

I tried this with a ~1MB WAV file:
https://ollpu.fi/decodeAudioData_demo.html
https://ollpu.fi/audio.wav

A similar scenario worked fine on a Mac, so this is most likely Linux specific.

Actual results:

Some or all of the audio decode jobs fail. Error messages "The buffer passed to decodeAudioData contains invalid content which cannot be decoded successfully." are emitted to the console. The catch clause is executed and error messages "
EncodingError: The given encoding is not supported." are shown on the page.

In addition, a lot of IPC-related errors are logged in the terminal, which makes me believe some message queues are overflowing. Using MOZ_LOG="MediaDecoder:5" didn't seem to add anything insightful. I've attached the log file.

Expected results:

No error messages and successful decoding of all files. If all jobs can't be processed immediately, the jobs should wait instead of failing.

OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Component: Untriaged → Web Audio
Product: Firefox → Core

Can also confirm that the issue exist on version 77.0.1.

Version: 75 Branch → 77 Branch

Thanks for filing this, I can reproduce it locally with decodeAudioData_demo.html.

It looks like a resource limit issue, specifically we're running out of file descriptors while allocating shmem segments for RDD via this stack:

#19 0x00007ff97e68dd8b in base::SharedMemory::CreateInternal(unsigned long, bool) (this=0x7ff94730a260, size=4096, freezeable=false)
    at /home/mjg/src/mozilla/ipc/chromium/src/base/shared_memory_posix.cc:246
#20 0x00007ff97dc88ebf in base::SharedMemory::Create(unsigned long)
    (this=0x7ff94730a260, size=4096)
    at /home/mjg/src/mozilla/ipc/chromium/src/base/shared_memory.h:70
#21 0x00007ff97dc88c4e in mozilla::ipc::SharedMemoryBasic::Create(unsigned long) (this=0x7ff94730a240, aNbytes=4096)
    at /home/mjg/src/mozilla/obj-x86_64-pc-linux-gnu/dist/include/mozilla/ipc/SharedMemoryBasic_chromium.h:37
#22 0x00007ff97e7889dd in mozilla::ipc::CreateSegment(mozilla::ipc::SharedMemory::SharedMemoryType, unsigned long, unsigned long)
    (aType=mozilla::ipc::SharedMemory::TYPE_BASIC, aNBytes=768, aExtraSize=4)
    at /home/mjg/src/mozilla/ipc/glue/Shmem.cpp:77
#23 0x00007ff97e7888a9 in mozilla::ipc::Shmem::Alloc(mozilla::ipc::Shmem::PrivateIPDLCaller, unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, bool, bool) (aNBytes=768, aType=mozilla::ipc::SharedMemory::TYPE_BASIC)
    at /home/mjg/src/mozilla/ipc/glue/Shmem.cpp:383
#24 0x00007ff97e76c2fe in mozilla::ipc::IToplevelProtocol::CreateSharedMemory(unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, bool, int*)
    (this=0x7ff9623d9800, aSize=768, aType=mozilla::ipc::SharedMemory::TYPE_BASIC, aUnsafe=true, aId=0x7ff95e1c5ee4)
    at /home/mjg/src/mozilla/ipc/glue/ProtocolUtils.cpp:685
#25 0x00007ff97e76c27e in mozilla::ipc::IProtocol::CreateSharedMemory(unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, bool, int*)
    (this=0x7ff966683340, aSize=768, aType=mozilla::ipc::SharedMemory::TYPE_BASIC, aUnsafe=true, aId=0x7ff95e1c5ee4)
    at /home/mjg/src/mozilla/ipc/glue/ProtocolUtils.cpp:301
#26 0x00007ff97e76d2bd in mozilla::ipc::IProtocol::AllocUnsafeShmem(unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, mozilla::ipc::Shmem*)
    (this=0x7ff966683340, aSize=768, aType=mozilla::ipc::SharedMemory::TYPE_BASIC, aOutMem=0x7ff947306010)
    at /home/mjg/src/mozilla/ipc/glue/ProtocolUtils.cpp:413
#27 0x00007ff981f6fe8c in mozilla::ShmemPool::AllocateShmem<mozilla::RemoteDecoderChild>(mozilla::RemoteDecoderChild*, unsigned long, mozilla::ShmemBuffer&, mozilla::ShmemPool::AllocationPolicy) (this=0x7ff966683400,
    aInstance=0x7ff966683340, aSize=768, aRes=..., aPolicy=mozilla::ShmemPool::AllocationPolicy::Unsafe)
    at /home/mjg/src/mozilla/obj-x86_64-pc-linux-gnu/dist/include/mozilla/ShmemPool.h:167
#28 0x00007ff981f6dffc in mozilla::ShmemPool::Get<mozilla::RemoteDecoderChild>(mozilla::RemoteDecoderChild*, unsigned long, mozilla::ShmemPool::AllocationPolicy)
    (this=0x7ff966683400, aInstance=0x7ff966683340, aSize=768, aPolicy=mozilla::ShmemPool::AllocationPolicy::Unsafe)
    at /home/mjg/src/mozilla/obj-x86_64-pc-linux-gnu/dist/include/mozilla/ShmemPool.h:119
#29 0x00007ff981f66174 in mozilla::RemoteDecoderChild::Decode(nsTArray<RefPtr<mozilla::MediaRawData> > const&)
    (this=0x7ff966683340, aSamples=const nsTArray<RefPtr<mozilla::MediaRawData> > & = {...}) at /home/mjg/src/mozilla/dom/media/ipc/RemoteDecoderChild.cpp:109

Disabling RDD by setting media.rdd-process.enabled to false in about:config allows decodeAudioData_demo.html to complete successfully (for the default N = 80, I imagine we may reach various limits for higher N), so unclear if RDD is the root cause or if we need to address this at some other level.

Severity: -- → S3
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(mfroman)
Priority: -- → P3

This isn't strictly RDD related. However, webaudio decodes are processed in batches to limit the number of IPC calls to RDD. The pref that controls the webaudio batch size is media.rdd-webaudio.batch.size. I tried this on my linux machine and can repro the problem with the default pref setting (100), but it appears to work with the pref set to 60 (I just picked a lower number). Lowering this setting directly affects the decode performance for webaudio, so it may require some trials to figure out a better setting.

Setting this pref lower can help solve the immediate issue, but it seems like there is an underlying shmem issue at work here.

For reference, the batch decoding was added in Bug 1568058 to deal with performance problems.

Flags: needinfo?(mfroman)

sorry - replied instead of edited the original comment

The current file descriptor limit is 4096 per process (and on some distributions that's the hard limit), but the rest of bug 1453735 comment #1 is still accurate.

So, 100 shared memory segments in flight at once shouldn't be a problem, but if each of those decodes is using many shared memory segments at once, then it would make sense that we'd have fd exhaustion problems. In that case we'd need to see if the shared memory segments could be consolidated somehow (or cached, as GMP does), or if this could be a use case for something like Chromium's data pipes.

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #7)

The current file descriptor limit is 4096 per process (and on some distributions that's the hard limit), but the rest of bug 1453735 comment #1 is still accurate.

So, 100 shared memory segments in flight at once shouldn't be a problem, but if each of those decodes is using many shared memory segments at once, then it would make sense that we'd have fd exhaustion problems. In that case we'd need to see if the shared memory segments could be consolidated somehow (or cached, as GMP does), or if this could be a use case for something like Chromium's data pipes.

Two things of note: The '100' is not requests inflight at once, it is how many samples are batched together per request. So size of segments, not number. There is a ShmemPool used by RDD as well, to keep from thrashing shmem alloc/dealloc. That could possibly be at play here, but because changing the batch size makes an difference, it doesn't seem as if the pool size is the issue.

I did some quick testing with my archaic method of watch -n 0.1 "ls /proc/pid/fd | wc -l", and observed the 4096 limit being hit. With N = 20 I already got over a thousand fds, but a more precise method would be needed to see how much exactly. (Tried stracing but there was no hint of file descriptors in the thousands. I'm probably missing something about how they're opened.)
This was in the relevant worker process, not the main process or the RDD process, they don't seem to open very many fds at all.

I took a moment to check on this - we're creating 80 RemoteDecoderChild objects that each will attempt to decode samples in batches of 100 (in order to optimize the number of IPC calls). Each sample is contained in a separate ShmemBuffer[1]. That means we're attempting to create 8000 ShmemBuffers in one process. We'll need to decide whether having 80 decoders working concurrently in a process is something we should support. If so, we'll need to rework how the samples are transmitted across IPC to RDD.

In streaming audio, this isn't an issue, but for webaudio decodes, the decoding happens all at once. Bug 1590475 added support for batch decoding (to support fixing 1568058). Reducing the batch size here could help, but will 1) slow down the batch decodes; 2) only work until someone tries creating more decoders in the same process.

[1] https://searchfox.org/mozilla-central/rev/8a0745cd346f0cfb89ae71690babbf7bff706113/dom/media/ipc/RemoteDecoderChild.cpp#105-126

Regressed by: 1568058

I don't understand why we're creating new shmems instead of either multiplexing or using a cross process ring buffer of some sort. Is it because nobody's done it?

I believe we have 80 decoders running simultaneously, and each RemoteDecoderChild/RemoteDecoderParent pair needs it's own set of buffers for passing raw and decoded samples between the content process and RDD. My guess is that when :jya added the ability for batch processing, running 80 decoders at the same time wasn't envisioned. Comment 7 seems to say we'll have a 4096 limit on simultaneous decodes even if we can optimize this to each decoder child/parent pair using a single shmem.

(In reply to Michael Froman [:mjf] from comment #10)

I took a moment to check on this - we're creating 80 RemoteDecoderChild objects that each will attempt to decode samples in batches of 100 (in order to optimize the number of IPC calls). Each sample is contained in a separate ShmemBuffer[1]. That means we're attempting to create 8000 ShmemBuffers in one process.

Creating a Shmem sends an IPC message (with ID SHMEM_CREATED_MESSAGE_TYPE), which goes through essentially the same path as normal messages and presumably has similar overhead, so the batching may not be saving much.

[1] https://searchfox.org/mozilla-central/rev/8a0745cd346f0cfb89ae71690babbf7bff706113/dom/media/ipc/RemoteDecoderChild.cpp#105-126

This helps clarify something I was confused about: “samples” in this context seems to mean something like separately decodeable packets, not individual PCM samples.

--

(In reply to Paul Adenot (:padenot) from comment #11)

I don't understand why we're creating new shmems instead of either multiplexing or using a cross process ring buffer of some sort. Is it because nobody's done it?

It's been done but not generically. The WebGL remoting project has one (talk to :handyman), and I think <canvas> remoting on Windows also has one (talk to :bobowen). There were some issues around signaling; we don't have a CrossProcessSemaphore implementation that works on macOS.

Related work also includes Chromium's new IPC system and its data pipes; interestingly, from looking at the code, they appear to signal changes to how much of the buffer is in use simply by sending IPC messages, although their messaging is more efficient than ours.

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #13)

(In reply to Michael Froman [:mjf] from comment #10)

I took a moment to check on this - we're creating 80 RemoteDecoderChild objects that each will attempt to decode samples in batches of 100 (in order to optimize the number of IPC calls). Each sample is contained in a separate ShmemBuffer[1]. That means we're attempting to create 8000 ShmemBuffers in one process.

Creating a Shmem sends an IPC message (with ID SHMEM_CREATED_MESSAGE_TYPE), which goes through essentially the same path as normal messages and presumably has similar overhead, so the batching may not be saving much.

Keep in mind, we're reusing the shmems in a pool within each decoder, so we only create them once for each decoder. At least at the time, the data I collected in https://bugzilla.mozilla.org/show_bug.cgi?id=1568058#c20 shows that it was a signifcant performance improvement.

Regressed by: 1590475
No longer regressed by: 1568058
Has Regression Range: --- → yes

(In reply to Paul Adenot (:padenot) from comment #11)

I don't understand why we're creating new shmems instead of either multiplexing or using a cross process ring buffer of some sort. Is it because nobody's done it?

We are, kind of.

We use a single shmem pool per RemoteDecoderChild/RemoteDecoderParent pair and shmem are re-used over and over. However, right now we will create RemoteAudioDataIPDL object for each audio block.

What we could do instead is concatenate all those audio frames together so we do use a single shmem per batch, or change the data type so that we continue to have multiple packets of audio (Rather than a very large one); they just all fit within a single shmem.

In bug 1663227, I have to modify how we exchange data between the parent and child as we need to handle the lifecycle of the images differently to what we currently have.

This requires the need of a custom IPDL object with a proper destructor that would notify the parent that the wrapped object is no longer needed.

This could give an easy basis to allocate a single shmem for all the AudioData objects at no extra cost.

See Also: → 1663227

Going to take that one as it's blocking me with bug 1595994

Assignee: nobody → jyavenard

Those two objects can be used to pack multiple array of objects into a minimal amount of shmem.

An ArrayOfRemoteByteBuffer will take at most a single shmem and perform a single memory allocation..
Similarly, an ArrayOfMediaRawData will pack multiple MediaRawData in at most 3 shmems (one for each array of bytes a MediaRawData contains).

They are designed to work in combination with a ShmemPool which will own each of the allocated Shmem and so can be re-used over and over.

Depends on D91536

The reduce the amount of shmem allocations to 1 instead of 100 when used with webaudio.

We also fix the AudioTrimmer for remote decoders as the trimming information was lost over the IPC serialization.
Another issue corrected is that we no longer crash if the first MediaRawData decoded had an empty size.

Depends on D91537

An ArrayOfRemoteAudioData pack all its AudioData objects into a single Shmem.
This Shmem will be-reused by the remote decoder over and over.
When used with webaudio, this reduces the number of memory allocation from 100 to 1 for each remote decoder.

Depends on D91538

It causes Chrome to crash too FWIW

In addition to making the test case work, those patches, applied to a linux build with default mozconfig (non-debug + opt) is about 25% faster [0] on https://chinmay.audio/decodethis/ than an official Nightly build, presumably PGOed.

[0]: Click test all then copy paste this in the console:

sum = 0; document.querySelectorAll("td").forEach((a, i) => { 
if ((i%10) == 9) {
 sum+=parseInt(a.textContent)
}}); sum

to sum the last columns.

Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fa783cf37776
P1. Add ShmemRecycleAllocator class. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/3eba7a72e498
P2. Add RemoteArrayOfByteBuffer and ArrayOfRemoteMediaRawData objects. r=padenot,mjf
https://hg.mozilla.org/integration/autoland/rev/62d400bfeb95
P3. Use ArrayOfRemoteMediaRawData inplace of an actual array. r=padenot,mjf
https://hg.mozilla.org/integration/autoland/rev/d21058e4bdeb
P4. Add ArrayOfRemoteAudioData and use it. r=mjf,padenot
Blocks: 1667979
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: