Closed Bug 1648309 Opened 5 years ago Closed 4 years ago

AudioContext::DecodeAudioData fails with too many concurrent decode jobs

Tracking

()

Status:

RESOLVED FIXED

Milestone:

83 Branch

Tracking Flags:

Tracking

Status

firefox-esr78

---

wontfix

firefox82

---

wontfix

firefox83

---

fixed

People

(Reporter: rpsalmi, Assigned: jya)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(6 files)

Log output when executing the given example 5 years ago Roope Salmi 1.04 MB, text/x-log		Details
An example WAV file that this reproduces on. 5 years ago Roope Salmi 1.10 MB, audio/x-wav		Details
Bug 1648309 - P1. Add ShmemRecycleAllocator class. r?mattwoodrow 4 years ago Jean-Yves Avenard [:jya] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1648309 - P2. Add RemoteArrayOfByteBuffer and ArrayOfRemoteMediaRawData objects. r?mattwoodrow,padenot,mjf 4 years ago Jean-Yves Avenard [:jya] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1648309 - P3. Use ArrayOfRemoteMediaRawData inplace of an actual array. r?padenot,mjf 4 years ago Jean-Yves Avenard [:jya] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1648309 - P4. Add ArrayOfRemoteAudioData and use it. r?mjf,padenot 4 years ago Jean-Yves Avenard [:jya] 47 bytes, text/x-phabricator-request		Details \| Review

Roope Salmi

Reporter

Description

•

5 years ago

Attached file Log output when executing the given example — Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:75.0) Gecko/20100101 Firefox/75.0

Steps to reproduce:

Try to decode multiple (around 60) audio buffers concurrently. For example, as with the following JS:

const ctx = new AudioContext();
const N = 80;
fetch("audio.wav").then((r) => { return r.arrayBuffer() }).then((ab) => {
  for (let i = 0; i < N; i++) {
    ctx.decodeAudioData(ab.slice(0))
    .catch((e) => {
      document.body.innerHTML += e + "<br>";
    });
  }
});

I tried this with a ~1MB WAV file:
https://ollpu.fi/decodeAudioData_demo.html
https://ollpu.fi/audio.wav

A similar scenario worked fine on a Mac, so this is most likely Linux specific.

Actual results:

Some or all of the audio decode jobs fail. Error messages "The buffer passed to decodeAudioData contains invalid content which cannot be decoded successfully." are emitted to the console. The catch clause is executed and error messages "
EncodingError: The given encoding is not supported." are shown on the page.

In addition, a lot of IPC-related errors are logged in the terminal, which makes me believe some message queues are overflowing. Using MOZ_LOG="MediaDecoder:5" didn't seem to add anything insightful. I've attached the log file.

Expected results:

No error messages and successful decoding of all files. If all jobs can't be processed immediately, the jobs should wait instead of failing.

Roope Salmi

Reporter

Comment 1

•

5 years ago

Attached audio An example WAV file that this reproduces on. — Details

Roope Salmi

Reporter

Updated

•

5 years ago

OS: Unspecified → Linux

Hardware: Unspecified → x86_64

Alice0775 White

Updated

•

5 years ago

Component: Untriaged → Web Audio

Product: Firefox → Core

Roope Salmi

Reporter

Comment 2

•

5 years ago

Can also confirm that the issue exist on version 77.0.1.

Roope Salmi

Reporter

Updated

•

5 years ago

Version: 75 Branch → 77 Branch

Matthew Gregan [:kinetik]

Comment 3

•

5 years ago

Thanks for filing this, I can reproduce it locally with decodeAudioData_demo.html.

It looks like a resource limit issue, specifically we're running out of file descriptors while allocating shmem segments for RDD via this stack:

#19 0x00007ff97e68dd8b in base::SharedMemory::CreateInternal(unsigned long, bool) (this=0x7ff94730a260, size=4096, freezeable=false)
    at /home/mjg/src/mozilla/ipc/chromium/src/base/shared_memory_posix.cc:246
#20 0x00007ff97dc88ebf in base::SharedMemory::Create(unsigned long)
    (this=0x7ff94730a260, size=4096)
    at /home/mjg/src/mozilla/ipc/chromium/src/base/shared_memory.h:70
#21 0x00007ff97dc88c4e in mozilla::ipc::SharedMemoryBasic::Create(unsigned long) (this=0x7ff94730a240, aNbytes=4096)
    at /home/mjg/src/mozilla/obj-x86_64-pc-linux-gnu/dist/include/mozilla/ipc/SharedMemoryBasic_chromium.h:37
#22 0x00007ff97e7889dd in mozilla::ipc::CreateSegment(mozilla::ipc::SharedMemory::SharedMemoryType, unsigned long, unsigned long)
    (aType=mozilla::ipc::SharedMemory::TYPE_BASIC, aNBytes=768, aExtraSize=4)
    at /home/mjg/src/mozilla/ipc/glue/Shmem.cpp:77
#23 0x00007ff97e7888a9 in mozilla::ipc::Shmem::Alloc(mozilla::ipc::Shmem::PrivateIPDLCaller, unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, bool, bool) (aNBytes=768, aType=mozilla::ipc::SharedMemory::TYPE_BASIC)
    at /home/mjg/src/mozilla/ipc/glue/Shmem.cpp:383
#24 0x00007ff97e76c2fe in mozilla::ipc::IToplevelProtocol::CreateSharedMemory(unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, bool, int*)
    (this=0x7ff9623d9800, aSize=768, aType=mozilla::ipc::SharedMemory::TYPE_BASIC, aUnsafe=true, aId=0x7ff95e1c5ee4)
    at /home/mjg/src/mozilla/ipc/glue/ProtocolUtils.cpp:685
#25 0x00007ff97e76c27e in mozilla::ipc::IProtocol::CreateSharedMemory(unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, bool, int*)
    (this=0x7ff966683340, aSize=768, aType=mozilla::ipc::SharedMemory::TYPE_BASIC, aUnsafe=true, aId=0x7ff95e1c5ee4)
    at /home/mjg/src/mozilla/ipc/glue/ProtocolUtils.cpp:301
#26 0x00007ff97e76d2bd in mozilla::ipc::IProtocol::AllocUnsafeShmem(unsigned long, mozilla::ipc::SharedMemory::SharedMemoryType, mozilla::ipc::Shmem*)
    (this=0x7ff966683340, aSize=768, aType=mozilla::ipc::SharedMemory::TYPE_BASIC, aOutMem=0x7ff947306010)
    at /home/mjg/src/mozilla/ipc/glue/ProtocolUtils.cpp:413
#27 0x00007ff981f6fe8c in mozilla::ShmemPool::AllocateShmem<mozilla::RemoteDecoderChild>(mozilla::RemoteDecoderChild*, unsigned long, mozilla::ShmemBuffer&, mozilla::ShmemPool::AllocationPolicy) (this=0x7ff966683400,
    aInstance=0x7ff966683340, aSize=768, aRes=..., aPolicy=mozilla::ShmemPool::AllocationPolicy::Unsafe)
    at /home/mjg/src/mozilla/obj-x86_64-pc-linux-gnu/dist/include/mozilla/ShmemPool.h:167
#28 0x00007ff981f6dffc in mozilla::ShmemPool::Get<mozilla::RemoteDecoderChild>(mozilla::RemoteDecoderChild*, unsigned long, mozilla::ShmemPool::AllocationPolicy)
    (this=0x7ff966683400, aInstance=0x7ff966683340, aSize=768, aPolicy=mozilla::ShmemPool::AllocationPolicy::Unsafe)
    at /home/mjg/src/mozilla/obj-x86_64-pc-linux-gnu/dist/include/mozilla/ShmemPool.h:119
#29 0x00007ff981f66174 in mozilla::RemoteDecoderChild::Decode(nsTArray<RefPtr<mozilla::MediaRawData> > const&)
    (this=0x7ff966683340, aSamples=const nsTArray<RefPtr<mozilla::MediaRawData> > & = {...}) at /home/mjg/src/mozilla/dom/media/ipc/RemoteDecoderChild.cpp:109

Disabling RDD by setting media.rdd-process.enabled to false in about:config allows decodeAudioData_demo.html to complete successfully (for the default N = 80, I imagine we may reach various limits for higher N), so unclear if RDD is the root cause or if we need to address this at some other level.

Severity: -- → S3

Status: UNCONFIRMED → NEW

Ever confirmed: true

Flags: needinfo?(mfroman)

Priority: -- → P3

Karl Tomlinson (:karlt)

Comment 4

•

5 years ago

Perhaps https://bugzilla.mozilla.org/show_bug.cgi?id=1453735#c1 may be relevant here.

Comment 5

•

5 years ago

•

Edited

This isn't strictly RDD related. However, webaudio decodes are processed in batches to limit the number of IPC calls to RDD. The pref that controls the webaudio batch size is media.rdd-webaudio.batch.size. I tried this on my linux machine and can repro the problem with the default pref setting (100), but it appears to work with the pref set to 60 (I just picked a lower number). Lowering this setting directly affects the decode performance for webaudio, so it may require some trials to figure out a better setting.

Setting this pref lower can help solve the immediate issue, but it seems like there is an underlying shmem issue at work here.

For reference, the batch decoding was added in Bug 1568058 to deal with performance problems.

Flags: needinfo?(mfroman)

Michael Froman [:mjf]

Comment 6

•

5 years ago

•

Edited

sorry - replied instead of edited the original comment

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 7

•

5 years ago

The current file descriptor limit is 4096 per process (and on some distributions that's the hard limit), but the rest of bug 1453735 comment #1 is still accurate.

So, 100 shared memory segments in flight at once shouldn't be a problem, but if each of those decodes is using many shared memory segments at once, then it would make sense that we'd have fd exhaustion problems. In that case we'd need to see if the shared memory segments could be consolidated somehow (or cached, as GMP does), or if this could be a use case for something like Chromium's data pipes.

Michael Froman [:mjf]

Comment 8

•

5 years ago

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #7)

The current file descriptor limit is 4096 per process (and on some distributions that's the hard limit), but the rest of bug 1453735 comment #1 is still accurate.

So, 100 shared memory segments in flight at once shouldn't be a problem, but if each of those decodes is using many shared memory segments at once, then it would make sense that we'd have fd exhaustion problems. In that case we'd need to see if the shared memory segments could be consolidated somehow (or cached, as GMP does), or if this could be a use case for something like Chromium's data pipes.

Two things of note: The '100' is not requests inflight at once, it is how many samples are batched together per request. So size of segments, not number. There is a ShmemPool used by RDD as well, to keep from thrashing shmem alloc/dealloc. That could possibly be at play here, but because changing the batch size makes an difference, it doesn't seem as if the pool size is the issue.

Roope Salmi

Reporter

Comment 9

•

5 years ago

I did some quick testing with my archaic method of watch -n 0.1 "ls /proc/pid/fd | wc -l", and observed the 4096 limit being hit. With N = 20 I already got over a thousand fds, but a more precise method would be needed to see how much exactly. (Tried stracing but there was no hint of file descriptors in the thousands. I'm probably missing something about how they're opened.)
This was in the relevant worker process, not the main process or the RDD process, they don't seem to open very many fds at all.

Michael Froman [:mjf]

Comment 10

•

4 years ago

I took a moment to check on this - we're creating 80 RemoteDecoderChild objects that each will attempt to decode samples in batches of 100 (in order to optimize the number of IPC calls). Each sample is contained in a separate ShmemBuffer[1]. That means we're attempting to create 8000 ShmemBuffers in one process. We'll need to decide whether having 80 decoders working concurrently in a process is something we should support. If so, we'll need to rework how the samples are transmitted across IPC to RDD.

In streaming audio, this isn't an issue, but for webaudio decodes, the decoding happens all at once. Bug 1590475 added support for batch decoding (to support fixing 1568058). Reducing the batch size here could help, but will 1) slow down the batch decodes; 2) only work until someone tries creating more decoders in the same process.

[1] https://searchfox.org/mozilla-central/rev/8a0745cd346f0cfb89ae71690babbf7bff706113/dom/media/ipc/RemoteDecoderChild.cpp#105-126

Michael Froman [:mjf]

Updated

•

4 years ago

Regressed by: 1568058

Paul Adenot (:padenot)

Comment 11

•

4 years ago

I don't understand why we're creating new shmems instead of either multiplexing or using a cross process ring buffer of some sort. Is it because nobody's done it?

BugBot [:suhaib / :marco/ :calixte]

Updated

•

4 years ago

Keywords: regression

Michael Froman [:mjf]

Comment 12

•

4 years ago

I believe we have 80 decoders running simultaneously, and each RemoteDecoderChild/RemoteDecoderParent pair needs it's own set of buffers for passing raw and decoded samples between the content process and RDD. My guess is that when :jya added the ability for batch processing, running 80 decoders at the same time wasn't envisioned. Comment 7 seems to say we'll have a 4096 limit on simultaneous decodes even if we can optimize this to each decoder child/parent pair using a single shmem.

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 13

•

4 years ago

(In reply to Michael Froman [:mjf] from comment #10)

I took a moment to check on this - we're creating 80 RemoteDecoderChild objects that each will attempt to decode samples in batches of 100 (in order to optimize the number of IPC calls). Each sample is contained in a separate ShmemBuffer[1]. That means we're attempting to create 8000 ShmemBuffers in one process.

Creating a Shmem sends an IPC message (with ID SHMEM_CREATED_MESSAGE_TYPE), which goes through essentially the same path as normal messages and presumably has similar overhead, so the batching may not be saving much.

[1] https://searchfox.org/mozilla-central/rev/8a0745cd346f0cfb89ae71690babbf7bff706113/dom/media/ipc/RemoteDecoderChild.cpp#105-126

This helps clarify something I was confused about: “samples” in this context seems to mean something like separately decodeable packets, not individual PCM samples.

--

(In reply to Paul Adenot (:padenot) from comment #11)

I don't understand why we're creating new shmems instead of either multiplexing or using a cross process ring buffer of some sort. Is it because nobody's done it?

It's been done but not generically. The WebGL remoting project has one (talk to :handyman), and I think <canvas> remoting on Windows also has one (talk to :bobowen). There were some issues around signaling; we don't have a CrossProcessSemaphore implementation that works on macOS.

Related work also includes Chromium's new IPC system and its data pipes; interestingly, from looking at the code, they appear to signal changes to how much of the buffer is in use simply by sending IPC messages, although their messaging is more efficient than ours.

Michael Froman [:mjf]

Comment 14

•

4 years ago

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #13)

(In reply to Michael Froman [:mjf] from comment #10)

I took a moment to check on this - we're creating 80 RemoteDecoderChild objects that each will attempt to decode samples in batches of 100 (in order to optimize the number of IPC calls). Each sample is contained in a separate ShmemBuffer[1]. That means we're attempting to create 8000 ShmemBuffers in one process.

Creating a Shmem sends an IPC message (with ID SHMEM_CREATED_MESSAGE_TYPE), which goes through essentially the same path as normal messages and presumably has similar overhead, so the batching may not be saving much.

Keep in mind, we're reusing the shmems in a pool within each decoder, so we only create them once for each decoder. At least at the time, the data I collected in https://bugzilla.mozilla.org/show_bug.cgi?id=1568058#c20 shows that it was a signifcant performance improvement.

Michael Froman [:mjf]

Updated

•

4 years ago

Regressed by: 1590475
No longer regressed by: 1568058

BMO Automation

Updated

•

4 years ago

Has Regression Range: --- → yes

Jean-Yves Avenard [:jya]

Assignee

Comment 15

•

4 years ago

(In reply to Paul Adenot (:padenot) from comment #11)

I don't understand why we're creating new shmems instead of either multiplexing or using a cross process ring buffer of some sort. Is it because nobody's done it?

We are, kind of.

We use a single shmem pool per RemoteDecoderChild/RemoteDecoderParent pair and shmem are re-used over and over. However, right now we will create RemoteAudioDataIPDL object for each audio block.

What we could do instead is concatenate all those audio frames together so we do use a single shmem per batch, or change the data type so that we continue to have multiple packets of audio (Rather than a very large one); they just all fit within a single shmem.

Jean-Yves Avenard [:jya]

Assignee

Comment 16

•

4 years ago

In bug 1663227, I have to modify how we exchange data between the parent and child as we need to handle the lifecycle of the images differently to what we currently have.

This requires the need of a custom IPDL object with a proper destructor that would notify the parent that the wrapped object is no longer needed.

This could give an easy basis to allocate a single shmem for all the AudioData objects at no extra cost.

Comment 17

•

4 years ago

Going to take that one as it's blocking me with bug 1595994

Assignee: nobody → jyavenard

Jean-Yves Avenard [:jya]

Assignee

Comment 18

•

4 years ago

Attached file Bug 1648309 - P1. Add ShmemRecycleAllocator class. r?mattwoodrow — Details

Jean-Yves Avenard [:jya]

Assignee

Comment 19

•

4 years ago

Attached file Bug 1648309 - P2. Add RemoteArrayOfByteBuffer and ArrayOfRemoteMediaRawData objects. r?mattwoodrow,padenot,mjf — Details

Those two objects can be used to pack multiple array of objects into a minimal amount of shmem.

An ArrayOfRemoteByteBuffer will take at most a single shmem and perform a single memory allocation..
Similarly, an ArrayOfMediaRawData will pack multiple MediaRawData in at most 3 shmems (one for each array of bytes a MediaRawData contains).

They are designed to work in combination with a ShmemPool which will own each of the allocated Shmem and so can be re-used over and over.

Depends on D91536

Jean-Yves Avenard [:jya]

Assignee

Comment 20

•

4 years ago

Attached file Bug 1648309 - P3. Use ArrayOfRemoteMediaRawData inplace of an actual array. r?padenot,mjf — Details

The reduce the amount of shmem allocations to 1 instead of 100 when used with webaudio.

We also fix the AudioTrimmer for remote decoders as the trimming information was lost over the IPC serialization.
Another issue corrected is that we no longer crash if the first MediaRawData decoded had an empty size.

Depends on D91537

Jean-Yves Avenard [:jya]

Assignee

Comment 21

•

4 years ago

Attached file Bug 1648309 - P4. Add ArrayOfRemoteAudioData and use it. r?mjf,padenot — Details

An ArrayOfRemoteAudioData pack all its AudioData objects into a single Shmem.
This Shmem will be-reused by the remote decoder over and over.
When used with webaudio, this reduces the number of memory allocation from 100 to 1 for each remote decoder.

Depends on D91538

Jean-Yves Avenard [:jya]

Assignee

Comment 22

•

4 years ago

It causes Chrome to crash too FWIW

Paul Adenot (:padenot)

Comment 23

•

4 years ago

In addition to making the test case work, those patches, applied to a linux build with default mozconfig (non-debug + opt) is about 25% faster [0] on https://chinmay.audio/decodethis/ than an official Nightly build, presumably PGOed.

[0]: Click test all then copy paste this in the console:

sum = 0; document.querySelectorAll("td").forEach((a, i) => { 
if ((i%10) == 9) {
 sum+=parseInt(a.textContent)
}}); sum

to sum the last columns.

Pulsebot

Comment 24

•

4 years ago

Pushed by jyavenard@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fa783cf37776 P1. Add ShmemRecycleAllocator class. r=mattwoodrow https://hg.mozilla.org/integration/autoland/rev/3eba7a72e498 P2. Add RemoteArrayOfByteBuffer and ArrayOfRemoteMediaRawData objects. r=padenot,mjf https://hg.mozilla.org/integration/autoland/rev/62d400bfeb95 P3. Use ArrayOfRemoteMediaRawData inplace of an actual array. r=padenot,mjf https://hg.mozilla.org/integration/autoland/rev/d21058e4bdeb P4. Add ArrayOfRemoteAudioData and use it. r=mjf,padenot

Jean-Yves Avenard [:jya]

Assignee

Updated

•

4 years ago

Blocks: 1667979

Dorel Luca [:dluca]

Comment 25

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/fa783cf37776
https://hg.mozilla.org/mozilla-central/rev/3eba7a72e498
https://hg.mozilla.org/mozilla-central/rev/62d400bfeb95
https://hg.mozilla.org/mozilla-central/rev/d21058e4bdeb

Status: NEW → RESOLVED

Closed: 4 years ago

status-firefox83: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 83 Branch

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

status-firefox82: --- → wontfix

status-firefox-esr78: --- → wontfix

Bryce Seager van Dyk [:bryce] (he/him) - Not reading bugmail

Updated

•

4 years ago

Depends on: 1710149

Bryce Seager van Dyk [:bryce] (he/him) - Not reading bugmail

Updated

•

4 years ago

No longer depends on: 1710149

You need to log in before you can comment on or make changes to this bug.