Closed Bug 1523974 Opened 5 years ago Closed 5 years ago

Mark background tabs that are playing media as "active" for the Process Priority Manager

Categories

(Core :: DOM: Content Processes, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: mconley, Assigned: mconley)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files, 1 obsolete file)

In the event that background tabs are playing audio, assuming that audio is being decoded in the same process as that background tab (which I think still happens these days), I think lowering the priority of those processes might result in greater chances for stuttering and lag.

We should probably ensure that any processes that are playing media stay "active".

This is a complex issue, and requires breaking down the current and future setups that Firefox can use to playback audio, and also to categorize what type of things is being played back and why. I think there are three categories we care about:

Media playback can mean to simply listen to a media stream that has an audible content.

This works like this in Gecko:

  • A thread fills a buffer of compressed data, this can be:
    • xhr or fetch calls getting compressed data from an endpoint (this touches the main thread)
    • HTTP streaming, via a URL (this is off-main-thread)
  • Another thread takes this compressed data and decodes it into decoded buffers. This can be:
    • In the same process, in software
    • In another (dedicated, called Remote Data Decoder process) process
  • This data ends up in a queue. The size of this queue is configurable, it's a trade-off between latency, resilience against load and memory usage, roughly
  • Every now and then (in an isochronous fashion), a super high-priority thread (the highest on the system, even using different scheduler altogether sometimes) comes and get some data off this queue, sometimes processes things a bit, and hands the data to the system for play back.

This is the simple case of watching a movie or listening to a music track on youtube or spotify. We can buffer a lot ahead, there is no real real-time constraints: the content is there and we can grab a lot of it, and not glitch even if the threads are not being scheduled often enough and for long enough. We can lower the priority of most things, and it will work as long as the queues are big enough. There is however quite a lot of thread hops throughout this process. Lowering thread priorities of some threads will widen the standard deviation of a typical response time of those threads, so we must take care of bumping the buffer length, or we risk data under-runs.

However, sometimes, the buffer lengths are not in our control, sometimes, the app itself tries to have very low latency, while using high-latency constructs (say, a Facebook or Twitch live stream), and Firefox does not know (for now), that a particular site is trying to do this. This is hard, and had us not do any type of setTimeout backtround-tab throttling in the past for any tab that is doing anything with audio: setTimeout were only running once a second, and glitch were very frequent.

Playback can mean real time media playback:

This uses an AudioContext to process the audio in various ways, and requires tight scheduling on the main thread and worker processes. It's more or less the same setup as above, although the output latency is extremely small so when things starts going bad, it's extremely bad much faster. This is SoundCloud for example.

Media playback can be a WebRTC call. In this case, it's we have hard-real-time constraints:

  • WebCam and microphone must be recorded, encoded and sent out as fast as possible
  • Audible content must be received, decoded and played out as fast as possible

This involves a variety of processes and threads: some for networking, some for decoding, some for capturing the webcam, some for playing back audio, etc. This type of application has a very low tolerance for scheduling issues, and consumes lots of resources (encoding X times a video, decoding X times multiple videos, mixing a few audio streams with effects, recording a microphone while processing it for noise cancellation and echo cancellation, encoding and sending it out, etc.). It's however rather common to switch to other tabs during a call.

A safe policy would be to not touch the priority of a process that has an output stream opened. If it's not good enough, maybe we want to figure out something smarter.

cc'ing gsvelto, who I believe has an active interest in this area as well.

I noted this on the distribution list, but my understanding is that we we stop decoding video frames in background videos after X seconds, while the audio, which requires fewer CPU resources, continues.

Bug 1224973 is the one that enabled this functionality.

I've double-checked the code in the process priority manager and it still assigns a higher-than-background priority to tabs that are holding "cpu" or "high-priority" wake-locks [1]. Unfortunately this doesn't apply anymore to tabs playing audio or video, those use "audio-playing" [2] and "video-playing" [3] wake-locks. That code should be changed to reflect that and the old wake-lock topics should be removed since they're unused (at least from what I can tell). Once that's done we'll then need to change the mapping for the PROCESS_BACKGROUND_PERCEIVABLE_PRIORITY in the HAL [4]. We might use Windows' BELOW_NORMAL_PRIORITY_CLASS for background tabs that are playing music and IDLE_PRIORITY_CLASS for those that aren't.

Once this is done we might try using the PROCESS_MODE_BACKGROUND_BEGIN/PROCESS_MODE_BACKGROUND_END class [5] for background tabs. Compared to IDLE_PRIORITY_CLASS it lowers both CPU and I/O resource usage of the process. IDLE_PRIORITY_CLASS only lowers CPU priority.

[1] https://searchfox.org/mozilla-central/rev/78cd247b5d7a08832f87d786541d3e2204842e8e/dom/ipc/ProcessPriorityManager.cpp#732
[2] https://searchfox.org/mozilla-central/rev/78cd247b5d7a08832f87d786541d3e2204842e8e/dom/html/HTMLMediaElement.cpp#3808
[3] https://searchfox.org/mozilla-central/rev/78cd247b5d7a08832f87d786541d3e2204842e8e/dom/html/HTMLVideoElement.cpp#296
[4] https://searchfox.org/mozilla-central/rev/78cd247b5d7a08832f87d786541d3e2204842e8e/hal/windows/WindowsProcessPriority.cpp#27
[5] https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-setpriorityclass

It looks like this is being seen in the wild (bug 1524193).

(In reply to Aaron Klotz [:aklotz] from comment #6)

It looks like this is being seen in the wild (bug 1524193).

It's not surprising... With DOM timers throttling (as Paul mentioned in comment 1) over the years several times we have made changes to throttle more aggressively that didn't take audio playback into consideration assuming things will hopefully work out well and every time users ran into issues on sites/machines we didn't test. :-)

(In reply to Gabriele Svelto [:gsvelto] from comment #5)

That all sounds generally reasonable... so is it fair to say that, since the wake-lock topics aren't being used, there's currently nothing directly informing the ProcessPriorityManager that a tab has media playing, and that we'd have to build that in?

Flags: needinfo?(gsvelto)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #8)

(In reply to Gabriele Svelto [:gsvelto] from comment #5)

That all sounds generally reasonable... so is it fair to say that, since the wake-lock topics aren't being used, there's currently nothing directly informing the ProcessPriorityManager that a tab has media playing, and that we'd have to build that in?

Yes, since it has been unused for almost three years its tabs-playing-audio detection has bit-rotted. Fortunately fixing it should be trivial.

Flags: needinfo?(gsvelto)
Blocks: 1524193

(In reply to Gabriele Svelto [:gsvelto] from comment #9)

Fortunately fixing it should be trivial.

Well, that's good to hear. :)

From a preliminary scan of the wake lock and process priority manager code, it looks like we set the wake locks for audio and video in the content process, and map that to the window which requested the wake lock.

I see us going pretty deep into Hal once we register that wake lock, but I don't see anything sending a message up to the parent to actually put the lock in up there (presumably, the parent needs to know about the wake locks for them to have any effect).

Is that plumbing just missing? Or am I not seeing it?

Flags: needinfo?(gsvelto)

I think I'm wrong in comment 10 - I think there is glue proxying wakelock information up to the parent:

https://searchfox.org/mozilla-central/rev/e62b311b2e0e02633afda2004ba4056b09fbcaa4/hal/sandbox/SandboxHal.cpp#314

So we might have the pieces we need here already, and we'll need to update the Process Priority Manager to handle those media locks.

I've just double-checked and the wakelocks are being properly proxy'd to the parent process. We just need to wire it up. I can write a patch for this on Monday if it's not super urgent (it's getting late over here).

Flags: needinfo?(gsvelto)

I've got a patch that seems to do the job. Coming up.

Assignee: nobody → mconley

I didn't tear out the high-priority WakeLock support from ProcessPriorityManager, since it looks like there's more in ProcessPriorityManager that cares about high-priority processes (like mHighPriorityChildIDs). For "cpu", I noticed that Android seems to reference it: https://searchfox.org/mozilla-central/rev/e62b311b2e0e02633afda2004ba4056b09fbcaa4/mobile/android/geckoview/src/main/java/org/mozilla/gecko/GeckoAppShell.java#578

So I went for the simple thing, and just added video and audio WakeLock support. I imagine we can think about removing the others in a follow-up.

I went with BELOW_NORMAL_PRIORITY_CLASS for PROCESS_PRIORITY_BACKGROUND_PERCEIVABLE as per your recommendation.

Do we know the interaction between the Multimedia Class Scheduler Service-per thread priority settings (that is essential for any type of audio work) and SetPriorityClass process-wide priority setting?

In particular, Firefox always uses the class "Audio" for the audio processing thread. Not doing this results in unbearable artifacts even when the machine is very fast and idle.

How do we plan to validate what we're doing here? In general, saturating all CPU/disks/memory bandwidth with a special program (stress on osx and Linux for example, I don't know on Windows), playing back something in Firefox and listening for glitches is good enough to detect issues, but I can provide more advanced tools that we built to do regression testing and real-time code validation if need be.

Flags: needinfo?(mconley)

The patches are looking good, I believe that we can tear out the "high-priority" wakelock support because it's only use was to lower the priority of all processes which had not requested it in Firefox OS (ironically it was introduced as a workaround for an Android bug that was later solved, but we never removed it). The cpu wakelock seems to be still supported on Fennec but I'm not sure if it's used. I'll open separate bugs to investigate.

(In reply to Paul Adenot (:padenot) from comment #17)

Do we know the interaction between the [Multimedia Class Scheduler Service]-per thread priority settings (that is essential for any type of audio work) and SetPriorityClass process-wide priority setting?

I don't, and the documentation doesn't make it super clear, but I can make an educated guess. It seems as if the Multimedia Class Scheduler Service is ultimately a way of setting thread priority though, and the Windows scheduler seems to compute which thread gets the next task by looking at both thread and process priority.

This would suggest that a background tab with BELOW_NORMAL_PRIORITY_CLASS using the MCSS priority of "Audio" would have less priority than a tab with NORMAL_PRIORITY_CLASS and an MCSS priority of "Audio".

(In reply to Paul Adenot (:padenot) from comment #17)

How do we plan to validate what we're doing here? In general, saturating all CPU/disks/memory bandwidth with a special program (stress on osx and Linux for example, I don't know on Windows), playing back something in Firefox and listening for glitches is good enough to detect issues, but I can provide more advanced tools that we built to do regression testing and real-time code validation if need be.

I think testing out some of these tools on the reference hardware and ensuring similar playback characteristics sounds like a fine idea. How can I get my hands on them?

Flags: needinfo?(mconley)

ni? for comment 19.

Flags: needinfo?(padenot)

Hey gsvelto - I may have spotted a bug in my patches here... correct me if I'm wrong, but suppose we have an <audio> element and a <video> element playing in a tab, and that tab goes into the background.

The patches will put that background tab into BELOW_NORMAL_PRIORITY_CLASS. So far so good.

Now suppose the video ends, and it relinquishes its audio and video locks.

From what I can tell, an update will be broadcast up to the parent to update the wakelock states... the parent will hear that there's no video wakelock, but still an audio wakelock.

If the notification of the video wakelock going away comes second, won't my patch for ParticularProcessPriorityManager::Notify set mHoldsPlayingMediaWakeLock to false, and we'll set the background tab to PROCESS_PRIORITY_BACKGROUND?

If so, maybe I shouldn't have mHoldsPlayingMediaWakeLock bool that's the || of audio / video locks, but individual bools for audio and video.

Or is this not a problem for reasons I don't see?

Flags: needinfo?(gsvelto)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #21)

Hey gsvelto - I may have spotted a bug in my patches here... correct me if I'm wrong, but suppose we have an <audio> element and a <video> element playing in a tab, and that tab goes into the background.
[...]
If the notification of the video wakelock going away comes second, won't my patch for ParticularProcessPriorityManager::Notify set mHoldsPlayingMediaWakeLock to false, and we'll set the background tab to PROCESS_PRIORITY_BACKGROUND?

Now that you make me think about it I guess it's possible... I'm not 100% but better be safe than sorry and split the logic to use two booleans.

BTW we had tests that covered the PPM logic but I can't find them anymore so I guess they're gone. I'll try to dig them up and see if I can restore them; they would be useful to cover this kind of potential corner-cases.

Flags: needinfo?(gsvelto)

Okay, thanks - I've switched to a pair of bools then. I've also addressed your review comment. Does this still look okay to land?

Flags: needinfo?(gsvelto)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #23)

Okay, thanks - I've switched to a pair of bools then. I've also addressed your review comment. Does this still look okay to land?

Yep! Keeping my fingers crossed about BELOW_NORMAL_PRIORITY_CLASS being enough to fix the audio glitches.

Flags: needinfo?(gsvelto)

I'm using a build without these patches, and it's rather easy to reproduce these glitches when the process priority manager is enabled, and also programmatically detect them, on an XPS15 with a Core i7 7700HQ, by loading the CPU and memory a bit, and without stressing the IOs.

I'm dusting off a machine that's more like something one of our users would use to have another look, and I'm preparing a little patch and instructions on how measure this kind of things.

Flags: needinfo?(padenot)

This is pretty rough but gives good data.

To use this, apply the patch, and run Firefox (in opt mode, etc.), like so:

MOZ_DISABLE_CONTENT_SANDBOX=1 MOZ_LOG=MSGTracing:5,raw,sync MOZ_LOG_FILE=tracing.log ./mach run

and then do something that does audio but only once (the code is using globals for ergonomics). For example, this is one of the scenarios I've tested:

  • Run Firefox with the command above
  • Open a YouTube video that is a music video (kind of important, we don't want a video that has silence in it, because we would miss some glitches, a music video is often good enough)
  • Load the machine (I open tabs, run load testing tools, etc.)
  • After some time, close Firefox
  • Find the files named "tracing.log.child*", there is one that contains lots of data (it's json)
  • Load it into https://padenot.github.io/msg-load-analyzer/, this will compute a bunch of data, show graphs. If the curve crosses 1, the audio thread blew its budget.

The "Load" is a metric that is the ratio of the time it took to do the rendering of a buffer of audio, to the length of this buffer of audio. For example, if it takes more than 10ms to render 10ms of audio, there's going to be trouble.

When doing playback, we have lots of headroom (there is a buffer queue in the system, which means that if we miss even a couple deadline, it's not going to glitch), however some sites use the Web Audio API to do their output. When this is the case, this safety net does not exist, and it's easier to glitch under load.

A good example of a website that uses the Web Audio API for output is SoundCloud. Youtube can be use as a test website for HTMLMediaElement.

Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4a12df5a18f0
Make ProcessPriorityManager pay attention to video and audio playing WakeLocks. r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/f77433e6354c
Map PROCESS_PRIORITY_BACKGROUND_PERCEIVABLE to BELOW_NORMAL_PRIORITY_CLASS on Windows. r=gsvelto

Reopening this to not scatter the information and measurements everywhere.

Mike, I can run some measurements here if you want, did you have time to gather some data yourself ?

Status: RESOLVED → REOPENED
Flags: needinfo?(mconley)
Resolution: FIXED → ---

(In reply to Paul Adenot (:padenot) from comment #30)

Mike, I can run some measurements here if you want, did you have time to gather some data yourself ?

I haven't gotten to it yet - if you have a few moments, that'd be very helpful! Thank you!

Flags: needinfo?(mconley) → needinfo?(padenot)

FYI I'm going to finish cleaning up the other wakelocks in bug 1358005.

Blocks: 1525000

We can close this, a better behaviour has been landed on autoland in https://bugzilla.mozilla.org/show_bug.cgi?id=1524193.

Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Flags: needinfo?(padenot)
Resolution: --- → FIXED
Attachment #9041463 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: