Closed Bug 1338011 Opened 3 years ago Closed 3 years ago

Add software decode with hardware layers as an intermediate fallback for compositor process crashes

Categories

(Core :: Audio/Video: Playback, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: ajones, Assigned: gerald)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files, 3 obsolete files)

Steps to reproduce:

* Play accelerated video (e.g. H.264 on YouTube)
* Kill compositor process

Expected results:

Fall back to accelerated layers but software decode

Actual results:

Software layers and software decode
There's a button for killing the compositor process in about:support along with the process's PID
Comment on attachment 8835837 [details]
Bug 1338011 - Adds an intermediate fallback when the GPU process crashes

https://reviewboard.mozilla.org/r/111412/#review114044

Clearing r? to see another patch, but this is a good idea.

As a follow-up (either separate bug or additional patch in this one), it'd be nice to also have a way to query Telemetry, to see how often a session lands in the "GPU process has restarted without video decoding" state. We could correlate that with the # of process launches to estimate how often we were able to preserve layers acceleration, while preventing crashes. Maybe just a boolean histogram that is set when we disable remote decoding?

::: dom/media/platforms/wmf/WMFDecoderModule.cpp:62
(Diff revision 2)
> +  } else if (XRE_IsGPUProcess()) {
> +    // Always allow DXVA in the GPU process.
> +    sDXVAEnabled = true;
> +  } else {
> +    // Never allow DXVA in the UI process.
> +    sDXVAEnabled = false;

This will disable DXVA for non-e10s browsers. If that's not intentional, you can do use:

sDXVAEnabled = mozilla::BrowserTabsRemoteAutostart()

::: gfx/ipc/GPUProcessManager.cpp:387
(Diff revision 2)
>  
> +  if (mNumProcessAttempts > uint32_t(gfxPrefs::GPUProcessMaxRestartsWithDecoder())) {
> +    char disableMessage[64];
> +    SprintfLiteral(disableMessage, "VideoDecoding on GPU process disabled after %d attempts",
> +                   mNumProcessAttempts);
> +    DisableVideoDecodingOnGPUProcess(disableMessage);

This pref only makes sense if MaxRestartsWithDecoder < MaxRestarts. It's worth commenting that somewhere.

::: gfx/thebes/gfxPlatform.cpp:2243
(Diff revision 2)
>  
>    if (gfxPrefs::GPUProcessForceEnabled()) {
>      gpuProc.UserForceEnable("User force-enabled via pref");
>    }
>  
> +  gpuVideoDecode.InitOrUpdate(gpuProc.IsEnabled(), gpuProc.GetValue(), "Setting from GPU Process feature");

I think it would be simpler to just use a boolean member variable in GPUProcessManager. The config value isn't used anywhere else.

If the intent was to get it printing in about:support automatically, instead I would remove these InitOrUpdate/ForceDisable calls, and do a single conditional EnableByDefault() if gpuProc.IsEnabled() is true at the end.
Attachment #8835837 - Flags: review?(dvander)
Comment on attachment 8835837 [details]
Bug 1338011 - Adds an intermediate fallback when the GPU process crashes

https://reviewboard.mozilla.org/r/111412/#review114044

> I think it would be simpler to just use a boolean member variable in GPUProcessManager. The config value isn't used anywhere else.
> 
> If the intent was to get it printing in about:support automatically, instead I would remove these InitOrUpdate/ForceDisable calls, and do a single conditional EnableByDefault() if gpuProc.IsEnabled() is true at the end.

Do you think it would be acceptable to not print it in about:support? If so I'm pretty happy for it to just be a boolean flag.
That's fine. Since you have a gfxCriticalNote in there the user will see it anyway.
Attachment #8835837 - Attachment is obsolete: true
Attachment #8837841 - Flags: review?(benjamin)
Comment on attachment 8837840 [details]
Bug 1338011 - Adds an intermediate fallback when the GPU process crashes

https://reviewboard.mozilla.org/r/112840/#review114400

::: gfx/ipc/GPUProcessManager.cpp:380
(Diff revision 1)
>      char disableMessage[64];
>      SprintfLiteral(disableMessage, "GPU process disabled after %d attempts",
>                     mNumProcessAttempts);
>      DisableGPUProcess(disableMessage);
> +  } else if (mNumProcessAttempts > uint32_t(gfxPrefs::GPUProcessMaxRestartsWithDecoder())
> +             && !gfxPlatform::GetPlatform()->HardwareVideoDecodingBlacklisted()) {

I don't fully understand the "HardwareVideoDecodingBlacklisted" check. Is the goal to preserve remote decoding if we don't think the device was blacklisted? If so, it's odd the GPU process would be using a blacklisted feature in the first place.

::: gfx/ipc/GPUProcessManager.cpp:798
(Diff revision 1)
>  void
>  GPUProcessManager::CreateContentVideoDecoderManager(base::ProcessId aOtherProcess,
>                                                      ipc::Endpoint<dom::PVideoDecoderManagerChild>* aOutEndpoint)
>  {
> -  if (!mGPUChild || !MediaPrefs::PDMUseGPUDecoder()) {
> +  if (!mGPUChild
> +      || !MediaPrefs::PDMUseGPUDecoder()

nit: || should be on the previous line. (same for the && earlier.)
Comment on attachment 8837840 [details]
Bug 1338011 - Adds an intermediate fallback when the GPU process crashes

https://reviewboard.mozilla.org/r/112840/#review114628

::: gfx/ipc/GPUProcessManager.cpp:380
(Diff revision 1)
>      char disableMessage[64];
>      SprintfLiteral(disableMessage, "GPU process disabled after %d attempts",
>                     mNumProcessAttempts);
>      DisableGPUProcess(disableMessage);
> +  } else if (mNumProcessAttempts > uint32_t(gfxPrefs::GPUProcessMaxRestartsWithDecoder())
> +             && !gfxPlatform::GetPlatform()->HardwareVideoDecodingBlacklisted()) {

Sorry, this was a check for the telemetry stuff that I accidentally amended to this commit.

::: gfx/ipc/GPUProcessManager.cpp:798
(Diff revision 1)
>  void
>  GPUProcessManager::CreateContentVideoDecoderManager(base::ProcessId aOtherProcess,
>                                                      ipc::Endpoint<dom::PVideoDecoderManagerChild>* aOutEndpoint)
>  {
> -  if (!mGPUChild || !MediaPrefs::PDMUseGPUDecoder()) {
> +  if (!mGPUChild
> +      || !MediaPrefs::PDMUseGPUDecoder()

Sorry, I had a comment on a previous PR telling me to do it this way but I double checked the style guide and you're right.
Comment on attachment 8837840 [details]
Bug 1338011 - Adds an intermediate fallback when the GPU process crashes

https://reviewboard.mozilla.org/r/112840/#review115640

::: gfx/thebes/gfxPlatform.cpp:2342
(Diff revision 2)
>  
>    // gfxFeature is not usable in the GPU process, so we use gfxVars to transmit this feature
>    gfxVars::SetUseWebRender(gfxConfig::IsEnabled(Feature::WEBRENDER));
>  }
>  
> +bool gfxPlatform::HardwareVideoDecodingBlacklisted() {

This (and the declaration in gfxPlatform.h) don't seem to be used anywhere.
Attachment #8837840 - Flags: review?(dvander) → review+
Comment on attachment 8837841 [details]
Bug 1338011 - Adds some telemetry probes feedback?bsmedberg

https://reviewboard.mozilla.org/r/112842/#review115646

This might be a little heavyweight for the data we want, since we have to enumerate all the transitions. I think just a normal enumerated histogram is fine, with something like 0=decoding disabled and 1=gpu-process disabled (and leaving room for more enums in the future). FORCED_DEVICE_RESET_REASON is a good example.
Attachment #8837841 - Flags: review?(dvander)
Comment on attachment 8837840 [details]
Bug 1338011 - Adds an intermediate fallback when the GPU process crashes

https://reviewboard.mozilla.org/r/112840/#review115704

::: gfx/thebes/gfxPlatform.cpp:2342
(Diff revision 2)
>  
>    // gfxFeature is not usable in the GPU process, so we use gfxVars to transmit this feature
>    gfxVars::SetUseWebRender(gfxConfig::IsEnabled(Feature::WEBRENDER));
>  }
>  
> +bool gfxPlatform::HardwareVideoDecodingBlacklisted() {

This was used to determine the transition type. Seeing as we don't care about it anymore, I've removed it.
Attachment #8837841 - Flags: review?(benjamin)
Comment on attachment 8837841 [details]
Bug 1338011 - Adds some telemetry probes feedback?bsmedberg

https://reviewboard.mozilla.org/r/112842/#review115772

r=me with that fixed.

::: gfx/ipc/GPUProcessManager.cpp:50
(Diff revision 3)
>  namespace mozilla {
>  namespace gfx {
>  
>  using namespace mozilla::layers;
>  
> +enum GPUProcessFallbackType

This should be "enum class GPUProcessFallbackType : uint32_t". Then drop the "GPUProcess" prefix.
Attachment #8837841 - Flags: review?(dvander) → review+
Attachment #8837841 - Flags: review?(benjamin)
Is this part of the test plan for the GPU process?
Who is responsible for monitoring this data?
Do we have any automated tests whether this telemetry is working correctly? That is typically required for expires=never measures, since it's common to accidentally break this telemetry during refactoring.
Flags: needinfo?(jay.harris)
Jay has just finished his internship, so I'll take over this bug.
Assignee: jay.harris → gsquelart
Flags: needinfo?(gsquelart)
Comment on attachment 8837841 [details]
Bug 1338011 - Adds some telemetry probes feedback?bsmedberg

https://reviewboard.mozilla.org/r/112842/#review119278

::: toolkit/components/telemetry/Histograms.json:1004
(Diff revision 5)
>      "high": 64000,
>      "n_buckets": 100,
>      "releaseChannelCollection": "opt-out",
>      "description": "GPU process initialization (excluding XPCOM and fork time) time in milliseconds"
>    },
> +  "GPU_PROCESS_CRASH_FALLBACKS": {

It doesn't appear that there is any automated test which verifies that this telemetry is being recorded correctly. Typically that is required for expires: never telemetry, because it's common/easy to accidentally break this kind of telemetry over time. Please at least file a followup bug to add automation for this.

Who is responsible for monitoring this data? Note that automated alerts will not be useful in this case, nor t.m.o, since those only operate on prerelease/opt-in datasets. Is this something that we should have as part of project war room for permanent monitoring as part of release health?
Attachment #8837841 - Flags: review?(benjamin) → review+
Flags: needinfo?(jay.harris)
Flags: needinfo?(gsquelart)
Flags: needinfo?(cpearce)
Working on it now.

Following comment 23, I will change the histogram expiry to 60 so we can revisit it later on, and I will open a separate bug to see if we can create an automated test...
Flags: needinfo?(cpearce)
Comment on attachment 8858472 [details]
Bug 1338011 - Adds an intermediate fallback when the GPU process crashes -

Just a rebase.
Carrying r+ from comment 13.
Attachment #8858472 - Flags: review?(dvander) → review+
Comment on attachment 8858473 [details]
Bug 1338011 - Adds some telemetry probes - f=bsmedberg

Rebase, and changing histogram expiry from "never" to 60.
Carrying r+ from comment 18 and data-peer f+ from comment 23.
Attachment #8858473 - Flags: review?(dvander)
Attachment #8858473 - Flags: review+
Attachment #8858473 - Flags: feedback+
Comment on attachment 8837840 [details]
Bug 1338011 - Adds an intermediate fallback when the GPU process crashes

Replaced by attachment 8858472 [details].
Attachment #8837840 - Attachment is obsolete: true
Comment on attachment 8837841 [details]
Bug 1338011 - Adds some telemetry probes feedback?bsmedberg

Replaced by attachment 8858473 [details].
Attachment #8837841 - Attachment is obsolete: true
Comment on attachment 8858472 [details]
Bug 1338011 - Adds an intermediate fallback when the GPU process crashes -

https://reviewboard.mozilla.org/r/130432/#review133126

r+ from comment 13.
Comment on attachment 8858473 [details]
Bug 1338011 - Adds some telemetry probes - f=bsmedberg

https://reviewboard.mozilla.org/r/130434/#review133128

r+ from comment 18.
Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a373d6ae67f5
Adds an intermediate fallback when the GPU process crashes - r=gerald
https://hg.mozilla.org/integration/autoland/rev/1b87f77672fd
Adds some telemetry probes - f=bsmedberg r=gerald
https://hg.mozilla.org/mozilla-central/rev/a373d6ae67f5
https://hg.mozilla.org/mozilla-central/rev/1b87f77672fd
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Blocks: 1356837
Depends on: 1370079
Blocks: 1372070
Depends on: 1388309
You need to log in before you can comment on or make changes to this bug.