Closed Bug 1907135 Opened 1 year ago Closed 1 year ago

We seem to fall back to software WebRender on Android if the GPU process is killed too many times

Categories

(Core :: Graphics: WebRender, defect)

All
Android
defect

Tracking

()

RESOLVED FIXED
130 Branch
Tracking Status
firefox130 --- fixed

People

(Reporter: mstange, Assigned: jnicol)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

I noticed sluggish scrolling on Android, took a profile, and noticed that Firefox Nightly was using Software WebRender on my phone. This is a Samsung Galaxy S21 so hardware WebRender should always be available.

about:support says:

GPU_PROCESS:
default: available,
WEBRENDER:
default: available,
runtime: unavailable, Failed to create remote compositor, Blocklisted; failure code FEATURE_FAILURE_GPU_PROCESS_ERROR

Could it be that Android has killed the GPU process a few times without killing the parent process? If that's the case, I think we should adjust our behavior. The GPU process being killed by Android is a pretty normal thing to happen, and on this phone there's no good reason to use software webrender.

Attached file about:support
Failure Log
(#0) Error: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#254): CP+[GFX1-]: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#255) Error: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#256) Error: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#257) Error: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#258) Error: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#259) Error: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#260) Error: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#261) Error: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#262) Error: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#263) Error: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#264) Error: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#265) Error: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#266): CP+[GFX1-]: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#267): CP+[GFX1-]: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
(#268): CP+[GFX1-]: CompositorBridgeChild receives IPC close with reason=AbnormalShutdown
Flags: needinfo?(jnicol)

Yeah this isn't great. And seems to be reasonably widespread: See recently bug 1903484 which was also originally noticed as a performance issue. And the surprisingly high volume of bug 1903810, which only occurs after the GPU process has been disabled.

I have made some attempts to avoid this in the past, eg reseting the unstable launch counter [here}(https://searchfox.org/mozilla-central/rev/1cc09ebc0b32f69fd30057582671b0811f2cb9c8/gfx/ipc/GPUProcessManager.cpp#258) after a successful launch. And always counting launches whilst in the background as stable here, to avoid repeated launches and kills whilst in the background from resulting in it being disabled.

But clearly those are not enough, or are not working as intended. I think we probably need to add some better logging so that about:support can give more helpful information

Assignee: nobody → jnicol
Severity: -- → S2
Flags: needinfo?(jnicol)

(Whoops, commented on wrong bug)

This is trivial to reproduce if you have set the "Background process limit" to "No background processes" in system developer options. Then simply open and minimise the app a dozen times.

(This makes the GPU process be killed a second after minimising the app, but the parent and a content process remain alive. I don't understand why this is so, I originally thought I should set the limit to 2 background processes for this behaviour, but that's not important here)

This occurs because the process is being relaunched within layers.gpu-process.stable.min-uptime-ms (4 minutes) of the previous launch, meaning we treat it as unstable here.

In bug 1869777 I previously made a change to treat the process as stable if launched whilst in background, but a) it's the state at time of kill rather than launch that is important, and b) that forgot about the uptime element of being declared stable, meaning short-lived processes could still be treated as unstable even if launched whilst in the background.

Instead of deciding whether the previous process was stable or not when the next process is being launched, we should do it when the process is actually lost. That will allow us to check the foreground/background state at that time, and make sure we treat the process as stable if in the background.

The question remains whether this is the cause of frequent SWGL fallback we are seeing in the wild? I can certainly believe it's common for people to use the browser in frequent short bursts of less than 4 minutes. However, clearly most users will not have set a background process limit in developer options. On the other hand, perhaps certain OEMs are aggressive enough about killing background processes that the behaviour is similar enough to reproduce.

Rather than determine whether the previous GPU process was stable or
unstable when launching the subsequent process, do so immediately
following the process loss. This allows us to check the application's
foreground/background state at the time of the loss, and ensure that
we do not treat the process as unstable if we are killed whilst in the
background.

This is desireable because background process kills are expected
behavior on Android, and not in any way indicitive of the GPU process
being unstable. Treating them as unstable was leading to fallback to
software rendering and eventually the gpu process being disabled. This
patch is attempting to prevent that from occuring.

Previously added code that treated GPU processes that were launched
whilst in the background as stable can be removed, as it is no longer
required with this change.

Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e43af486bd02 Do not treat GPU process loss as unstable whilst application is in background. r=aosmond
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 130 Branch
Blocks: 1912944
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: