Closed Bug 1706656 Opened 4 years ago Closed 3 years ago

SurfaceAllocator is not compatible with isolated process

Categories

(GeckoView Graveyard :: Sandboxing, task, P2)

Unspecified
All

Tracking

(firefox110 wontfix, firefox111 fixed)

RESOLVED FIXED
111 Branch
Tracking Status
firefox110 --- wontfix
firefox111 --- fixed

People

(Reporter: owlish, Assigned: jnicol)

References

(Blocks 4 open bugs)

Details

(Whiteboard: [sandboxing] [geckoview:2022q3])

Attachments

(1 file)

Affected tests: MediaElementTest and MediaSessionTest#fullscreenVideoElementMetadata.

Exception:

32142-32220/org.mozilla.geckoview.test:tab33 W/SurfaceAllocator: Failed to acquire GeckoSurface
    java.lang.SecurityException: Isolated process not allowed to call bindService
        at android.os.Parcel.readException(Parcel.java:1683)
        at android.os.Parcel.readException(Parcel.java:1636)
        at android.app.ActivityManagerProxy.bindService(ActivityManagerNative.java:4374)
        at android.app.ContextImpl.bindServiceCommon(ContextImpl.java:1491)
        at android.app.ContextImpl.bindService(ContextImpl.java:1450)
        at android.content.ContextWrapper.bindService(ContextWrapper.java:636)
        at org.mozilla.gecko.gfx.SurfaceAllocator.ensureConnection(SurfaceAllocator.java:36)
        at org.mozilla.gecko.gfx.SurfaceAllocator.acquireSurface(SurfaceAllocator.java:45)
Whiteboard: [sandboxing][geckoview:m90?]
Whiteboard: [sandboxing][geckoview:m90?] → [sandboxing]
Severity: -- → S3
Priority: -- → P2
Whiteboard: [sandboxing] → [sandboxing][geckoview:m90?]
Whiteboard: [sandboxing][geckoview:m90?] → [sandboxing][geckoview:m91?]
Whiteboard: [sandboxing][geckoview:m91?] → [sandboxing]
Whiteboard: [sandboxing] → [sandboxing] [geckoview:2022q3?]
Whiteboard: [sandboxing] [geckoview:2022q3?] → [sandboxing] [geckoview:2022q3]

I wonder if this is still a problem now with all the changes that jnicol has been making for the GPU process.

I don't think we call bindService any more, so it could well be solved.

Alternatively on more recent android versions we can use AHardwareBuffers instead of SurfaceTextures for video frames. I'm not sure if isolated processes are a new thing or not.

Moving isolated process bugs to the new GeckoView::Sandboxing component.

Component: General → Sandboxing

Jamie, the GeckoView team is starting work to support isolated process mode in Q3. Do you have cycles to work on this Android bug in Q3 or Q4? Or is this bug something the GeckoView team should own?

You can enable isolated processes locally for testing by adding ac_add_options --enable-isolated-process to your mozconfig file.

EDIT:

Looks like you might be able to just set the isolatedProcess flag for each process here without needing to change your mozconfig:

https://searchfox.org/mozilla-central/rev/d5edb4a4538657b7d691a41c00e6796a19ade6e7/mobile/android/geckoview/src/main/AndroidManifest.xml#50,57,64,71,78,85

In that case, maybe we enable and ship isolated process mode for the GPU process before enabling it for the parent and content process.

Flags: needinfo?(jnicol)

This bug seems to be about isolated content processes using the SurfaceAllocator. I believe in the past SurfaceAllocator, GeckoSurface, GeckoSurfaceTexture and co have been owned by geckoview/mobile, though they certainly overlap with gfx and media. Nowadays though I'm probably one of the people most familiar with it, so I should be able to find time to work on this bug this half.

FWIW my recent changes to the SurfaceAllocator for the GPU process do indeed seem to have fixed the error message from comment 0 like Agi and I thought it might. However, we now run in to a different error immediately afterwards.

For the latter part of your question, about isolating the GPU process. I'm not sure whether isolated processes are in fact allowed to access the GPU. This comment seems to indicate not. In which case isolating the GPU process is a complete non-starter. Additionally, it would mean the graphics team will need to implement out-of-process webgl on Android for isolated content processes to be shippable. I can try to confirm whether access to the GPU is indeed limited.

Flags: needinfo?(jnicol)
Blocks: 1709948

Jamie, is this related to the OOP WebGL?

Flags: needinfo?(jnicol)

OOP webgl might be a prerequisite for shipping isolated content processes. It depends on whether isolated processes are able to use the GPU

Flags: needinfo?(jnicol)

I have been doing some testing of isolated content processes on a variety of Android versions.

On all versions tested (5.0+), once the SurfaceAllocator is working webgl works fine too. So it appears isolated processes are able to access the GPU, meaning out-of-process WebGL is not a blocker for isolated content processes. And in theory we should one day be able to isolate the GPU process too.

On Android >= 8.0, the SurfaceAllocator already works correctly. On versions lower than that, we hang when attempting to create a SurfaceTexture here. (This is what is causing the test failure on the emulator, which runs Android 7).

I see lots of messages saying ServiceManager: Waiting for service SurfaceFlinger... in the logcat. So it appears we are unable to create SurfaceTextures in isolated processes on Android versions < 8.0. In this case, we only need to create one as a dummy so we can call the Surface constructor. We can work around this by using reflection to call the hidden constructor that does not require a SurfaceTexture instance.

However, we do actually need to create a SurfaceTexture here so that we can readback the contents in the content process, eg for using a video frame as texture data in webgl. Perhaps we can live without that feature on older Android versions, or perhaps we therefore only want to ship isolated processes on 8+. Chris, were we planning to ship isolated content processes on all Android versions? Are we able to only ship to certain versions? There is no way I can see around this limitation unfortunately.

Additionally, the GPU process definitely needs to be able to create SurfaceTextures, so we would definitely only be able to isolate that on 8+. I'll also note that when testing isolated content processes on a few physical devices I ran in to a number of crashes long before I could get to testing this code, not sure whether these are known or not. And attempting to play a video always crashes on any android version whether device or emulator, with the exception: java.lang.SecurityException: Isolated process not allowed to call bindService. (Presumably when attempting to start the media process.)

Flags: needinfo?(cpeterson)

Chris, were we planning to ship isolated content processes on all Android versions? Are we able to only ship to certain versions? There is no way I can see around this limitation unfortunately.

Olivia, I see that we currently declare processes to be android:isolatedProcess="false" in a manifest file. Can we delay the decision of whether a process should be isolated or not to runtime, so we can check whether the user is on Android 8+? Or is there a way to specify a minimum Android version in the manifest file?

https://searchfox.org/mozilla-central/rev/9b8ebf06145feeccf34dc126cf45b07e86556392/mobile/android/geckoview/src/main/AndroidManifest.xml#51,58,65,72,79,86

Flags: needinfo?(cpeterson) → needinfo?(ohall)

I'm going to tentatively say that we might be able to use a resource value to change android:isolatedProcess="@bool/isolatedProcessIsSupported". I'd like to double check if that works first. Basing this idea off of this StackOverflow thread.

# res/values/bool.xml
<resources>
    <bool name="isolatedProcessIsSupported">false</bool>
</resources>

# Turn on isolated process for Android 8.0 +
# res/values-v26/bool.xml
<resources>
    <bool name="isolatedProcessIsSupported">true</bool>
</resources>

Here is where the build system turns on isolated processes for the content processes. If the above works, we will need to also adjust that to turn on/off isolated processes on the res/values-v26/bool.xml file.

Going to leave on the needinfo as a reminder to check if a resource value will work or not.

Olivia, I see that we currently declare processes to be android:isolatedProcess="false" in a manifest file. Can we delay the decision of whether a process should be isolated or not to runtime, so we can check whether the user is on Android 8+? Or is there a way to specify a minimum Android version in the manifest file?

It looks like using a resource value to change android:isolatedProcess in the manifest file based on device API could work as a solution.

(Checked by turning on an isolated process test that would fail if isolated processes were enabled and pass if not enabled, set the above changes, and ran the test in API 24, 25, 26, 31. Test passed/failed as expected if isolated processes were in fact off/on - pass on 24, 25 and fail as expected on 26, 31.)

I think bug 1719762 to update the CI emulator would also become relevant for testing isolated processes (mozemulator right now is 24).

Just mentioning some additional implementation considerations for reference - If we go this direction, we would also need to adjust the build flag of MOZ_ANDROID_CONTENT_SERVICE_ISOLATED_PROCESS to turn on/off isolated processes in the values-26/bool.xml file (add .jinja template layer?) or similar, update a static pref that uses that flag to work as expected, and update Enviroment.java.

Flags: needinfo?(ohall)

Thank you for testing that, Olivia! It is very reassuring to know that we have that option.

We may not need it, however, for now at least. I will attach a patch here shortly that will allow the surface allocator to work, at the expense of breaking video data upload to webgl (only when using isolated processes), as mentioned in comment 8.

This will allow MediaSessionTest#fullscreenVideoElementMetadata to progress further, though it still fails: java.lang.SecurityException: Isolated process not allowed to call bindService. But importantly the SurfaceAllocator will no longer be blocking work on other issues. Meanwhile I can try to find an alternate solution to the video data webgl issue, and depending on the success of that, once closer to shipping, we can decide whether to ship to all versions or not.

Due to a bug on versions of Android prior to 8.0, attempting to create
a SurfaceTexture from an isolated process results in an infinite
hang. We must therefore avoid doing this.

We currently create a dummy SurfaceTexture in order to call the
Surface constructor when instantiating a GeckoSurface from a
Parcel. This patch avoids this by making GeckoSurface own a Surface
rather than inherit from one.

Additionally, for every Surface allocated we create a corresponding
"sync" SurfaceTexture in the content processes. This is used when the
texture data is required in the content process, eg using video as a
WebGL texture upload source. This patch prevents us allocating such
SurfaceTextures. If out-of-process WebGL is enabled, video data can
still be used as a texture upload source as WebGL resides in the GPU
process. However, if out-of-process WebGL is disabled the texture
upload will fail.

Prior to shipping isolated content processes on affected Android
versions, we must therefore ensure we have an alternative solution to
using video as a WebGL texture upload source, or that out-of-process
WebGL can be relied upon.

Assignee: nobody → jnicol
Status: NEW → ASSIGNED
Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9759c11fa52f Avoid creating SurfaceTextures in isolated content processes on old android versions. r=geckoview-reviewers,ohall

Backed out changeset 9759c11fa52f (Bug 1706656) for causing gv-junit geckoview failures.
Backout link
Push with failures <--> gv-junit-e10s-single
Failure Log

Flags: needinfo?(jnicol)
Blocks: 1810736

A couple tests used to pass because while one thread hung indefinitely trying to start SurfaceFlinger to create a SurfaceTexture, the test passed in the meantime. With this patch, we avoid creating the SurfaceTexture then run in to the Isolated process not allowed to call bindService crash above which fails the test. I've updated the tests to be skipped when running with isolated process, and linked to bug 1810736.

Flags: needinfo?(jnicol)
Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ab36547f819e Avoid creating SurfaceTextures in isolated content processes on old android versions. r=geckoview-reviewers,ohall
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 111 Branch
Blocks: 1649110
See Also: → 1811929
Regressions: 1811929
See Also: 1811929
Product: GeckoView → GeckoView Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: