Closed Bug 1902077 Opened 8 months ago Closed 8 months ago

Fenix starts video timer on android that never terminates

Categories

(GeckoView :: Media, defect)

defect

Tracking

(Performance Impact:high, relnote-firefox 127+, firefox127+ fixed, firefox128+ fixed, firefox129+ fixed)

RESOLVED FIXED
129 Branch
Performance Impact high
Tracking Status
relnote-firefox --- 127+
firefox127 + fixed
firefox128 + fixed
firefox129 + fixed

People

(Reporter: bas.schouten, Assigned: jhlin)

References

(Blocks 1 open bug)

Details

Attachments

(5 files)

This leads a flawed Android OS battery stats calculator to attribute a lot of fake battery usage to Firefox.

Steps to reproduce:

  • Have Firefox installed.
  • Force stop. (App Info -> Force Stop, i.e. a full force stop)
  • Wait for an hour or two.

I've verified using adb shell dumpsys media.resource_manager that after watching a couple of videos and terminating the Firefox process, the Firefox media process will continue things like:

Client:
        Id: -5476158183275902000
        Name: c2.exynos.h264.decoder
        Resources:
          non-secure-codec/hw-video-codec:[]:1
          battery/hw-video-codec:[]:1

These will not go away until a device restart as far as I can tell.

Blocks: 1894804

More specifically, this works for me:

  1. Restart phone
  2. adb shell dumpsys media.resource_manager shows no activity
  3. Start firefox on a blank page and go to https://www.vimeo.com/
  4. Once the video on the front page (not logged in) starts adb shell dumpsys media.resource_manager shows an h264 decoder
  5. Press the app view button and swipe app up to close
  6. Long press firefox -> App Info -> Force Stop
  7. adb shell dumpsys media.resource_manager shows the codec still there
  8. Video number in batterystats --usage goes up
Attached file Resource Manager Log

Relevant bit from the resource manager, after killing Firefox.

I tried the steps above on a pixel 6 and s21 with android 13 and couldn't get it to reproduce.
However, I tried it on a s24 ultra with android 14 and got the same results as Bas.

I added the logs I got with the s24 ultra.

I have Firefox Beta installed on my Samsung Galaxy S23. I think for me it's not only an issue of attribution but my battery really does drain faster. I also noticed that using Samsung's battery saving mode seems to "fix" the battery drain. Maybe this information is somehow useful.

jhlin has been able to reproduce this in the emulator and has noticed the following in the logcat

06-14 10:58:00.504   437  3189 W ResourceManagerService: addResource called with untrusted pid 12284 or uid 10190, using calling pid 12705, uid 10190
Assignee: nobody → jolin

(In reply to kirill.rakhman from comment #5)

I have Firefox Beta installed on my Samsung Galaxy S23. I think for me it's not only an issue of attribution but my battery really does drain faster. I also noticed that using Samsung's battery saving mode seems to "fix" the battery drain. Maybe this information is somehow useful.

There should be a way to verify if this is true or a placebo effect if you have access to adb. (I felt the same and concluded it was probably not true on my Pixel)

Use adb shell dumpsys batterystats --usage, it will have an estimated usage and an actual usage. For me the difference between the two was roughly equal to the 'video' attribution of Firefox. Suggesting at least that part of the reporting was 'not real'.

3 processes involved:

  • tab/content process(:tab*), where the video element is run. It calls MediaCodec API remoted in media process via binder/service.
  • media process(:media) runs the Android MediaCodec API that calls media resource manager service (also via binder).
  • mediaserver process, an Android system process where the resource manager is run.

Now, some findings (on a Pixel 7/Android 14 emulator):

First, client/resource info in media.resource_manager:

    Pid: 10806
    Priority: <unknown>
      Client:
        Id: -5476376626430055712
        Name: c2.goldfish.h264.decoder
        Resources:
          non-secure-codec/video-codec:[]:1
          battery/video-codec:[]:1

Adding/Removing the above info is done by Android framework code, as shown in the event log (captured in anther test session):

    06-14 15:12:34 removeResource(pid 4796, uid 10192 clientId -5476376626430051536)
    06-14 15:12:34 removeResource(pid 4796, uid 10192 clientId -5476376626430051536)
    06-14 15:12:34 markClientForPendingRemoval(pid 4796, clientId -5476376626430051536)
    06-14 15:12:33 addResource(pid 4796, uid 10192 clientId -5476376626430051536, resources battery/video-codec:[]:1 )
    06-14 15:12:33 addResource(pid 4796, uid 10192 clientId -5476376626430051536, resources graphic-memory/unspecified:[]:0 )
    06-14 15:12:33 addResource(pid 4796, uid 10192 clientId -5476376626430051536, resources non-secure-codec/video-codec:[]:1 )

The implementation gets caller pid from binder IPC state, so the event log shows tab/content process ID, but there is a hidden service permission that the resource manager checks, and when fails, pid will be replaced with media process ID, as the 'untrusted pid' message shown in comment 7, hence the different pids in client info and event log. Android system media services always show matching pids because they have the special permission and can use private API.

Second, the persistent client/resource info issue:
Even though pid seems problematic as shown above, the resource manager behaves as expected when the codec lifecycle is managed properly (i.e., create/release via MediaCodec API) because it translates pid automatically.
However, if the codec is not released explicitly, things could go wrong. The resource manager uses the binder death technique to clean the client/resource info when client process is killed, but it doesn't check and replace the pid in this case. (Again, system media services work as expected because of the special permission.)
We also use the same way to handle 'tab/content process is killed unexpectedly' situation. Unfortunately, when closing the app via Recents screen UI, Android actually kills the processes in a "undesired" order:

06-14 16:19:08.308   562   581 I ActivityManager: Killing 9595:org.mozilla.geckoview_example:gpu/u0a194 (adj 905): remove task
06-14 16:19:08.308   562   581 I ActivityManager: Killing 9862:org.mozilla.geckoview_example:media/u0a194 (adj 925): remove task
06-14 16:19:08.308   562   581 I ActivityManager: Killing 9561:org.mozilla.geckoview_example:tab16/u0a194 (adj 905): remove task
06-14 16:19:08.308   562   581 I ActivityManager: Killing 9515:org.mozilla.geckoview_example/u0a194 (adj 915): remove task

Since tab/content process is killed after media process, binderDied() wasn't called. Some device/Android version combinations might behave differently. I guess that's probably why Marc couldn't reproduce the issue earlier.

Android document mentions Service.onTaskRemoved() is the way to handle this "remove task" situation, but so far I have not been able to make it work. Will keep trying.

(In reply to John Lin [:jhlin][:jolin] from comment #9)

The implementation gets caller pid from binder IPC state, so the event log shows tab/content process ID,

Why is the IPC caller the tab process? Doesn't that mean that we're calling some Android media function from the tab process?

Flags: needinfo?(jolin)

(In reply to Markus Stange [:mstange] from comment #10)

(In reply to John Lin [:jhlin][:jolin] from comment #9)

The implementation gets caller pid from binder IPC state, so the event log shows tab/content process ID,

Why is the IPC caller the tab process? Doesn't that mean that we're calling some Android media function from the tab process?

Not directly. All Android media functions are called in media process, and tab process calls into media process through binder.

I've tried other emulators with different Android versions, so far only Android 14 has this issue. Android 12 and 13, while also pass tab process pid to resource manager, will use media process pid in overridePid() and remove the resouce info successfully.

Flags: needinfo?(jolin)

I see... I think we have to look for a way to set things up in such a way that AIBinder_getCallingPid() will return the media process pid and not the tab process pid.

I don't think cleaning things up ourselves is an option. Android can kill any process at any time, so our cleanup would never run reliably.

(In reply to Markus Stange [:mstange] from comment #12)

I see... I think we have to look for a way to set things up in such a way that AIBinder_getCallingPid() will return the media process pid and not the tab process pid.

I tried to access the hidden method using Java reflection but got NoSuchMethodException.

I'll see if moving MediaCodec calls out of the binder thread can fix this.

I don't think cleaning things up ourselves is an option. Android can kill any process at any time, so our cleanup would never run reliably.

Good point. I couldn't get OnTaskRemoved() to work anyway.

Attachment #9408483 - Attachment mime type: application/octet-stream → text/plain

John, can you try running the code which calls MediaCodec.createByCodecName asynchronously, outside the Binder.execTransact call? Then AIBinder_getCallingPid might return the media process's pid.

I also looked for a way to avoid getting into the branch that calls AIBinder_getCallingPid, by pre-initializing mPid, but it seems the only way to pre-initialize mPid is to call MediaCodec.createByCodecNameForClient which is a system-only API that requires special permissions.

Flags: needinfo?(jolin)

(In reply to Markus Stange [:mstange] from comment #15)

John, can you try running the code [...] asynchronously

Heh, this might not be as easy as I initially envisioned; the calling side expects a synchronous return value and it's not clear where the asynchronicity should be introduced.

But Jeff just had another idea: We can maybe just wrap our call to createByCodecName with a call to Binder.clearCallingIdentity. He's testing this now.

Here's what I'm trying:

diff --git a/mobile/android/geckoview/src/main/java/org/mozilla/gecko/media/LollipopAsyncCodec.java b/mobile/android/geckoview/src/main/java/org/mozilla/gecko/media/LollipopAsyncCodec.java
index aaf8810bbb05..e55d52698b52 100644
--- a/mobile/android/geckoview/src/main/java/org/mozilla/gecko/media/LollipopAsyncCodec.java
+++ b/mobile/android/geckoview/src/main/java/org/mozilla/gecko/media/LollipopAsyncCodec.java
@@ -8,6 +8,7 @@ import android.media.MediaCodec;
 import android.media.MediaCodecInfo.CodecCapabilities;
 import android.media.MediaCrypto;
 import android.media.MediaFormat;
+import android.os.Binder;
 import android.os.Build;
 import android.os.Bundle;
 import android.os.Handler;
@@ -134,7 +135,9 @@ import org.mozilla.gecko.util.HardwareCodecCapabilityUtils;
   }

   /* package */ LollipopAsyncCodec(final String name) throws IOException {
-    mCodec = MediaCodec.createByCodecName(name);
+          long token = Binder.clearCallingIdentity();
+          mCodec = MediaCodec.createByCodecName(name);
+          Binder.restoreCallingIdentity(token);
   }

   @Override

Android is getting confused about which process is owning a media resource;
in different places it stores either our media process's pid or our tab process's pid.
Then, when our processes shut down (or rather when they killed by Android), this
confusion leads Android to do improper cleanup of its video resource accounting.
This causes it to keep increasing the "video" time for Firefox (in
adb shell dumpsys batterystats --usage) even while no Firefox processes
are running. This in turn causes the "Battery" UI in Android's system preferences to
show large values for Firefox.

To work around this attribution error, we try to ensure that Android tracks the
same PID in all the places.
Specifically, the problem is that the constructor of
android::MediaCodec::ResourceManagerServiceProxy::ResourceManagerServiceProxy
is getting the owning PID from AIBinder_getCallingPid(). The Firefox media process
calls MediaCodec.createByCodecName under a binder transaction from the tab process,
specifically while the tab process is calling ICodec$Stub$Proxy.configure inside
CodecProxy.init.

This fix makes it so that AIBinder_getCallingPid() returns the media process's PID,
by wrapping the call to MediaCodec.createByCodecName in calls to
Binder.clearCallingIdentity / Binder.restoreCallingIdentity. The Android
documentation says the following about clearCallingIdentity:

This can be useful if, while handling an incoming call, you will be calling
on interfaces of other objects that may be local to your process and need to
do permission checks on the calls coming into them (so they will check the
permission of your own local process, and not whatever process originally
called you).

This matches what we're trying to do here quite well.


Firefox media process stack showing MediaCodec.createByCodecName getting into
AIBinder_getCallingPid():

AIBinder_getCallingPid
android::MediaCodec::ResourceManagerServiceProxy::ResourceManagerServiceProxy(int, unsigned int, std::__1::shared_ptr<aidl::android::media::IResourceManagerClient> const&, android::MediaCodec*)[libstagefright.so]
android::MediaCodec::MediaCodec(android::sp<android::ALooper> const&, int, unsigned int, std::__1::function<android::sp<android::CodecBase>(android::AString const&, char const*)>, std::__1::function<int (android::AString const&, android::sp<android::MediaCodecInfo>*)>)
android::MediaCodec::CreateByComponentName(android::sp<android::ALooper> const&, android::AString const&, int*, int, unsigned int)
android::JMediaCodec::JMediaCodec(_JNIEnv*, _jobject*, char const*, bool, bool, int, int)
android_media_MediaCodec_native_setup(_JNIEnv*, _jobject*, _jstring*, unsigned char, unsigned char, int, int)
android.media.MediaCodec.<init>
android.media.MediaCodec.<init>
android.media.MediaCodec.createByCodecName
org.mozilla.gecko.media.LollipopAsyncCodec.<init>

Firefox tab process stack showing ICodec$Stub$Proxy.configure binder call:

_ioctl
ioctl
android::IPCThreadState::talkWithDriver(bool)
android::IPCThreadState::waitForResponse(android::Parcel*, int*)
android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)
android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)
android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int)
art_quick_generic_jni_trampoline
art_quick_invoke_stub
android.os.BinderProxy.transact
org.mozilla.gecko.media.ICodec$Stub$Proxy.configure
org.mozilla.gecko.media.CodecProxy.init
org.mozilla.gecko.media.RemoteManager.createCodec
org.mozilla.gecko.media.CodecProxy.create
mozilla::java::CodecProxy::Create
mozilla::RemoteVideoDecoder::Init()
Pushed by mstange@themasta.com: https://hg.mozilla.org/integration/autoland/rev/0dbb55b9b4d9 Work around a battery usage misattribution bug in Android 14. r=jolin,geckoview-reviewers,owlish

This patch fixes the issue in local testing on Android 14, and video still works.
We haven't done any testing on Android 13 yet.

Flags: needinfo?(jolin)

Backed out for causing lint failures in LollipopAsyncCodec.java

  • Backout link
  • Push with failures
  • Failure Log
  • Failure line: TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/mobile/android/geckoview/src/main/java/org/mozilla/gecko/media/LollipopAsyncCodec.java:144:10 | Variable 'token' should be declared final. (com.puppycrawl.tools.checkstyle.checks.coding.FinalLocalVariableCheck)
Flags: needinfo?(jolin)
Flags: needinfo?(jolin) → needinfo?(mstange.moz)

I've updated the patch to add the final.
Try push before that change: https://treeherder.mozilla.org/jobs?repo=try&revision=68c8e480baa6650bdffa261a06b3c884897f988d (thanks Kaya)

Flags: needinfo?(mstange.moz)
Pushed by mstange@themasta.com: https://hg.mozilla.org/integration/autoland/rev/b888fe39379b Work around a battery usage misattribution bug in Android 14. r=jolin,geckoview-reviewers,owlish
Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → 129 Branch

Markus, given the patch looks relatively simple and given the high impact on perceived performance, is this something you think we should get into the 127.0.2 dot release that is scheduled to happen on Monday?
If you want to do it, we'll need to request uplift today (both to mozilla-beta and mozilla-release) so that it's in tomorrow's beta.

Flags: needinfo?(mstange.moz)

Yes, the plan is to try to uplift to release. I talked with dmeehan yesterday and we can request uplift as late as Friday. I'll request beta uplift now.

Flags: needinfo?(mstange.moz)

Comment on attachment 9408514 [details]
Bug 1902077 - Work around a battery usage misattribution bug in Android 14. r=jolin

Beta/Release Uplift Approval Request

  • User impact if declined: We get hit with a battery usage misattribution bug in Android 14 that can make it seem like Firefox is continually consuming battery even when it's not running.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): We're not certain what behaviour changes this could cause on all of the different Android versions and devices. It's possible that this could break video playback in some circumstances.

We've done some manual testing on Android 14 and 13 and everything seems fine. This code is triggered anytime there's video playback so at least it's well covered.

  • String changes made/needed:
  • Is Android affected?: Yes
Attachment #9408514 - Flags: approval-mozilla-beta?

Release Note Request (optional, but appreciated)
[Why is this notable]: It fixes a reporting error that many users are complaining about and that made us look worse than we are.
[Affects Firefox for Android]: yes, only
[Suggested wording]: Fix battery usage reporting in Firefox for Android: Firefox no longer triggers a bug in Android 14 which was causing Android to report massively inflated battery usage on certain devices when Firefox was not running. You may need to restart your phone for the fix to take effect.
[Links (documentation, blog post, etc)]: n/a

relnote-firefox: --- → ?

I've tested this change on Android 13. Video worked and there were no leaks of decoders when looking at adb shell dumpsys media.resource_manager

I also tested this with DRM video and that works too.

Comment on attachment 9408514 [details]
Bug 1902077 - Work around a battery usage misattribution bug in Android 14. r=jolin

Approved for 128.0b6.

Attachment #9408514 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

jnicol reports that video still works on Android 5.1

Comment on attachment 9408514 [details]
Bug 1902077 - Work around a battery usage misattribution bug in Android 14. r=jolin

Beta/Release Uplift Approval Request

  • User impact if declined: We get hit with a battery usage misattribution bug in Android 14 that can make it seem like Firefox is continually consuming battery even when it's not running.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): We're not certain what behaviour changes this could cause on all of the different Android versions and devices. It's possible that this could break video playback in some circumstances, given the testing that we've done this doesn't seem super likely.

We've done some manual testing on Android 14, 13, 5.1 and everything seems fine. This code is triggered anytime there's video playback so at least it's well covered.

  • String changes made/needed:
  • Is Android affected?: Yes
Attachment #9408514 - Flags: approval-mozilla-release?

Comment on attachment 9408514 [details]
Bug 1902077 - Work around a battery usage misattribution bug in Android 14. r=jolin

Approved for 127.0.2, thanks.

Attachment #9408514 - Flags: approval-mozilla-release? → approval-mozilla-release+
Component: Audio/Video → Media
Product: Core → GeckoView
Duplicate of this bug: 1893242
Duplicate of this bug: 1891950
No longer blocks: 1891950
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: