1810550 - Tabs use "excessive" CPU when application is backgrounded

Reporter

Description

•

2 years ago

From github: https://github.com/mozilla-mobile/fenix/issues/28452.

Steps to reproduce

STR: Open Firefox Beta and/or Nightly. Open 3-4 tabs. Leave for a few hours. Firefox gets killed by ActivityManager due to "excessive CPU". Hav

I'd like to re-open #23140, I see this issue quite a lot on Nightly and on Beta, several times per day, usually when the phone is locked and in my pocket. I have tried:- logging out of Firefox sync; making the app "unrestricted" in battery monitor; comparing Beta with Nightly. Nothing really makes much difference. Sometimes a tab process gets killed and I lose half my open tabs, sometimes the main process gets killed and Firefox cold launches when I switch to it.

I had Beta and Nightly opened simultaneously on my phone, each with 3-4 tabs open. They were both killed at different times this morning, typically when the phone was at a low level of activity as seen in the logs below

Expected behaviour

Firefox should not get killed due to excessive CPU

Actual behaviour

Several times per day Firefox parent or one tab process gets killed, resulting in loss of half my open tabs or complete closing of the app and re-launching

Device name

Galaxy M32

Android version

Android 13

Firefox release type

Firefox Beta

Firefox version

109.0b5

Device logs
adb logcat | grep “excessive"

01-08 08:07:54.764  1425  1476 I ActivityManager: Killing 16364:org.mozilla.firefox_beta/u0a295 (adj 920): excessive cpu 9340 during 300151 dur=1123666 limit=2
...
01-08 10:17:11.017  1425  1476 I ActivityManager: Killing 9589:org.mozilla.fenix:tab20/u0a275 (adj 900): excessive cpu 6940 during 300143 dur=928426 limit=2
Investigating the Beta kill in more detail, there is nothing exciting in the log beforehand, the phone was at a very low level of activity:-
adb logcat | grep "08:07:5”
 
01-08 08:07:50.117  1425  1425 D SGM:GameManager: identifyGamePackage. org.mozilla.fenix, mCurrentUserId: 0, callerUserId: 0, callingMethodInfo: com.android.server.ssrm.fgapps.GameAppUtils.isGame(GameAppUtils.java:84)
01-08 08:07:50.117  1425  1425 D SGM:PkgDataHelper: getGamePkgData(). org.mozilla.fenix
01-08 08:07:50.119  1425  1425 D SGM:GameManager: identifyGamePackage. org.mozilla.fenix, mCurrentUserId: 0, callerUserId: 0, callingMethodInfo: com.android.server.ssrm.fgapps.GameAppUtils.isGame(GameAppUtils.java:84)
01-08 08:07:50.119  1425  1425 D SGM:PkgDataHelper: getGamePkgData(). org.mozilla.fenix
01-08 08:07:51.156  3780  4631 I SDHMS:D : SIOP:: AP:175(283,100) BAT:288(288,0) USB:0(0,0) CHG:276(276,0) PA:283(283,0) BLK:0(0,0) SUBBAT:0(0,0) LRP:301(301,0) LRP2:286(286,0) LRF:301(301) LRB:300(300) LRF2:286(286) LRB2:284(284) AP2:286(286) CHG2:283(283) VAP:275(275) 
01-08 08:07:53.020  1425  1502 D PowerManagerService: UserActivityStateListenerState: 0
01-08 08:07:54.148 12030 12030 I wpa_supplicant: Heartbeat 6576
01-08 08:07:54.764  1425  1476 I ActivityManager: Killing 16364:org.mozilla.firefox_beta/u0a295 (adj 920): excessive cpu 9340 during 300151 dur=1123666 limit=2
Additional information

No response

┆Issue is synchronized with this Jira Task

Change performed by the Move to Bugzilla add-on.

Christian Sadilek [:csadilek]

Reporter

Updated

•

2 years ago

Blocks: 1752594

Christian Sadilek [:csadilek]

Reporter

Comment 1

•

2 years ago

•

Edited

Great summary in this comment on Github: https://github.com/mozilla-mobile/fenix/issues/28452#issuecomment-1382712532

The attached profile (https://share.firefox.dev/3kguBfB) shows CPU usage of the second content process while the app is in the background. There are frequent GC spikes that stand out to me and could be interpreted by the OS as excessive. The last spike is at 80% CPU.

No longer blocks: 1752594

Christian Sadilek [:csadilek]

Reporter

Updated

•

2 years ago

Blocks: 1752594

Mark

Comment 2

•

2 years ago

I also filed bug 1810370 against Focus. Similar but not identical behavior, probably the same root cause so if you want to close as duplicate go ahead.

Christian Sadilek [:csadilek]

Reporter

Comment 3

•

2 years ago

I also filed bug 1810370 against Focus. Similar but not identical behavior, probably the same root cause so if you want to close as duplicate go ahead.

Thanks, let's keep tracking the Focus issue too.

Christian Sadilek [:csadilek]

Reporter

Comment 4

•

2 years ago

The other activity on the content processes from the profile above comes from the page's setTimeout calls. These calls don't seem to pass the excessive CPU threshold though.

Full profile with app in background: https://share.firefox.dev/3kguBfB
Android detection of excessive CPU: https://cs.android.com/android/platform/superproject/+/android-11.0.0_r48:frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java;l=17702

Ted, thanks for taking a first look with me! Is there anything we could do to optimize / improve GC scheduling here when the app (and or tab) are in the background e.g., can we spread out the work of major GCs in this case?

Flags: needinfo?(tcampbell)

Steve Fink [:sfink] [:s:]

Comment 5

•

2 years ago

First thoughts:

It seems like the main options here are (1) break up the GC slices when in the background and (2) be less eager to GC when in the background. I'm not sure what exactly "in the background" should mean, though.

I'm going to make up some terminology for a distinction we probably ought to have:

frame-idle: When driven by the refresh driver, this is the short time between individual frames where we can do a small amount of work without missing the next refresh.
app-idle: the app (or a page) is in the foreground but there's nothing happening. No animations, no updates, no input events, no long processing even if it isn't visible to the user.
system-idle: the app is backgrounded. There may be setTimeout events firing and triggering JS code to run, possibly allocating memory.

These are not completely independent and may not be the right way to break things down, but they're better than a simple boolean "idle".

I think the behavior we would want is:

a) When in the foreground, we try to do as much GC work in frame-idle and app-idle time as possible.
b) When app-idle starts, be fairly aggressive about doing a last major GC.
c) When system-idle starts, be fairly aggressive about doing a last major GC. (We can go system-idle without ever being app-idle iiuc.)
d) After that GC (if it was found to be needed), be conservative about triggering GC.
e) If a GC is triggered during system-idle time, make it incremental with slice budgets and gap lengths set to stay comfortably below the excessive CPU thresholds.

Currently, the closest thing I know of that we have to system-idle is that if there has been no GC allocation at all, we will abort any GC that tries to start. But if setTimeouts are running and doing any allocation at all, this check will be defeated. So right now system-idle is treated the same as app-idle. Which means (a) says to set low thresholds and (e) says to set high thresholds, and since we don't really implement system-idle, (a) wins.

I was initially thinking that (e) would be the first thing to try, but having looked at the code linked above and the log messages, I'm skeptical. If I'm reading that correctly, the CPU usage threshold is calculated over the power check interval, which seems to be logged as about 300000ms or 5min. If the excessive CPU limit is 2% as it was in the log messages above, that means we get 6sec of CPU time allowed every 5min. The major GC in the posted profile consumed 379ms, which is 0.1% of the power check interval. The other 1.9% must have been from the setTimeout processing (and spinning things up and down to allow handling them).

I don't see how stretching out the major GC is really going to help. In order to drop that 379ms per interval appreciably, it would need to be stretched across several power check intervals, which means that we'd need to make the overall major GC take at least half an hour or so. Especially if we don't know whether we're system-idle vs app-idle, then that seems like a bad idea. When app-idle, we won't have finished the GC once things start happening again. And given that the GC was responsible for only 6.3% of the allowed CPU time (379ms of 6000ms allowed), that's very little gain at a high cost. (Note that the cost would not be high if we were detecting system-idle, but the gain is still low, at least based on the sample profile with a 379ms GC.)

It seems better to shoot for (c): when backgrounding, be fairly aggressive about running one last GC, then raise thresholds to try to prevent GCing at all while in the background. This does increase the risk of GCing immediately after the app is resumed again, but I think we already deal with that reasonably well: we'll wait to start the GC until frame-idle or app-idle time unless a lot of allocation happens quickly, and only if there is an extended non-idle period will we do work when the user might notice it.

This will require detecting system-idle time and taking it into account. I'm not entirely sure of the right way of doing that. I think it needs to make it into TaskController and/or its IdleTaskManager. I don't know if that should be done in the callback for Android's onStop() and onResume() events? If onStop() is called, is it ok to do some extra processing (a final GC if the lower thresholds want) and not have it count again the initial power check interval CPU threshold? (It'll at least be the highest threshold of the four, specifically it looks like it's mConstants.POWER_CHECK_MAX_CPU_1.)

Bas, what's the right way of handling this information in TaskController?

Who could do the Android-side plumbing?

Flags: needinfo?(tcampbell) → needinfo?(bas)

Steve Fink [:sfink] [:s:]

Comment 6

•

2 years ago

I should also mention that at least for this specific profile, fixing GC probably wouldn't prevent it from being killed. That would require slowing down setTimeout callbacks or something.

Also, based on skimming through Android docs, it seems like it would be useful to have profiler markers for lifecycle callbacks: onStop, onPause, onResume, etc. (So for example I could select the range from an onStop to the end and see what our notion of the CPU usage was.) It could be useful for analyzing things like this. I don't know where the callbacks for that sort of thing are, though; in the profile, searching for "stop" and "resume" returned nothing relevant.

Christian Sadilek [:csadilek]

Reporter

Updated

•

2 years ago

Comment 7

•

2 years ago

•

Edited

Thanks, Steve, for looking into this and the summary!

Who could do the Android-side plumbing?

The Android team can work on / help with this if we all agree that it's worth doing. Based on your summary, I agree that (c) would be worth trying.

Also, based on skimming through Android docs, it seems like it would be useful to have profiler markers for lifecycle callbacks: onStop, onPause, onResume, etc.

Yes, we can add those. We currently only have a marker for onStart. I filed bug 1811133 for this.

I should also mention that at least for this specific profile, fixing GC probably wouldn't prevent it from being killed. That would require slowing down setTimeout callbacks or something.

Yes, I think that's right for this particular page. The GC work would also affect otherwise "well-behaved" sites though, IIUC. Of course, it would be worse for pages with lots of work triggered by timeouts. Throttling timouts seems to already be supported though, and I wonder if we could "simply" configure this more aggressively on Android? Andreas, I found your old post about this [1] and bug 1377766. Does that sound reasonable to you? I am also wondering if this functionality is active / working correctly on Android, since the timeouts seem not affected in the profile above.

[1] https://groups.google.com/g/mozilla.dev.platform/c/hcEqovQrBts

Paul Bone [:pbone]

Comment 8

•

2 years ago

The "first" profile (in the github bug): https://share.firefox.dev/3kgBQEp doesn't even have any GCMajor events. So also doubt that GC is a major contributor here.

My other thoughts.

The github issue, but it's not clear in comment 0, describes two symptoms: 1. the process gets killed, 2. CPU usage is high until the tab is viewed.
I don't see especially high CPU usage in the profiles, yeah it's active and there are many wakeups (I would really like close to zero wakeups if my phone is idle), but it doesn't seem to match the description. I'm not sure what's happening here, maybe I'm reading the profiles wrong?
I think the high CPU usage before the tab is viewed is a more interesting observation. I'd check with some DOM folk but I don't know who specifically.

Steve Fink [:sfink] [:s:]

Comment 9

•

2 years ago

Looking at the other profile and the comment in the github issue that Paul points to, I agree with his conclusions. The issue filed doesn't seem to be GC related after all.

Our GC scheduling still seems wrong, and may be contributing to similar problems. But both of the profiles here are due to Fenix tab processes doing too much non-GC work before the tab is viewed (and after it is backgrounded?)

Christian Sadilek [:csadilek]

Reporter

Comment 10

•

2 years ago

Looking at the other profile and the comment in the github issue that Paul points to, I agree with his conclusions.

The other (first) profile is with Fenix in the foreground though, where different (less restrictive) rules apply. Here specifically we're trying to pinpoint all CPU activity that occurs on processes while the app is in the background, which get penalized by the OS. And GC contributes to that, and so do the timeouts on this page.

Mark

Comment 11

•

2 years ago

I think there is substantial CPU load which is detected by about:processes and adb shell top but maybe not by the profiler. I could be wrong of course.

I think ActivityManager killing in the background is also a symptom of the bug and not the bug itself. I think the bug is just that new tabs which are loaded but not yet viewed use substantial CPU indefinitely until they get viewed. about:processes and adb shell top show the CPU load, maybe the profiler does not.

I have simpler STR, maybe have a go at these and see what you think.

Close all tabs. Kill & relaunch Fenix. Open about:processes in tab 1. Open www.bestbuy.com in tab 2. Long press the Best Buy logo and open another www.bestbuy.com in a new background tab. Do not switch to the new tab so it remains unviewed. Switch back to about:processes. When everything has settled down Fenix is still using substantial CPU, I get ~40% for the main process and ~20% for one of the content processes. That CPU load stays indefinitely and is due to the unviewed Best Buy tab. Switch to the unviewed Best Buy tab to view it and then switch straight back to about:processes. CPU load on Main drops to ~5% and on the content process to <<1%. The act of viewing stops the excessive CPU load.

adb shell top shows similar CPU load, and shows that it continues even when Fenix app is in the Android background.

I think it's a small problem when Fenix is in the Android foreground, it wastes some battery. But it's a bigger problem when Fenix is in the Android background because ActivityManager dislikes apps using CPU in the background. I don't understand the rules which ActivityManager applies so I don't know exactly what leads to a kill. It takes many minutes or hours before it kills.

Mark

Comment 12

•

2 years ago

More clarity... Profiles seem to show intermittent cpu spikes on just a content process. That is not what I see using top or about:processes. Instead I see constant moderate cpu due to unviewed background tabs on content AND main process. Viewing the tab(s) stops the cpu load. I think it's real, a couple of unviewed background tabs makes my phone warm even when fenix is backgrounded, the phone is locked and in my pocket.

Kayacan Kaya [:kaya]

Comment 13

•

2 years ago

Great investigations! Jumping in to provide more context around how ActivityManager behaves:
From this link, it can be seen that at every power check step the allowed cpu percentage is set to a different constant. These constants are as follows (see lines 396-411): ActivityManagerConstants.java#396

The values of those constants are (in %):

private static final int DEFAULT_POWER_CHECK_MAX_CPU_1 = 25; 
private static final int DEFAULT_POWER_CHECK_MAX_CPU_2 = 25; 
private static final int DEFAULT_POWER_CHECK_MAX_CPU_3 = 10; 
private static final int DEFAULT_POWER_CHECK_MAX_CPU_4 = 2;

So, after the 3rd power check (15 mins), we are actually limited to only use 2% of the CPU. That's why in the first three power check intervals, we have more than 6 secs of CPU usage limit.
Therefore the following condition is valid only for the 4th interval, not for every 5 mins:

If I'm reading that correctly, the CPU usage threshold is calculated over the power check interval, which seems to be logged as about 300000ms or 5min. If the excessive CPU limit is 2% as it was in the log messages above, that means we get 6sec of CPU time allowed every 5min.

Considering what Christian also suggested as acting more aggressively on Android, I thought maybe we can include this gradually decreasing percentage limit and the power check interval value in our throttling algorithm to adjust the execution budget that we have in order to delay the timeouts a bit more aggressively.

Mark

Comment 14

•

2 years ago

(an unimportant but fun post, well I enjoyed it anyway)

@kaya. Nice! Thanks for the info!

4x www.bestbuy.com tabs loaded in the background and not viewed, Fenix put in the background at 16:30

top says 10-15% CPU for the main process and for each content process. There are occasional spikes up to ~30%. Stays constant “forever”:-

Tasks: 4 total,   0 running,   4 sleeping,   0 stopped,   0 zombie
Mem:  5788952K total,  5364172K used,   424780K free,     3012K buffers
Swap:  4194300K total,  2253296K used,  1941004K free,  1090608K cached
800%cpu  35%user   1%nice  31%sys 733%idle   0%iow   0%irq   0%sirq   0%host
PID USER         PR  NI VIRT  RES  SHR S[%CPU] %MEM     TIME+ ARGS            
25089 u0_a275      20   0  27G 318M 147M S 15.0   5.6   3:04.95 org.mozilla.fen+
25211 u0_a275      20   0 9.1G 377M  69M S 13.0   6.6   1:54.04 org.mozilla.fen+
26698 u0_a275      20   0 9.0G 345M  66M S 10.0   6.0   1:30.46 org.mozilla.fen+
25274 u0_a275      20   0 7.5G 128M  90M S  0.0   2.2   0:04.71 org.mozilla.fen+

After 25 minutes those processes all failed the 10% CPU test and the whole lot were killed simultaneously:-

01-19 16:55:06.866  1647  1695 I ActivityManager: Killing 18183:org.mozilla.fenix/u0a275 (adj 900): excessive cpu 34010 during 300001 dur=762507 limit=10
01-19 16:55:06.870  1647  1695 I ActivityManager: Killing 19358:org.mozilla.fenix:tab24/u0a275 (adj 920): excessive cpu 34670 during 300001 dur=762507 limit=10
01-19 16:55:06.871  1647  1695 I ActivityManager: Killing 22392:org.mozilla.fenix:tab15/u0a275 (adj 920): excessive cpu 33230 during 300001 dur=762507 limit=10

Second try,

1x bestbuy.com tab loaded in background and not viewed, Fenix put in background at 17:22

top says 6-9% for main and one content process, other is idle at <1%, as expected

It actually took just over 90 minutes. But then the main process and one content process got killed for exceeding the 2% limit. Pretty much as expected only it took a long time.

01-19 18:56:07.035  1647  1695 I ActivityManager: Killing 25089:org.mozilla.fenix/u0a275 (adj 900): excessive cpu 13230 during 300001 dur=4343161 limit=2
01-19 18:56:07.041  1647  1695 I ActivityManager: Killing 27377:org.mozilla.fenix:tab0/u0a275 (adj 910): excessive cpu 11490 during 300001 dur=4343161 limit=2

Florian Quèze [:florian]

Comment 15

•

2 years ago

(In reply to Christian Sadilek [:csadilek] from comment #1)

The attached profile (https://share.firefox.dev/3kguBfB) shows CPU usage of the second content process while the app is in the background.

I'm surprised when looking at this profile to see how often we execute setTimeout callbacks despite the process being in the "background" priority. On Firefox Desktop when a page is in a background tab, its timers are throttled and can't execute more than once per second. Here some timers seem to run every 100ms.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 16

•

2 years ago

janv, perhaps you know what all the "Runnable - StorageNotifierService::Broadcast - priority: Normal (4)" tasks are in the profile in comment 15. There are lots of them.

Flags: needinfo?(jvarga)

Julien Wajsberg [:julienw]

Updated

•

2 years ago

Performance Impact: --- → ?

Kayacan Kaya [:kaya]

Comment 17

•

2 years ago

Referring to the comments: 7 and 15, I did some investigations in order to see what our current state on Firefox Android is. Here are some of my findings:
After testing several usecases in the app, e.g. creating tabs, taking app to foreground/background and switching between the tabs, etc... I can confirm that the app process' foreground/background state is independent from the tab processes' foreground/background states (which is currently a bug on Android and is different from how the desktop app behaves). And the time throttling logic is also independent from the processes' priorities but it is dependent on the docshell's active/inactive state.
When I was testing the app (I will exclude the steps I took to keep the ticket clean, but no specific set of steps are necessary to reproduce what I observed), unfortunately I was not able to get a throttled TimeDuration w.r.t. the execution budget in TimeoutManager::MinSchedulingDelay . It is because TimeoutManager::BudgetThrottlingEnabled mostly returns false. And in the cases when it returns true, (mExecutionBudget < TimeDuration()) condition is not satisfied as the execution budget gets a value of 50000000.

Considering this current state, Andreas, I would like to raise a couple of questions to you referencing to this ticket:
1- How can we change the timer throttling algorithm to make it work for Android? Would it be possible to change it in a way that it looks for processes' priorities instead of the docshell's active/inactive state?
2- If it is not possible to adjust the algorithm to make it work on Android as well, do you have any other alternative ideas to consider for the Android client?
3- There is also the OcclusionStateChanged event which was used to indicate that the app is in the background, could we reuse that to make things work?
Thanks in advance!

Flags: needinfo?(afarre)

Andreas Farre [:farre]

Comment 18

•

2 years ago

FIrst thing: there are two setTimeout throttling mechanisms. The throttle to once every 1 second and the budget throttling. Both of these only occur when the tab is in the background (i.e. the active/inactive state). Both should of course work for Android.

Let me see if I understand this correctly, you suggest that the active/inactive state for browsing contexts on Android should be a dud, and not have any meaning? Or that it already doesn't mean anything, and we should instead have process priority co-opt the meaning of the active/inactive state? I think neither is a good idea. I think we need to get the browsing context active/inactive state working for Android (if it isn't working that is).

I'm not sure I understand what it means to look at process priorities here. Are we talking about content processes? In that case I think it also wouldn't be ideal. If we have a 100 tabs open, with pages in the same process (but not reachable through window.opener et al), we'd not be able to throttle the 99 tabs that are in the background.

I think that the algorithm is working for Android, or at least that it should work for Android. If the Android client hasn't got a way to communicate to browsing contexts which contexts are active, then that is the thing that we need to fix.
I think this already happens. RecomputeAppWindowVisibility() is called on OcclusionStateChanged.

So what could be wrong here. The execution budget feels very fishy to me. 50000000 ms, at a regeneration rate at 1 ms every 100 ms means that it took 57 days(!) to get it as high as 50000000. But that's really besides the point, because the execution budget is clamped to not allow it to go over the maximum allowed execution budget.

Are you sure that throttling (both the 1s one and budget throttling) isn't working for actual background tabs? And that it's only for background app that it's broken?

Does this reproduce in GeckoView? If it does I can have a quick look at it and maybe I can find any potential bugs.

Flags: needinfo?(afarre) → needinfo?(kkaya)

Mark

Comment 19

•

2 years ago

Yes, it would be worth checking GeckoView. Focus behaves differently from Fenix which might be a clue.

With Fenix I think throttling is only triggered on tab transition from foreground to background. Throttling is not triggered by app transition from foreground to background. With Focus I think throttling is never triggered.

Steve Fink [:sfink] [:s:]

Updated

•

2 years ago

Updated

•

2 years ago

Comment 20

•

2 years ago

The bug related to the setTimeout callbacks on Android is filed in the following ticket with a short description about why it happens: https://bugzilla.mozilla.org/show_bug.cgi?id=1815015

Flags: needinfo?(kkaya)

Chris Peterson [:cpeterson]

Updated

•

2 years ago

Severity: -- → S3

Joe Walker [:jwalker]

Updated

•

2 years ago

Keywords: requested-mobile

Bas Schouten (:bas.schouten)

Comment 21

•

2 years ago

Olli would probably have the most expertise in this director. If we want to somehow implement this in TaskController directly we could, but I think this can be handled though the IdleTaskManager (or another TaskManager).

Flags: needinfo?(bas) → needinfo?(smaug)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 22

•

2 years ago

Oh, we mark processes being background even though the tab is still foreground, but the app as whole is in background?
(just based on the profile)

florian, or anyone, do you know how that is implemented on Android?

Flags: needinfo?(florian)

Florian Quèze [:florian]

Comment 23

•

2 years ago

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #22)

Oh, we mark processes being background even though the tab is still foreground, but the app as whole is in background?

Actually, we don't, you can check the process priority markers. (or I'm confused and there's a new profile I haven't looked at)

florian, or anyone, do you know how that is implemented on Android?

Here's a comment I wrote on Slack that I think answers:

Investigating this a bit, with a profile that includes IPCs https://share.firefox.dev/3R4rEuI, the timers are not throttled, but completely suspended. It's triggered by the PContent::Msg_CommitBrowsingContextTransaction IPC, and there's a call to nsGlobalWindowInner::Suspend with this stack:
nsGlobalWindowInner::Suspend(bool) dom/base/nsGlobalWindowInner.cpp
mozilla::dom::BrowsingContextGroup::UpdateToplevelsSuspendedIfNeeded() docshell/base/BrowsingContextGroup.cpp
mozilla::dom::BrowsingContext::DidSet(...) docshell/base/BrowsingContext.cpp
mozilla::dom::syncedcontext::Transaction<mozilla::dom::BrowsingContext>::Apply(...) const docshell/base/SyncedContextInlines.h
...
mozilla::dom::ContentChild::RecvCommitBrowsingContextTransaction(...) dom/ipc/ContentChild.cpp
mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) ipc/ipdl/PContentChild.cpp
PContent::Msg_CommitBrowsingContextTransaction

Flags: needinfo?(florian)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 24

•

2 years ago

But the profile in comment one ever sets priority to Background onlyl

Florian Quèze [:florian]

Comment 25

•

2 years ago

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #24)

But the profile in comment one ever sets priority to Background onlyl

That profile, according to https://github.com/mozilla-mobile/fenix/issues/28452#issuecomment-1382712532, opens tabs in the background. These tabs are never put in the foreground, so maybe we miss the opportunity to suspend the timers there? I have not been able to reproduce myself the exact behavior in that profile, where we end up with processes in the background priority, but firing timers very often. I think kaya managed to reproduce.

Kayacan Kaya [:kaya]

Comment 26

•

2 years ago

In the case of navigating between the tabs, the timers -as Florian pointed out- seem to be completely suspended. However, in the scope of this ticket where we are opening links in a new tab and not switching to the newly created tabs, the case is a bit different. The newly created [and not switched] tabs get FOREGROUND ProcessPriority and fire non-stop setTimeout callbacks though they are not visible. This isBackground check is buggy on Android and needs to be improved by us sending OcclusionStateChangeEvent that would mark the newly created tab as background in terms of docshell's active/passive state.

I may also provide some related info for the implementation of ProcessPriority on Android:
Not sure what you exactly mean by "marking processes being background" but I will answer this w.r.t. the "ProcessPriority" attribute.

When app is on the foreground and we have only one tab open, the ProcessPriority of that tab is FOREGROUND.
When we take the app to background (e.g. locking the screen, pressing the home button etc.), the tab process' ProcessPriority is still FOREGROUND.
When the app is on the foreground and we create a new tab from tab's tray and switch to the newly created tab:
-- the newly created tab gets FOREGROUND ProcessPriority
-- 3secs after [navigating to the new tab], the previously visible tab gets BACKGROUND ProcessPriority.
When the app is taken to background again [with two tab/content processes one having FOREGROUND and other one having BACKGROUND ProcessPriority], nothing changes for the tab processes' ProcessPriority fields. They stay as they are.
In simple terms: app's foreground/background state does not have any impact on what the tab process' priorities get. Only the navigation between the tabs affect the tab process' priority assignments.
There is a bug filed on Android to reflect this buggy behavior of process priority assignment.
Note: There is a mapping of ProcessPriority field from cpp to java modules as implemented here.

Jan Varga [:janv]

Comment 27

•

2 years ago

•

Edited

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #16)

janv, perhaps you know what all the "Runnable - StorageNotifierService::Broadcast - priority: Normal (4)" tasks are in the profile in comment 15. There are lots of them.

StorageNotifierService::Broadcast is used to fire storage events. If there are too many of them, the page is probably calling localStorage.setItem a lot.
https://cdn.quantummetric.com/qscripts/quantum-telegraph.js shows up in the profile and contains localStorage.setItem calls

Flags: needinfo?(jvarga)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 28

•

2 years ago

The background check is not buggy ;) Android doesn't seem to tell that the page is in background, so the platform code can't do much to that.

About the process priority, the profile in comment 1 shows the there are ProcessPriority markers, something is moving a child process being in background. Though, using that information for setTimout wouldn't be useful, since the same process may have tabs in background and foreground and relying on processpriority would not catch that case.

Flags: needinfo?(smaug)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 29

•

2 years ago

•

Edited

Btw, kaya, just say if you think the platform side is missing some helper code here which would make it easier for Fenix to tell which tab is in background. Desktop frontend seems to rely on things like https://searchfox.org/mozilla-central/rev/66aa740e65a343659a7446b890145781f1b6a344/toolkit/content/widgets/browser-custom-element.js#422-423
On the content process side it leads to https://searchfox.org/mozilla-central/rev/66aa740e65a343659a7446b890145781f1b6a344/docshell/base/BrowsingContext.cpp#2688,2732 and then https://searchfox.org/mozilla-central/rev/66aa740e65a343659a7446b890145781f1b6a344/docshell/base/nsDocShell.cpp#4948

Flags: needinfo?(kkaya)

Kayacan Kaya [:kaya]

Comment 30

•

2 years ago

Hi Olli, on Android we are missing the OcclusionStateChanged event that is fired on desktop apps (Windows and macOS). As mentioned in this ticket, some of the methods you mentioned is included in the ticket as well. I am currently working on integrating the window occlusion state tracker api on Android as well. We had a discussion with Andreas [:farre] about this event which eventually sets the correct docshell states that are returned here. By means of sending OcclusionStateChanged event on Android, we will be setting the dochell's active/passive state correctly that would help us throttle the timers when a link is opened as a new tab and not switched to.

Flags: needinfo?(kkaya)

Bas Schouten (:bas.schouten)

Comment 31

•

2 years ago

The Performance Impact Calculator has determined this bug's performance impact to be medium. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: Android
[x] Causes severe resource usage
[x] Able to reproduce locally
[x] Bug affects multiple sites

Performance Impact: ? → medium

Keywords: perf:resource-use

Mark

Comment 32

•

1 year ago

Not sure if bug 1815015 was supposed to fix this? I can't comment on bug 1815015, it's closed

I still have problems in Nightly Build 20159475555, 2023-04-29T04:06:49

Open 7 background tabs on bestbuy.com at 08:08, leave phone for an hour:-

04-30 08:53:31.257 1611 1660 I ActivityManager: Killing 19651:org.mozilla.fenix:tab17/u0a284 (adj 700): excessive cpu 7100 during 300410 dur=936948 limit=2

Maybe things are better, but Fenix still isn't stable for me with a bunch of background tabs.

https://share.firefox.dev/3Nl11C8

Mark

Comment 33

•

1 year ago

Could be a red herring?

I had 1x foreground tab (ie I viewed it) & 7x background tabs (ie unviewed) all open on bestbuy.com, with Fenix in the Android background. Perhaps the problem was actually the single foreground tab? IIUC bug 1815015 might only fix the background tab issue, the foreground tab might still be using CPU?

Kayacan Kaya [:kaya]

Comment 34

•

1 year ago

Hi Mark, thank you for checking in!
The ultimate fixation of this issue is a bit tricky as the purpose of all the fixes is basically letting the processes live longer. However, at the end of the day, they will not guarantee that the processes will not be killed by the OS. The fix in 1815015 is an improvement for background tabs w.r.t. the less consumption of CPU. The decreased amount of CPU depends on the website script and how frequent it was triggering the setTimeout callbacks. With that fix, the average CPU reduction is around 7% for a single backgrounded tab (the test case: collection of CPU values at each 10s intervals for 5 different backgrounded websites for 100 seconds). The reduction values oscillates between 2.5% up to 25% w.r.t. the webpages in the test case. When it comes to your case, I would expect a better result along with that fix, but I would not be able to guarantee your process to be alive after an hour.

We only have 2 content processes that holds all the tabs you've opened. This means, the CPU consumption of all the tabs in a process add up to the CPU consumption value of that process. Taking the average 7% reduction into account (could be less or more in your case), the fix in 1815015 should've reduced the CPU consumption by ~25% for each process (p1: 4 background tabs, p2: 1 foreground + 3 background tabs). However, from the profiler you've sent, it still looks like they are consuming a noticeable amount of CPU for the OS to recognize and ActivityManager to kill.
I would expect your foreground tab to be on the safe side when the app is taken to background. Before the fix in 1815015, the background tabs; however, were continuing to consistently consume CPU though the app was backgrounded, now it should be resolved as well (most of the spikes in your profiler are the throttled timeout callbacks that are expected).
I will inspect the profiler you linked once again to spot if there are any other points that could still be improved.

So... all in all, I expect that we will be doing better in general by the fix in 1815015 (would be concerned otherwise) and I know that this ticket will not have an ultimate fix. The purpose is getting better in CPU consumption and letting the processes live as long as we can, hoping the user to come back by the time and prioritize the app once again not to let the OS to kill it. This means, there will always be a room for improvements (and a room for the OS to kill the processes). For the next steps and improvements regarding the CPU consumption, we will be linking tickets here after running our investigations.

Mark

Comment 35

•

1 year ago

Thanks for the detailed & interesting reply. I do that think it is better in everyday use, my test was quite artificial & perhaps unduly severe. Keep up the good work & thanks!

Jeff Boek [:boek]

Updated

•

1 year ago

Priority: -- → P2

John Thompson

Comment 36

•

1 year ago

Hi,
Great what you're doing here.
I just dropped in with a comment on high CPU usage (PC / Windows 10, I know this thread is about Android but the following is likely applicable cross-platform).
I was having problems with a particular site that I use a lot. Every time I opened a page from that site, CPU would jump to 70%, then settle down at 40-50% for about a minute or so before easing off. As soon as I loaded another page from the same site, or even just moved in the same page, it would start again. Didn't matter if the page was in a tab not displayed, or without focus or even in a minimized window.
I cleared the site data accumulated over three years for that particular site (132 cookies, 156 MB of storage) and the problem seems to be solved.
Maybe users could be warned when this kind of situation arises? With suitable disclaimers re losing unsaved data etc. of course.

Mark

Comment 37

•

10 months ago

Any more progress on this? it's better but I am still getting perhaps one excessive CPU kill per day even with only 4 tabs open.
Perhaps the next big win will be Fission? Is Fission stable enough to be worth enabling it? (I use Nightly)
Thanks all!

Mark

Comment 38

•

10 months ago

I tried Fission, it improves things still further. Tabs (almost) never get killed through "Excessive CPU". If I use my phone gently to keep memory pressure moderate, I can keep the same 6-8 Fenix tabs loaded in memory for days. Fission is surprisingly usable too.

I've got to congratulate the Mozilla team, Fenix Nightly has become noticeably faster and much more stable for me during 2023. A year ago bringing a Fenix tab to the foreground was miserable, it was quite likely to reload the tab or even the whole browser. But now that almost never happens, for me anyway. Maybe you have more work to do for phones with small amounts of RAM.

I still think the Fenix UI sucks compared to Fennec, but as a basic "tabbed browser" Fenix is now really good.

Patricia Lawless

Updated

•

6 months ago

Blocks: perf-android

Frank Doty [:fdoty]

Updated

•

5 months ago

Blocks: 1894804

Chris Peterson [:cpeterson]

Comment 39

•

5 months ago

Clearing Priority so we can reprioritize these bugs relative to our ux-fun-2024 bugs.

Priority: P2 → --

Markus Stange [:mstange]

Updated

•

4 months ago

Depends on: 1859846