Closed Bug 1917184 Opened 1 year ago Closed 8 months ago

MemoryTelemetry::GatherReports should not run in idle processes

Categories

(Core :: XPCOM, defect, P2)

defect

Tracking

()

RESOLVED FIXED
136 Branch
Performance Impact medium
Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 --- wontfix
firefox130 --- wontfix
firefox131 --- wontfix
firefox132 --- wontfix
firefox134 --- wontfix
firefox135 --- fixed
firefox136 --- fixed

People

(Reporter: florian, Assigned: pbone)

References

(Blocks 3 open bugs, Regression)

Details

(Keywords: perf:resource-use, regression)

Attachments

(5 files, 1 obsolete file)

Here is a 10 minute subset of a power profile I took over the night on my Macbook: https://share.firefox.dev/3z94fDY (If you are curious the full profile is at https://share.firefox.dev/4dPX96F ; I was testing my patches for bug 1914862).

In the profile, we see that every content process (including the preallocated ones that should be completely inactive):

  • starts every minute a MemoryTelemetry::GatherReports runnable (off a 59,999ms timer that we can see on the timer thread),
  • the main thread MemoryTelemetry::GatherReports runnable spawns a new StreamTrans thread, which also runs a MemoryTelemetry::GatherReports runnable.
  • both the main thread and the StreamTrans MemoryTelemetry::GatherReports runnables save data to Telemetry Histograms.
  • saving any data in Telemetry starts a timer, which triggers an IPC to the parent process after 2 seconds (also causing a wake-up of the parent process)
  • after 30s of idleness, the StreamTrans thread is shutdown (also waking up the main thread).

So that's at least 3 (more likely significantly more) process wake-ups per minute per content process.

In the profile, my Firefox had a single tab, but 6 content processes (1 for my tab, 1 webextension, 1 priviledged about, 3 preallocated), so these MemoryTelemetry::GatherReports runnables caused at least 18 process wake-ups per minute. The cost of these wake-ups is easily visible in the shape of the power tracks for child processes (ignore the power track in the parent, the excessive power use there is due to bug 1735408).

Paul, from reading bug 1894582 I'm not sure if this more frequent memory telemetry collection was meant to be temporary (if so, given it has already been in for multiple cycles, maybe we already have the data we need?) or permanent. If it was meant to be permanent, I think it should really be based on activity instead of on timers. From my reading of bug 1894582 it seems using timers was meant to avoid over-reporting for busy processes and avoid under-reporting of low-activity processes. I think both of these could be achieved without timer:

  • To avoid over-reporting, you could save a timestamp of the last report, and ignore new opportunities to make a report if the saved timestamp is recent enough.
  • To avoid under-reporting, you could save the same memory report multiple times at once (ie. call TelemetryHistogram.Add multiple times in a row) if you notice the last report is old.
Duplicate of this bug: 1917185

Set release status flags based on info from the regressing bug 1894582

:pbone, since you are the author of the regressor, bug 1894582, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

Flags: needinfo?(pbone)

Thanks Florian, I will definitly look at this.

Assignee: nobody → pbone
Status: NEW → ASSIGNED
Flags: needinfo?(pbone)

The severity field is not set for this bug.
:nika, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(nika)
Severity: -- → S3
Flags: needinfo?(nika)

Hi Paul,

Is there any progress on this bug? This issue seems to be a significant contributor for why we consume 2x more power and cpu time vs chrome during the background-resource test.. Here is another profile of this specific test case showing a lot of cycles spent in GatherReports: https://share.firefox.dev/3VxnVJW

Blocks: perf-android
Performance Impact: --- → medium
Flags: needinfo?(pbone)

Here are results from disabling the memory reporter. We see a 50% improvement in the background CPU time.

Okay. I'm sorry I dropped this. It's now my priority.

Flags: needinfo?(pbone)

We see a 50% improvement in the background CPU time.

This could be a significant improvement on Android background tab kills, so rising to S2.

Severity: S3 → S2
Priority: -- → P2
Attachment #9430902 - Attachment is obsolete: true
Attachment #9430901 - Attachment description: WIP: Bug 1917184 - Don't collect memory telemetry in preallocated processes → Bug 1917184 - Don't collect memory telemetry in preallocated processes r=mccr8

This patch adds a sliding window of recent calls to Poke() that is used to
determine if the process is idle. This avoids running the memory telemetry
when the process should be idle.

Attachment #9444496 - Attachment description: Bug 1917184 - Fix timer comparison r=mccr8 → Bug 1917184 - Fix inverted comparision operator when checking cooldown r=mccr8
Attachment #9444496 - Attachment description: Bug 1917184 - Fix inverted comparision operator when checking cooldown r=mccr8 → Bug 1917184 - Fix inverted comparison operator when checking cooldown r=mccr8
Pushed by pbone@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7cefcb3f6e31 Use a constant rather than a magic number r=mccr8 https://hg.mozilla.org/integration/autoland/rev/cfe4a3109ecf Change kTelemetryInterval's unit to seconds r=mccr8 https://hg.mozilla.org/integration/autoland/rev/a2b1914cd7be Don't collect memory telemetry in preallocated processes r=mccr8 https://hg.mozilla.org/integration/autoland/rev/3d05cf9721c0 Fix inverted comparison operator when checking cooldown r=mccr8 https://hg.mozilla.org/integration/autoland/rev/25d2eab62bf9 Detect "idleness" more accurately r=mccr8

The patch landed in nightly and beta is affected.
:pbone, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox135 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(pbone)

Comment on attachment 9444497 [details]
Bug 1917184 - Detect "idleness" more accurately r=mccr8

Beta/Release Uplift Approval Request

  • User impact if declined/Reason for urgency: Firefox continues to process and send this telemetry even when idle, this wakes the CPU more frequently than necessary which can consume battery power.
  • Is this code covered by automated tests?: Unknown
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: Other patches on this bug.
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This is not risky, it's a change to some telemetry code that a user won't notice.
  • String changes made/needed: None.
  • Is Android affected?: Yes
Flags: needinfo?(pbone)
Attachment #9444497 - Flags: approval-mozilla-beta?

Can we confirm that this has led to the expected improvements on Nightly builds?

Flags: needinfo?(florian)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #20)

Can we confirm that this has led to the expected improvements on Nightly builds?

If I compare the behavior of the privileged content process in the profile from comment 0 (https://share.firefox.dev/429Wrhj) and in a long profile captured today (https://share.firefox.dev/4habFap), there's a pretty clear improvement. In the first profile, the MemoryTelemetry::GatherReports timer fired every minute. In today's profile, it fired once in 3 hours.

Today, no timer fired in 3 hours in the preallocated content processes: https://share.firefox.dev/4237tVL In the profile from comment 0 the timers fired every minute in preallocated processes: https://share.firefox.dev/3Pv0c9D

The profile I captured today doesn't have any non-empty content process (I was profiling about:blank), and was on Windows (vs Mac for the profile from comment 0) but I think that's enough to confirm a dramatic improvement.

Flags: needinfo?(florian)

Comment on attachment 9444497 [details]
Bug 1917184 - Detect "idleness" more accurately r=mccr8

Thanks for the extra info, glad to hear it's making such a noticeable difference already! Approved for 135.0b4.

Attachment #9444497 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: