Closed Bug 1822483 Opened 1 year ago Closed 1 year ago

Power tracks stop after a few hours on Intel Macs

Categories

(Core :: Gecko Profiler, defect, P2)

defect

Tracking

()

RESOLVED FIXED
113 Branch
Tracking Status
firefox113 --- fixed

People

(Reporter: florian, Assigned: florian)

References

Details

Attachments

(1 file)

Example profile: https://share.firefox.dev/3JDrclg

The track that stops first is always the "CPU package" one, which contains the higher value. The "CPU cores" track stops later. And after leaving the profiling running an entire week-end, I only have the "iGPU" track left.

This behavior seems to indicate we have a value that exceeds the maximum possible value of a variable.

The check at https://searchfox.org/mozilla-central/rev/47aea2f603cc18144afcedbd604a418f11e90f9b/tools/profiler/core/PowerCounters-mac-amd64.cpp#184 ensures we only capture a new sample if it's value is higher than the previous one, so that likely explains why the track stops, if somehow the counter went back to zero.

I'm not sure where things overflow. We have mSample that is a double at https://searchfox.org/mozilla-central/rev/47aea2f603cc18144afcedbd604a418f11e90f9b/tools/profiler/core/PowerCounters-mac-amd64.cpp#191 and the raw counters are uint64_t at https://searchfox.org/mozilla-central/source/tools/profiler/core/PowerCounters-mac-amd64.cpp#97-112

We seem to overflow after using a bit more than 3W for about 24h.

Severity: -- → S3
Priority: -- → P2

I got a long profile last night from a 2018 macbook pro: https://share.firefox.dev/3zdeDXR where I added debug logging.

There are 3 interesting events in the profile:

  • after about 13,000s, the "CPU package" track stops updating. Here is what I have in the log:
pkg: 4294944464, pp0: 2225911504, pp1: 1486256680, ddr: 505918927
pkg: 78, pp0: 2225914945, pp1: 1486273631, ddr: 505921229
  AddSample failed: mSample: 262142,606445, aSample: 0,004761
  • about 25,000s (~7h) later, there's a little glitch on the "CPU package" track:
pkg: 4294935371, pp0: 3794278915, pp1: 2887882738, ddr: 1602795378
pkg: 4294956320, pp0: 3794282243, pp1: 2887897686, ddr: 1602797550
pkg: 7661, pp0: 3794285788, pp1: 2887910164, ddr: 1602799880

This explains the little glitch as 4294956320 > 4294944464 (which was the last value recorded 7h before).

  • The "Power: iGPU" track stops after about 38,200s:
pkg: 36554547, pp0: 3030930415, pp1: 4294963938, ddr: 1083658241
pkg: 36578830, pp0: 3030934437, pp1: 14209, ddr: 1083660720
  AddSample failed: mSample: 262143,795044, aSample: 0,867249

4294944464 = 0b11111111111111111010011011010000
4294956320 = 0b11111111111111111101010100100000
4294963938 = 0b11111111111111111111001011100010

So it seems despite being stored in uint64_t fields of the pkg_energy_statistics_t structure, the pkg_energy and pp1_energy fields are uint32_t values.

Looking at the source code of the kernel (https://github.com/apple/darwin-xnu/blob/2ff845c2e033bd0ff64b5b6aa6063a1f8f65aa32/osfmk/i386/Diagnostics.c#L252-L260) it seems two 32 bit values are read and put together into a 64bit value, so I'm not sure why the high part of the 64 bit value is always zeros for us.

Assignee: nobody → florian
Status: NEW → ASSIGNED

(In reply to Florian Quèze [:florian] from comment #1)

I got a long profile last night from a 2018 macbook pro: https://share.firefox.dev/3zdeDXR where I added debug logging.

Here's an even longer profile (more than 21h) from the same machine with the patch applied: https://share.firefox.dev/3zjewtH, showing that it does fix the bug.

Pushed by fqueze@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e6edaff6f405
Handle wraparounds in RAPL power counters on Intel Macs, r=dlrobertson.
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: