Closed Bug 1736513 Opened 3 years ago Closed 2 years ago

CPU utilization recording can have too-high overhead

Categories

(Core :: Gecko Profiler, defect, P2)

defect

Tracking

()

RESOLVED FIXED
95 Branch
Tracking Status
firefox95 --- fixed

People

(Reporter: mozbugz, Assigned: mozbugz)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Thanks to dthayer, here's a profile taken by external tool, while Firefox was running the Gecko Profiler: https://share.firefox.dev/3n4b0gp
In the SamplerThread, most of the time is spent in QueryThreadCycleTime, which is used to retrieve the number of cycles that a thread has spent on the CPU.
This overhead is way too high. I always assumed that that function was quick.

Now, there is extra code that may call this function in a loop, until it can be tightly tied with a timestamp: https://searchfox.org/mozilla-central/rev/36aa22c7ea92bd3cf7910774004fff7e63341cf5/tools/profiler/core/platform.cpp#3285-3299
Basically, it takes a timestamp, runs QueryThreadCycleTime, takes another timestamp, and repeats until the two timestamps are "close enough" (determined by early benchmarking to guage what the possible maximum speed probably is).
If the computer gets really busy, it's possible that the SamplerThread will be interrupted a lot, in which case it could spent a lot of time in that loop.

There should be at least a maximum number of loops allowed.

And I should better measure the overhead of these functions, ideally on most platforms, and on slow hardware.
And look again at the actual impact of not running this loop (i.e., only recording one timestamp and one CPU measurement.)

External tool: https://github.com/jrmuizel/etw-profiler/tree/main/etw-gecko

GetRunningTimesWithTightTimestamp was running the native get-running-times functions in a loop until it could run it within a "fast enough" time (based on an initial quick benchmark); This was to associate a timestamp with CPU measurements as close as possible, to get a more reliable CPU utilization graph at the end.
But this loop condition meant that if the system was under high stress, the loop could potentially re-run many times, wasting way more resources than the resulting "tightness" warrants.

So instead of repeating until "fast enough", we repeat at most once, and take the best of both measurements.
In the worst case, CPU measurements and timestamps may be a bit far apart, but it just means that a spike of activity may be reported in the next time slice instead, but the cumulative amount of work will be correct overall.

Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a3a6b2e8e578
Repeat thread-CPU-measurement at most once - r=canaltinova
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 95 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: