CPU utilization recording can have too-high overhead
Categories
(Core :: Gecko Profiler, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox95 | --- | fixed |
People
(Reporter: mozbugz, Assigned: mozbugz)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
Thanks to dthayer, here's a profile taken by external tool, while Firefox was running the Gecko Profiler: https://share.firefox.dev/3n4b0gp
In the SamplerThread, most of the time is spent in QueryThreadCycleTime
, which is used to retrieve the number of cycles that a thread has spent on the CPU.
This overhead is way too high. I always assumed that that function was quick.
Now, there is extra code that may call this function in a loop, until it can be tightly tied with a timestamp: https://searchfox.org/mozilla-central/rev/36aa22c7ea92bd3cf7910774004fff7e63341cf5/tools/profiler/core/platform.cpp#3285-3299
Basically, it takes a timestamp, runs QueryThreadCycleTime
, takes another timestamp, and repeats until the two timestamps are "close enough" (determined by early benchmarking to guage what the possible maximum speed probably is).
If the computer gets really busy, it's possible that the SamplerThread will be interrupted a lot, in which case it could spent a lot of time in that loop.
There should be at least a maximum number of loops allowed.
And I should better measure the overhead of these functions, ideally on most platforms, and on slow hardware.
And look again at the actual impact of not running this loop (i.e., only recording one timestamp and one CPU measurement.)
External tool: https://github.com/jrmuizel/etw-profiler/tree/main/etw-gecko
Assignee | ||
Comment 1•2 years ago
|
||
GetRunningTimesWithTightTimestamp
was running the native get-running-times functions in a loop until it could run it within a "fast enough" time (based on an initial quick benchmark); This was to associate a timestamp with CPU measurements as close as possible, to get a more reliable CPU utilization graph at the end.
But this loop condition meant that if the system was under high stress, the loop could potentially re-run many times, wasting way more resources than the resulting "tightness" warrants.
So instead of repeating until "fast enough", we repeat at most once, and take the best of both measurements.
In the worst case, CPU measurements and timestamps may be a bit far apart, but it just means that a spike of activity may be reported in the next time slice instead, but the cumulative amount of work will be correct overall.
Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a3a6b2e8e578 Repeat thread-CPU-measurement at most once - r=canaltinova
Comment 3•2 years ago
|
||
bugherder |
Description
•