Can CPU use data help reduce the overhead of profiling (many) idle threads?
Categories
(Core :: Gecko Profiler, task, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox90 | --- | fixed |
People
(Reporter: florian, Assigned: mozbugz)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
While trying to capture a profile of many threads of a mostly idle process for bug 1689164, I noticed in about:processes that the SamplerThread
was using about 16% of a core per process, including in processes where almost all the threads were idle (<1% CPU use for the entire process).
I later tried to reduce the list of sampled threads, and the CPU use of the SamplerThread
went down to about 4%.
This was on Mac, I don't know of the overhead is comparable on other platforms. My attempt to profile with Instruments failed (it crashed).
Would it be possible to use the CPU utilization data that we are now recording to dramatically reduce the overhead of profiling threads that are completely idle? (0 CPU cycle used since the previous sample)
If it's possible, it could enable using the Firefox Profiler to profile all threads with a reasonable amount of overhead (currently some engineers switch to Linux perf
for that use case).
Assignee | ||
Comment 1•5 years ago
|
||
We should definitely test if runningTimes.GetThreadCPUDelta() == Some(0)
there, and duplicate stacks in this case.
Now I'm not certain that this would help that much with the overhead, since most places where the CPU is idle should hopefully be marked with AUTO_PROFILER_THREAD_SLEEP
, and already do the stack duplication instead of a full stack walk.
I'm afraid this duplication work may be the main cause of overhead -- to be measured; bug 1633572 should help with this.
But regardless, we should try to implement this bug here anyway.
In the "worst" case, it may not help much with the overhead, but the cost is minimal and it could give useful insights into AUTO_PROFILER_THREAD_SLEEP
, e.g., to prove that it works, maybe find spots where it could be added, etc.
Reporter | ||
Comment 2•5 years ago
|
||
(In reply to Gerald Squelart [:gerald] (he/him) from comment #1)
since most places where the CPU is idle should hopefully be marked with
AUTO_PROFILER_THREAD_SLEEP
, and already do the stack duplication instead of a full stack walk.
Unless I'm missing something, AUTO_PROFILER_THREAD_SLEEP
is called from C++, so I suspect none of the Rust threads use it (similarly, they don't have label frames setting the idle category).
Assignee | ||
Comment 3•5 years ago
|
||
Good point about Rust threads (unless they call C++ waiting functions that we've instrumented, but I doubt it), and there could be other C++ functions that miss AUTO_PROFILER_THREAD_SLEEP
.
Anyway, we should implement this bug in any case, then measure the benefits (and see which one of us was too optimistic/pessimistic!)
And then this could point at spots where we could add AUTO_PROFILER_THREAD_SLEEP
s, because I believe they would overlap and extend ranges with 0% CPU, so they'd still be useful.
Assignee | ||
Comment 4•4 years ago
|
||
I did a quick experiment with adding runningTimesDiff.GetThreadCPUDelta() == Some(uint64_t(0)) ||
there, and I found effectively no occurrences where it helped!
I.e., every time the CPU use was zero, the thread had already been marked asleep. Great job to whoever added all these PROFILER_THREAD_SLEEP
s! (A few did happen when starting the profiler, but nothing after that.)
On the other hand, there were occasions where the thread was marked asleep, but there was some CPU activity. I'm guessing it can happen due to OS work that happens while handling mutexes, IOs, etc.
This shows that we can't rely on CPU measurements alone, and we should keep using PROFILER_THREAD_SLEEP
to lower the profiler overhead.
But it may still be good to add the CPU check, because it's cheap, and it could still catch areas without PROFILER_THREAD_SLEEP
(that I didn't hit in my short tests), which may happen more frequently when we start adding Rust threads, or even unregistered threads.
Assignee | ||
Comment 5•4 years ago
|
||
In cases where a thread is not doing anything, but hasn't been marked as asleep with AUTO_PROFILER_THREAD_SLEEP
, we can still duplicate the previous sample if we know that zero CPU activity happened.
Updated•4 years ago
|
Comment 7•4 years ago
|
||
bugherder |
Description
•