Closed Bug 1689325 Opened 5 years ago Closed 4 years ago

Can CPU use data help reduce the overhead of profiling (many) idle threads?

Categories

(Core :: Gecko Profiler, task, P2)

task

Tracking

()

RESOLVED FIXED
90 Branch
Tracking Status
firefox90 --- fixed

People

(Reporter: florian, Assigned: mozbugz)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

While trying to capture a profile of many threads of a mostly idle process for bug 1689164, I noticed in about:processes that the SamplerThread was using about 16% of a core per process, including in processes where almost all the threads were idle (<1% CPU use for the entire process).

I later tried to reduce the list of sampled threads, and the CPU use of the SamplerThread went down to about 4%.

This was on Mac, I don't know of the overhead is comparable on other platforms. My attempt to profile with Instruments failed (it crashed).

Would it be possible to use the CPU utilization data that we are now recording to dramatically reduce the overhead of profiling threads that are completely idle? (0 CPU cycle used since the previous sample)

If it's possible, it could enable using the Firefox Profiler to profile all threads with a reasonable amount of overhead (currently some engineers switch to Linux perf for that use case).

We should definitely test if runningTimes.GetThreadCPUDelta() == Some(0) there, and duplicate stacks in this case.

Now I'm not certain that this would help that much with the overhead, since most places where the CPU is idle should hopefully be marked with AUTO_PROFILER_THREAD_SLEEP, and already do the stack duplication instead of a full stack walk.
I'm afraid this duplication work may be the main cause of overhead -- to be measured; bug 1633572 should help with this.

But regardless, we should try to implement this bug here anyway.
In the "worst" case, it may not help much with the overhead, but the cost is minimal and it could give useful insights into AUTO_PROFILER_THREAD_SLEEP, e.g., to prove that it works, maybe find spots where it could be added, etc.

Severity: -- → N/A
Priority: -- → P2

(In reply to Gerald Squelart [:gerald] (he/him) from comment #1)

since most places where the CPU is idle should hopefully be marked with AUTO_PROFILER_THREAD_SLEEP, and already do the stack duplication instead of a full stack walk.

Unless I'm missing something, AUTO_PROFILER_THREAD_SLEEP is called from C++, so I suspect none of the Rust threads use it (similarly, they don't have label frames setting the idle category).

Good point about Rust threads (unless they call C++ waiting functions that we've instrumented, but I doubt it), and there could be other C++ functions that miss AUTO_PROFILER_THREAD_SLEEP.

Anyway, we should implement this bug in any case, then measure the benefits (and see which one of us was too optimistic/pessimistic!)

And then this could point at spots where we could add AUTO_PROFILER_THREAD_SLEEPs, because I believe they would overlap and extend ranges with 0% CPU, so they'd still be useful.

I did a quick experiment with adding runningTimesDiff.GetThreadCPUDelta() == Some(uint64_t(0)) || there, and I found effectively no occurrences where it helped!
I.e., every time the CPU use was zero, the thread had already been marked asleep. Great job to whoever added all these PROFILER_THREAD_SLEEPs! (A few did happen when starting the profiler, but nothing after that.)

On the other hand, there were occasions where the thread was marked asleep, but there was some CPU activity. I'm guessing it can happen due to OS work that happens while handling mutexes, IOs, etc.
This shows that we can't rely on CPU measurements alone, and we should keep using PROFILER_THREAD_SLEEP to lower the profiler overhead.

But it may still be good to add the CPU check, because it's cheap, and it could still catch areas without PROFILER_THREAD_SLEEP (that I didn't hit in my short tests), which may happen more frequently when we start adding Rust threads, or even unregistered threads.

Try with patch.

In cases where a thread is not doing anything, but hasn't been marked as asleep with AUTO_PROFILER_THREAD_SLEEP, we can still duplicate the previous sample if we know that zero CPU activity happened.

Assignee: nobody → gsquelart
Status: NEW → ASSIGNED
Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4e86cb83f1e1 Also use CPU to determine if a thread is asleep - r=canaltinova
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 90 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: