Open Bug 1757350 Opened 3 years ago Updated 3 years ago

Reduce overhead of profiler::AllocCallback and FreeCallback

Tracking

()

Status:

NEW

People

(Reporter: mozbugz, Unassigned)

References

(Blocks 1 open bug)

Details

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Reporter

Description

•

3 years ago

Spawned from bug 1745591 comment 3:

I did some local instrumentation and profiling of this: https://share.firefox.dev/3t3ddM4
It seems like running the profiler slows things down by more 2x. The time for a single call to new_ct_font_with_variations goes from around 3-4ms to 9-12ms (With mozilla::profiler::AllocCallback being mostly to blame).

This is quite a big jump!
Zooming in on zones of activity in WRWorker threads, one third of samples are in atomic_fetch_add, inside ProfilerCounterTotal::Add called by profiler::AllocCallback and profiler::FreeCallback.
So even though profiler counters are using relaxed atomics, they can still have a visible overhead where there are lots of operations happening around the same time in many threads.

I can think of two possible ways to help:

In these (hopefully rare) cases where the Profiler memory interception functions have a bit impact, there could be a way to prevent them from running, either an environment variable for our skilled users, or a friendlier option in about:profiling.
AND/OR
Remove the contentious shared atomics, by using atomic operations working on thread-specific numbers accessed through thread-local storage (TLS).
The periodic sampling part would have to read all of these.
This should make individual operations faster, thanks to the minimal contention: Only that thread would perform the addition, and the sampler thread would read it from time to time. There is a cost to perform TLS accesses, to be measured on all platforms.

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Reporter

Comment 1

•

3 years ago

Looking again at the profile of WRWorker activity, a further 22% of samples are in an atomic load in mozilla::profiler::ThreadIntercept::ThreadIntercept (to prevent re-entering interceptions routines), I'm not sure we could avoid them with option 2 alone.

And there's another 10% in atomic_fetch_sub inside PHC's MaybePageAlloc, so it's not just the profiler adding some overhead!

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Reporter

Comment 2

•

3 years ago

One more bit of information: Looking at the json data (in the js console, it's in the profile variable), I can see that the number of memory operations is in the low hundreds per sample (every ~1ms) most of the time, but during these busy multi-threaded times the number of memory operations climbed to around 10,000 per sample!
This adds to the evidence that inter-thread atomic contention is much more visible in these cases.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Reduce overhead of profiler::AllocCallback and FreeCallback

Categories

(Core :: Gecko Profiler, task, P2)

Tracking

()

People

(Reporter: mozbugz, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2