Closed Bug 1338902 Opened 7 years ago Closed 7 years ago

Lock used by TelemetryHistogram::Accumulate shows up in the performance profiles

Categories

(Toolkit :: Telemetry, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: smaug, Unassigned)

References

Details

Could we consider some lock-free approach for telemetry? At least when used in main thread?
Summary: Lock used by TelemetryHistogram::Accumulate show up in the performance profiles → Lock used by TelemetryHistogram::Accumulate shows up in the performance profiles
The lock doesn't show up very badly, but still feels bad to slow down performance because of telemetry probes.
The assumption was that locks don't cost significant resources compared to all the other work we are doing.
Is this showing up from a specific hot path, e.g. some code accumulating from a busy loop?

Julian, can you fill in with what you found on perf re using locks?
Flags: needinfo?(jseward)
Olli, can you show some more details about the costs you are
seeing?
Flags: needinfo?(bugs)
See also bug 1338891. Was this the main reason for Telemetry showing up in perf profiles?
No, bug 1338891 has very little to do with this. Bug 1338891 is way worse.

I need to get new profiles for this, but in principle I was surprised to see those locks in reasonable hot code paths.
(In reply to Georg Fritzsche [:gfritzsche] from comment #2)
> Julian, can you fill in with what you found on perf re using locks?

The general assumption was that a lock-unlock cycle that is
uncontended (as it should be, else there are other problems)
would involve two atomic memory transactions -- one for the
lock, one for the unlock.  In the worst case, if these happen
at the main memory level then we have two round trip times to
main memory, around 400 cycles total. If -- as seems more
likely -- they happen in the L3 cache, then the cost is more
like 100 cycles total.

I profiled x86_64 "lock; addq" instructions against memory some
time in the distant past and found the costs to be even lower
(~ 12 cycles?) if the location is in a core-specific (L2, L1)
cache and no other core tries to access it.  I think this is
possible because of how the MESI protocol family works.

Which is why I ask Olli: what exactly are you seeing in your
profiles?
Flags: needinfo?(jseward)
Now that Bug 1338891 is fixed, I guess I'm fine closing this, but using locks in hot code paths is no-no. So, another reason to be careful where to use telemetry.
Flags: needinfo?(bugs)
We'll have make the performance considerations more clear in the documentation, i filed bug 1339019 on that.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.