Closed
Bug 1338902
Opened 7 years ago
Closed 7 years ago
Lock used by TelemetryHistogram::Accumulate shows up in the performance profiles
Categories
(Toolkit :: Telemetry, defect)
Toolkit
Telemetry
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: smaug, Unassigned)
References
Details
Could we consider some lock-free approach for telemetry? At least when used in main thread?
Reporter | ||
Updated•7 years ago
|
Summary: Lock used by TelemetryHistogram::Accumulate show up in the performance profiles → Lock used by TelemetryHistogram::Accumulate shows up in the performance profiles
Reporter | ||
Comment 1•7 years ago
|
||
The lock doesn't show up very badly, but still feels bad to slow down performance because of telemetry probes.
Comment 2•7 years ago
|
||
The assumption was that locks don't cost significant resources compared to all the other work we are doing. Is this showing up from a specific hot path, e.g. some code accumulating from a busy loop? Julian, can you fill in with what you found on perf re using locks?
Flags: needinfo?(jseward)
Comment 3•7 years ago
|
||
Olli, can you show some more details about the costs you are seeing?
Flags: needinfo?(bugs)
Comment 4•7 years ago
|
||
See also bug 1338891. Was this the main reason for Telemetry showing up in perf profiles?
Reporter | ||
Comment 5•7 years ago
|
||
No, bug 1338891 has very little to do with this. Bug 1338891 is way worse. I need to get new profiles for this, but in principle I was surprised to see those locks in reasonable hot code paths.
Comment 6•7 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] from comment #2) > Julian, can you fill in with what you found on perf re using locks? The general assumption was that a lock-unlock cycle that is uncontended (as it should be, else there are other problems) would involve two atomic memory transactions -- one for the lock, one for the unlock. In the worst case, if these happen at the main memory level then we have two round trip times to main memory, around 400 cycles total. If -- as seems more likely -- they happen in the L3 cache, then the cost is more like 100 cycles total. I profiled x86_64 "lock; addq" instructions against memory some time in the distant past and found the costs to be even lower (~ 12 cycles?) if the location is in a core-specific (L2, L1) cache and no other core tries to access it. I think this is possible because of how the MESI protocol family works. Which is why I ask Olli: what exactly are you seeing in your profiles?
Flags: needinfo?(jseward)
Reporter | ||
Comment 7•7 years ago
|
||
Now that Bug 1338891 is fixed, I guess I'm fine closing this, but using locks in hot code paths is no-no. So, another reason to be careful where to use telemetry.
Flags: needinfo?(bugs)
Comment 8•7 years ago
|
||
We'll have make the performance considerations more clear in the documentation, i filed bug 1339019 on that.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•