Closed Bug 1338902 Opened 9 years ago Closed 9 years ago

Lock used by TelemetryHistogram::Accumulate shows up in the performance profiles

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: smaug, Unassigned)

References

Details

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Description

•

9 years ago

Could we consider some lock-free approach for telemetry? At least when used in main thread?

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Updated

•

9 years ago

Summary: Lock used by TelemetryHistogram::Accumulate show up in the performance profiles → Lock used by TelemetryHistogram::Accumulate shows up in the performance profiles

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 1

•

9 years ago

The lock doesn't show up very badly, but still feels bad to slow down performance because of telemetry probes.

Georg Fritzsche [:gfritzsche]

Comment 2

•

9 years ago

The assumption was that locks don't cost significant resources compared to all the other work we are doing. Is this showing up from a specific hot path, e.g. some code accumulating from a busy loop? Julian, can you fill in with what you found on perf re using locks?

Flags: needinfo?(jseward)

Julian Seward [:jseward]

Comment 3

•

9 years ago

Olli, can you show some more details about the costs you are seeing?

Flags: needinfo?(bugs)

Georg Fritzsche [:gfritzsche]

Comment 4

•

9 years ago

See also bug 1338891. Was this the main reason for Telemetry showing up in perf profiles?

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 5

•

9 years ago

No, bug 1338891 has very little to do with this. Bug 1338891 is way worse. I need to get new profiles for this, but in principle I was surprised to see those locks in reasonable hot code paths.

Julian Seward [:jseward]

Comment 6

•

9 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #2) > Julian, can you fill in with what you found on perf re using locks? The general assumption was that a lock-unlock cycle that is uncontended (as it should be, else there are other problems) would involve two atomic memory transactions -- one for the lock, one for the unlock. In the worst case, if these happen at the main memory level then we have two round trip times to main memory, around 400 cycles total. If -- as seems more likely -- they happen in the L3 cache, then the cost is more like 100 cycles total. I profiled x86_64 "lock; addq" instructions against memory some time in the distant past and found the costs to be even lower (~ 12 cycles?) if the location is in a core-specific (L2, L1) cache and no other core tries to access it. I think this is possible because of how the MESI protocol family works. Which is why I ask Olli: what exactly are you seeing in your profiles?

Flags: needinfo?(jseward)

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 7

•

9 years ago

Now that Bug 1338891 is fixed, I guess I'm fine closing this, but using locks in hot code paths is no-no. So, another reason to be careful where to use telemetry.

Flags: needinfo?(bugs)

Georg Fritzsche [:gfritzsche]

Comment 8

•

9 years ago

We'll have make the performance considerations more clear in the documentation, i filed bug 1339019 on that.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Lock used by TelemetryHistogram::Accumulate shows up in the performance profiles

Categories

(Toolkit :: Telemetry, defect)

Tracking

()

People

(Reporter: smaug, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8