Closed Bug 1243241 Opened 8 years ago Closed 8 years ago

nsPerformanceStats.cpp assertion failure: cyclesDelta <= totalCyclesDelta

Categories

(Toolkit :: Performance Monitoring, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox47 --- fixed

People

(Reporter: pipcet, Assigned: Yoric)

References

Details

Attachments

(1 file)

Debug build on a pretty standard x86_64 Linux system (with a multicore CPU). Very often, but not always, I run into this assertion failure:

Assertion failure: cyclesDelta <= totalCyclesDelta, at .../gecko-dev/toolkit/components/perfmonitoring/nsPerformanceStats.cpp:1208

The relevant function is nsPerformanceStatsService::CommitGroup. Setting a breakpoint produces values of 3910254 for totalCyclesDelta and 49217537 for cyclesDelta.

At first blush, this appears to be a problem of non-synchronized time stamp counters between CPU cores. Restricting Firefox to a single CPU core using cpusets appears to avoid the problem.

Dmesg provides the following Linux kernel messages:

[    0.687177] TSC synchronization [CPU#0 -> CPU#1]:
[    0.687333] Measured 316247764 cycles TSC warp between CPUs, turning off TSC clock.
[    0.687392] tsc: Marking TSC unstable due to check_tsc_sync_source failed

So it seems that at least on this system, monotonicity of rdtsc results isn't guaranteed, and right now that's making debug builds unusable. I'd suggest simply discarding such data, but maybe it would be better to detect this condition and disable TSC-based performance monitoring on systems which are broken in this way.
Moving to Toolkit, nsPerformanceStats is not part of DevTools.
Component: Developer Tools: Performance Tools (Profiler/Timeline) → Performance Monitoring
Product: Firefox → Toolkit
Flags: needinfo?(dteller)
pipcet: Normally, we do not rely upon the monotonicity of RDTSC. I'll triple-check that code.
Flags: needinfo?(dteller)
Flags: needinfo?(dteller)
We assume that the total number of cycles spent executing JS code
during an event is equal to the number of cycles in the "top group",
i.e. a group to which everything belongs. While this is true in
theory, RDTSC is actually non-monotonic, so we can end up with fewer
cycles reported for the top group than for some groups whose execution
was actually shorter. When we end up in this situation, groups with
more cycles than the top group will be reported as using more CPU than
was actually used.

This patch fixes the situation by defining the total number of cycles
as the largest number of cycles reported for all groups. In most
cases, this will be the number of cycles in the top group. In all
cases, this ensures that the amount of CPU used by a group is bounded
by the actual amount of CPU used by the thread.

Note that this fix requires us to walk twice the list of performance
groups executed during the tick, instead of once. It should be
possible to keep this down to a single walk, but I suspect that the
code would be more complex and I am not sure that this would improve
performance.

Review commit: https://reviewboard.mozilla.org/r/32759/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/32759/
Attachment #8713156 - Flags: review?(nfroyd)
Ok, so there *was* a case in which we assumed that RDTSC is monotonic.

I attach a fix. I realize that there is a second solution, which would consist in us in adding a trivial layer on top of RDTSC to keep it artificially monotonic during each tick. I don't know which fix is best.
Assignee: nobody → dteller
Flags: needinfo?(dteller)
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment on attachment 8713156 [details]
MozReview Request: Bug 1243241 - Make RDTSC monotonic;r?jandem

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32759/diff/1-2/
Attachment #8713156 - Attachment description: MozReview Request: Bug 1243241 - Don't assume that the top group's number of cycles is always reported as being larger than each group's number of cycles;r?froydnj → MozReview Request: Bug 1243241 - Make RDTSC monotonic;r?jandem
Attachment #8713156 - Flags: review?(nfroyd) → review?(jdemooij)
Actually, this second fix, which operates directly on top of RDTSC, should be simpler and, hopefully, faster.
Comment on attachment 8713156 [details]
MozReview Request: Bug 1243241 - Make RDTSC monotonic;r?jandem

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32759/diff/2-3/
> Ok, so there *was* a case in which we assumed that RDTSC is monotonic.

I think I understand now (several AutoStopWatches with the same iteration number can be alive at the same time, and the cycle difference that the outer stopwatch, which acquired the top group, sees is smaller than the difference seen by an inner stopwatch which never acquired the top group). Thanks for the patch! I'm testing it now.

This is somewhat tangential, but right now ::CommitGroup doesn't appear to commit anything if the time stamp counter is constant (when MOZ_HAVE_RDTSC isn't defined, in particular). Shouldn't we attempt to update mTicks, at least?

As for the actual bug, it turns out that on this CPU, the TSC is actually writable, using MSR 0x10, so it should be possible to de-synchronize the TSC even more for further testing (or synchronize it for the old code to work). There's also a RDTSCP instruction which combines getCycles and getCPU() atomically. And since the TSC increases at a constant frequency rather than scaling with power management, we should be able to do without the "isSameCPU" check in AutoStopwatch::exit, too. IOW, I think there is still room for improvement in this code to get a more accurate view of who's hogging the CPU.
(In reply to pipcet from comment #8)
> > Ok, so there *was* a case in which we assumed that RDTSC is monotonic.
> This is somewhat tangential, but right now ::CommitGroup doesn't appear to
> commit anything if the time stamp counter is constant (when MOZ_HAVE_RDTSC
> isn't defined, in particular). Shouldn't we attempt to update mTicks, at
> least?

Sure, we could do that. I didn't do it because we haven't found a use for mTicks yet, but it certainly won't hurt. Do you wish to write a patch?

> As for the actual bug, it turns out that on this CPU, the TSC is actually
> writable, using MSR 0x10, so it should be possible to de-synchronize the TSC
> even more for further testing (or synchronize it for the old code to work).
> There's also a RDTSCP instruction which combines getCycles and getCPU()
> atomically.

Oh, I wasn't aware of that instruction. Good to know.

> And since the TSC increases at a constant frequency rather than
> scaling with power management, we should be able to do without the
> "isSameCPU" check in AutoStopwatch::exit, too. IOW, I think there is still
> room for improvement in this code to get a more accurate view of who's
> hogging the CPU.

I'm quite open to suggestions and patches. In particular, I'm not very happy about isSameCPU and not just because of the race condition.

However, if TSC is increasing at the same frequency on all your CPUs/cores, how did you end up in the situation that caused the assertion failure?
> However, if TSC is increasing at the same frequency on all your CPUs/cores, how did you end up in the situation that caused the assertion failure?

The frequency was synchronized, but the counter values weren't.  I assume they started out at different values as the CPUs booted, and the BIOS failed to synchronize them afterwards.

> In particular, I'm not very happy about isSameCPU

I think it should be safe to drop it entirely now getCycles() is monotonic. I think the standard case (on Linux, at least) is for all TSCs to be synchronized and running at the same speed, so it's probably not worth it to do more than not fail catastrophically on broken systems such as mine.

However, to be honest I'm not sure it's a good idea to have mozilla code use the time stamp counter directly at all, at least on recent Linux systems.  The Linux kernel already provides for a syscall-less interface to gettimeofday() and clock_gettime(); it's slightly more expensive because it actually provides correct results rather than failing silently when rescheduling happens between two RDTSC calls, but I don't think the difference would be measurable.
Comment on attachment 8713156 [details]
MozReview Request: Bug 1243241 - Make RDTSC monotonic;r?jandem

https://reviewboard.mozilla.org/r/32759/#review29957
Attachment #8713156 - Flags: review?(jdemooij) → review+
(In reply to pipcet from comment #10)
> > In particular, I'm not very happy about isSameCPU
> 
> I think it should be safe to drop it entirely now getCycles() is monotonic.
> I think the standard case (on Linux, at least) is for all TSCs to be
> synchronized and running at the same speed, so it's probably not worth it to
> do more than not fail catastrophically on broken systems such as mine.

Ah, I remember RDTSCP, btw. I didn't use it because 1/ it required a rather recent architecture; 2/ I couldn't find any information on the result of the cpuid part (besides the fact that it's causing previous instructions to be flushed – I don't care either way for this specific use); 3/ it is my understanding that the cpuid part is deactivated on some machines. All in all, it seemed to complicate my life for no benefit.

Now, the real reason for `isSameCPU()` is just to lower a bit the noise level.

> However, to be honest I'm not sure it's a good idea to have mozilla code use
> the time stamp counter directly at all, at least on recent Linux systems. 
> The Linux kernel already provides for a syscall-less interface to
> gettimeofday() and clock_gettime(); it's slightly more expensive because it
> actually provides correct results rather than failing silently when
> rescheduling happens between two RDTSC calls, but I don't think the
> difference would be measurable.

Well, we used `gettimeofday()` in a previous version and the impact was actually measurable. Also, we had other issues with OS clocks either not being monotonic or not being precise enough (in particular, with Windows, we had reproducible cases in which the OS clock had a resolution worse than 200ms – yes, that's two-hundred milliseconds). We can work that out, but sill end up with enough uncertainty that we need to perform the same kind of post-processing as with RDTSC, so there is no real simplification here.

Note that we do not use RDTSC directly as a measure of time. We use `getrusage` to give us the CPU time spent by the process during the interval, then RDTSC to subdivide this interval between the execution of all JS compartments. So `getrusage` bounds the error we could attain with RDTSC.
https://hg.mozilla.org/mozilla-central/rev/82dc7ec9db67
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: