Closed Bug 1703410 Opened 3 years ago Closed 3 years ago

Profiling should not affect the timer resolution

Categories

(Core :: Gecko Profiler, enhancement, P2)

Unspecified
Windows
enhancement

Tracking

()

RESOLVED FIXED
90 Branch
Tracking Status
firefox90 --- fixed

People

(Reporter: mozbugz, Assigned: mozbugz)

References

Details

Attachments

(1 file)

"No Periodic Sampling" means only that: No stacks are periodically captured from a number of threads. The main benefit is that threads are never suspended by the profiler, which considerably lowers the overhead.

However the loop in the thread sampler is still running:

  • In the main process it clears expired exit profiles from other processes.
  • It still collects counters, which are atomic; the only one currently is for memory usage.
  • It can collect CPU utilization without stopping threads.
  • It runs the buffer's FulfillChunkRequests(), to allocate new chunks and destroy/recycle old ones.

One issue with this periodic loop, is that on Windows, the timer resolution is adjusted from its system default (~16ms) down to the requested interval (if less than 10ms). This is done so that the sampler loop has a chance to actually reach the expected rate.
But an important side effect is that it affects all timers in the application, which can sometimes cause strange things, e.g.: Some animation may stutter, but then when the developer tries to use the profiler to understand it, the profiler changes the timer resolution, which "fixes" the stutter if it was based on a short-fuse timer!

Some potential fixes:

  • Don't change the timer resolution in no-sampling mode, and/or automatically extend the interval. (This would affect the other things happening in the loop, but I think it would be acceptable, they are not as time-sensitive as sampling thread stacks.)
  • Document this timer resolution issue.
  • In about:profiling, note that "No Periodic Sampling" still runs a loop that's affected by the "Sampling interval".
  • In about:profiling, rename "Sampling interval" to show it's not only about sampling.
  • Update the MOZ_PROFILER_STARTUP_INTERVAL documentation.
  • In the code, rename SamplerThread and other related identifiers to show their other responsibilities.

Recent thought: Maybe we should not change the timer resolution even when periodic sampling is running?

We would do this to avoid impacting how Firefox timers (and other events?) behave when the profiler is running, so that we get more correct measurements of what happens when the profiler is not running.

The potential issue is that the profiler sampler loop itself could have bigger gaps between samplings.
But my gut feel at the moment is that this may not be that bad:

  • If any thread is actively running, I believe that the OS will respond to timers just as quickly, so there shouldn't be any problem there.
  • If all threads are blocked waiting on something, then timers may be delayed until the OS comes back to the process (up to 16ms later) and starts servicing events. In this case the profiler loop could indeed be delayed, but it means that if it had run before it would have sampled waiting threads, which were probably already sampled before and will be sampled again, and the final visual result should be the same. So I don't think this would have given us more information.

But in the end it's possible that overall we would capture fewer samples of waiting stacks, which could artificially make people think that Firefox was busier that it really was. 🤔
The front-end display may need to be more obvious about sample gaps -- There are already some discussions about that on github.

Quick attempt:

At a glance I don't see obvious differences. The average sample interval is very close (e.g., parent processes: 1.2945ms/sample vs 1.2950).

NI: Florian and Markus, could you please have a look and see if you can find any issues with the second profile compared to the first?
And Markus, I believe you have a better understanding on the subject of timer precision, what do you think of this whole idea?

And by extension, should we consider going even further and try to remove the sampler timer when all threads are in a waiting state? -- But that would be a separate bug, and much more complex to implement, if even possible.

Flags: needinfo?(mstange.moz)
Flags: needinfo?(florian)

I don't have a strong opinion on this, but we should be prepared for changes in behavior that only occur on some versions of Windows or only on specific hardware. For example, I find it hard to believe that the sampling accuracy wouldn't be affected when we still get reports of extremely different setTimeout performance when the profiler is running, for example most recently in bug 1683560 comment 8.

Flags: needinfo?(mstange.moz)

(In reply to Gerald Squelart [:gerald] (he/him) from comment #1)

Recent thought: Maybe we should not change the timer resolution even when periodic sampling is running?

We would do this to avoid impacting how Firefox timers (and other events?) behave when the profiler is running, so that we get more correct measurements of what happens when the profiler is not running.

I agree that having the browser behavior when profiling match the behavior when not profiling is something desirable.

Quick attempt:

At a glance I don't see obvious differences. The average sample interval is very close (e.g., parent processes: 1.2945ms/sample vs 1.2950).

NI: Florian and Markus, could you please have a look and see if you can find any issues with the second profile compared to the first?

I can't see any meaningful difference between the 2 profiles.

And by extension, should we consider going even further and try to remove the sampler timer when all threads are in a waiting state? -- But that would be a separate bug, and much more complex to implement, if even possible.

Interesting idea! And indeed, that would be for another bug.

Flags: needinfo?(florian)

Thank you both.
I think the advantage of having Firefox behave closer to the real thing outweigh the risk of some longer sampling gaps, so I'll make the change.

Assignee: nobody → gsquelart
Summary: No Periodic Sampling mode should not affect the timer resolution → Profiling should not affect the timer resolution

On Windows, the profiler changes the timer resolution when the sampling interval is less than 10ms.
However this change affects all of Firefox, which can change other behaviors, sometimes causing confusion when refresh-rate issues magically disappear when the profiler is running.

So now by default the profiler will not change the timer resolution. This should rarely affect the profiler's effective sampling rate, unless all threads in a process are in a waiting state, in which case there is nothing new to sample anyway! The front-end is investigating ways to make sampling gaps more obvious, see https://github.com/firefox-devtools/profiler/issues/2962

If some power users really need the added sampling precision in some profiling sessions, the timer resolution change may be requested by setting the environment variable "PROFILER_ADJUST_TIMER_RESOLUTION" to any value.

Is that only for "nostacksampling" or is that for all cases?

If it's for all cases, I'd suggest to use a feature rather than the env variable. Indeed it would be more discoverable, we can have a UI around it, maybe even enable it automatically when some conditions are met (for example I'm thinking of interval < 1ms ? Not sure that's relevant though...).

Tell me what you think :-)

(In reply to Julien Wajsberg [:julienw] from comment #6)

Is that only for "nostacksampling" or is that for all cases?

For all cases.

If it's for all cases, I'd suggest to use a feature rather than the env variable. Indeed it would be more discoverable, we can have a UI around it,

See comments 1 to 3, neither Markus nor myself could see a significant difference in profiles with this change, so I think we would have a hard time coming up with a feature description that makes sense to users.

In general, I would like us to remove features that don't make a lot of sense, or aren't useful, so that the few that remain are more likely to be noticed.

maybe even enable it automatically when some conditions are met (for example I'm thinking of interval < 1ms ? Not sure that's relevant though...).

That's a question I wanted to ask in the review on phabricator. I think forcing high precision timers for intervals < 1ms would make sense, but I think the minimum supported interval on Windows is 2ms (I'm not sure if that's still true). If that's still true, we can forget this idea.

(In reply to Florian Quèze [:florian] from comment #7)

See comments 1 to 3, neither Markus nor myself could see a significant difference in profiles with this change, so I think we would have a hard time coming up with a feature description that makes sense to users.

Note there may be different experiences on different Windows versions (Windows 7 vs 8 vs 10). Have you tried older versions? How much do we want to support the older ones?

maybe even enable it automatically when some conditions are met (for example I'm thinking of interval < 1ms ? Not sure that's relevant though...).

That's a question I wanted to ask in the review on phabricator. I think forcing high precision timers for intervals < 1ms would make sense, but I think the minimum supported interval on Windows is 2ms (I'm not sure if that's still true). If that's still true, we can forget this idea.

IIRC the issue with <2ms timers on Windows is fixed in recent Windows versions.

Also it's possible we switch to some internal way of dealing with small intervals with high overhead, and we weren't using the high precision interval for this (but maybe we should?).

(In reply to Julien Wajsberg [:julienw] from comment #6)

Is that only for "nostacksampling" or is that for all cases?

Initially this bug was only for "nostacksampling", but I changed since comment 1 to all cases.

If it's for all cases, I'd suggest to use a feature rather than the env variable. Indeed it would be more discoverable, we can have a UI around it,

I actually want to keep it hidden!
I keep the code only in case something goes wrong and we need it back, or if someone complains about the change and then they can switch it back.
After a while, either there won't be any issues and we could get rid of it for good, or there are issues common enough to make it more discoverable and add UI for it (as you suggested).

maybe even enable it automatically when some conditions are met (for example I'm thinking of interval < 1ms ? Not sure that's relevant though...).

AND

(In reply to Florian Quèze [:florian] from comment #7)

That's a question I wanted to ask in the review on phabricator. I think forcing high precision timers for intervals < 1ms would make sense, but I think the minimum supported interval on Windows is 2ms (I'm not sure if that's still true). If that's still true, we can forget this idea.

I don't believe that smaller intervals make the timer resolution more important, for the same reasons I highlighted in comment 1, basically:

  • If any thread is running, timers will be serviced ASAP, so sampling should still run at the requested rate, when it's actually important to capture stacks of busy software.
  • If no threads are running, then it doesn't matter than we're not sampling at a super fast rate, since we would by definition capture exactly the same stacks (because nothing is running).

And yes, the minimum effective rate on Windows is 2ms. Though if you request < 1ms, the inter-sampling wait is done with a busy loop, which wouldn't be affected by the timer precision anyway.

(In reply to Julien Wajsberg [:julienw] from comment #8)

Note there may be different experiences on different Windows versions (Windows 7 vs 8 vs 10). Have you tried older versions? How much do we want to support the older ones?

I only tried on Windows 10 on my beefy desktop.
I'd need to find an old Windows to test... (Or install a VM.)

IIRC the issue with <2ms timers on Windows is fixed in recent Windows versions.

I still see an effective rate of 2ms when I request 1ms on the latest Windows 10. Where did you read this?

Also it's possible we switch to some internal way of dealing with small intervals with high overhead, and we weren't using the high precision interval for this (but maybe we should?).

Yes, we run a busy loop for intervals < 1ms. Or did you mean something else?

(In reply to Gerald Squelart [:gerald] (he/him) from comment #9)

I still see an effective rate of 2ms when I request 1ms on the latest Windows 10. Where did you read this?

I'm probably wrong :-) maybe I was especially confused with the high precision timer...

Also it's possible we switch to some internal way of dealing with small intervals with high overhead, and we weren't using the high precision interval for this (but maybe we should?).

Yes, we run a busy loop for intervals < 1ms. Or did you mean something else?

Nope I meant exactly that :)

Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/da7e47e55f3f
Only change Windows timer resolution if PROFILER_ADJUST_TIMER_RESOLUTION is set - r=florian
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 90 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: