Closed Bug 1476775 Opened Last year Closed Last year

Add a fixed-time-window sampling mode for continuous profiling

Categories

(Core :: Gecko Profiler, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- fixed

People

(Reporter: mstange, Assigned: canova)

References

Details

Attachments

(3 files, 2 obsolete files)

At the moment, the profiler has a fixed buffer size per process, which it uses as a circular buffer by continuously discarding old entries as new entries get added.

This enables the "continuous profiling" use case: The user can enable the profiler and leave it running for the entire duration of their browsing session, and opportunistically capture profiles whenever (/just after) they encounter a performance problem.

However, a fixed buffer size does not result in a fixed time duration: Based on the number of threads in a process, and the depth + complexity of the sampled stacks, the actual time duration that is captured in a buffer of a fixed size can vary wildly. This causes multiple problems:

 - It's hard to know what size to pick for the profile buffer, so people usually pick something large that's definitely "big enough". However, this will often be too much: Profiles can be extremely large and capture time windows that are very long. This makes such profiles hard slow to work with, because capturing long profiles takes a long time during JSON serialization, and because large profiles place too much load on the UI which needs to display (unnecessarily) large amounts of data. It's also harder to find the relevant part of a large profile; often, you have to start the analysis by zooming in multiple times to the very end of the profile.

 - Every process has its own buffer, so in a profile with multiple processes, the different processes usually cover different periods of time. Such profiles can be very confusing to analyze: you might have one process which has been idle and is able to fit multiple minutes of idleness into its buffer, and one process which had lots of activity within the last 20 seconds and can only fit those last 20 seconds into its buffer. In such a profile, the idle process will determine the time scale that perf.html will use to display the profile with, and the busy process will show a quick burst of activity at the very end of this time range, with the rest of the time range being marked as "no data in buffer".

I think the experience of the "continuous profiling" mode would be much better if it allowed you to specify the size of the window *as a time duration in seconds*.

So in this bug I'm proposing the following:
 - Add an API to nsIProfiler that allows starting the profiler with a fixexd time window, e.g. by adding a new method called nsIProfiler.StartProfilerWithFixedTimeWindow() or by adding an argument to the existing StartProfiler() method.
 - Also add environment variables for this setting so that it can be inherited into child processes and specified for startup profiling.
 - In the sampling loop, after recording a sample in the buffer, discard any samples from the buffer that are older than the window length.

The dynamic buffer sizing work from bug 1476757 will automatically adjust the memory consumption of the profiler buffer to the right amount.
Comment on attachment 8993128 [details]
Bug 1476775 - Proof of concept: Discard samples that are older than 20 seconds.

https://reviewboard.mozilla.org/r/257952/#review264846


Code analysis found 3 defects in this patch:
 - 3 defects found by clang-tidy

You can run this analysis locally with:
 - `./mach static-analysis check path/to/file.cpp` (C/C++)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: tools/profiler/core/ProfileBufferEntry.cpp:1287
(Diff revision 1)
> +    // - We skip range Pause, Resume, CollectionStart, Marker, and CollectionEnd
> +    //   entries between samples.
> +    while (e.Has()) {
> +      if (e.Get().IsThreadId()) {
> +        break;
> +      } else {

Warning: Do not use 'else' after 'break' [clang-tidy: readability-else-after-return]

      } else {
        ^~~~~~

::: tools/profiler/core/ProfileBufferEntry.cpp:1287
(Diff revision 1)
> +    // - We skip range Pause, Resume, CollectionStart, Marker, and CollectionEnd
> +    //   entries between samples.
> +    while (e.Has()) {
> +      if (e.Get().IsThreadId()) {
> +        break;
> +      } else {

Warning: Do not use 'else' after 'break' [clang-tidy: readability-else-after-return]

      } else {
        ^~~~~~

::: tools/profiler/core/ProfileBufferEntry.cpp:1287
(Diff revision 1)
> +    // - We skip range Pause, Resume, CollectionStart, Marker, and CollectionEnd
> +    //   entries between samples.
> +    while (e.Has()) {
> +      if (e.Get().IsThreadId()) {
> +        break;
> +      } else {

Warning: Do not use 'else' after 'break' [clang-tidy: readability-else-after-return]

      } else {
        ^~~~~~
Assignee: nobody → canaltinova
Status: NEW → ASSIGNED
Priority: -- → P1
I haven't looked at the patches in detail yet but I don't like the fact that the maximum duration argument is mandatory. I think it should be optional. I don't know how exactly the API should look, though. It would be best if we could keep existing calls to nsIProfiler::StartProfiler working, but it's not a requirement.
Attachment #9010295 - Attachment is obsolete: true
I changed the API and made the duration parameter optional. Therefore we don't need to change talos test suite anymore.
Attachment #9010290 - Attachment is obsolete: true
Attachment #9010291 - Attachment description: Bug 1476775 - Part 2: Change the debugger public API and webextensions debugger API → Bug 1476775 - Part 1: Discard samples that are older than given duration r?mstange
Attachment #9010293 - Attachment description: Bug 1476775 - Part 3: Change the profiler usage in devtools after API change → Bug 1476775 - Part 2: Change the profiler usage in devtools after API change
Pushed by canaltinova@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/9f21792d4ae6
Part 1: Discard samples that are older than given duration r=mstange
https://hg.mozilla.org/integration/autoland/rev/bed93ebb313b
Part 2: Change the profiler usage in devtools after API change r=julienw,gregtatum
https://hg.mozilla.org/mozilla-central/rev/9f21792d4ae6
https://hg.mozilla.org/mozilla-central/rev/bed93ebb313b
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Depends on: 1515197
You need to log in before you can comment on or make changes to this bug.