Closed Bug 1635522 Opened 4 years ago Closed 2 years ago

Consider always running Raptor jobs with gecko profiler and "nostacksampling" feature enabled

Categories

(Testing :: Raptor, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: whimboo, Unassigned)

References

(Depends on 2 open bugs, Blocks 2 open bugs)

Details

(Whiteboard: [fxp])

Currently Raptor jobs in CI are running with the gecko profiler disabled and enabled:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&tier=1%2C2%2C3&searchStr=rap

The profiler jobs are within the prof job group. None of the results are actually getting submitted to perfherder due to the overhead of the profiler. As such there is no way to compare actual results and how much the profiler affects the performance metrics. To enable submission we would have to enable:

https://dxr.mozilla.org/mozilla-central/rev/34d71b4a00863e7615463592662dfe362c4a993e/testing/raptor/raptor/output.py#214-217

For try jobs that would be fine as Greg mentioned to me. Otherwise there is still the perfherder-data.json artifact for each job, which can be used to extract the numbers.

So if we want to have a low-overhead profiler running even for normal Raptor jobs, a feature list like js,leaf,stackwalk,nostacksampling should do it. This will record markers and the JS stack only for those, but will not do interval based stack sampling.

It might be helpful to compare the metrics between a normal Raptor job, one with the full profiler, and another one with the low-overhead profiler. If the latter isn't adding overhead or additional noise to the page loads, maybe it could be enabled by default.

Depends on: 1545386

So it looks like we went through a lot of pain to prevent gecko-profiler data from getting out but with the extra-options split we have available to us now we could enable perfherder-data submission by adding the gecko_profile extra option to the results here: https://dxr.mozilla.org/mozilla-central/source/testing/raptor/raptor/results.py#198-207

Similar to what this patch does: https://phabricator.services.mozilla.com/D73275

There are a few things disabled when gecko-profiling is active though, so we would need to undo those changes as well:

  1. https://dxr.mozilla.org/mozilla-central/source/testing/raptor/raptor/results.py#232,660
  2. https://dxr.mozilla.org/mozilla-central/source/testing/raptor/raptor/output.py#215

This sounds like a really neat use case so I think it would make sense for us to start collecting this data everywhere we run it if we can. :davehunt, what do you think?

Flags: needinfo?(dave.hunt)

(In reply to Greg Mierzwinski [:sparky] from comment #1)

This sounds like a really neat use case so I think it would make sense for us to start collecting this data everywhere we run it if we can. :davehunt, what do you think?

Yes, I like this idea. Could you file a bug and mark it as a dependency?

Flags: needinfo?(dave.hunt)
Severity: -- → S3

(In reply to Dave Hunt [:davehunt] [he/him] ⌚BST from comment #2)

Yes, I like this idea. Could you file a bug and mark it as a dependency?

Actually, is this a dupe of bug 1631784?

Flags: needinfo?(gmierz2)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚BST from comment #3)

Actually, is this a dupe of bug 1631784?

No, this only prevents submission of metrics for the profiler jobs. But this bug is for running the profiler with no stack sampling for all Raptor jobs.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #4)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚BST from comment #3)

Actually, is this a dupe of bug 1631784?

No, this only prevents submission of metrics for the profiler jobs. But this bug is for running the profiler with no stack sampling for all Raptor jobs.

If you look at the patch in that bug, I believe it's not preventing submission but in fact adding gecko_profile to extraOptions as described in comment 1.

Sorry, it should have been enables and not prevents.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #6)

Sorry, it should have been enables and not prevents.

I wasn't clear in comment 3. I'm not suggesting that this bug is a dupe, I meant the bug I was requesting in comment 2 is the dupe.

That bug is only for browsertime profiling jobs since they are submitting the data as if it was non-profiled data which gives us a bi-modal distribution. Once that patch lands, we will still have to enable the gecko-profile perfherder data to be output (steps are detailed in comment #1).

Flags: needinfo?(gmierz2)
Depends on: 1635983
Depends on: 1640169
Whiteboard: [perf:workflow]
Version: Version 3 → unspecified

I would be very interested in knowing whether marker-only mode could be enabled by default without perturbing the results too much. Bug 1702310 makes it easier to run, but if it were always enabled then I wouldn't need that bug.

Bug 1700700 also fixes a problem in the gecko_profile tag added to the perfdata, specifically that at least right now the schema validator doesn't want underscores in the name (I renamed it to gecko-profile there, but perhaps the schema should be updated instead?)

I've experimented with this in the past months. It looked like there were a large number of differences in the perfherder results when we enabled the profiler by default. I suspect that it's because there are number of changes profiler does in the firefox runtime when it's enabled. We were okay with enabling them if the numbers were consistents across different tests and platforms. For example if we were getting a 15% hit on all test or on a specific platform. But numbers were very unpredictable.
Some of the reasons include, windows timer resolution change we do when we enable the profiler and the profiler button visual change that happens when profiler is started and stopped. But they weren't the only ones as they continued to be a large difference even after solving them.

So instead of enabling the profiler in the main browsertime run, we've decided to add an additional run with the profiler enabled. This way, we won't affect the performance results of the main run but still get a profile after. This work is done in Bug 1786400. Closing this one as wontfix in favor of it.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
See Also: → 1786400
Whiteboard: [perf:workflow] → [fxp]
You need to log in before you can comment on or make changes to this bug.