Closed Bug 1754823 Opened 3 years ago Closed 2 years ago

Improve user experience of triggering performance tests via try perf selector

Categories

(Testing :: Performance, task, P2)

task

Tracking

(firefox108 fixed)

RESOLVED FIXED
108 Branch
Tracking Status
firefox108 --- fixed

People

(Reporter: sparky, Assigned: sparky)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [fxp])

Attachments

(9 files, 5 obsolete files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

Identifying suitable performance tests and running them in our CI is currently confusing, meaning that this is under utilised. We see frustration when engineers are attempting to run tests in CI, and there are likely other engineers who give up early or have been put off from past experience. This is harmful to promoting a culture of performance at Mozilla.

One suggestion is that instead of listing all tasks in try chooser, we could have have logical collections (page load, benchmark, essential, etc.)

Blocks: dev-pain
Assignee: nobody → gmierz2
Depends on: 1677559

I recently used try to get page load perf results for essentially the first time – since my perspective is fairly fresh, feel free to ask me for feedback on the design. From my history as a product developer, I think the experience should be simple enough that I come with a question like, "How did my change impact cold start up on fenix?" and I'll be able to get the results without asking for help.

At a high level, there are two major roadblocks I hit:

  1. figuring out which tests to select is hard. "logical collections (page load, benchmark, essential, etc.)" seems like it'd get most of the way there, at least for page load
  2. I don't know how to interpret the results, e.g. what's the difference between SpeedIndex, ContentfulSpeedIndex, and PerceptualSpeedIndex? Do I need to retrigger to get confident results? etc. I provided this feedback to the perfherder compare google survey so I won't go into details here.

(In reply to Michael Comella (:mcomella) [needinfo or I won't see it] from comment #1)

  1. I don't know how to interpret the results, e.g. what's the difference between SpeedIndex, ContentfulSpeedIndex, and PerceptualSpeedIndex? Do I need to retrigger to get confident results? etc. I provided this feedback to the perfherder compare google survey so I won't go into details here.

Thanks for your input :mcomella!

Regarding your second point, I think we can target a good portion of that with some new perftest guidelines we are thinking of making. There's no bugs or documentation available about this yet.

Blocks: 1772541
Summary: Improve user experience of triggering performance tests via try chooser → Improve user experience of triggering performance tests via try perf selector

This patch adds a new try selector specifically for performance tests. We make use of the compare selector as well as the fuzzy, and again selectors (used within the compare one) to do this.

The code starts off by building the variant combinations (e.g. fission+live-sites+profiling), then uses these to build up the rest of the categories. The variants are hidden by default and only available with something like --variants profiling. The same is true for apps, where chrome tests are hidden by default, and for platforms where android is hidden by default. There are flags developers can use to enable these if needed. All of this is done to prevent excessive try runs as these categories make it easier to do so.

The fuzzy selector was also modified to handle displaying the categories, and then parsing those categories for the tasks requested.

This patch adds a capture_log flag that will allow a user to capture the logs produced when running the push_to_try methods. Currently, we use subprocess.check_call which causes hg to hang when we attempt to redirect and gather stdout. Using subprocess.run is better as it lets us capture the logs, but the log output is very slow and can easily lead people to believe that hg is hanging when it's not. This results in corrupted repos. Using Popen, the logs are output slower than check_call, but faster than run so you know something is happening. The speed at which these logs are printed is also why I have this log capturing behind a flag.

This functionality will be used in the ./mach try perf selector to capture the child revisions produced for the try task config file changes and allow us to produce a PerfCompare link to provide the user.

This patch modifies the Compare selector to gather the logs when perf_categories are supplied. This argument will hold the categories to display in the Fuzzy selector and is currently the only thing that dictates if logs should be collected. Also, this simplifies the process of using --show-all in the perf selector since we still want to capture logs but we don't want to display any categories. A log processor was added to capture the logs and parse the revisions. Furthermore, the run method was moved into the CompareParser to make it's usage clearer in the perf selector.

Depends on D155981

This patch changes the fuzzy selector to show perf-specific categories when they are available (provided from the perf selector). When these are provided, only they will be shown, and the actual tasks will be hidden. After the user selects some categories, we run a set of queries specified by that category to select the tasks to run.

Depends on D154515

Blocks: 1523972

:ahal, regarding this patch (https://bugzilla.mozilla.org/attachment.cgi?id=9292284), the suggestion of using iter to get the logs didn't work either. I've been trying a few other things and combinations but I can't get it to work any better/faster than what I already have here.

I spent some time looking at the mercurial source code to see what is going on there when the ui.* methods are called for logging and I noticed that the logs become "protected" and I think this is why I'm having a hard time capturing them: https://www.mercurial-scm.org/repo/hg-stable/file/tip/mercurial/utils/procutil.py#l426

The code suggests that I can get this output from stderr, but there's nothing being output there (or nothing that I can catch).

What do you think about this?

Flags: needinfo?(ahal)

Sorry, I have to admit.. this is going a bit over my head. I guess your original patch is good enough then :)

Flags: needinfo?(ahal)

No problem, thanks! I'll reduce the duplication as you requested.

This patch moves the fzf utility methods to a separate file so that they can be used by other try choosers such as the perf chooser. At the same time, helper methods for two aspects of the mach try fuzzy run method (setting up the tasks, and building the base command) are added but the code in the run method is left untouched.

Attachment #9292285 - Attachment is obsolete: true

This patch moves the fzf utility methods to a separate file so that they can be used by other try choosers such as the perf chooser. At the same time, helper methods for two aspects of the mach try fuzzy run method (setting up the tasks, and building the base command) are added but the code in the run method is left untouched.

Attachment #9299496 - Attachment is obsolete: true
Pushed by gmierz2@outlook.com: https://hg.mozilla.org/integration/autoland/rev/f88c3eed72f7 Move tryselect fzf utils to separate file. r=ahal
Keywords: leave-open
See Also: → 1797180
Attachment #9292286 - Attachment is obsolete: true
Attachment #9289653 - Attachment is obsolete: true
Attachment #9292287 - Attachment is obsolete: true

This patch adds the basics for the perf selector: file, categorizations, selector configuration, and CLI arguments.

Depends on D155980

This patch adds a method for expanding the categories. This method takes all the configuration details from the PerfParser (variants, apps, platforms, etc.) and combines them to create the categories that people will be seeing and selecting from. It creates the queries, and checks for any restrictions among them to provide selections that work. See the method for more details on all of this.

Depends on D160413

This patch adds two methods for getting/selecting tasks. The first one (get_tasks) is used to either query fzf directly, or to use interactively for user selections. The second (get_perf_tasks) makes use of the first to query for categories (through the user) and then query for the selected tasks. It also performs all the necesary intersections/unions for the queries.

These two are separated because we need a simple way to allow the user to look and select the raw tasks instead of categories if needed. This way the user can still get a comparison link at the end instead of going to mcah try fuzzy. See the next patches.

Depends on D160414

This patch adds a method that allows us to push to try. It performs two pushes similar to the compare selector, except in this case we're also capturing the logs in all cases so that we can provide a Perfherder URL afterwards. At the same time, this patch adds a LogProcessor that can capture these logs and parse them for the revisions.

Depends on D160416

This patch adds the run methods for the perf selector as well as the entry point for the mach command. It also produces the Perfherder URL at the end.

Depends on D160417

This patch adds two parametrized tests (totalling 18 tests) for the perf selector. The first set of 15 tests are for testing the category expansions under different configurations. The last 3 tests are for testing the run methods in the three configurations that we offer: standard, dry runs, and uncategorized test selection (or --show-all).

Depends on D160418

Keywords: leave-open
Pushed by gmierz2@outlook.com: https://hg.mozilla.org/integration/autoland/rev/df816f6bad3f Add a flag to allow capturing the push-to-try logs. r=ahal,AlexandruIonescu https://hg.mozilla.org/integration/autoland/rev/ff5e60523368 Add a new perf selector skeleton file. r=perftest-reviewers,AlexandruIonescu https://hg.mozilla.org/integration/autoland/rev/6e94007a5fe8 Add a method for expanding the perf categories. r=perftest-reviewers,AlexandruIonescu https://hg.mozilla.org/integration/autoland/rev/3c18a3f66f89 Add methods for getting tasks for the perf selector. r=perftest-reviewers,AlexandruIonescu https://hg.mozilla.org/integration/autoland/rev/d57d61c31656 Refactor the CompareParser for the perf selector. r=AlexandruIonescu https://hg.mozilla.org/integration/autoland/rev/cac9a015e794 Add a push-to-try method to the perf selector. r=perftest-reviewers,AlexandruIonescu https://hg.mozilla.org/integration/autoland/rev/461ee4c95c53 Add the run methods to the perf selector. r=ahal https://hg.mozilla.org/integration/autoland/rev/0a6ce361ed41 Add tests for the perf selector. r=ahal
Blocks: 1799113
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: