Closed Bug 1459621 Opened Last year Closed 10 months ago

Support (or document) mach try syntax for gecko profiling

Categories

(Testing :: Talos, defect)

Version 3
defect
Not set

Tracking

(firefox65 fixed)

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: gregtatum, Assigned: rwood)

References

(Blocks 1 open bug)

Details

(Whiteboard: [PI:June])

Attachments

(1 file)

Looking at the online docs for gecko profiling: https://wiki.mozilla.org/Buildbot/Talos/Profiling

They do not list any `mach try` syntax. The only listing is the following:

try: -b o -p macosx64,win32,win64 -u none -t all[10.6,10.8,Windows XP,Windows 7,Windows 8] mozharness: --geckoProfile

Another developer was asking me how to make it work with mach try, and I did not know. So either:

1) We need docs on how to do this.
2) We need to enable gecko profiling on Talos runs through `mach try`.
Whiteboard: [PI:May]
:ahal, does 'mach try' support the --geckoProfile flag? I know that 'mach try fuzzy --geckoProfile' does work. Maybe we should just update the documentation to indicate to use 'mach try fuzzy --geckoProfile'.
Flags: needinfo?(ahal)
Whiteboard: [PI:May] → [PI:June]
I'm pretty sure there is a way to do it with try syntax, I think implementing it for fuzzy was getting it to parity with try syntax. But I'm all for official docs pushing people away from try syntax. So even if it is possible to do, +1 to the idea of suggesting |mach try fuzzy --talos-profile| in the docs instead. I'd love if new developers didn't even know try syntax exists :).

Aside: --geckoProfile is an alias to --talos-profile for backwards compatibility, but it is a bad name as it doesn't make it clear that it only impacts talos jobs. So if we're updating docs anyway, let's also change --geckoProfile to --talos-profile.
Flags: needinfo?(ahal)
(In reply to Andrew Halberstadt [:ahal] from comment #2)

> Aside: --geckoProfile is an alias to --talos-profile for backwards
> compatibility, but it is a bad name as it doesn't make it clear that it only
> impacts talos jobs. So if we're updating docs anyway, let's also change
> --geckoProfile to --talos-profile.

Good to know, thanks - except the ./mach talos-test command itself doesn't except '--talos-profile', it only accepts '--geckoProfile'. So I think we should just keep it consistent and use --geckoProfile IMO.
Assignee: nobody → rwood
Status: NEW → ASSIGNED
Updated https://wiki.mozilla.org/Buildbot/Talos/Profiling accordingly.
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
I think we also want to use `./mach try <try syntax>`, at least this is how I do it and I've seen others doing it.

IMO this issue isn't closed until "mach try" supports --geckoProfile too.

See how mach try supports the "normal" try syntax:

$ ./mach try -b o -p macosx64 -u none
[master 80abdb29091a] try: -b o -p macosx64 -u none
Bundling 1 changesets in 0.00s
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 0 changes to 0 files (+1 heads)
remote: recorded push in pushlog
remote: 
remote: View your change here:
remote:   https://hg.mozilla.org/try/rev/ccbfbba4e0af706f055de34532d3940f12092106
remote: 
remote: Follow the progress of your build on Treeherder:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=ccbfbba4e0af706f055de34532d3940f12092106
remote: recorded changegroup in replication log in 0.110s
To hg::ssh://hg.mozilla.org/try
 + d53081dec09a...80abdb29091a HEAD -> branches/default/tip (forced update)

And how it doesn't once we add mozharness:

$ ./mach try -b o -p macosx64 -u none -t damp-e10s mozharness: --geckoProfile
usage: mach [-h] [-b BUILDS] [-p PLATFORMS] [-u TESTS] [-t TALOS] [-j JOBS]
            [--tag TAGS] [--and] [--no-artifact] [-v] [--detect-paths]
            [-m [MESSAGE]] [--no-push] [--closed-tree] [--save SAVE]
            [--preset PRESET] [--list-presets] [--edit-presets]
            [--chemspill-prio] [--all-emails] [--upload-xdbs] [--interactive]
            [--setenv SETENV] [--artifact] [-e]
            [--rebuild-talos REBUILD_TALOS] [-f] [--rebuild REBUILD]
            [--no-retry] [--failure-emails]
            [paths [paths ...]]
mach: error: unrecognized arguments for try syntax: '--geckoProfile'


I think this should be supported.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I do't thin the command line would know how to interpret:

'/mach try -b o -p macosx64 -u none -t damp-e10s mozharness: --geckoProfile'

To be more exact: 'mozharness: --geckoProfile'

As command line arguments would search for - or -- interpreting 'mozharness: --geckoProfile' as one argument and value would get complicated.

Should the mach command line be able to interpret this syntax?
If yes do we have an example of a similar implementation?
Flags: needinfo?(rwood)
Flags: needinfo?(jmaher)
I would prefer if we could have:
./mach talos-test -a tart --geckoProfile


Oddly enough I just ran that locally and it worked bringing up a web browser to show me the profiles (of course symbolication was broken).

:rwood can you run that locally and get a profile?
:bebe, how about for you?
Flags: needinfo?(jmaher) → needinfo?(bebe)
./mach talos-test -a tart --geckoProfile - works for me and generates a profile

But, from my understanding this bug is about running:
'/mach try -b o -p macosx64 -u none -t damp-e10s mozharness: --geckoProfile'

This command returns an error because it does not know how to interpret: mozharness: --geckoProfile in the command line.
(In reply to u555083 from comment #8)
> ./mach talos-test -a tart --geckoProfile - works for me and generates a
> profile
> 
> But, from my understanding this bug is about running:
> '/mach try -b o -p macosx64 -u none -t damp-e10s mozharness: --geckoProfile'
> 
> This command returns an error because it does not know how to interpret:
> mozharness: --geckoProfile in the command line.
Flags: needinfo?(bebe)
so the confusion on my part was locally vs try, sorry for the randomization in this bug.

we have specific talos tests for profiling, for example:
talos-tp6-e10s -> talos-tp6-profiling-e10s

this can be run via ./mach try if you use the (--full) option:
./mach try fuzzy -q '!pgo !qr linux64 talos-tp6 profile' --full

I think we can edit the documents to be more useful to outline the above as well as the ability to retrigger a known job easily with job actions in treeherder (2 clicks)

Raptor will need to conform to this and when we can get geckoProfiling to work in raptor, then we can ensure it matches the needs here.

:bebe, can you update the documentation on the wiki to match what is above?
Flags: needinfo?(rwood) → needinfo?(bebe)
Just noting |mach try fuzzy| supports --talos-profile (which is the same as --geckoProfile in |mach talos|):
./mach try fuzzy -q "'talos" --talos-profile
thanks :ahal, that indeed does work.  Should we prevent that to remove confusion?
Sorry prevent what? The talos-profile tasks from showing up in fzf? If their only purpose is to allow running profiles on try, then I think they can probably be removed.
oh, my bad.  We run the profile jobs on mozilla-central, but maybe we don't need them anymore.  :rwood, what do you think? 

:florian do you think there is still value to collecting a gecko profile on every talos test for each mozilla-central push?  It is fairly easy to retrigger and get a profile now, or push a special ./mach push with --talos-profile
Flags: needinfo?(rwood)
Flags: needinfo?(florian)
So is it possible now to select a completed non-profiling talos job on treeherder and retrigger it adding gecko profiling? If that's the case yes IMO we should remove the talos profiling suite on central. Btw, how do you retrigger and add profiling? That should def be added to our talos (and in the future raptor) profiling wiki.
Flags: needinfo?(rwood)
ok, lets see what :florian has to say
Flags: needinfo?(bebe)
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #14)

> :florian do you think there is still value to collecting a gecko profile on
> every talos test for each mozilla-central push?  It is fairly easy to
> retrigger and get a profile now, or push a special ./mach push with
> --talos-profile

I think most Talos tests run the same thing several times (eg 10 times), and then take an average to produce the final number that will be reported.

The Talos runs with profiling enabled do the same thing, but they output no numbers (because their values are skewed by the profiling overhead), and instead upload a zip file containing a profile of each execution of the test (eg. 10 profiles of the same thing). These 10 profiles are often more confusing than useful, as we currently have no good way to compare them, and the engineers looking at these profiles tend to just randomly pick one out of the 10, and ignore the others.

What I would prefer if it's possible is that in the same job, we run the test 10 times to produce numbers, and an 11th time with the profiler enabled, to upload a single profile.

This would eliminate the difference between "Talos with profiling" and "Talos without profiling", as all Talos runs would generate a profile. If we can move to something like this, I don't think profiling should be limited to mozilla-central. All talos runs on try and integration branches should have profiles. How feasible is this?


Assuming we have to keep the current model for some reason. I'm afraid there are few people actually looking at Talos profiles, so it's probably hard to justify running specific jobs to produce them all the time. If we stop producing these profiles automatically, we should make it super easy to retrigger a Talos job with profiling enabled. I'm afraid it's currently pretty hard (if not impossible) to do it without asking information from someone who has already done it before.
Flags: needinfo?(florian)
this is a good suggestion.  Currently in profiling mode, we do not run the test for all cycles, usually just 1 or 2 cycles so we are not making an impossible profile to investigate.  For all regression bugs we add before/after links to profiles in the bug so the person who caused the regression can use the profiles to investigate.

I think either way we should delete the profile specific jobs and scheduling on m-c.
:bebe, would you want to remove the talos-profiling specific jobs:
https://searchfox.org/mozilla-central/search?q=-profiling&path=taskcluster
Flags: needinfo?(bebe)
I removed all talos-profiling specific jobs
Flags: needinfo?(bebe)
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/42412209a322
Support (or document) mach try syntax for gecko profiling r=jmaher jmaher
https://hg.mozilla.org/mozilla-central/rev/42412209a322
Status: REOPENED → RESOLVED
Closed: Last year10 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
(In reply to Pulsebot from comment #22)
> Pushed by jmaher@mozilla.com:
> https://hg.mozilla.org/integration/autoland/rev/42412209a322
> Support (or document) mach try syntax for gecko profiling r=jmaher jmaher

Next time, could you please adjust the commit message so that it matches what the patch does? Thanks!
(In reply to Florian Quèze [:florian] from comment #17)

> What I would prefer if it's possible is that in the same job, we run the
> test 10 times to produce numbers, and an 11th time with the profiler
> enabled, to upload a single profile.
> 
> This would eliminate the difference between "Talos with profiling" and
> "Talos without profiling", as all Talos runs would generate a profile.

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #18)
> this is a good suggestion.

Is there a bug on file to track this?
Flags: needinfo?(jmaher)
thanks for following up, here is bug 1505495 :)
Flags: needinfo?(jmaher)
You need to log in before you can comment on or make changes to this bug.