Record system resource utilization in Perfherder

RESOLVED FIXED

Status

defect
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: gps, Unassigned)

Tracking

(Blocks 1 bug)

unspecified
Dependency tree / graph

Firefox Tracking Flags

(firefox49 fixed)

Details

Attachments

(3 attachments)

+++ This bug was initially created as a clone of Bug #1271077 +++

Bug 1271077 is surfacing system resource usage via TinderboxPrint. This is a follow-up to record the data in Perfherder, if wlach lets us :)
THIS PATCH IS NOT COMPLETE AND IS NOT READY TO LAND.

Submitting to wlach for early feedback so I know if I'm on the
right track.

This commit teaches the resource monitor in mozharness to emit
Perfherder data for system metrics and step times. This will
allow us to see when the timing or resource characteristics
of jobs in automation changes.

The wonkiest part of this patch is likely the mechanism to
define the Perfherder "test" names. We don't appear to have
an identifier in mozharness suitable for distinguishing
between job types. e.g. the "desktop_unittest.py" script is
responsible for running a few dozen jobs. So we invent code
for creating an identifier from the script config options.
I /think/ Treeherder will automatically assign the
project/branch, platform, and build type, which is why these
aren't included in the identifier.

Review commit: https://reviewboard.mozilla.org/r/52303/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/52303/
Attachment #8751959 - Flags: review?(wlachance)
Comment on attachment 8751959 [details]
Bug 1272176 - Emit Perfherder data for system resource utilization;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52303/diff/1-2/
Comment on attachment 8751959 [details]
Bug 1272176 - Emit Perfherder data for system resource utilization;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52303/diff/2-3/
Comment on attachment 8751959 [details]
Bug 1272176 - Emit Perfherder data for system resource utilization;

https://reviewboard.mozilla.org/r/52303/#review49509

This is looking pretty good! We should test this on treeherder stage for a while (at least a few weeks) before enabling on master. Let me think about how to do that in a reasonable way.

::: testing/mozharness/scripts/desktop_unittest.py:198
(Diff revision 3)
> +            perfherder_parts.append(c['this_chunk'])
> +        if c['e10s']:
> +            perfherder_parts.append('e10s')
> +
> +        self.resource_monitor_perfherder_name = '.'.join(perfherder_parts)
> +

You should run this through the perfherder schema, if possible. There's an example of this with talos:

http://mxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/talos.py#350
Attachment #8751959 - Flags: review?(wlachance)
Having the latest schema available seems like a good thing. This is a
direct copy of schemas/performance-artifact.json from
https://github.com/mozilla/treeherder.git at commit
7bed1b22ceb01e3e71536fa1c4ecd14ddc87e803.

Review commit: https://reviewboard.mozilla.org/r/53354/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/53354/
Attachment #8753566 - Flags: review?(wlachance)
Attachment #8751959 - Flags: review?(wlachance)
Comment on attachment 8751959 [details]
Bug 1272176 - Emit Perfherder data for system resource utilization;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52303/diff/3-4/
https://reviewboard.mozilla.org/r/52303/#review49509

Something else I noticed is that Treeherder failed to parse the logs on the original Try push. See https://treeherder.mozilla.org/#/jobs?repo=try&revision=52754ceac78c0673b3f4499415b858ddcb5d75be. Perhaps that is due to an unknown suite name in the PERFHERDER_DATA?
https://reviewboard.mozilla.org/r/52303/#review50160

::: testing/mozharness/mozharness/base/python.py:619
(Diff revision 4)
> +                schema_path = os.path.join(self.query_abs_dirs()['abs_work_dir'],
> +                                           'tests', 'talos', 'treeherder-schemas',
> +                                           'performance-artifact.json')

This doesn't work because the JSON file appears to be coming from the test archive. We'll probably want to move the schema file into mozharness so all mozharness jobs have access to it.
Comment on attachment 8753566 [details]
Bug 1272176 - Synchronize performance artifact schema with upstream;

https://reviewboard.mozilla.org/r/53354/#review50366

Yup, this is a no-brainer
Attachment #8753566 - Flags: review?(wlachance) → review+
Attachment #8751959 - Flags: review?(wlachance)
Comment on attachment 8751959 [details]
Bug 1272176 - Emit Perfherder data for system resource utilization;

https://reviewboard.mozilla.org/r/52303/#review50370

This all looks fine to me. I'd like to see some samples of what PERFHERDER_DATA looks like in the try runs before giving an r+.

I'd also like to test this on treeherder stage (treeherder.allizom.org) for a while before enabling this on production. I filed a bug https://bugzilla.mozilla.org/show_bug.cgi?id=1273983 to make this sort of easier, but I can hack something in so we don't have to gate landing this patch on that.
Currently, only Talos accesses this file. An uncoming commit will add
a non-Talos consumer. Enable all mozharness consumers to access the
file by including it in the mozharness directory (previously it was
part of the Talos test archive).

Review commit: https://reviewboard.mozilla.org/r/53638/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/53638/
Attachment #8753566 - Attachment description: MozReview Request: Bug 1272176 - Synchronize performance artifact schema with upstream; r?wlach → MozReview Request: Bug 1272176 - Synchronize performance artifact schema with upstream; r=wlach
Attachment #8751959 - Flags: review?(wlachance)
Comment on attachment 8753566 [details]
Bug 1272176 - Synchronize performance artifact schema with upstream;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/53354/diff/1-2/
Comment on attachment 8751959 [details]
Bug 1272176 - Emit Perfherder data for system resource utilization;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52303/diff/4-5/
https://reviewboard.mozilla.org/r/52303/#review49509

That's odd -- it clearly did parse the logs because I see all sorts of job metadata in there. I think this is just a treeherder error. Presently perfherder just ignores unknown suite names, so that shouldn't be a problem.
Attachment #8751959 - Flags: review?(wlachance)
Comment on attachment 8751959 [details]
Bug 1272176 - Emit Perfherder data for system resource utilization;

https://reviewboard.mozilla.org/r/52303/#review50424

I don't see anything wrong with this, but we should figure out why it's failing on try before landing.
https://reviewboard.mozilla.org/r/52303/#review50424

Schema failure on the last push. New patch coming shortly.
Comment on attachment 8751959 [details]
Bug 1272176 - Emit Perfherder data for system resource utilization;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52303/diff/5-6/
Attachment #8751959 - Flags: review?(wlachance)
Attachment #8751959 - Flags: review?(wlachance)
Comment on attachment 8751959 [details]
Bug 1272176 - Emit Perfherder data for system resource utilization;

https://reviewboard.mozilla.org/r/52303/#review50744

I'm seeing some errors validating the schema on try (even though it claims to be green??) e.g. https://public-artifacts.taskcluster.net/XxeXH77bToOtSEYipNa1AQ/0/public/logs/live_backing.log (search for perfherder)

I also should have mentioned this before, but we can specify "e10s" as an extra property of the suite (via extraOptions), which will probably look a little bit nicer in the UI. You can see an example of this in the e10s talos jobs (again, search for perfherder):

http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-linux64/1463679999/mozilla-inbound_ubuntu64_hw_test-dromaeojs-e10s-bm105-tests1-linux-build4330.txt.gz
Comment on attachment 8753993 [details]
Bug 1272176 - Move performance artifact schema into mozharness directory;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/53638/diff/1-2/
Attachment #8753993 - Attachment description: MozReview Request: Bug 1272176 - Move performance artifact schema into mozharness directory; w?wlach → MozReview Request: Bug 1272176 - Move performance artifact schema into mozharness directory; r?wlach
Attachment #8753993 - Flags: review?(wlachance)
Attachment #8751959 - Flags: review?(wlachance)
Comment on attachment 8753566 [details]
Bug 1272176 - Synchronize performance artifact schema with upstream;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/53354/diff/2-3/
Comment on attachment 8751959 [details]
Bug 1272176 - Emit Perfherder data for system resource utilization;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52303/diff/6-7/
Comment on attachment 8751959 [details]
Bug 1272176 - Emit Perfherder data for system resource utilization;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52303/diff/7-8/
Comment on attachment 8753993 [details]
Bug 1272176 - Move performance artifact schema into mozharness directory;

https://reviewboard.mozilla.org/r/53638/#review51532
Attachment #8753993 - Flags: review?(wlachance) → review+
Comment on attachment 8751959 [details]
Bug 1272176 - Emit Perfherder data for system resource utilization;

https://reviewboard.mozilla.org/r/52303/#review51534

The code looks fine (didn't check the try runs) except for one thing which I mentioned but forgot to open an issue for.

::: testing/mozharness/scripts/desktop_unittest.py:200
(Diff revision 8)
> +            perfherder_parts.append(c['this_chunk'])
> +
> +        if c['e10s']:
> +            perfherder_options.append('e10s')
> +        else:
> +            perfherder_options.append('no-e10s')

I mentioned this a few days ago but forgot to "open an issue".

You can specify 'e10s' as extraOptions for a suite, which I think would be better than putting it in the suite name. You can see an example of this in the e10s talos jobs:

http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-linux64/1463679999/mozilla-inbound_ubuntu64_hw_test-dromaeojs-e10s-bm105-tests1-linux-build4330.txt.gz

Just omit "no-e10s" entirely, I think.
Attachment #8751959 - Flags: review?(wlachance)
https://reviewboard.mozilla.org/r/52303/#review51534

> I mentioned this a few days ago but forgot to "open an issue".
> 
> You can specify 'e10s' as extraOptions for a suite, which I think would be better than putting it in the suite name. You can see an example of this in the e10s talos jobs:
> 
> http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-linux64/1463679999/mozilla-inbound_ubuntu64_hw_test-dromaeojs-e10s-bm105-tests1-linux-build4330.txt.gz
> 
> Just omit "no-e10s" entirely, I think.

Uh, isn't this what this patch is doing now? That was changed in revision 8 IIRC.
Comment on attachment 8753993 [details]
Bug 1272176 - Move performance artifact schema into mozharness directory;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/53638/diff/2-3/
Attachment #8753993 - Attachment description: MozReview Request: Bug 1272176 - Move performance artifact schema into mozharness directory; r?wlach → Bug 1272176 - Move performance artifact schema into mozharness directory;
Attachment #8753566 - Attachment description: MozReview Request: Bug 1272176 - Synchronize performance artifact schema with upstream; r=wlach → Bug 1272176 - Synchronize performance artifact schema with upstream;
Attachment #8751959 - Attachment description: MozReview Request: Bug 1272176 - Emit Perfherder data for system resource utilization; r?wlach → Bug 1272176 - Emit Perfherder data for system resource utilization;
Attachment #8751959 - Flags: review?(wlachance)
Attachment #8751959 - Flags: review?(gps)
Comment on attachment 8753566 [details]
Bug 1272176 - Synchronize performance artifact schema with upstream;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/53354/diff/3-4/
Comment on attachment 8751959 [details]
Bug 1272176 - Emit Perfherder data for system resource utilization;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52303/diff/8-9/
https://reviewboard.mozilla.org/r/52303/#review51534

> Uh, isn't this what this patch is doing now? That was changed in revision 8 IIRC.

Sorry, I guess I misread your changes. This does look right now (haven't checked the output). But you should still remove the "no-e10s" option, that's redundant.
Attachment #8751959 - Flags: review?(wlachance) → review+
Comment on attachment 8751959 [details]
Bug 1272176 - Emit Perfherder data for system resource utilization;

https://reviewboard.mozilla.org/r/52303/#review54120

Let's land this! Note that ingestion of this won't happen until we add a performance framework for this type of data in perfherder. Let's figure out how to do that seperately, on stage only at first.
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/901aee13bd76
Move performance artifact schema into mozharness directory; r=wlach
https://hg.mozilla.org/integration/mozilla-inbound/rev/82fb632252de
Synchronize performance artifact schema with upstream; r=wlach
https://hg.mozilla.org/integration/mozilla-inbound/rev/7bff68b26943
Emit Perfherder data for system resource utilization; r=wlach
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b1961339522
Import mozharness to avoid missing symbol; r=me
Attachment #8751959 - Flags: review?(gps)
You need to log in before you can comment on or make changes to this bug.