Closed Bug 1222549 Opened 4 years ago Closed 4 years ago

Submit build metrics to perfherder

Categories

(Firefox Build System :: General, defect)

defect
Not set

Tracking

(firefox45 fixed)

RESOLVED FIXED
mozilla45
Tracking Status
firefox45 --- fixed

People

(Reporter: chmanchester, Assigned: chmanchester)

References

(Depends on 1 open bug)

Details

Attachments

(1 file)

With bug 1149164 this should be pretty straightforward, because we already have the data in mozharness.

We're already logging this to grafana, but it would be nice to have this be public, and avoid some of the submission hiccups we've had there.

wlach pointed to https://github.com/mozilla/treeherder/blob/master/schemas/performance-artifact.json for the format we need.

I'd be interested in seeing the duration of each build step for each checkin to start, but there's a lot we could be tracking (and a lot is already available in mozharness!).
I suspect the best way of structuring this is as a suite called "buildtime". That way you can have an overall value for tracking/alerting/sheriffing that could then be broken down into sub-steps for when you want to know the details. We actually already do this for Android APK/installer size, see e.g.:

https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,4eb0cde5431ee9aeb5eb14512ddb3da6d4702cf0,1]&selected=[mozilla-inbound,4eb0cde5431ee9aeb5eb14512ddb3da6d4702cf0,22616,16936152]

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&newProject=mozilla-inbound&originalRevision=2d2e821fa20d&newRevision=ff1dc3e22d19&originalSignature=4eb0cde5431ee9aeb5eb14512ddb3da6d4702cf0&newSignature=4eb0cde5431ee9aeb5eb14512ddb3da6d4702cf0

You can see how this works by looking at the job json we submit:

{
    "framework": {
        "name": "build_metrics"
    },
    "suites": [
        {
            "name": "installer size",
            "subtests": [
                {
                    "name": "omni.ja",
                    "value": 9767974
                },
                {
                    "name": "classes.dex",
                    "value": 5083044
                },
                {
                    "name": "libxul.so",
                    "value": 23732321
                }
            ],
            "value": 42138676
        }
    ]
}
Slight digression: Another thing it might be interesting to look at is ccache utilisation over time, and how our changes to the build system (unified builds come into mind) are affecting hits and misses.  I'm not sure if this is feasible to test.
I took a stab at this in [1], but my additions caused treeherder's log parsing to fail. Will, any suggestions on where I can go to debug such a failure? The line I ended up with was:

16:17:24     INFO - PERFHERDER_DATA: {"framework": {"name": "build_metrics"}, "suites": [{"subtests": [{"name": "libxul.so", "value": 95296519}], "name": "installer size", "value": 56539813}, {"subtests": [{"name": "export", "value": 184.7016899585724}, {"name": "compile", "value": 534.3188898563385}, {"name": "misc", "value": 13.340939998626709}, {"name": "libs", "value": 35.77689480781555}, {"name": "tools", "value": 0.6390438079833984}, {"name": "package-tests", "value": 29.12372899055481}, {"name": "buildsymbols", "value": 239.42571711540222}, {"name": "package", "value": 62.132347106933594}, {"name": "l10n-check", "value": 64.08258700370789}, {"name": "upload", "value": 178.7302930355072}], "name": "build times", "value": 1461.152214050293}]}

Which looks as I expect.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=e91db8a9501b
Flags: needinfo?(wlachance)
(In reply to Chris Manchester [:chmanchester] from comment #3)
> I took a stab at this in [1], but my additions caused treeherder's log
> parsing to fail. Will, any suggestions on where I can go to debug such a
> failure? The line I ended up with was:
> 
> 16:17:24     INFO - PERFHERDER_DATA: {"framework": {"name":
> "build_metrics"}, "suites": [{"subtests": [{"name": "libxul.so", "value":
> 95296519}], "name": "installer size", "value": 56539813}, {"subtests":
> [{"name": "export", "value": 184.7016899585724}, {"name": "compile",
> "value": 534.3188898563385}, {"name": "misc", "value": 13.340939998626709},
> {"name": "libs", "value": 35.77689480781555}, {"name": "tools", "value":
> 0.6390438079833984}, {"name": "package-tests", "value": 29.12372899055481},
> {"name": "buildsymbols", "value": 239.42571711540222}, {"name": "package",
> "value": 62.132347106933594}, {"name": "l10n-check", "value":
> 64.08258700370789}, {"name": "upload", "value": 178.7302930355072}], "name":
> "build times", "value": 1461.152214050293}]}
> 
> Which looks as I expect.
> 
> [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=e91db8a9501b

Yeah, we were truncating the perfherder lines in the log until today accidentally (see bug 1222588).

If you look on stage (where a fix has been deployed) you'll see that the log parsing passed:

https://treeherder.allizom.org/#/jobs?repo=try&revision=e91db8a9501b

Here's the data plotted:

https://treeherder.allizom.org/perf.html#/graphs?series=[try,6ca71a468a248a1db822a99e150f7df891981d3f,1]

This looks great, excited to see more!
Flags: needinfo?(wlachance)
Excellent, thank you. I did another push to take a look at the compare view, and sanity checked all the numbers against what we see in grafana, and everything checks out.
Bug 1222549 - Add build-metrics submission to perfherder. r=wlach,jlund

This adds "basic" build time metrics to data already submitted to perfherder
from mozharness: the overall build time, and the duration of each build step
as a subtest.
Attachment #8686739 - Flags: review?(wlachance)
Attachment #8686739 - Flags: review?(jlund)
Comment on attachment 8686739 [details]
MozReview Request: Bug 1222549 - Don't give up logging any stats when failing to submit to influxdb. r=jlund

https://reviewboard.mozilla.org/r/25063/#review22569

This mostly looks ok to me, from the point of view of ingesting things via perfherder. I'm less familiar with the overall architecture of mozharness and whether this method of data collection makes sense, so I'll leave that part of the review to jlund.

::: testing/mozharness/mozharness/base/python.py:755
(Diff revision 1)
> +                    'value': tier.get('duration'),

Why are you using .get() here? I don't think we want to set a None value here if there is no value. In that case I'd give 0.0, probably.
Attachment #8686739 - Flags: review?(wlachance)
https://reviewboard.mozilla.org/r/25063/#review22569

> Why are you using .get() here? I don't think we want to set a None value here if there is no value. In that case I'd give 0.0, probably.

This is in line with the surrounding code, which looks like it's intended to fail softly if something goes wrong. I guess we should not log the data at all if the build fails in a way that results in a malformed build resources file.
https://reviewboard.mozilla.org/r/25063/#review22569

> This is in line with the surrounding code, which looks like it's intended to fail softly if something goes wrong. I guess we should not log the data at all if the build fails in a way that results in a malformed build resources file.

Yeah, if you could make it so we didn't have a PERFHERDER line for that case it would be good. :)
Comment on attachment 8686739 [details]
MozReview Request: Bug 1222549 - Don't give up logging any stats when failing to submit to influxdb. r=jlund

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25063/diff/1-2/
Attachment #8686739 - Flags: review?(wlachance)
Attachment #8686739 - Flags: review?(wlachance)
Comment on attachment 8686739 [details]
MozReview Request: Bug 1222549 - Don't give up logging any stats when failing to submit to influxdb. r=jlund

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25063/diff/2-3/
Comment on attachment 8686739 [details]
MozReview Request: Bug 1222549 - Don't give up logging any stats when failing to submit to influxdb. r=jlund

https://reviewboard.mozilla.org/r/25063/#review22583

Looks good now!
Attachment #8686739 - Flags: review+
Attachment #8686739 - Flags: review?(jlund) → review+
Comment on attachment 8686739 [details]
MozReview Request: Bug 1222549 - Don't give up logging any stats when failing to submit to influxdb. r=jlund

https://reviewboard.mozilla.org/r/25063/#review22733

lgtm

::: testing/mozharness/mozharness/base/python.py:755
(Diff revision 2)
> +                    self.build_metrics_summary = None

so if 'duration' doesn't exist in every tier, we ensure that self.build_metrics_summary is set back to None?
https://reviewboard.mozilla.org/r/25063/#review22733

> so if 'duration' doesn't exist in every tier, we ensure that self.build_metrics_summary is set back to None?

Right, this is a little cautious so we don't end up submitting incomplete stats to perfherder. Hopefully this isn't going to be an issue, but if it does happen I'll investigate.
It looks like there's something wrong with how we're generating the summary on Windows, I'll take a look tomorrow.
Keywords: leave-open
Depends on: 1225568
So bug 1225568 is dealt with, but we still don't have full data for windows dep builds. I'll see if I can debug on a loaner.
Depends on: 1226698
I wasn't able to reproduce the exact issue on a loaner, but I came across one failure mode I'll post a patch for shortly.
Comment on attachment 8686739 [details]
MozReview Request: Bug 1222549 - Don't give up logging any stats when failing to submit to influxdb. r=jlund

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25063/diff/2-3/
Attachment #8686739 - Attachment description: MozReview Request: Bug 1222549 - Add build-metrics submission to perfherder. r=wlach,jlund → MozReview Request: Bug 1222549 - Don't give up logging any stats when failing to submit to influxdb. r=jlund
Attachment #8686739 - Flags: review?(jlund)
Attachment #8686739 - Flags: review+
Attachment #8686739 - Flags: review?(jlund) → review+
Comment on attachment 8686739 [details]
MozReview Request: Bug 1222549 - Don't give up logging any stats when failing to submit to influxdb. r=jlund

https://reviewboard.mozilla.org/r/25063/#review23657
Keywords: leave-open
Depends on: 1229275
https://hg.mozilla.org/mozilla-central/rev/2387bdf52937
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.