Closed Bug 1361436 Opened 3 years ago Closed 3 years ago
"summary" build metrics measuring `make check`
59 bytes, text/x-review-board-request
https://hg.mozilla.org/mozilla-central/rev/4e61e69a383ca79a9b84c78eb2eb53e5958333e3 (part of bug 1304508) broke the build metrics reporting for the "summary" metric. This metric is essentially end-to-end time. When we switched from a raw `make check` to `mach build check`, that meant that mach's metrics reporting for `mach build` kicked in and started reporting the `mach build check` execution time in the "summary" metric. There is a quick hack to fix this. The more involved hack is to run the "check" target during the build itself. But only on TaskCluster, since there is an optimization in buildbot to run "check" after some artifacts have been uploaded so tests start executing sooner.
ted, mshal, KWierso: I'd like your opinion on merging "check" into the main `mach build` invocation. According to the numbers at https://treeherder.mozilla.org/perf.html#/alerts?id=6298, it appears `make check` is now ~1 minute on Linux, ~2 minutes on Windows, ~3 minutes on OS X. We've moved the bulk of work out of `make check` already. But there's still more we can do. And, as we move to TC, we don't have the BB hack of uploading artifacts to trigger test jobs before running `make check`. So in TC it doesn't make sense to continue running `make check` separately. I'm really tempted to do away with `make check` being its own invocation from mozharness and to move it into the main `mach build`. If we did so: * It could be run concurrently with other automation tasks, such as test packaging. This could result in a net speed-up of the overall build task. * We'd lose the BB speed-up hack (which is going away with the move off BB anyway). * We may lose --keep-going during `make check`, failing the build and preventing tests from running if anything in `make check` barfed. This means that intermittent failures would short circuit tests. I'm not sure if this would be a problem in practice. Although we may be able to hack around this with $(MAKEFLAGS) adjustment or something.
I'm not going to bother consolidating `make check` into the "build" mozharness action at this time. Too much of a rabbit hole.
(In reply to Gregory Szorc [:gps] from comment #1) > * We may lose --keep-going during `make check`, failing the build and > preventing tests from running if anything in `make check` barfed. This means > that intermittent failures would short circuit tests. I'm not sure if this > would be a problem in practice. Although we may be able to hack around this > with $(MAKEFLAGS) adjustment or something. We do a similar thing for l10n-check since it is the only post-automation step that is not parallel-safe: https://dxr.mozilla.org/mozilla-central/rev/a748acbebbde373a88868dc02910fb2bc5e6a023/build/moz-automation.mk#72 So we could just do a 'AUTOMATION_EXTRA_CMDLINE-check = -k' and set a MOZ_AUTOMATION_CHECK = 1 somewhere to wire it up to the existing automation rules. I agree that once we don't have buildbot doing a sendchange in between the build & check, there's no reason to have check run by mozharness, though I think we should try to finish moving things out to separate tasks if we can. Maybe a good project for a work-week sprint?
Comment on attachment 8863906 [details] Bug 1361436 - Emit build stats immediately after build; https://reviewboard.mozilla.org/r/135632/#review138930 It sucks that you have to fix this in so many places, but OK!
Attachment #8863906 - Flags: review?(ted) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/2877aa1e02c8 Emit build stats immediately after build; r=ted
I see a build times regression which seems to point to this revision: == Change summary for alert #6334 (as of May 03 2017 18:06 UTC) == Regressions: 4340% build times summary linux32 pgo taskcluster-c4.4xlarge 64.67 -> 2,871.10 4035% build times summary linux64 pgo taskcluster-c4.4xlarge 59.92 -> 2,477.21 3441% build times summary windows2012-64 pgo taskcluster-c4.4xlarge122.92 -> 4,352.14 3233% build times summary windows2012-32 pgo taskcluster-c4.4xlarge121.10 -> 4,036.21 2612% build times summary windows8-64 opt buildbot-c4.4xlarge 75.67 -> 2,052.01 2185% build times summary windows8-64 debug buildbot-c4.4xlarge 88.63 -> 2,025.43 2036% build times summary windowsxp debug buildbot-c4.4xlarge 85.97 -> 1,836.04 1989% build times summary linux64-stylo opt taskcluster-c4.4xlarge60.08 -> 1,254.94 1949% build times summary windowsxp opt buildbot-c4.4xlarge 93.70 -> 1,919.83 1735% build times summary osx-10-7 debug buildbot-unknown 166.44 -> 3,054.15 1716% build times summary osx-10-7 opt buildbot-unknown 163.72 -> 2,972.67 1215% build times summary linux64 asan taskcluster-c4.4xlarge 81.73 -> 1,074.73 1025% build times summary linux32 debug taskcluster-c4.4xlarge 65.92 -> 741.53 982% build times summary windows2012-64 opt taskcluster-c4.4xlarge120.87 -> 1,308.11 982% build times summary linux64-stylo debug taskcluster-c4.4xlarge66.59 -> 720.18 944% build times summary windows2012-32 opt taskcluster-c4.4xlarge119.54 -> 1,248.29 899% build times summary linux32 opt taskcluster-c4.4xlarge 65.67 -> 656.31 895% build times summary linux64 opt taskcluster-c4.4xlarge 64.00 -> 636.48 882% build times summary linux64 debug taskcluster-c4.4xlarge 68.18 -> 669.30 873% build times summary windows2012-64 debug taskcluster-c4.4xlarge120.96 -> 1,176.59 856% build times summary windows2012-32 debug taskcluster-c4.4xlarge122.34 -> 1,169.61 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=6334 not sure if it is really related or coincidence
Those alerts were expected. We broke the reporting for a few days and this alert is the values going back to their expected point.
You need to log in before you can comment on or make changes to this bug.