Open Bug 1432996 Opened 2 years ago Updated 2 years ago
Track code size and uncompressed installer size on perfherder
I and others have been landing various patches to decrease the code size impact of Rust code. We've had some big wins - in the hundreds of kilobytes - and I'd like to be sure that we don't unwittingly regress these wins in the future. One problem with tracking size in the status quo is that we only track installer size, and the measurements for that are noisy. For example, in the graph in , we see a big drop around January 24th, corresponding to a patch nox landed. But we also see a large increase in the 4 days prior, and it's hard to pin that down to any specific commit. Glandium and I believe that the noise may be due to the compression involved in creating the installer. As such, it would be useful to also track the uncompressed text segment size, as well as the size of the uncompressed installer assets. We can then set up alerts on those graphs, and make sure that any large increases are understood and warranted.  https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1299710,1,2
Thoughts Ted? Is this something that your team could take?
https://hg.mozilla.org/mozilla-central/file/32b850fa28ae/testing/mozharness/mozharness/mozilla/building/buildbase.py#l1914 is where the code for this lives. Although mozharness probably isn't in a position to track code sizes easily. If we emit the proper PERFHERDER_DATA line, the build system itself could emit the metrics. This is definitely territory of the build system.
Actually, I told bholley in email to needinfo Ted so he could give us a history lesson of these metrics in case that is useful. But at this point, it might be safe to assume history and flawed: we should record whatever metrics people find useful. Our current best mechanism for reporting metrics is PERFHERDER_DATA lines in task output. So that's where we should start.
Tracking libxul PGO/opt sizes would definitely be useful, that's a pretty good proxy for RSS regressions as well.
A long time ago we had a tool called `codesighs` that did these sorts of measurements: https://wiki.mozilla.org/Codesighs It got broken by toolchain updates that no longer produced the data it needed, so we gave up on it and didn't have a replacement. One nice feature we had in codesighs was that it would produce a diff against the previous build, so you could see what individual changes wound up doing to code size. This is a little harder now that we aren't using a single builder per-platform, but we could probably just upload profile data as an artifact and build a simple "compare two builds" tool that consumed the artifacts. There are tools like bloaty now that seem to fit this niche: https://github.com/google/bloaty/ and cargo-bloat, which is very Rust-centric right now but might be easier to add Windows support to: https://github.com/RazrFalcon/cargo-bloat/ I think that figuring out the specific tooling for this task is out of scope for the build team (although we're certainly happy to advise). If other folks can provide a tool + set of commands that would produce the necessary data we can absolutely work out the nitty-gritty of wiring it up in CI and getting data into perfherder etc. Someone on njn's team (like froydnj or glandium) might be a good fit for doing the first steps.
Makes sense. Nathan, is this something that can go on your or your team's roadmap?
(In reply to Bobby Holley (:bholley) from comment #6) > Makes sense. Nathan, is this something that can go on your or your team's > roadmap? Sure, I can put it on the roadmap. Probably for Q2 or Q3...though maybe sooner, since apparently people are landing large changes that completely undo the effects of nox's work: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1299710,1,2&zoom=1516953879055.2825,1517011541417.6904,62597014.92537314,63343283.58208955&selected=mozilla-inbound,1299710,299934,402685860,2 :(
(In reply to Nathan Froyd [:froydnj] from comment #7) > (In reply to Bobby Holley (:bholley) from comment #6) > > Makes sense. Nathan, is this something that can go on your or your team's > > roadmap? > > Sure, I can put it on the roadmap. Probably for Q2 or Q3...though maybe > sooner, since apparently people are landing large changes that completely > undo the effects of nox's work: Geeze, that's terrible. Joel, while we're waiting for this bug (which should give us less noisy measurements on codesize), can we at least set up an alert for large regressions in installer size like the one above?
we generate alerts for installer size already: https://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/building/buildbase.py#2000 we can change the configuration, but right now we alert on a 1% regression which would be 628K vs the ~400K seen in the above link. possibly make it an absolute limit of >250K, although this would apply to all flavors of builds and that might give more false alerts.
I think an absolute value of 250k for installer size change is totally reasonable for alerts, especially if the alert mechanism already does some filtering for noise.
Can we try that? We can always re-evaluate if we get a lot of false alerts.
I would vote for lowering the limit to 100k, personally.
(In reply to Nathan Froyd [:froydnj] from comment #12) > I would vote for lowering the limit to 100k, personally. SGTM
:igoldan, want to write up a patch for this? I am happy to review
Flags: needinfo?(jmaher) → needinfo?(igoldan)
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #14) > :igoldan, want to write up a patch for this? I am happy to review I would like to. My first question: what code parses the logs with that PERFHERDER_DATA dump? Is it a daemon configured for that? Also, I noticed we have configurations for our alert signatures recorded in the database (performance_signature table). Do we have some config file used for filling the performance_signature table?
all your questions are automatically handled inside of perfherder/treeherder- that is out of the scope of this bug- we just need to adjust https://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/building/buildbase.py#2000 to be absolute at 100K
It turns out that we are recording raw sizes of things like libxul.so and omni.ja, e.g. for Android: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1544596,1,2&series=mozilla-inbound,1544595,1,2&series=mozilla-inbound,1544597,1,2&series=mozilla-inbound,1544594,1,2 and libxul.so for Linux x86-64 (omni.ja to be recorded in bug 1442378): https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1299711,1,2&series=mozilla-inbound,1383432,1,2 Those are currently structured as subtests of the installer size test. Joel, are we able to get alerts on subtests? If not, should we look into promoting the size tests for individual artifacts to "top-level" tests?
we do not alert on subtests- my only concern with adding more top level tests is that they get multiplied across many configs (opt/pgo/nightly/etc.) and branches- this has a high cost for sheriffing and often the sheriff is asked to do a lot of additional work in the bugs when filing them. A few options: 1) have a page that shows the other metrics you care about in an easier to see format 2) have higher thresholds for the other metrics when you make them top level 3) sheriff your own alerts, or more realistically be the original person to nag patch authors- perf sheriff files a bug and needinfo's the build team to take ownership. 4) other ideas?
Ionut is already chasing installer size regressions. Making these metrics more visible will just make his job easier. I'm fine if you want to just keep the alerts on installer size, and make these less-noisy graphs available for easy reference in order to provide more certainty on which changeset triggered the regression.
:igoldan, what are your thoughts on this?
(In reply to Joel Maher (:jmaher) (UTC-5) from comment #20) > :igoldan, what are your thoughts on this? :bholley's proposal is reasonable: I do need more visibility on these regressions. No need for extra alerts, but have the ability to quickly point to a subtest graph, as I still can't figure out how to do that from Perfherder. Releasing more top level tests worries me because of the volume of work. I'm not sure what its magnitude is. I believe the tests' results are affected by compression. If that's the case, this will increase the number of false alerts. Installer size has a low, fine threshold and seems to be catching things in time.
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #21) > Releasing more top level tests worries me because of the volume of work. I'm > not sure what its magnitude is. > I believe the tests' results are affected by compression. If that's the > case, this will increase the number of false alerts. To be clear, the advantage of the metrics discussed here is that they are _not_ affected by compression, whereas installer size (what we're currently tracking) _is_ affected by compression. That said, it seems reasonable to expose graphs for the subtests, and then keep an eye on them while tracking regressions in installer size, which is the metric that actually matters to users.
You need to log in before you can comment on or make changes to this bug.