Open
Bug 1432996
Opened 6 years ago
Updated 2 years ago
Track code size and uncompressed installer size on perfherder
Categories
(Firefox Build System :: General, enhancement)
Firefox Build System
General
Tracking
(Not tracked)
NEW
People
(Reporter: bholley, Unassigned)
References
Details
I and others have been landing various patches to decrease the code size impact of Rust code. We've had some big wins - in the hundreds of kilobytes - and I'd like to be sure that we don't unwittingly regress these wins in the future. One problem with tracking size in the status quo is that we only track installer size, and the measurements for that are noisy. For example, in the graph in [1], we see a big drop around January 24th, corresponding to a patch nox landed. But we also see a large increase in the 4 days prior, and it's hard to pin that down to any specific commit. Glandium and I believe that the noise may be due to the compression involved in creating the installer. As such, it would be useful to also track the uncompressed text segment size, as well as the size of the uncompressed installer assets. We can then set up alerts on those graphs, and make sure that any large increases are understood and warranted. [1] https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1299710,1,2
Reporter | ||
Comment 1•6 years ago
|
||
Thoughts Ted? Is this something that your team could take?
Flags: needinfo?(ted)
Comment 2•6 years ago
|
||
https://hg.mozilla.org/mozilla-central/file/32b850fa28ae/testing/mozharness/mozharness/mozilla/building/buildbase.py#l1914 is where the code for this lives. Although mozharness probably isn't in a position to track code sizes easily. If we emit the proper PERFHERDER_DATA line, the build system itself could emit the metrics. This is definitely territory of the build system.
Flags: needinfo?(ted)
Comment 3•6 years ago
|
||
Actually, I told bholley in email to needinfo Ted so he could give us a history lesson of these metrics in case that is useful. But at this point, it might be safe to assume history and flawed: we should record whatever metrics people find useful. Our current best mechanism for reporting metrics is PERFHERDER_DATA lines in task output. So that's where we should start.
Flags: needinfo?(ted)
Comment 4•6 years ago
|
||
Tracking libxul PGO/opt sizes would definitely be useful, that's a pretty good proxy for RSS regressions as well.
Comment 5•6 years ago
|
||
A long time ago we had a tool called `codesighs` that did these sorts of measurements: https://wiki.mozilla.org/Codesighs It got broken by toolchain updates that no longer produced the data it needed, so we gave up on it and didn't have a replacement. One nice feature we had in codesighs was that it would produce a diff against the previous build, so you could see what individual changes wound up doing to code size. This is a little harder now that we aren't using a single builder per-platform, but we could probably just upload profile data as an artifact and build a simple "compare two builds" tool that consumed the artifacts. There are tools like bloaty now that seem to fit this niche: https://github.com/google/bloaty/ and cargo-bloat, which is very Rust-centric right now but might be easier to add Windows support to: https://github.com/RazrFalcon/cargo-bloat/ I think that figuring out the specific tooling for this task is out of scope for the build team (although we're certainly happy to advise). If other folks can provide a tool + set of commands that would produce the necessary data we can absolutely work out the nitty-gritty of wiring it up in CI and getting data into perfherder etc. Someone on njn's team (like froydnj or glandium) might be a good fit for doing the first steps.
Flags: needinfo?(ted)
Reporter | ||
Comment 6•6 years ago
|
||
Makes sense. Nathan, is this something that can go on your or your team's roadmap?
Flags: needinfo?(nfroyd)
Comment 7•6 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #6) > Makes sense. Nathan, is this something that can go on your or your team's > roadmap? Sure, I can put it on the roadmap. Probably for Q2 or Q3...though maybe sooner, since apparently people are landing large changes that completely undo the effects of nox's work: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1299710,1,2&zoom=1516953879055.2825,1517011541417.6904,62597014.92537314,63343283.58208955&selected=mozilla-inbound,1299710,299934,402685860,2 :(
Flags: needinfo?(nfroyd)
Reporter | ||
Comment 8•6 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #7) > (In reply to Bobby Holley (:bholley) from comment #6) > > Makes sense. Nathan, is this something that can go on your or your team's > > roadmap? > > Sure, I can put it on the roadmap. Probably for Q2 or Q3...though maybe > sooner, since apparently people are landing large changes that completely > undo the effects of nox's work: Geeze, that's terrible. Joel, while we're waiting for this bug (which should give us less noisy measurements on codesize), can we at least set up an alert for large regressions in installer size like the one above?
Flags: needinfo?(jmaher)
Comment 9•6 years ago
|
||
we generate alerts for installer size already: https://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/building/buildbase.py#2000 we can change the configuration, but right now we alert on a 1% regression which would be 628K vs the ~400K seen in the above link. possibly make it an absolute limit of >250K, although this would apply to all flavors of builds and that might give more false alerts.
Flags: needinfo?(jmaher)
Reporter | ||
Comment 10•6 years ago
|
||
I think an absolute value of 250k for installer size change is totally reasonable for alerts, especially if the alert mechanism already does some filtering for noise.
Reporter | ||
Comment 11•6 years ago
|
||
Can we try that? We can always re-evaluate if we get a lot of false alerts.
Flags: needinfo?(jmaher)
Comment 12•6 years ago
|
||
I would vote for lowering the limit to 100k, personally.
Reporter | ||
Comment 13•6 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #12) > I would vote for lowering the limit to 100k, personally. SGTM
Comment 14•6 years ago
|
||
:igoldan, want to write up a patch for this? I am happy to review
Flags: needinfo?(jmaher) → needinfo?(igoldan)
Comment 15•6 years ago
|
||
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #14) > :igoldan, want to write up a patch for this? I am happy to review I would like to. My first question: what code parses the logs with that PERFHERDER_DATA dump? Is it a daemon configured for that? Also, I noticed we have configurations for our alert signatures recorded in the database (performance_signature table). Do we have some config file used for filling the performance_signature table?
Flags: needinfo?(igoldan)
Comment 16•6 years ago
|
||
all your questions are automatically handled inside of perfherder/treeherder- that is out of the scope of this bug- we just need to adjust https://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/building/buildbase.py#2000 to be absolute at 100K
Comment 17•6 years ago
|
||
It turns out that we are recording raw sizes of things like libxul.so and omni.ja, e.g. for Android: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1544596,1,2&series=mozilla-inbound,1544595,1,2&series=mozilla-inbound,1544597,1,2&series=mozilla-inbound,1544594,1,2 and libxul.so for Linux x86-64 (omni.ja to be recorded in bug 1442378): https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1299711,1,2&series=mozilla-inbound,1383432,1,2 Those are currently structured as subtests of the installer size test. Joel, are we able to get alerts on subtests? If not, should we look into promoting the size tests for individual artifacts to "top-level" tests?
Flags: needinfo?(jmaher)
Comment 18•6 years ago
|
||
we do not alert on subtests- my only concern with adding more top level tests is that they get multiplied across many configs (opt/pgo/nightly/etc.) and branches- this has a high cost for sheriffing and often the sheriff is asked to do a lot of additional work in the bugs when filing them. A few options: 1) have a page that shows the other metrics you care about in an easier to see format 2) have higher thresholds for the other metrics when you make them top level 3) sheriff your own alerts, or more realistically be the original person to nag patch authors- perf sheriff files a bug and needinfo's the build team to take ownership. 4) other ideas?
Flags: needinfo?(jmaher)
Reporter | ||
Comment 19•6 years ago
|
||
Ionut is already chasing installer size regressions. Making these metrics more visible will just make his job easier. I'm fine if you want to just keep the alerts on installer size, and make these less-noisy graphs available for easy reference in order to provide more certainty on which changeset triggered the regression.
Comment 21•6 years ago
|
||
(In reply to Joel Maher (:jmaher) (UTC-5) from comment #20) > :igoldan, what are your thoughts on this? :bholley's proposal is reasonable: I do need more visibility on these regressions. No need for extra alerts, but have the ability to quickly point to a subtest graph, as I still can't figure out how to do that from Perfherder. Releasing more top level tests worries me because of the volume of work. I'm not sure what its magnitude is. I believe the tests' results are affected by compression. If that's the case, this will increase the number of false alerts. Installer size has a low, fine threshold and seems to be catching things in time.
Flags: needinfo?(igoldan)
Reporter | ||
Comment 22•6 years ago
|
||
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #21) > Releasing more top level tests worries me because of the volume of work. I'm > not sure what its magnitude is. > I believe the tests' results are affected by compression. If that's the > case, this will increase the number of false alerts. To be clear, the advantage of the metrics discussed here is that they are _not_ affected by compression, whereas installer size (what we're currently tracking) _is_ affected by compression. That said, it seems reasonable to expose graphs for the subtests, and then keep an eye on them while tracking regressions in installer size, which is the metric that actually matters to users.
Updated•6 years ago
|
Product: Core → Firefox Build System
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•