1432996 - Track code size and uncompressed installer size on perfherder

Reporter

Description

•

7 years ago

I and others have been landing various patches to decrease the code size impact of Rust code. We've had some big wins - in the hundreds of kilobytes - and I'd like to be sure that we don't unwittingly regress these wins in the future. One problem with tracking size in the status quo is that we only track installer size, and the measurements for that are noisy. For example, in the graph in [1], we see a big drop around January 24th, corresponding to a patch nox landed. But we also see a large increase in the 4 days prior, and it's hard to pin that down to any specific commit. Glandium and I believe that the noise may be due to the compression involved in creating the installer. As such, it would be useful to also track the uncompressed text segment size, as well as the size of the uncompressed installer assets. We can then set up alerts on those graphs, and make sure that any large increases are understood and warranted. [1] https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1299710,1,2

Bobby Holley (:bholley)

Reporter

Comment 1

•

7 years ago

Thoughts Ted? Is this something that your team could take?

Flags: needinfo?(ted)

Gregory Szorc [:gps]

Comment 2

•

7 years ago

https://hg.mozilla.org/mozilla-central/file/32b850fa28ae/testing/mozharness/mozharness/mozilla/building/buildbase.py#l1914 is where the code for this lives. Although mozharness probably isn't in a position to track code sizes easily. If we emit the proper PERFHERDER_DATA line, the build system itself could emit the metrics. This is definitely territory of the build system.

Flags: needinfo?(ted)

Gregory Szorc [:gps]

Comment 3

•

7 years ago

Actually, I told bholley in email to needinfo Ted so he could give us a history lesson of these metrics in case that is useful. But at this point, it might be safe to assume history and flawed: we should record whatever metrics people find useful. Our current best mechanism for reporting metrics is PERFHERDER_DATA lines in task output. So that's where we should start.

Flags: needinfo?(ted)

Eric Rahm [:erahm]

Comment 4

•

7 years ago

Tracking libxul PGO/opt sizes would definitely be useful, that's a pretty good proxy for RSS regressions as well.

(not currently active) Ted Mielczarek

Comment 5

•

7 years ago

A long time ago we had a tool called `codesighs` that did these sorts of measurements: https://wiki.mozilla.org/Codesighs It got broken by toolchain updates that no longer produced the data it needed, so we gave up on it and didn't have a replacement. One nice feature we had in codesighs was that it would produce a diff against the previous build, so you could see what individual changes wound up doing to code size. This is a little harder now that we aren't using a single builder per-platform, but we could probably just upload profile data as an artifact and build a simple "compare two builds" tool that consumed the artifacts. There are tools like bloaty now that seem to fit this niche: https://github.com/google/bloaty/ and cargo-bloat, which is very Rust-centric right now but might be easier to add Windows support to: https://github.com/RazrFalcon/cargo-bloat/ I think that figuring out the specific tooling for this task is out of scope for the build team (although we're certainly happy to advise). If other folks can provide a tool + set of commands that would produce the necessary data we can absolutely work out the nitty-gritty of wiring it up in CI and getting data into perfherder etc. Someone on njn's team (like froydnj or glandium) might be a good fit for doing the first steps.

Flags: needinfo?(ted)

Bobby Holley (:bholley)

Reporter

Comment 6

•

7 years ago

Makes sense. Nathan, is this something that can go on your or your team's roadmap?

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Comment 7

•

7 years ago

(In reply to Bobby Holley (:bholley) from comment #6) > Makes sense. Nathan, is this something that can go on your or your team's > roadmap? Sure, I can put it on the roadmap. Probably for Q2 or Q3...though maybe sooner, since apparently people are landing large changes that completely undo the effects of nox's work: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1299710,1,2&zoom=1516953879055.2825,1517011541417.6904,62597014.92537314,63343283.58208955&selected=mozilla-inbound,1299710,299934,402685860,2 :(

Flags: needinfo?(nfroyd)

Bobby Holley (:bholley)

Reporter

Comment 8

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #7) > (In reply to Bobby Holley (:bholley) from comment #6) > > Makes sense. Nathan, is this something that can go on your or your team's > > roadmap? > > Sure, I can put it on the roadmap. Probably for Q2 or Q3...though maybe > sooner, since apparently people are landing large changes that completely > undo the effects of nox's work: Geeze, that's terrible. Joel, while we're waiting for this bug (which should give us less noisy measurements on codesize), can we at least set up an alert for large regressions in installer size like the one above?

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 9

•

7 years ago

we generate alerts for installer size already: https://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/building/buildbase.py#2000 we can change the configuration, but right now we alert on a 1% regression which would be 628K vs the ~400K seen in the above link. possibly make it an absolute limit of >250K, although this would apply to all flavors of builds and that might give more false alerts.

Flags: needinfo?(jmaher)

Bobby Holley (:bholley)

Reporter

Comment 10

•

7 years ago

I think an absolute value of 250k for installer size change is totally reasonable for alerts, especially if the alert mechanism already does some filtering for noise.

Bobby Holley (:bholley)

Reporter

Comment 11

•

7 years ago

Can we try that? We can always re-evaluate if we get a lot of false alerts.

Flags: needinfo?(jmaher)

Nathan Froyd [:froydnj]

Comment 12

•

7 years ago

I would vote for lowering the limit to 100k, personally.

Bobby Holley (:bholley)

Reporter

Comment 13

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #12) > I would vote for lowering the limit to 100k, personally. SGTM

Joel Maher ( :jmaher ) (UTC -8)

Comment 14

•

7 years ago

:igoldan, want to write up a patch for this? I am happy to review

Flags: needinfo?(jmaher) → needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Comment 15

•

7 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #14) > :igoldan, want to write up a patch for this? I am happy to review I would like to. My first question: what code parses the logs with that PERFHERDER_DATA dump? Is it a daemon configured for that? Also, I noticed we have configurations for our alert signatures recorded in the database (performance_signature table). Do we have some config file used for filling the performance_signature table?

Flags: needinfo?(igoldan)

Joel Maher ( :jmaher ) (UTC -8)

Comment 16

•

7 years ago

all your questions are automatically handled inside of perfherder/treeherder- that is out of the scope of this bug- we just need to adjust https://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/building/buildbase.py#2000 to be absolute at 100K

Ionuț Goldan [:igoldan]

Updated

•

7 years ago

Updated

•

7 years ago

Depends on: 1434587

Ionuț Goldan [:igoldan]

Updated

•

7 years ago

Updated

•

7 years ago

Depends on: 1442378

Nathan Froyd [:froydnj]

Comment 17

•

7 years ago

It turns out that we are recording raw sizes of things like libxul.so and omni.ja, e.g. for Android: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1544596,1,2&series=mozilla-inbound,1544595,1,2&series=mozilla-inbound,1544597,1,2&series=mozilla-inbound,1544594,1,2 and libxul.so for Linux x86-64 (omni.ja to be recorded in bug 1442378): https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1299711,1,2&series=mozilla-inbound,1383432,1,2 Those are currently structured as subtests of the installer size test. Joel, are we able to get alerts on subtests? If not, should we look into promoting the size tests for individual artifacts to "top-level" tests?

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 18

•

7 years ago

we do not alert on subtests- my only concern with adding more top level tests is that they get multiplied across many configs (opt/pgo/nightly/etc.) and branches- this has a high cost for sheriffing and often the sheriff is asked to do a lot of additional work in the bugs when filing them. A few options: 1) have a page that shows the other metrics you care about in an easier to see format 2) have higher thresholds for the other metrics when you make them top level 3) sheriff your own alerts, or more realistically be the original person to nag patch authors- perf sheriff files a bug and needinfo's the build team to take ownership. 4) other ideas?

Flags: needinfo?(jmaher)

Bobby Holley (:bholley)

Reporter

Comment 19

•

7 years ago

Ionut is already chasing installer size regressions. Making these metrics more visible will just make his job easier. I'm fine if you want to just keep the alerts on installer size, and make these less-noisy graphs available for easy reference in order to provide more certainty on which changeset triggered the regression.

Joel Maher ( :jmaher ) (UTC -8)

Comment 20

•

7 years ago

:igoldan, what are your thoughts on this?

Flags: needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Comment 21

•

7 years ago

(In reply to Joel Maher (:jmaher) (UTC-5) from comment #20) > :igoldan, what are your thoughts on this? :bholley's proposal is reasonable: I do need more visibility on these regressions. No need for extra alerts, but have the ability to quickly point to a subtest graph, as I still can't figure out how to do that from Perfherder. Releasing more top level tests worries me because of the volume of work. I'm not sure what its magnitude is. I believe the tests' results are affected by compression. If that's the case, this will increase the number of false alerts. Installer size has a low, fine threshold and seems to be catching things in time.

Flags: needinfo?(igoldan)

Bobby Holley (:bholley)

Reporter

Comment 22

•

7 years ago

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #21) > Releasing more top level tests worries me because of the volume of work. I'm > not sure what its magnitude is. > I believe the tests' results are affected by compression. If that's the > case, this will increase the number of false alerts. To be clear, the advantage of the metrics discussed here is that they are _not_ affected by compression, whereas installer size (what we're currently tracking) _is_ affected by compression. That said, it seems reasonable to expose graphs for the subtests, and then keep an eye on them while tracking regressions in installer size, which is the metric that actually matters to users.

BMO Automation

Updated

•

7 years ago

Product: Core → Firefox Build System

BMO Automation

Updated

•

3 years ago

Severity: normal → S3