Closed Bug 1463296 Opened 2 years ago Closed 2 years ago

Report interesting section sizes in build metrics

Categories

(Firefox Build System :: General, enhancement)

3 Branch
enhancement
Not set

Tracking

(firefox62 fixed)

RESOLVED FIXED
mozilla62
Tracking Status
firefox62 --- fixed

People

(Reporter: erahm, Assigned: erahm)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 2 obsolete files)

It would be useful to start tracking section sizes again with our build metrics in order to identify improvements in reducing the amount of non-sharable data.

The initial plan is to just use a filtered output from `size`. For linux my current proposal is to limit output to:
  - .rodata
  - .data
  - .data.rel.ro
  - .bss
  - .text
Example perf data forma that conforms with perfherder:

> {'name': 'XUL section sizes',
>  'shouldAlert': False,
>  'subtests': [{'name': '.rodata', 'value': 19371280},
>               {'name': '.data', 'value': 302160},
>               {'name': '.data.rel.ro', 'value': 3896872},
>               {'name': '.bss', 'value': 487120},
>               {'name': '.text', 'value': 55957711}],
>  'value': 100}

We could then add additional binaries we care about (such as NSS, avcodecs we ship, etc).
For mac we have the following sections, it's not super clear which ones we care about:

> .text                      73378672        7168
> __TEXT.__stubs                11772    73385840
> __TEXT.__stub_helper          19576    73397612
> .const                      4096437    73417216
> .cstring                    2736941    77513664
> __TEXT.__ustring              81172    80250608
> __TEXT.__gcc_except_tab      107688    80331780
> __TEXT.__objc_methname        34586    80439468
> __TEXT.__objc_classname        1747    80474054
> __TEXT.__objc_methtype        38800    80475801
> .eh_frame                   9224528    80514608
> __DATA.__nl_symbol_ptr           16    89739264
> __DATA.__got                  23072    89739280
> __DATA.__la_symbol_ptr        15696    89762352
> .mod_init_func                  784    89778048
> .const_data                15233672    89778832
> .cfstring                     12416   105012504
> __DATA.__objc_classlist         520   105024920
> __DATA.__objc_catlist            88   105025440
> __DATA.__objc_protolist         136   105025528
> __DATA.__objc_imageinfo           8   105025664
> __DATA.__objc_const           39376   105025672
> __DATA.__objc_selrefs         10344   105065048
> __DATA.__objc_protorefs           8   105075392
> __DATA.__objc_classrefs        1336   105075400
> __DATA.__objc_superrefs         408   105076736
> __DATA.__objc_ivar             1032   105077144
> __DATA.__objc_data             5200   105078176
> __DATA..kPStaticModules         488   105083376
.data                        665072   105083872
> __DATA.__thread_vars            960   105748944
> __DATA.__thread_data            880   105749920
> __DATA.__thread_bss             136   105750816
> .bss                         375904   105750960
> __DATA.__common               93584   106126864

I'm guessing:
  - .bss
  - .data
  - .const_data
  - .const
  - .text
If the total number of sections is small (say just a few dozen), my attitude is "why not track all of them." We could also consider only enabling perfherder reporting if the section size is larger than a threshold. And we could lump all "other" sections together so we have that size reported as well.

Also, when it comes time to author a patch, while the existing code for binary size reporting is in mozharness (in buildbase.py IIRC), we're trying to wean off mozharness. My vote would be to emit this as part of the build system proper rather than accrue more technical debt in mozharness.
(In reply to Gregory Szorc [:gps] from comment #3)
> If the total number of sections is small (say just a few dozen), my attitude
> is "why not track all of them." We could also consider only enabling
> perfherder reporting if the section size is larger than a threshold. And we
> could lump all "other" sections together so we have that size reported as
> well.
> 
> Also, when it comes time to author a patch, while the existing code for
> binary size reporting is in mozharness (in buildbase.py IIRC), we're trying
> to wean off mozharness. My vote would be to emit this as part of the build
> system proper rather than accrue more technical debt in mozharness.

FWIW I already implemented it in buildbase.py, but I don't really care where it goes as long as I don't have to think about hooking up custom perfherder stuff. Do you have a specific example of emitting perf statistics in the 'build system proper' that you could point me at?
This is my WIP, I'm not really sure how to test it. Apparently artificate builds are a no-go.
Since you implemented it before I wrote the comment, don't worry changing things: it isn't worth the scope bloat incurred on you.
This adds section size metrics in order to track finer grained improvements
and regressions in binary size. Currently it implements tracking of:
  - XUL
  - NSS
  - NSPR
  - mozavcodec
  - mozavutil

The sections tracked are limited in order to avoid too much noise:
  - .text
  - .data
  - .rodata
  - .data.rel.ro
  - .bss

Currently this is limited to measure Linux and Android builds, but can be
easily extended to support other platforms once we have a `size`-like tool
available.

Joel it looks like you've done some reviews in here before. Would you mind
taking a look?

gps it sounded like you had some opinions so feedback would be welcomed. I've
limited to just a subset of sections for now in order to avoid too much noise
for the sheriffs once we turn on alerting. Cutting off at a threshold is
interesting, but I'm worried we'd see sections come and go leading to empty
datapoints on our graphs.
Attachment #8979701 - Flags: review?(jmaher)
Attachment #8979701 - Flags: feedback?(gps)
Attachment #8979411 - Attachment is obsolete: true
Comment on attachment 8979701 [details] [diff] [review]
Track section sizes in build metrics

Review of attachment 8979701 [details] [diff] [review]:
-----------------------------------------------------------------

r+ from the perspective of this code is easy to read, is similar to other code in the same file, and looks to be using perfherder properly :)

::: testing/mozharness/mozharness/mozilla/building/buildbase.py
@@ +1663,5 @@
> +            self.info("Couldn't find `size` program")
> +            return []
> +
> +        # Call `size` and output with SysV format in decimal radix
> +        cmd = [size_prog, '-A', '-d', file]

do these options work for both size and gsize?
Attachment #8979701 - Flags: review?(jmaher) → review+
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #16)
> Comment on attachment 8979701 [details] [diff] [review]
> Track section sizes in build metrics
> 
> Review of attachment 8979701 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r+ from the perspective of this code is easy to read, is similar to other
> code in the same file, and looks to be using perfherder properly :)
> 
> ::: testing/mozharness/mozharness/mozilla/building/buildbase.py
> @@ +1663,5 @@
> > +            self.info("Couldn't find `size` program")
> > +            return []
> > +
> > +        # Call `size` and output with SysV format in decimal radix
> > +        cmd = [size_prog, '-A', '-d', file]
> 
> do these options work for both size and gsize?

Thanks for the quick review! `gsize` would just be a custom build of binutil's `size` that includes support for other binary formats, so yes it should be the same.
Comment on attachment 8979701 [details] [diff] [review]
Track section sizes in build metrics

Review of attachment 8979701 [details] [diff] [review]:
-----------------------------------------------------------------

This seems reasonable.

I *really* wish we could make this part of the build system proper more easily. But mozharness is definitely the easiest way to implement it these days.

Please obtain formal review from a build peer before landing this.

::: testing/mozharness/mozharness/mozilla/building/buildbase.py
@@ +1671,5 @@
> +        # <section-name> <size> <address>, ie:
> +        # .data                  302160   101053344
> +        size_section_re = re.compile(r"([\w\.]+)\s+(\d+)\s+(\d+)")
> +        sections = {}
> +        for line in StringIO(output).readlines():

You don't need StringIO here. `output` is a str/bytes and has a `.splitlines()` method.

@@ +1690,5 @@
> +            'XUL': ('libxul.so', 'xul.dll', 'XUL'),
> +            'NSS': ('libnss3.so', 'nss3.dll', 'libnss3.dylib'),
> +            'NSPR': ('libnspr4.so', 'nspr4.dll', 'libnspr4.dylib'),
> +            'avcodec': ('libmozavcodec.so', 'mozavcodec.dll', 'libmozavcodec.dylib'),
> +            'avutil': ('libmozavutil.so', 'mozavutil.dll', 'libmozavutil.dylib')

Things like this are reinventing the build system and are why code like this should live in the build system proper. Taking a step back, it's also a reason why the architecture of mozharness doesn't match the world we currently live in and is why we're trying to move away from mozharness.

Anyway, it's probably not worth your time to make this work in the build system proper. Code like this is pretty easy to port once we have some infrastructure in the build system for reporting perfherder metrics.
Attachment #8979701 - Flags: feedback?(gps) → feedback+
Nathan can you give this a build peer stamp of approval?
Attachment #8980079 - Flags: review?(nfroyd)
Attachment #8979701 - Attachment is obsolete: true
Comment on attachment 8980079 [details] [diff] [review]
Track section sizes in build metrics.  froydnj

Review of attachment 8980079 [details] [diff] [review]:
-----------------------------------------------------------------

I guess this looks fine.  I'd like to see the human-readable point below addressed, WDYT?

::: testing/mozharness/mozharness/mozilla/building/buildbase.py
@@ +1694,5 @@
> +            'avutil': ('libmozavutil.so', 'mozavutil.dll', 'libmozavutil.dylib')
> +        }
> +        # TODO(erahm): update for windows and osx. As-is we only have support
> +        # for `size` on debian which gives us linux and android.
> +        section_interests = ('.text', '.data', '.rodata', '.data.rel.ro', '.bss')

Is there any reason why we're tracking section names and not higher level concepts like "code", "read-only support" (.eh_frame and friends, better names welcome) "read-only data", "read/write data", "zero-init data", etc.?  If we want developers to actually look at these in any way, and not have somebody constantly translating what the names mean, more human-readable names would be good.

Looking into some of the size changes over the last several months suggests that .symtab/.strtab can play a significant role, maybe we should be tracking those?  Maybe that should be a followup bug, though I don't know that the data would be particularly actionable...
Attachment #8980079 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #21)
> Comment on attachment 8980079 [details] [diff] [review]
> Track section sizes in build metrics.  froydnj
> 
> Review of attachment 8980079 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I guess this looks fine.  I'd like to see the human-readable point below
> addressed, WDYT?
> 
> ::: testing/mozharness/mozharness/mozilla/building/buildbase.py
> @@ +1694,5 @@
> > +            'avutil': ('libmozavutil.so', 'mozavutil.dll', 'libmozavutil.dylib')
> > +        }
> > +        # TODO(erahm): update for windows and osx. As-is we only have support
> > +        # for `size` on debian which gives us linux and android.
> > +        section_interests = ('.text', '.data', '.rodata', '.data.rel.ro', '.bss')
> 
> Is there any reason why we're tracking section names and not higher level
> concepts like "code", "read-only support" (.eh_frame and friends, better
> names welcome) "read-only data", "read/write data", "zero-init data", etc.? 
> If we want developers to actually look at these in any way, and not have
> somebody constantly translating what the names mean, more human-readable
> names would be good.

It's hard to map all these concepts consistently across platforms. Instead it might make sense to just have a helper page that we link to once we start actually turning on alerts. There's also merit in keeping the sections separate so that it's easier to triage what exactly regressed. Was it relocations or was it writable data?

> Looking into some of the size changes over the last several months suggests
> that .symtab/.strtab can play a significant role, maybe we should be
> tracking those?  Maybe that should be a followup bug, though I don't know
> that the data would be particularly actionable...

That needs to be a follow up for when we have purpose built tool; unfortunately `size` doesn't include those sections.
erahm and I were chatting about this on IRC and I built a proof-of-concept tool in Rust that can spit out section sizes grouped by category:
https://github.com/luser/rust-size

It only supports ELF at the moment but has stubbed out branches for Mach-O/PE.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #23)
> erahm and I were chatting about this on IRC and I built a proof-of-concept
> tool in Rust that can spit out section sizes grouped by category:
> https://github.com/luser/rust-size
> 
> It only supports ELF at the moment but has stubbed out branches for
> Mach-O/PE.

Yep! That's what I was thinking of for the follow up.
Blocks: 1464501
https://hg.mozilla.org/integration/mozilla-inbound/rev/09d98fab8420c7bf2c76f2d9c3ab75eb61d89d37
Bug 1463296 - Handle if size is found but has no output. r=me CLOSED TREE
https://hg.mozilla.org/mozilla-central/rev/69db802ffc31
https://hg.mozilla.org/mozilla-central/rev/09d98fab8420
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Version: Version 3 → 3 Branch
You need to log in before you can comment on or make changes to this bug.