Closed Bug 1472496 Opened 7 years ago Closed 7 years ago

check_binary.py is a major bottleneck for incremental builds

Tracking

(firefox-esr60 fixed, firefox64 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla64

Tracking Flags:

Tracking

Status

firefox-esr60

---

fixed

firefox64

---

fixed

People

(Reporter: kmag, Assigned: padenot)

References

Details

Attachments

(2 files, 1 obsolete file)

Bug 1472496 - Disable check_binary in developer builds; 7 years ago Gregory Szorc [:gps] 59 bytes, text/x-review-board-request	mshal : review+	Details
Bug 1472496 - Don't call out to check_binary.py in developer builds. r?mshal 7 years ago Paul Adenot (:padenot) 46 bytes, text/x-phabricator-request		Details \| Review
Bug 1472496 - Don't call out to readelf in check_binary.py. r?mshal 7 years ago Paul Adenot (:padenot) 46 bytes, text/x-phabricator-request	mshal : review+ jcristau : approval-mozilla-esr60+	Details \| Review

Kris Maglione [:kmag]

Reporter

Description

•

7 years ago

My libxul.so link times have been pretty unmanageable for incremental builds lately. I thought the linker was the bottleneck, but when I looked closer, I realized that lld takes about 3 seconds, and then check_binary.py takes at least 25 seconds. For debug builds, it takes 2-3x as long. It would probably help to split the various checks into parallel tasks, but according to the python profiler, the biggest bottlenecks are the helper apps we call, and the per-call overhead for those is ~8s, so it would still be pretty slow: 2738706 function calls (2738670 primitive calls) in 24.352 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 24.352 24.352 <string>:1(<module>) 1 0.000 0.000 24.352 24.352 /home/kris/code/mozilla-central/python/mozbuild/mozbuild/action/check_binary.py:318(main) 1 0.000 0.000 24.351 24.351 /home/kris/code/mozilla-central/python/mozbuild/mozbuild/action/check_binary.py:287(checks) 6 0.000 0.000 22.940 3.823 /home/kris/code/mozilla-central/python/mozbuild/mozbuild/util.py:933(__call__) 3 0.004 0.001 22.940 7.647 /home/kris/code/mozilla-central/python/mozbuild/mozbuild/action/check_binary.py:52(get_output) 3 0.000 0.000 22.818 7.606 /usr/lib64/python2.7/subprocess.py:194(check_output) 9 0.000 0.000 22.813 2.535 /usr/lib64/python2.7/subprocess.py:122(_eintr_retry_call) 3 0.000 0.000 22.803 7.601 /usr/lib64/python2.7/subprocess.py:452(communicate) 5 22.803 4.561 22.803 4.561 {method 'read' of 'file' objects} 1 0.000 0.000 11.077 11.077 /home/kris/code/mozilla-central/python/mozbuild/mozbuild/action/check_binary.py:139(check_textrel) 74 0.000 0.000 11.077 0.150 /home/kris/code/mozilla-central/python/mozbuild/mozbuild/action/check_binary.py:67(at_least_one) 74 0.000 0.000 11.076 0.150 /home/kris/code/mozilla-central/python/mozbuild/mozbuild/action/check_binary.py:101(iter_readelf_dynamic) 1 0.000 0.000 11.044 11.044 /home/kris/code/mozilla-central/python/mozbuild/mozbuild/action/check_binary.py:251(check_pt_load) 1 0.492 0.492 2.229 2.229 /home/kris/code/mozilla-central/python/mozbuild/mozbuild/action/check_binary.py:165(check_nsmodules) 1089718 0.720 0.000 0.720 0.000 {method 'endswith' of 'str' objects} 547616 0.141 0.000 0.141 0.000 {method 'split' of 'str' objects} 3 0.118 0.039 0.118 0.039 {method 'splitlines' of 'str' objects} 1095186/1095176 0.056 0.000 0.056 0.000 {len} 3 0.000 0.000 0.014 0.005 /usr/lib64/python2.7/subprocess.py:334(__init__) 3 0.000 0.000 0.014 0.005 /usr/lib64/python2.7/subprocess.py:900(_execute_child) 3 0.010 0.003 0.010 0.003 {posix.read} 3 0.003 0.001 0.003 0.001 {posix.fork} The best option I can think of at this point is to add a build flag to skip those checks when not running in automation.

Nathan Froyd [:froydnj]

Comment 1

•

7 years ago

I guess a corollary of the per-call overhead numbers means that it's not really worth turning off the "expensive" checks, because they're all relatively expensive, when measured against lld link times? We've always done these checks (well, not quite as completely as the Python script does), I wonder whether the Python is just significantly more expensive, or what? Making them dependent on --enable-release seems like a reasonable option, if an unfortunate one.

Blocks: 1470127

Kris Maglione [:kmag]

Reporter

Comment 2

•

7 years ago

I'm not sure how much the migration to a python script affected the run times, but they definitely do seem to have gotten slower fairly recently, to the point that I decided to track down the cause. Either way, I don't think that adding at least 25 seconds to each incremental build is good for developer productivity. Maybe we could move the binary checks to a separate task/build stage, so they would still run in full builds, but not in `build binaries`?

Mike Hommey [:glandium]

Comment 3

•

7 years ago

One difference that bug 1471132 did is that it's now looking at the .symtab symbol table rather than the .dynsym one, and the former is much larger than the latter. And there's not really a way around that because the start/stop symbols are not exported, so only appear in .symtab. I guess there are two questions that follow: is nm particularly slow to display that list, and would something else (e.g. readelf or objdump) be faster? And is python slow to read the list, and would pre-filtering with e.g. grep help?

Mike Hommey [:glandium]

Comment 4

•

7 years ago

Kris, can you check the timings on your machine with bug 1471132 backed out?

Mike Hommey [:glandium]

Updated

•

7 years ago

Flags: needinfo?(kmaglione+bmo)

Mike Hommey [:glandium]

Comment 5

•

7 years ago

So, on my machine this is actually even slower, and it's due to readelf stalling while sucking 100% CPU for some reason, and that's fixed in next version of binutils. Past that, it takes less than 2s.

Gregory Szorc [:gps]

Comment 6

•

7 years ago

Assuming this doesn't break often, let's make this test conditional on --enable-release. If the checks are made faster (a few seconds), we can make it enabled by default. But that can be done in a follow-up. Taking bug to author patch to make conditional on --enable-release.

Assignee: nobody → gps

Status: NEW → ASSIGNED

Comment hidden (mozreview-request)

Mike Hommey [:glandium]

Comment 8

•

7 years ago

Here are facts to help make an informed decision: - While the changes in bug 1471132 made the setup less fragile than it was, when things go wrong, the result is either a crash at startup (best case), or no direct crash, but random xpcom components aren't registered and initialized, meaning that things relying on them would fail. Example of components that could fail to exist because of this: nsPrefModule, nsUConvModule, nsComposerModule, nsCookieModule, etc. - The only substantial change, from both bug 1471132 and bug 1470127, that could explain a *change* in behavior and thus performance, is the flags change on nm, making it output all symbols rather than only those exported. - The only thing that happened to be slow when I tried to reproduce this was *readelf*, and I'm not even sure it would happen with an actual build (because I was downloading a build from automation, so using a different toolchain than what I had locally). It has been run as part of those post-linkage checks for years. - The actually important test doesn't use readelf, it uses nm. - Actually, now looking again at comment 0, it's clear :kmag's problem is readelf too. See how check_textrel and check_ptloads, which are the two tests using readelf, are taking 11s each. check_nsmodules, which is the one using nm, only takes 2s. So, if touching this at all, rather than disable the whole thing, I'd rather disable the tests that use readelf on non --enable-release. But I'm also not sure we should work around what, ultimately, is a problem in one specific version of binutils, by disabling tests for everybody.

Kris Maglione [:kmag]

Reporter

Comment 9

•

7 years ago

(In reply to Mike Hommey [:glandium] from comment #5) > So, on my machine this is actually even slower, and it's due to readelf > stalling while sucking 100% CPU for some reason, and that's fixed in next > version of binutils. Past that, it takes less than 2s. That would probably explain why it seems to have gotten slower, yeah. (In reply to Mike Hommey [:glandium] from comment #8) > So, if touching this at all, rather than disable the whole thing, I'd rather > disable the tests that use readelf on non --enable-release. > > But I'm also not sure we should work around what, ultimately, is a problem > in one specific version of binutils, by disabling tests for everybody. I think that's probably reasonable, but I also suspect that this affects most Linux users at the moment, so perhaps we should at least disable it for users who have a problematic version. That said, I also think that skipping this check when we don't specify --enable-release is probably reasonable, at least for `build binaries` rather than a full build. Even with a fixed version of readelf, it seems like this step will still be much slower than the actual linking. And the latest version of binutils available for Arch (which is known for its aggressive updates) still has this issue, which means it will probably still be affecting most developers for a long time to come.

Flags: needinfo?(kmaglione+bmo)

Michael Shal [:mshal]

Comment 10

•

7 years ago

mozreview-review

Comment on attachment 8989513 [details] Bug 1472496 - Disable check_binary in developer builds; https://reviewboard.mozilla.org/r/254550/#review262640 ::: config/rules.mk:568 (Diff revision 1) > # 2-second granularity > touch -t `date +%Y%m%d%H%M.%S -d 'now+5seconds'` pgo.relink > endif > else # !WINNT || GNU_CC > $(call EXPAND_CC_OR_CXX,$@) -o $@ $(COMPUTED_CXX_LDFLAGS) $(PGO_CFLAGS) $($(notdir $@)_$(OBJS_VAR_SUFFIX)) $(RESFILE) $(WIN32_EXE_LDFLAGS) $(LDFLAGS) $(STATIC_LIBS) $(MOZ_PROGRAM_LDFLAGS) $(SHARED_LIBS) $(OS_LIBS) > +ifndef DEVELOPER_OPTIONS Per #c8, I think we should put this inside check_binary.py and check buildconfig.substs['DEVELOPER_OPTIONS'] to enable check_stdcxx, check_glibc, and check_textrel. Though if it's not too far off to just make them all fast enough to run all the time, I'm also fine landing this for now since we can back it out then.

Attachment #8989513 - Flags: review+

Michael Shal [:mshal]

Updated

•

7 years ago

Attachment #8989513 - Flags: review?(core-build-config-reviews)

Kris Maglione [:kmag]

Reporter

Comment 11

•

7 years ago

The longer this goes without being fixed, the higher the chance of me accidentally pushing a patch with a hack to disable check_binary.py...

Gregory Szorc [:gps]

Comment 12

•

7 years ago

I've been preempted by other work.

Assignee: gps → nobody

Status: ASSIGNED → NEW

Michael Shal [:mshal]

Comment 13

•

7 years ago

Is switching to llvm-readelf a viable solution? Also note that the readelf bug affects dependentlibs.py as well, not just check_binary.py. So even if we disable the tests we still have to contend with a much slower dependentlibs.list generation (on my machine, this goes from <200ms to 8 seconds with the buggy readelf).

Paul Adenot (:padenot)

Assignee

Comment 14

•

7 years ago

Attached file Bug 1472496 - Don't call out to check_binary.py in developer builds. r?mshal (obsolete) — Details

Phabricator Automation

Updated

•

7 years ago

Attachment #9006304 - Attachment description: Bug 1472496 - Don't do the checks in check_binary.py in developer builds. r?mshal → Bug 1472496 - Don't call out to check_binary.py in developer builds. r?mshal

Paul Adenot (:padenot)

Assignee

Comment 15

•

7 years ago

Attached file Bug 1472496 - Don't call out to readelf in check_binary.py. r?mshal — Details

Paul Adenot (:padenot)

Assignee

Comment 16

•

7 years ago

We can pick one of the two patches here, number in the review comments.

Michael Shal [:mshal]

Comment 17

•

7 years ago

Comment on attachment 9012110 [details] Bug 1472496 - Don't call out to readelf in check_binary.py. r?mshal Michael Shal [:mshal] has approved the revision.

Attachment #9012110 - Flags: review+

Phabricator Automation

Updated

•

7 years ago

Attachment #9006304 - Attachment is obsolete: true

Pulsebot

Comment 18

•

7 years ago

Pushed by padenot@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1b9e4b90413f Don't call out to readelf in check_binary.py. r=mshal

Cristian Brindusan [:cbrindusan]

Comment 19

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1b9e4b90413f

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox64: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla64

Sylvestre Ledru [:Sylvestre]

Updated

•

7 years ago

Assignee: nobody → padenot

Tom Ritter [:tjr]

Comment 20

•

7 years ago

Comment on attachment 9012110 [details]
Bug 1472496 - Don't call out to readelf in check_binary.py. r?mshal

This is one of series of patches I am requesting uplift to esr60. Please don't uplift any if the entire series won't go. The whole series will need to go in one push.

This try run (applied on tip-of-esr60 as of an hour ago; and beginning with 'Bug 1491901 - move MK*SHLIB to moz.configure') represents the patch series. It must be applied in that order: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0659d6e265f3b624ad6fbac0c9cd7ce246094596 (If this try run doesn't complete successfully, I will investigate and figure out why)

The uplift request form is the same for all of the patch series; see https://bugzilla.mozilla.org/show_bug.cgi?id=1491901#c10

Attachment #9012110 - Flags: approval-mozilla-esr60?

Julien Cristau [:jcristau]

Comment 21

•

7 years ago

Comment on attachment 9012110 [details]
Bug 1472496 - Don't call out to readelf in check_binary.py. r?mshal

approved for 60.5esr

Attachment #9012110 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+

Julien Cristau [:jcristau]

Comment 22

•

7 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-esr60/rev/e4ec2848fed5

status-firefox-esr60: --- → fixed

You need to log in before you can comment on or make changes to this bug.