Half the time spent during mach build faster is not spent on make

RESOLVED FIXED in Firefox 46

Status

()

Core
Build Config
RESOLVED FIXED
2 years ago
2 months ago

People

(Reporter: glandium, Assigned: dminor)

Tracking

unspecified
mozilla46
Points:
---

Firefox Tracking Flags

(firefox46 fixed)

Details

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
This is what a (stripped) log of mach build faster -v looks like on my machine:

 0:00.15 /usr/bin/make -C /home/glandium/build/obj-x86_64-unknown-linux-gnu -j4 -s backend.RecursiveMakeBackend
 0:00.28 /usr/bin/make -C faster -j4
 0:00.56 make: Entering directory '/home/glandium/build/obj-x86_64-unknown-linux-gnu/faster'
(...)
 0:01.76 make: Leaving directory '/home/glandium/build/obj-x86_64-unknown-linux-gnu/faster'
 0:02.80 Overall system resources - Wall time: 2s; CPU: 65%; Read bytes: 0; Write bytes: 3305472; Read time: 0; Write time: 836
 0:02.81 Your build was successful!

There are 3 observations to make:
- It takes .15s before we start running make backend.RecursiveMakeBackend, and that's not even counting the python startup
- It takes .28s between we say make -C faster is going to be executed make saying it started working
- It takes 1.04s(!) between make says it's done and mach displaying the system resource summary and saying we're done.

That's 1.47s spent doing something that looks like is not make, which is ironically slightly more than half the time spent overall.

Comment 1

2 years ago
Python startup overhead is obviously a problem. We're likely going to have >100ms no matter what (between mach loading, etc).

The 1.04s might be due to resource collection. I think there can be a delay between shutting down the resource collector and it shutting down (due to a sleep). `mach build` was recently changed to start the resource collector on every build, not just full builds. We can probably use a threading.Event here to minimize the delay.

That 0.28s is somewhat intriguing. Output buffering?

Comment 2

2 years ago
CC dminor for possible relationship to resource collection.

Comment 3

2 years ago
cProfile output seems to confirm my suspicion we're waiting ~1s for a process shutdown in the resource collector.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.202    3.202 mach:10(<module>)
        1    0.000    0.000    3.202    3.202 mach:75(main)
        1    0.000    0.000    3.074    3.074 main.py:302(run)
        1    0.000    0.000    3.074    3.074 main.py:362(_run)
        1    0.000    0.000    3.071    3.071 registrar.py:58(_run_command_handler)
        1    0.000    0.000    3.061    3.061 mach_commands.py:270(build)
        3    0.000    0.000    1.949    0.650 process.py:43(run_process)
        1    0.001    0.001    1.042    1.042 building.py:237(finish)
       30    0.000    0.000    1.033    0.034 subprocess.py:473(_eintr_retry_call)
        1    0.000    0.000    1.014    1.014 mach_commands.py:628(_run_mach_artifact_install)
        1    0.000    0.000    1.013    1.013 base.py:641(_run_command_in_srcdir)
       13    0.000    0.000    1.010    0.078 subprocess.py:1379(wait)
       12    1.010    0.084    1.010    0.084 {posix.waitpid}
        1    0.000    0.000    1.004    1.004 resourcemonitor.py:266(stop)
        3    1.003    0.334    1.003    0.334 {method 'poll' of '_multiprocessing.Connection' objects}
        2    0.000    0.000    0.942    0.471 base.py:492(_run_make)
        2    0.000    0.000    0.935    0.468 base.py:644(_run_command_in_objdir)
        2    0.000    0.000    0.929    0.465 processhandler.py:799(wait)
        6    0.000    0.000    0.929    0.155 threading.py:309(wait)
        2    0.000    0.000    0.929    0.464 threading.py:911(join)
       28    0.928    0.033    0.928    0.033 {time.sleep}
        1    0.000    0.000    0.128    0.128 mach:46(get_mach)
        1    0.000    0.000    0.128    0.128 mach:33(check_and_get_mach)
(Assignee)

Comment 4

2 years ago
:gps, thanks for tracking this down. It looks like the poll interval on shutdown in mozsystemmonitor is hardcoded to 1 second. Changing it to 0.1 second doesn't seem to have any adverse effects. I'll prepare a patch.
(Assignee)

Comment 5

2 years ago
Created attachment 8708978 [details]
MozReview Request: Bug 1239939 - [mozsystemmonitor] Reduce poll interval on exit r?wlach

Currently we poll at an interval of 1 second when shutting down
mozsystemmonitor. This causes a noticeable shutdown delay on short
running processes. Reducing the interval to 0.1 reduces this delay without
seeming to have any adverse effects.

Review commit: https://reviewboard.mozilla.org/r/31211/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/31211/
Attachment #8708978 - Flags: review?(wlachance)
(Assignee)

Updated

2 years ago
Assignee: nobody → dminor
Status: NEW → ASSIGNED
Attachment #8708978 - Flags: review?(wlachance) → review+
Comment on attachment 8708978 [details]
MozReview Request: Bug 1239939 - [mozsystemmonitor] Reduce poll interval on exit r?wlach

https://reviewboard.mozilla.org/r/31211/#review28015

Comment 7

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/e9113fd6cdb8
https://reviewboard.mozilla.org/r/31211/#review28507

::: testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py:289
(Diff revision 1)
> -        while self._pipe.poll(1):
> +        while self._pipe.poll(0.1):

I'm not really sure why this code is using `poll` at all. It should just be able to call `recv` until an `EOFError` is raised:
https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Connection.recv

Comment 9

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/e9113fd6cdb8
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox46: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla46

Updated

2 months ago
See Also: → bug 1384062
You need to log in before you can comment on or make changes to this bug.