Closed Bug 1239939 Opened 4 years ago Closed 4 years ago

Half the time spent during mach build faster is not spent on make

Categories

(Firefox Build System :: General, defect)

defect
Not set

Tracking

(firefox46 fixed)

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: glandium, Assigned: dminor)

References

Details

Attachments

(1 file)

This is what a (stripped) log of mach build faster -v looks like on my machine:

 0:00.15 /usr/bin/make -C /home/glandium/build/obj-x86_64-unknown-linux-gnu -j4 -s backend.RecursiveMakeBackend
 0:00.28 /usr/bin/make -C faster -j4
 0:00.56 make: Entering directory '/home/glandium/build/obj-x86_64-unknown-linux-gnu/faster'
(...)
 0:01.76 make: Leaving directory '/home/glandium/build/obj-x86_64-unknown-linux-gnu/faster'
 0:02.80 Overall system resources - Wall time: 2s; CPU: 65%; Read bytes: 0; Write bytes: 3305472; Read time: 0; Write time: 836
 0:02.81 Your build was successful!

There are 3 observations to make:
- It takes .15s before we start running make backend.RecursiveMakeBackend, and that's not even counting the python startup
- It takes .28s between we say make -C faster is going to be executed make saying it started working
- It takes 1.04s(!) between make says it's done and mach displaying the system resource summary and saying we're done.

That's 1.47s spent doing something that looks like is not make, which is ironically slightly more than half the time spent overall.
Python startup overhead is obviously a problem. We're likely going to have >100ms no matter what (between mach loading, etc).

The 1.04s might be due to resource collection. I think there can be a delay between shutting down the resource collector and it shutting down (due to a sleep). `mach build` was recently changed to start the resource collector on every build, not just full builds. We can probably use a threading.Event here to minimize the delay.

That 0.28s is somewhat intriguing. Output buffering?
CC dminor for possible relationship to resource collection.
cProfile output seems to confirm my suspicion we're waiting ~1s for a process shutdown in the resource collector.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.202    3.202 mach:10(<module>)
        1    0.000    0.000    3.202    3.202 mach:75(main)
        1    0.000    0.000    3.074    3.074 main.py:302(run)
        1    0.000    0.000    3.074    3.074 main.py:362(_run)
        1    0.000    0.000    3.071    3.071 registrar.py:58(_run_command_handler)
        1    0.000    0.000    3.061    3.061 mach_commands.py:270(build)
        3    0.000    0.000    1.949    0.650 process.py:43(run_process)
        1    0.001    0.001    1.042    1.042 building.py:237(finish)
       30    0.000    0.000    1.033    0.034 subprocess.py:473(_eintr_retry_call)
        1    0.000    0.000    1.014    1.014 mach_commands.py:628(_run_mach_artifact_install)
        1    0.000    0.000    1.013    1.013 base.py:641(_run_command_in_srcdir)
       13    0.000    0.000    1.010    0.078 subprocess.py:1379(wait)
       12    1.010    0.084    1.010    0.084 {posix.waitpid}
        1    0.000    0.000    1.004    1.004 resourcemonitor.py:266(stop)
        3    1.003    0.334    1.003    0.334 {method 'poll' of '_multiprocessing.Connection' objects}
        2    0.000    0.000    0.942    0.471 base.py:492(_run_make)
        2    0.000    0.000    0.935    0.468 base.py:644(_run_command_in_objdir)
        2    0.000    0.000    0.929    0.465 processhandler.py:799(wait)
        6    0.000    0.000    0.929    0.155 threading.py:309(wait)
        2    0.000    0.000    0.929    0.464 threading.py:911(join)
       28    0.928    0.033    0.928    0.033 {time.sleep}
        1    0.000    0.000    0.128    0.128 mach:46(get_mach)
        1    0.000    0.000    0.128    0.128 mach:33(check_and_get_mach)
:gps, thanks for tracking this down. It looks like the poll interval on shutdown in mozsystemmonitor is hardcoded to 1 second. Changing it to 0.1 second doesn't seem to have any adverse effects. I'll prepare a patch.
Currently we poll at an interval of 1 second when shutting down
mozsystemmonitor. This causes a noticeable shutdown delay on short
running processes. Reducing the interval to 0.1 reduces this delay without
seeming to have any adverse effects.

Review commit: https://reviewboard.mozilla.org/r/31211/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/31211/
Attachment #8708978 - Flags: review?(wlachance)
Assignee: nobody → dminor
Status: NEW → ASSIGNED
Attachment #8708978 - Flags: review?(wlachance) → review+
Comment on attachment 8708978 [details]
MozReview Request: Bug 1239939 - [mozsystemmonitor] Reduce poll interval on exit r?wlach

https://reviewboard.mozilla.org/r/31211/#review28015
https://reviewboard.mozilla.org/r/31211/#review28507

::: testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py:289
(Diff revision 1)
> -        while self._pipe.poll(1):
> +        while self._pipe.poll(0.1):

I'm not really sure why this code is using `poll` at all. It should just be able to call `recv` until an `EOFError` is raised:
https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Connection.recv
https://hg.mozilla.org/mozilla-central/rev/e9113fd6cdb8
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
See Also: → 1384062
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.