Closed
Bug 883209
Opened 11 years ago
Closed 11 years ago
Record resource usage of builds
Categories
(Firefox Build System :: General, defect)
Firefox Build System
General
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla26
People
(Reporter: gps, Assigned: gps)
References
(Blocks 1 open bug)
Details
Attachments
(3 files, 6 obsolete files)
288.53 KB,
image/png
|
Details | |
16.96 KB,
patch
|
glandium
:
review+
|
Details | Diff | Splinter Review |
21.37 KB,
patch
|
glandium
:
review+
|
Details | Diff | Splinter Review |
I've long wanted to record system resource usage during builds. Now that we have a class in mozbase for easily capturing system resource usage, I'd like to hook it up to the build system. The end goal is to facilitate easier "monitoring" of the efficiency of the build system. We'd like to optimize for 100% CPU usage while minimizing CPU time, I/O, and memory. Having these recorded in a machine readable manner will allow us to better get a handle on historical trends. It can also help developers get an understanding on why builds are slow. e.g. "oh, my machine is spending a lot of time in I/O wait, perhaps I should get an SSD." Or, "my machine is using all available CPU, maybe I should get a faster processor."
Assignee | ||
Comment 1•11 years ago
|
||
Fun Friday patch! I hooked the system resource monitor into the BuildMonitor class. We drop markers into the resource monitor during tier traversal. When a build has finished, we write out some structured log events with high-level resource usage. These are printed to the user. e.g. --- 3:58.32 nsBrowserApp.cpp 4:01.30 390 compiler warnings present. 4:02.55 Overall system resources - Wall time: 241s; CPU: 50%; Read bytes: 3854320640; Write bytes: 4454077952; Read time: 147867; Write time: 88086 4:02.55 base tier - Wall time: 3s; CPU: 29%; Read bytes: 4902912; Write bytes: 6599168; Read time: 161; Write time: 135 4:02.55 nspr tier - Wall time: 2s; CPU: 15%; Read bytes: 7102464; Write bytes: 12748288; Read time: 164; Write time: 169 4:02.55 nss tier - Wall time: 5s; CPU: 56%; Read bytes: 74002944; Write bytes: 109625856; Read time: 2254; Write time: 1238 4:02.55 js tier - Wall time: 11s; CPU: 31%; Read bytes: 442766336; Write bytes: 920877056; Read time: 11961; Write time: 14635 4:02.56 platform tier - Wall time: 183s; CPU: 58%; Read bytes: 3206482944; Write bytes: 3268277760; Read time: 130150; Write time: 64596 4:02.56 app tier - Wall time: 10s; CPU: 32%; Read bytes: 16626176; Write bytes: 50634752; Read time: 899; Write time: 5919 Your build was successful! To take your build for a test drive, run: /Users/gps/src/firefox/obj-firefox.noindex/dist/Nightly.app/Contents/MacOS/firefox For more information on what to do now, see https://developer.mozilla.org/docs/Developer_Guide/So_You_Just_Built_Firefox --- In addition, we also write out a JSON file into the object directory containing the raw data. I'd like to provide a graph view of this data so people can easily see what systems are doing during builds. Eventually, I'd like to have this file uploaded to ftp. But that requires releng using mach, which is out of scope for this bug. There are currently pieces of data missing from the JSON file, notably info on all the phases. I may upload a v2 of this patch. Or, we can delegate it to a followup. I figured I might as well upload something that works as soon as I have it.
Attachment #763026 -
Flags: review?(ted)
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → gps
Assignee | ||
Comment 2•11 years ago
|
||
I quickly added phase times to the JSON.
Attachment #763030 -
Flags: review?(ted)
Assignee | ||
Updated•11 years ago
|
Attachment #763026 -
Attachment is obsolete: true
Attachment #763026 -
Flags: review?(ted)
Assignee | ||
Comment 3•11 years ago
|
||
I made modifications necessary to support an HTML viewer.
Assignee | ||
Updated•11 years ago
|
Attachment #763030 -
Attachment is obsolete: true
Attachment #763030 -
Flags: review?(ted)
Assignee | ||
Updated•11 years ago
|
Attachment #763202 -
Flags: review?(ted)
Assignee | ||
Comment 4•11 years ago
|
||
I implemented a (very ugly looking) HTML interface to visualize the produced resource usage JSON file. It is far from feature complete and it is very ugly. But it's a start. There is a new mach command (mach build-resource-usage) which fires up a Python HTTP server and launches Firefox against said server. If all goes well, you should see some visualizations within a second or two. Again, this is very ugly. Web development is not my strong suit. I hope once it gets committed a more talented web developer than myself will feel compelled to fix it after it causes their eyes to bleed.
Attachment #763203 -
Flags: review?(ted)
Assignee | ||
Comment 5•11 years ago
|
||
This screenshot shows what things should look like. Tufte would not approve.
Assignee | ||
Comment 6•11 years ago
|
||
I added a legend to the graph.
Attachment #763692 -
Flags: review?(ted)
Assignee | ||
Updated•11 years ago
|
Attachment #763203 -
Attachment is obsolete: true
Attachment #763203 -
Flags: review?(ted)
Assignee | ||
Comment 7•11 years ago
|
||
Comment on attachment 763202 [details] [diff] [review] Record system resource usage of builds I found some issues on Windows. Cancelling review until I have them sorted out.
Attachment #763202 -
Flags: review?(ted)
Comment 8•11 years ago
|
||
Just FWIW, the output in comment 2 is way too verbose and I don't think it'll be useful to most people. I would shorten it to just the "Overall system resources" line, and maybe print the mach command that gives you the HTML view.
Comment 9•11 years ago
|
||
Comment on attachment 763692 [details] [diff] [review] Part 2: Add HTML interface to render build resource usage Review of attachment 763692 [details] [diff] [review]: ----------------------------------------------------------------- ::: python/mozbuild/mozbuild/html_build_viewer.py @@ +21,5 @@ > + self.send_header('Content-Type', 'application/json; charset=utf-8') > + self.end_headers() > + > + keys = sorted(s.json_files.keys()) > + json.dump(keys, self.wfile) nit: application/json is supposed to have an object as the root, not a list. @@ +76,5 @@ > + self.send_header('Content-Type', ct) > + self.end_headers() > + > + with open(local_path, 'rb') as fh: > + self.wfile.write(fh.read()) Can you not punt this to some stdlib class? @@ +89,5 @@ > + > + self.doc_root = doc_root > + self.json_files = {} > + > + self.server = BaseHTTPServer.HTTPServer((address, port), HTTPHandler) Pass 0 as the port to get an arbitrary free port number. You can then use httpd.server_address to get the (real_address, port) tuple back. @@ +101,5 @@ > + self.json_files[key] = path > + > + def run(self): > + while not self.do_shutdown: > + self.server.handle_request() You don't seem to try to handle Ctrl+C anywhere, won't users wind up with a stack trace when they follow instructions? That doesn't seem very friendly. ::: python/mozbuild/mozbuild/mach_commands.py @@ +379,5 @@ > + description='Show information about system resource usage from a build.') > + @CommandArgument('--address', default='localhost', > + help='Address the HTTP server should listen on.') > + @CommandArgument('--port', type=int, default=12000, > + help='Port number the HTTP server should listen on.') These feel unnecessary. Just use localhost and default to an open port. ::: python/mozbuild/mozbuild/resources/html-build-viewer/index.html @@ +1,4 @@ > +<!-- This Source Code Form is subject to the terms of the Mozilla Public > + - License, v. 2.0. If a copy of the MPL was not distributed with this > + - file, You can obtain one at http://mozilla.org/MPL/2.0/. --> > +<!DOCTYPE html> I am not going to review this file. It's totally NPOTB anyway, so I don't think it matters.
Attachment #763692 -
Flags: review?(ted) → review+
Comment 10•11 years ago
|
||
I rebased the first patch against a recentish trunk (trivial, fortunately), and 30 seconds into the build, I get: Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 808, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 761, in run self.__target(*self.__args, **self.__kwargs) File "/home/mh/mozilla-central/testing/mozbase/mozprocess/mozprocess/processhandler.py", line 698, in _processOutput self.processOutputLine(line.rstrip()) File "/home/mh/mozilla-central/testing/mozbase/mozprocess/mozprocess/processhandler.py", line 663, in processOutputLine handler(line) File "/home/mh/mozilla-central/python/mach/mach/mixin/process.py", line 86, in handleLine line_handler(line) File "/home/mh/mozilla-central/python/mozbuild/mozbuild/mach_commands.py", line 235, in on_line warning, state_changed, relevant = self.monitor.on_line(line) File "/home/mh/mozilla-central/python/mozbuild/mozbuild/controller/building.py", line 175, in on_line duration = self.resources.finish_phase(phase) File "/home/mh/mozilla-central/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py", line 294, in finish_phase assert name in self._active_phases AssertionError
Assignee | ||
Comment 11•11 years ago
|
||
Try https://hg.mozilla.org/users/gszorc_mozilla.com/gecko-patches/raw-file/default/build-resource-monitor
Comment 12•11 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #11) > Try > https://hg.mozilla.org/users/gszorc_mozilla.com/gecko-patches/raw-file/ > default/build-resource-monitor It works. I added the following to part 2 because my build machine doesn't have a browser: diff --git a/python/mozbuild/mozbuild/mach_commands.py b/python/mozbuild/mozbuild/mach_commands.py --- a/python/mozbuild/mozbuild/mach_commands.py +++ b/python/mozbuild/mozbuild/mach_commands.py @@ -407,17 +407,20 @@ class Build(MachCommandBase): if not os.path.exists(last): print('Build resources not available. If you have performed a ' 'build and receive this message, the psutil Python package ' 'likely failed to initialize properly.') return 1 server = BuildViewerServer(address, port) server.add_resource_json_file('last', last) - webbrowser.get(browser).open('http://%s:%d/' % (address, port)) + try: + webbrowser.get(browser).open('http://%s:%d/' % (address, port)) + except Exception: + print('Point a browser at http://%s:%d/' % (address, port)) print('Hit CTRL+c to stop server.') server.run() @Command('clobber', category='build', description='Clobber the tree (delete the object directory).') def clobber(self): try: self.remove_objdir()
Assignee | ||
Comment 13•11 years ago
|
||
Rebased to work with concurrent subtiers and directories. This involved moving the tier tracking into its own class. I think the resulting code is much cleaner. As part of this, I also changed the format of the BUILDSTATUS messages so downstream consumers have to keep less context. https://tbpl.mozilla.org/?tree=Try&rev=019b7a75fba7
Assignee | ||
Updated•11 years ago
|
Attachment #763202 -
Attachment is obsolete: true
Assignee | ||
Comment 14•11 years ago
|
||
Comment on attachment 790638 [details] [diff] [review] Part 1: Record system resource usage of builds; I haven't performed a polish pass on this patch yet. Throwing it up for feedback to get an idea of what you'd like to see before review/checkin.
Attachment #790638 -
Flags: feedback?(mh+mozilla)
Comment 15•11 years ago
|
||
Comment on attachment 790638 [details] [diff] [review] Part 1: Record system resource usage of builds; Review of attachment 790638 [details] [diff] [review]: ----------------------------------------------------------------- ::: config/rules.mk @@ +718,5 @@ > +# Tiers are traditionally composed of directories that are invoked either > +# once (so-called "static" directories) or 3 times with the export, libs, and > +# tools sub-tiers. > +# > +# If the TIER_$(tier)_CUSTOM variable is defined, traditional these traditional "traditional these traditional"? ::: python/mozbuild/mozbuild/controller/building.py @@ +47,5 @@ > + > + Tiers can have subtiers. Subtiers can execute sequentially or concurrently. > + > + Subtiers can have directories. Directories can execute sequentially or in > + parallel. Not sure why you used "concurrently" for subtiers and "in parallel" for directories; is there a distinction I'm missing? @@ +124,5 @@ > + if self.active_subtiers: > + st['concurrent'] = True > + > + try: > + del self.active_dirs[subtier] I'm confused to find this here @@ +194,5 @@ > + o = [] > + > + for tier, state in self.tiers.items(): > + > + t_entry = dict( Nit: no newline before this
Comment 16•11 years ago
|
||
Comment on attachment 790638 [details] [diff] [review] Part 1: Record system resource usage of builds; Review of attachment 790638 [details] [diff] [review]: ----------------------------------------------------------------- I haven't looked at the python code yet, but i figured i'd already send my first comments. ::: config/rules.mk @@ +443,5 @@ > > endef # The extra line is important here! don't delete it > > define TIER_DIR_SUBMAKE > +@echo "BUILDSTATUS TIERDIR_START $(1) $(2) $(3)" I bitrotted you, but even after updating the patch, it doesn't work on current m-i :-/ @@ +726,5 @@ > tier_$(1):: > +ifdef TIER_$(1)_CUSTOM > + $$(foreach dir,$$($$@_dirs),$$(call SUBMAKE,,$$(dir))) > +else > + @printf "BUILDSTATUS TIER_START $(1)" I'd rather see the tier munging in a separate patch, with a cover letter, because i'm not sure what the goal is here, and what it has to do with resource usage monitoring.
Assignee | ||
Comment 17•11 years ago
|
||
I've isolated the resource monitoring code. Let's see what happens with code review :) Requires patch in bug 905879 or it won't apply cleanly. https://tbpl.mozilla.org/?tree=Try&rev=a0650f17b7bc
Attachment #791067 -
Flags: review?(mh+mozilla)
Assignee | ||
Updated•11 years ago
|
Attachment #790638 -
Attachment is obsolete: true
Attachment #790638 -
Flags: feedback?(mh+mozilla)
Assignee | ||
Comment 18•11 years ago
|
||
Adding latest part 2. Ted was fine with not reviewing the HTML bits.
Attachment #791074 -
Flags: review?(mh+mozilla)
Assignee | ||
Updated•11 years ago
|
Attachment #763692 -
Attachment is obsolete: true
Comment 19•11 years ago
|
||
Comment on attachment 791067 [details] [diff] [review] Part 1: Record system resource usage of builds; Review of attachment 791067 [details] [diff] [review]: ----------------------------------------------------------------- ::: python/mozbuild/mozbuild/controller/building.py @@ +208,5 @@ > + duration=state['duration'], > + subtiers=[], > + ) > + > + add_resources_to_entry(t_entry, *resources(self._phase(tier))) You're doing this three times, and I don't see why resources() and add_resources_to_entry() have to be two separate functions. @@ +448,5 @@ > + entry['cpu_percent'] = cpu_percent > + entry['cpu_times'] = list(cpu_times) > + entry['io'] = list(io) > + > + cpu_percent, cpu_times, io = resources(None) You're using resources() and add_resources_to_entry() only once here, I think you can inline them.
Attachment #791067 -
Flags: review?(mh+mozilla) → review+
Comment 20•11 years ago
|
||
Comment on attachment 791074 [details] [diff] [review] Part 2: Add HTML interface to render build resource usage Review of attachment 791074 [details] [diff] [review]: ----------------------------------------------------------------- Haven't looked at the HTML. Note it would be nice to be able to get useful information when hovering the graph instead of the timeline below it. ::: python/mozbuild/mozbuild/html_build_viewer.py @@ +58,5 @@ > + local_path = os.path.join(root, path) > + > + # We don't normalize paths so there is technically a security > + # vulnerability for path traversal here. But, we should only run > + # locally, so we don't care. ... on the other hand, doing it takes about as many lines as the comment saying why you're not doing it... ::: python/mozbuild/mozbuild/mach_commands.py @@ +436,5 @@ > + return 1 > + > + server = BuildViewerServer(address, port) > + server.add_resource_json_file('last', last) > + webbrowser.get(browser).open_new_tab(server.url) try: webbrowser.get(browser).open_new_tab(server.url) except Exception: print('Point a browser at %s.' % server.url)
Attachment #791074 -
Flags: review?(mh+mozilla) → review+
Comment 21•11 years ago
|
||
Okay, there's a big problem with part 1: $ time ./mach build foo 0:00.10 /usr/bin/make -j8 -s foo 0:00.13 make: *** No rule to make target `foo'. Stop. real 0m1.301s user 0m0.184s sys 0m0.056s $ time make -j8 -s foo -C ../build/obj-x86_64-unknown-linux-gnu/ make: *** No rule to make target `foo'. Stop. real 0m0.038s user 0m0.032s sys 0m0.004s Without part 1 applied: $ time ./mach build foo 0:00.09 /usr/bin/make -j8 -s foo 0:00.12 make: *** No rule to make target `foo'. Stop. real 0m0.296s user 0m0.188s sys 0m0.044s This is concerning because it makes a < 1s rebuild an almost 2s one.
Assignee | ||
Comment 22•11 years ago
|
||
That 1s comes from the way mozsystemmonitor is implemented. Currently, there is a background process performing a sleep(poll_interval=1.0). When it wakes up, it checks a socket to see whether it needs to shut down. That is silly. It should poll the socket for poll_interval and detect the shutdown message immediately. I'm addressing this in bug 892342. Perhaps it would be better to only enable resource monitoring on full tree builds until that improvement lands.
Comment 23•11 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #22) > Perhaps it would be better to only enable resource monitoring on full tree > builds until that improvement lands. Agreed.
Assignee | ||
Comment 24•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/cb14e3a5a7c8 https://hg.mozilla.org/integration/mozilla-inbound/rev/cc7686d07acf With review feedback addressed.
Status: NEW → ASSIGNED
No longer depends on: 901601
Comment 25•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/cb14e3a5a7c8 https://hg.mozilla.org/mozilla-central/rev/cc7686d07acf
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
Updated•6 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•