Closed
Bug 1384062
Opened 7 years ago
Closed 7 years ago
AssertionError: assert self._running when shutting down resource monitor
Categories
(Testing :: Mozbase, defect)
Testing
Mozbase
Tracking
(firefox56 wontfix, firefox57 fixed)
RESOLVED
FIXED
mozilla57
People
(Reporter: sa4zet, Assigned: glandium)
References
Details
Attachments
(2 files)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:54.0) Gecko/20100101 Firefox/54.0
Build ID: 20170707114307
Steps to reproduce:
hg clone https://hg.mozilla.org/releases/mozilla-beta/ beta
export SHELL=/bin/bash
export MOZ_CURRENT_PROJECT=browser
export MOZCONFIG=./mozconfig
./mach build
mozconfig:
mk_add_options MOZ_MAKE_FLAGS="-j8"
mk_add_options MOZ_BUILD_PROJECTS="browser"
ac_add_options --enable-application=browser
ac_add_options --disable-debug
ac_add_options --disable-pulseaudio
ac_add_options --enable-alsa
ac_add_options --disable-tests
Actual results:
23:40.38 Packaging quitter@mozilla.org.xpi...
23:40.45 0 compiler warnings present.
Error running mach:
['build']
The error occurred in code that was called by the mach command. This is either
a bug in the called code itself or in the way that mach is calling it.
You should consider filing a bug for this issue.
If filing a bug, please include the full output of mach, including this error
message.
The details of the failure are as follows:
AssertionError
File "/root/mozilla-beta/python/mozbuild/mozbuild/mach_commands.py", line 528, in build
monitor.finish(record_usage=status==0)
File "/root/mozilla-beta/python/mozbuild/mozbuild/mach_commands.py", line 241, in __exit__
self.monitor.stop_resource_recording()
File "/root/mozilla-beta/python/mozbuild/mozbuild/controller/building.py", line 258, in stop_resource_recording
self.resources.stop()
File "/root/mozilla-beta/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py", line 289, in stop
assert self._running
Expected results:
mach build end with success
Updated•7 years ago
|
Component: General → Build Config
Product: Release Engineering → Core
Comment 1•7 years ago
|
||
This bug is known to me. I suspect it is a dupe. But I can't find it right now.
It is a mostly harmless (but annoying) error. The underlying bug is in the resource monitor not handling shutting down when it is not running. It should probably silently fail unless the API was clearly used incorrectly.
Summary: mach build failed on beta → AssertionError: assert self._running when shutting down resource monitor
Comment 2•7 years ago
|
||
stop_resource_recording is called twice from python/mozbuild/mozbuild/controller/building.py and python/mozbuild/mozbuild/mach_commands.py.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → mh+mozilla
Component: Build Config → Mozbase
Product: Core → Testing
Comment hidden (mozreview-request) |
Assignee | ||
Comment 4•7 years ago
|
||
Comment on attachment 8897697 [details]
Bug 1384062 - Make SystemResourceMonitor.stop more resilient to errors.
Turns out the problem is way more subtle than this, and this patch actually fixes nothing.
The core problem is that there's either a bug in python or in the kernel, and either way, it's going to require weird hacks.
So, it turns out, BuildMonitor.stop_resource_recording has its own tracking, and *does* avoid calling stop again. The way it does it is:
def stop_resource_recording(self):
if self._resources_started:
self.resources.stop()
self._resources_started = False
Now, what's actually happening is that *during* self.resources.stop (so, in SystemResourceMonitor.stop), an exception is thrown (more on that later), which makes the self._resources_started = False skipped, *and* makes the BuildOutputManager context manager invoke its __exit__, which calls stop_resource_recording again, which is how we end up calling SystemResourceMonitor.stop a second time, because self._resource_started is still True.
And an exception is thrown from this line in SystemResourceMonitor.stop:
while self._pipe.poll(1.0):
yes, the exception comes from poll. But here's the twist. Even though poll failed, there's still something to receive: if I wrap the while in a try/except, and do a _pipe.recv() in the except, I do receive the 'done' message from the end of _collect.
Printing out the exception message is instructive about what python thinks is going wrong:
poll() gave POLLNVAL or POLLERR
Looking at strace is instructive about what the underlying poll system call does:
poll([{fd=7, events=POLLIN|POLLPRI}], 1, 1000) = 1 ([{fd=7, revents=POLLIN|POLLERR|POLLHUP}])
Yup, it's telling: there's stuff to read *and* an error *and* the other end hanged up.
I'm not sure the kernel is right to be returning POLLERR here, but the fact is it does, and that makes python unhappy, and there's apparently no clean way to detect this error case, because the exception's errno is None, and there's no way to get the underlying poll return values.
A reliable way to make the error happen, because it's dependent on a race between the collect thread closing the pipe and stop trying to read from it, is to add a time.sleep(1) before the loop.
That being said, I'm not sure why poll is being used at all here...
Attachment #8897697 -
Flags: review?(ahalberstadt)
Comment hidden (mozreview-request) |
Comment 6•7 years ago
|
||
mozreview-review |
Comment on attachment 8897697 [details]
Bug 1384062 - Make SystemResourceMonitor.stop more resilient to errors.
https://reviewboard.mozilla.org/r/168986/#review174522
I think this makes sense, though I'm not all that familiar with mozsystemresourcemonitor. If you have any doubts about this, please flag gps instead.
Attachment #8897697 -
Flags: review?(ahalberstadt) → review+
Comment 7•7 years ago
|
||
mozreview-review |
Comment on attachment 8897697 [details]
Bug 1384062 - Make SystemResourceMonitor.stop more resilient to errors.
https://reviewboard.mozilla.org/r/168986/#review174560
::: testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py:304
(Diff revision 2)
> - while self._pipe.poll(1.0):
> + while True:
> + try:
> - start_time, end_time, io_diff, cpu_diff, cpu_percent, virt_mem, \
> + start_time, end_time, io_diff, cpu_diff, cpu_percent, virt_mem, \
> - swap_mem = self._pipe.recv()
> + swap_mem = self._pipe.recv()
`multiprocess.Pipe.recv()` can block. That's why the `poll()` is here.
With this patch applied locally, I was able to hang a `mach build` by issuing a ^C during just the right moment.
```
0:05.32 checking for getpagesize... yes
Process Process-1:
Traceback (most recent call last):
File "/home/gps/.pyenv/versions/2.7.12/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
0:05.34 checking for gmtime_r... yes
0:05.34 Traceback (most recent call last):
0:05.34 File "/home/gps/src/firefox/configure.py", line 124, in <module>
0:05.35 sys.exit(main(sys.argv))
0:05.35 File "/home/gps/src/firefox/configure.py", line 29, in main
0:05.35 sandbox.run(os.path.join(os.path.dirname(__file__), 'moz.configure'))
0:05.35 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 398, in run
0:05.35 self.include_file(path)
0:05.35 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 389, in include_file
0:05.35 Traceback (most recent call last):
0:05.35 File "/home/gps/src/firefox/configure.py", line 124, in <module>
0:05.35 sys.exit(main(sys.argv))
0:05.35 exec_(code, self)
0:05.35 File "/home/gps/src/firefox/python/mozbuild/mozbuild/util.py", line 59, in exec_
0:05.35 File "/home/gps/src/firefox/configure.py", line 29, in main
0:05.35 sandbox.run(os.path.join(os.path.dirname(__file__), 'moz.configure'))
0:05.35 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 427, in run
0:05.35 exec(object, globals, locals)
0:05.35 File "/home/gps/src/firefox/moz.configure", line 7, in <module>
0:05.35 include('build/moz.configure/init.configure')
0:05.35 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 693, in include_impl
0:05.35 func(*args)
0:05.35 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 473, in _value_for
0:05.35 self.include_file(what)
0:05.35 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 389, in include_file
0:05.35 return self._value_for_depends(obj, need_help_dependency)
0:05.35 File "/home/gps/src/firefox/python/mozbuild/mozbuild/util.py", line 925, in method_call
0:05.35 exec_(code, self)
0:05.35 File "/home/gps/src/firefox/python/mozbuild/mozbuild/util.py", line 59, in exec_
0:05.35 exec(object, globals, locals)
0:05.35 File "/home/gps/src/firefox/build/moz.configure/init.configure", line 160, in <module>
self.run()
File "/home/gps/.pyenv/versions/2.7.12/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/gps/src/firefox/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py", line 96, in _collect
while not pipe.poll(sleep_interval):
KeyboardInterrupt
0:05.35 @imports('distutils.sysconfig')
0:05.35 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 676, in decorator
0:05.35 cache[args] = self.func(instance, *args)
0:05.35 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 482, in _value_for_depends
0:05.35 depends = DependsFunction(self, func, dependencies, when=when)
0:05.35 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 95, in __init__
0:05.35 sandbox._value_for(self)
0:05.35 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 473, in _value_for
0:05.35 return obj.result(need_help_dependency)
0:05.36 File "/home/gps/src/firefox/python/mozbuild/mozbuild/util.py", line 925, in method_call
0:05.36 return self._value_for_depends(obj, need_help_dependency)
0:05.36 File "/home/gps/src/firefox/python/mozbuild/mozbuild/util.py", line 925, in method_call
0:05.36 cache[args] = self.func(instance, *args)
0:05.36 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 122, in result
0:05.36 return self._func(*resolved_args)
0:05.36 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 1001, in wrapped
0:05.36 cache[args] = self.func(instance, *args)
0:05.36 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 482, in _value_for_depends
0:05.36 return obj.result(need_help_dependency)
0:05.36 File "/home/gps/src/firefox/python/mozbuild/mozbuild/util.py", line 925, in method_call
0:05.36 return new_func(*args, **kwargs)
0:05.36 File "/home/gps/src/firefox/build/moz.configure/old.configure", line 350, in old_configure
0:05.36 ret = subprocess.call(cmd)
0:05.36 File "/home/gps/.pyenv/versions/2.7.12/lib/python2.7/subprocess.py", line 523, in call
0:05.36 cache[args] = self.func(instance, *args)
0:05.36 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 122, in result
0:05.36 return self._func(*resolved_args)
0:05.36 File "/home/gps/src/firefox/python/mozbuild/mozbuild/configure/__init__.py", line 1001, in wrapped
0:05.36 return new_func(*args, **kwargs)
0:05.36 File "/home/gps/src/firefox/build/moz.configure/init.configure", line 217, in virtualenv_python
0:05.36 sys.exit(subprocess.call([python] + sys.argv))
0:05.36 File "/home/gps/.pyenv/versions/2.7.12/lib/python2.7/subprocess.py", line 523, in call
0:05.36 return Popen(*popenargs, **kwargs).wait()
0:05.36 File "/home/gps/.pyenv/versions/2.7.12/lib/python2.7/subprocess.py", line 1392, in wait
0:05.36 pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
0:05.36 File "/home/gps/.pyenv/versions/2.7.12/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call
0:05.36 return func(*args)
0:05.36 KeyboardInterrupt
0:05.36 return Popen(*popenargs, **kwargs).wait()
0:05.36 File "/home/gps/.pyenv/versions/2.7.12/lib/python2.7/subprocess.py", line 1392, in wait
0:05.36 pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
0:05.36 File "/home/gps/.pyenv/versions/2.7.12/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call
0:05.36 return func(*args)
0:05.36 KeyboardInterrupt
0:05.36 client.mk:384: recipe for target 'configure' failed
0:05.36 make: *** [configure] Interrupt
```
Attachment #8897697 -
Flags: review-
Assignee | ||
Comment 8•7 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #7)
> Comment on attachment 8897697 [details]
> Bug 1384062 - Remove the poll loop from SystemResourceMonitor.stop.
>
> https://reviewboard.mozilla.org/r/168986/#review174560
>
> ::: testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py:304
> (Diff revision 2)
> > - while self._pipe.poll(1.0):
> > + while True:
> > + try:
> > - start_time, end_time, io_diff, cpu_diff, cpu_percent, virt_mem, \
> > + start_time, end_time, io_diff, cpu_diff, cpu_percent, virt_mem, \
> > - swap_mem = self._pipe.recv()
> > + swap_mem = self._pipe.recv()
>
> `multiprocess.Pipe.recv()` can block. That's why the `poll()` is here.
It doesn't matter that it blocks. It only blocks if the other end is still alive and waiting to send data.
> With this patch applied locally, I was able to hang a `mach build` by
> issuing a ^C during just the right moment.
This makes no sense, and I can't even reproduce, no matter how hard I try.
Comment hidden (mozreview-request) |
Comment 10•7 years ago
|
||
mozreview-review |
Comment on attachment 8897697 [details]
Bug 1384062 - Make SystemResourceMonitor.stop more resilient to errors.
https://reviewboard.mozilla.org/r/168986/#review175148
This looks much more resilient!
I suspect there's edge cases where we need to catch detect KeyboardInterrupt (which doesn't inherit from Exception). But this is strictly better.
Attachment #8897697 -
Flags: review?(gps) → review+
Assignee | ||
Comment 11•7 years ago
|
||
I suspect KeyboardInterrupt doesn't cause problems besides large tracebacks. First because apparently it's usually caught by the collection process, which leads to the pipe being in a non-pipe state, which is why you got the hang in comment 7, and we're now dealing with that. And if a KeyboardInterrupt happens on the main process during stop(), well, that just interrupts stop and prints a tracebacks but that should have no bad consequences anyways (at least it shouldn't leave us hanging).
Comment 12•7 years ago
|
||
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/4e9cd1d537cc
Make SystemResourceMonitor.stop more resilient to errors. r=ahal,gps
Comment 13•7 years ago
|
||
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8897697 -
Flags: review+ → review?(gps)
Comment 15•7 years ago
|
||
mozreview-review |
Comment on attachment 8897697 [details]
Bug 1384062 - Make SystemResourceMonitor.stop more resilient to errors.
https://reviewboard.mozilla.org/r/168986/#review175162
Attachment #8897697 -
Flags: review?(gps) → review+
Comment 16•7 years ago
|
||
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/8d9ae8c45dd0
Make SystemResourceMonitor.stop more resilient to errors. r=ahal,gps
Comment 17•7 years ago
|
||
bugherder |
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Comment 20•7 years ago
|
||
my bugs have been marked as duplicate of this bug and I still cannot build Firefox
Comment 21•7 years ago
|
||
Ran into this problem attempting to build release 56.0.
status-firefox56:
--- → ?
Updated•7 years ago
|
Updated•7 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•