Closed Bug 1485638 Opened 6 years ago Closed 6 years ago

[mozsystemmonitor] Build stalls just before finishing


(Testing :: Mozbase, defect)

Version 3
Not set


(firefox-esr60 fixed, firefox64 fixed)

Tracking Status
firefox-esr60 --- fixed
firefox64 --- fixed


(Reporter: valentin, Assigned: bc)


(Blocks 1 open bug)



(2 files)

For the past few weeks I've noticed that sometimes the build system stalls just before being finished - for a very long time.
I sometimes Ctrl-C when it stalls - the build is effectively done ( I can run it and everything ), I get this trace:

 2:04.34     Finished dev [optimized + debuginfo] target(s) in 2m 00s
 3:08.25 dependentlibs.list.stub
 3:19.85   adding: install.rdf (deflated 53%)
 3:19.87   adding: plugins/ (deflated 66%)
 3:19.90   adding: plugins/ (deflated 66%)
 3:19.92   adding: plugins/ (deflated 66%)
 3:19.94   adding: plugins/ (deflated 66%)
 3:26.77 Packaging
 3:26.87 Packaging
 3:26.95 Packaging
^CProcess Process-1:ort compile misc libs tools
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/", line 267, in _bootstrap export compile misc libs tools
  File "/usr/lib/python2.7/multiprocessing/", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/icecold/mozilla-central/testing/mozbase/mozsystemmonitor/mozsystemmonitor/", line 101, in _collect
    while not pipe.poll(sleep_interval):
 5:22.88 recipe for target 'build' failed
 5:22.88 make: *** [build] Interrupt
 5:22.88 311 compiler warnings present.
 5:23.00 ccache (direct) hit rate: 0.0%; (preprocessed) hit rate: 0.0%; miss rate: 100.0%
 5:23.00 /usr/bin/notify-send --app-name=Mozilla Build System Mozilla Build System Build failed

This happens intermittently... But I can reproduce it every 5 times or so.
This is my .mozconfig in case that's relevant:
Platform: Linux icecold-tp 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
I've seen this as well when running the raptor android tests without geckoview_example installed. I've been looking into it and if I find a resolution I'll take this but if not someone else can take it as well.
gps, ahal: I have been running into issues with mozsystemmonitor on stop when running raptor on android locally. When I hacked things to force it to use the in-tree version, the problem went away.

Bug 1272782 fixed the issue with poll() in, but it didn't change the poll in which is what valentin is hitting.

It appears that the public version doesn't have the fixes from bug 1272782.

Perhaps we can fix the other poll and release a new version 0.4 for puppetagain and pypi?
Flags: needinfo?(gps)
Flags: needinfo?(ahal)
Sounds good, let's bump it to 1.0.0 so we can move closer towards consistent SemVer in mozbase. I don't think other packages depend on this, but might as well do it.
Flags: needinfo?(gps)
Flags: needinfo?(ahal)
Oops, I misread and thought this was simple version bump, but looks like there's still an unfixed bug here. I'm not familiar with this code so adding the :gps needinfo back.
Flags: needinfo?(gps)
Blocks: 1486908
I haven't touched this code in ages and am busy with other work right now. I don't think I have anything constructive to add :/
Flags: needinfo?(gps)
Ok. I'll work up a patch based on your earlier work.

"The constructor should always be called with keyword arguments. "
Assignee: nobody → bob
Attachment #9010451 - Flags: review?(ahal)
Attachment #9010454 - Flags: review?(ahal)
(In reply to Bob Clary [:bc:] from comment #8)
> Created attachment 9010454 [details] [diff] [review]
> bug-1485638-mozsystemmonitor.patch

failed lint:

$ ./mach lint --outgoing
  92:1  error  expected 2 blank lines, found 1  E302 (flake8)

fixed locally.
Attachment #9010451 - Flags: review?(ahal) → review+
Comment on attachment 9010454 [details] [diff] [review]

Review of attachment 9010454 [details] [diff] [review]:

::: testing/mozbase/mozsystemmonitor/mozsystemmonitor/
@@ +76,5 @@
> +# multiprocessing.Pipe is not actually a pipe on at least Linux.  that
> +# has an effect on the expected outcome of reading from it when the
> +# other end of the pipe dies, leading to possibly hanging on revc()
> +# below.

This could be a docstring

@@ +87,5 @@
> +        # returns both POLLERR and POLLIN, but python doesn't tell us
> +        # about it. So assume there is something to read, and we'll
> +        # get an exception when trying to read the data.
> +        return True
> +

Need an extra space here to appease flake8 (try running mach lint)
Attachment #9010454 - Flags: review?(ahal) → review+
$ ./mach lint --outgoing
✖ 0 problems (0 errors, 0 warnings)

$ ./mach lint --linter flake8 --outgoing
✖ 0 problems (0 errors, 0 warnings)

$ ./mach lint --linter py2 --outgoing
✖ 0 problems (0 errors, 0 warnings)

$ ./mach lint --linter py3 --outgoing
✖ 0 problems (0 errors, 0 warnings),2,3&

I don't see the lint errors. What exactly would you like me to change?
Flags: needinfo?(ahal)
Pushed by
[mozsystemmonitor] Multiprocessing.Process should always be called with keyword arguments, r=ahal.
[mozsystemmonitor] wrap Multiprocessing.Pipe.poll in _collect as well as in SystemResourceMonitor.stop, r=ahal.
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Usually flake8 requires two spaces between module level functions, but I guess the flake8 linter isn't enabled on mozsystemresourcemonitor yet. Don't worry about it.
Flags: needinfo?(ahal)
You mean blank lines not spaces? It has two lines before and after _poll. Thanks!
Blocks: 1411358
See Also: → 1502363
You need to log in before you can comment on or make changes to this bug.