Closed Bug 1485638 Opened Last year Closed Last year

[mozsystemmonitor] Build stalls just before finishing

Categories

(Testing :: Mozbase, defect)

Version 3
defect
Not set

Tracking

(firefox-esr60 fixed, firefox64 fixed)

RESOLVED FIXED
mozilla64
Tracking Status
firefox-esr60 --- fixed
firefox64 --- fixed

People

(Reporter: valentin, Assigned: bc)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

For the past few weeks I've noticed that sometimes the build system stalls just before being finished - for a very long time.
I sometimes Ctrl-C when it stalls - the build is effectively done ( I can run it and everything ), I get this trace:

 2:04.34     Finished dev [optimized + debuginfo] target(s) in 2m 00s
 2:04.40 libxul.so
 3:08.25 dependentlibs.list.stub
 3:19.85   adding: install.rdf (deflated 53%)
 3:19.87   adding: plugins/libnptest.so (deflated 66%)
 3:19.90   adding: plugins/libnpsecondtest.so (deflated 66%)
 3:19.92   adding: plugins/libnpthirdtest.so (deflated 66%)
 3:19.94   adding: plugins/libnpswftest.so (deflated 66%)
 3:26.77 Packaging specialpowers@mozilla.org.xpi...
 3:26.87 Packaging quitter@mozilla.org.xpi...
 3:26.95 Packaging mozscreenshots@mozilla.org.xpi...
^CProcess Process-1:ort compile misc libs tools
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()rt export compile misc libs tools
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/icecold/mozilla-central/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py", line 101, in _collect
    while not pipe.poll(sleep_interval):
KeyboardInterrupt
 5:22.88 client.mk:150: recipe for target 'build' failed
 5:22.88 make: *** [build] Interrupt
 5:22.88 311 compiler warnings present.
 5:23.00 ccache (direct) hit rate: 0.0%; (preprocessed) hit rate: 0.0%; miss rate: 100.0%
 5:23.00 /usr/bin/notify-send --app-name=Mozilla Build System Mozilla Build System Build failed

This happens intermittently... But I can reproduce it every 5 times or so.
This is my .mozconfig in case that's relevant: https://gist.github.com/valenting/37752e03588d01b7279efe86dddc4a4d
Platform: Linux icecold-tp 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
I've seen this as well when running the raptor android tests without geckoview_example installed. I've been looking into it and if I find a resolution I'll take this but if not someone else can take it as well.
gps, ahal: I have been running into issues with mozsystemmonitor on stop when running raptor on android locally. When I hacked things to force it to use the in-tree version, the problem went away.

Bug 1272782 fixed the issue with poll() in https://searchfox.org/mozilla-central/source/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py#311, but it didn't change the poll in https://searchfox.org/mozilla-central/source/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py#101 which is what valentin is hitting.

It appears that the public version https://pypi.org/project/mozsystemmonitor/#files doesn't have the fixes from bug 1272782.

Perhaps we can fix the other poll and release a new version 0.4 for puppetagain and pypi?
Flags: needinfo?(gps)
Flags: needinfo?(ahal)
Sounds good, let's bump it to 1.0.0 so we can move closer towards consistent SemVer in mozbase. I don't think other packages depend on this, but might as well do it.
Flags: needinfo?(gps)
Flags: needinfo?(ahal)
Oops, I misread and thought this was simple version bump, but looks like there's still an unfixed bug here. I'm not familiar with this code so adding the :gps needinfo back.
Flags: needinfo?(gps)
Blocks: 1486908
I haven't touched this code in ages and am busy with other work right now. I don't think I have anything constructive to add :/
Flags: needinfo?(gps)
Ok. I'll work up a patch based on your earlier work.
https://docs.python.org/2.7/library/multiprocessing.html?highlight=lock#process-and-exceptions

"The constructor should always be called with keyword arguments. "
Assignee: nobody → bob
Attachment #9010451 - Flags: review?(ahal)
Attachment #9010454 - Flags: review?(ahal)
(In reply to Bob Clary [:bc:] from comment #8)
> Created attachment 9010454 [details] [diff] [review]
> bug-1485638-mozsystemmonitor.patch

failed lint:

$ ./mach lint --outgoing
/home/bclary/mozilla/builds/inbound-taskcluster/mozilla/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py
  92:1  error  expected 2 blank lines, found 1  E302 (flake8)

fixed locally.
Attachment #9010451 - Flags: review?(ahal) → review+
Comment on attachment 9010454 [details] [diff] [review]
bug-1485638-mozsystemmonitor.patch

Review of attachment 9010454 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py
@@ +76,5 @@
>  
> +# multiprocessing.Pipe is not actually a pipe on at least Linux.  that
> +# has an effect on the expected outcome of reading from it when the
> +# other end of the pipe dies, leading to possibly hanging on revc()
> +# below.

This could be a docstring

@@ +87,5 @@
> +        # returns both POLLERR and POLLIN, but python doesn't tell us
> +        # about it. So assume there is something to read, and we'll
> +        # get an exception when trying to read the data.
> +        return True
> +

Need an extra space here to appease flake8 (try running mach lint)
Attachment #9010454 - Flags: review?(ahal) → review+
$ ./mach lint --outgoing
✖ 0 problems (0 errors, 0 warnings)

$ ./mach lint --linter flake8 --outgoing
✖ 0 problems (0 errors, 0 warnings)

$ ./mach lint --linter py2 --outgoing
✖ 0 problems (0 errors, 0 warnings)

$ ./mach lint --linter py3 --outgoing
✖ 0 problems (0 errors, 0 warnings)

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1,2,3&author=bclary@mozilla.com&fromchange=a3e4d3f9707face3ca2422556fcdf31dee4ec3ea

I don't see the lint errors. What exactly would you like me to change?
Flags: needinfo?(ahal)
Pushed by bclary@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2c7157d3325f
[mozsystemmonitor] Multiprocessing.Process should always be called with keyword arguments, r=ahal.
https://hg.mozilla.org/integration/mozilla-inbound/rev/229cfd63e3d8
[mozsystemmonitor] wrap Multiprocessing.Pipe.poll in _collect as well as in SystemResourceMonitor.stop, r=ahal.
https://hg.mozilla.org/mozilla-central/rev/2c7157d3325f
https://hg.mozilla.org/mozilla-central/rev/229cfd63e3d8
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Usually flake8 requires two spaces between module level functions, but I guess the flake8 linter isn't enabled on mozsystemresourcemonitor yet. Don't worry about it.
Flags: needinfo?(ahal)
You mean blank lines not spaces? It has two lines before and after _poll. Thanks!
Blocks: 1411358
See Also: → 1502363
You need to log in before you can comment on or make changes to this bug.