Closed Bug 1485638 Opened Last year Closed Last year
[mozsystemmonitor] Build stalls just before finishing
For the past few weeks I've noticed that sometimes the build system stalls just before being finished - for a very long time. I sometimes Ctrl-C when it stalls - the build is effectively done ( I can run it and everything ), I get this trace: 2:04.34 Finished dev [optimized + debuginfo] target(s) in 2m 00s 2:04.40 libxul.so 3:08.25 dependentlibs.list.stub 3:19.85 adding: install.rdf (deflated 53%) 3:19.87 adding: plugins/libnptest.so (deflated 66%) 3:19.90 adding: plugins/libnpsecondtest.so (deflated 66%) 3:19.92 adding: plugins/libnpthirdtest.so (deflated 66%) 3:19.94 adding: plugins/libnpswftest.so (deflated 66%) 3:26.77 Packaging firstname.lastname@example.org... 3:26.87 Packaging email@example.com... 3:26.95 Packaging firstname.lastname@example.org... ^CProcess Process-1:ort compile misc libs tools Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap self.run()rt export compile misc libs tools File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/home/icecold/mozilla-central/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py", line 101, in _collect while not pipe.poll(sleep_interval): KeyboardInterrupt 5:22.88 client.mk:150: recipe for target 'build' failed 5:22.88 make: *** [build] Interrupt 5:22.88 311 compiler warnings present. 5:23.00 ccache (direct) hit rate: 0.0%; (preprocessed) hit rate: 0.0%; miss rate: 100.0% 5:23.00 /usr/bin/notify-send --app-name=Mozilla Build System Mozilla Build System Build failed This happens intermittently... But I can reproduce it every 5 times or so. This is my .mozconfig in case that's relevant: https://gist.github.com/valenting/37752e03588d01b7279efe86dddc4a4d Platform: Linux icecold-tp 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
I've seen this as well when running the raptor android tests without geckoview_example installed. I've been looking into it and if I find a resolution I'll take this but if not someone else can take it as well.
gps, ahal: I have been running into issues with mozsystemmonitor on stop when running raptor on android locally. When I hacked things to force it to use the in-tree version, the problem went away. Bug 1272782 fixed the issue with poll() in https://searchfox.org/mozilla-central/source/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py#311, but it didn't change the poll in https://searchfox.org/mozilla-central/source/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py#101 which is what valentin is hitting. It appears that the public version https://pypi.org/project/mozsystemmonitor/#files doesn't have the fixes from bug 1272782. Perhaps we can fix the other poll and release a new version 0.4 for puppetagain and pypi?
Sounds good, let's bump it to 1.0.0 so we can move closer towards consistent SemVer in mozbase. I don't think other packages depend on this, but might as well do it.
Oops, I misread and thought this was simple version bump, but looks like there's still an unfixed bug here. I'm not familiar with this code so adding the :gps needinfo back.
I haven't touched this code in ages and am busy with other work right now. I don't think I have anything constructive to add :/
Ok. I'll work up a patch based on your earlier work.
https://docs.python.org/2.7/library/multiprocessing.html?highlight=lock#process-and-exceptions "The constructor should always be called with keyword arguments. "
Assignee: nobody → bob
Attachment #9010451 - Flags: review?(ahal)
(In reply to Bob Clary [:bc:] from comment #8) > Created attachment 9010454 [details] [diff] [review] > bug-1485638-mozsystemmonitor.patch failed lint: $ ./mach lint --outgoing /home/bclary/mozilla/builds/inbound-taskcluster/mozilla/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py 92:1 error expected 2 blank lines, found 1 E302 (flake8) fixed locally.
Comment on attachment 9010454 [details] [diff] [review] bug-1485638-mozsystemmonitor.patch Review of attachment 9010454 [details] [diff] [review]: ----------------------------------------------------------------- ::: testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py @@ +76,5 @@ > > +# multiprocessing.Pipe is not actually a pipe on at least Linux. that > +# has an effect on the expected outcome of reading from it when the > +# other end of the pipe dies, leading to possibly hanging on revc() > +# below. This could be a docstring @@ +87,5 @@ > + # returns both POLLERR and POLLIN, but python doesn't tell us > + # about it. So assume there is something to read, and we'll > + # get an exception when trying to read the data. > + return True > + Need an extra space here to appease flake8 (try running mach lint)
Attachment #9010454 - Flags: review?(ahal) → review+
$ ./mach lint --outgoing ✖ 0 problems (0 errors, 0 warnings) $ ./mach lint --linter flake8 --outgoing ✖ 0 problems (0 errors, 0 warnings) $ ./mach lint --linter py2 --outgoing ✖ 0 problems (0 errors, 0 warnings) $ ./mach lint --linter py3 --outgoing ✖ 0 problems (0 errors, 0 warnings) https://treeherder.mozilla.org/#/jobs?repo=try&tier=1,2,email@example.com&fromchange=a3e4d3f9707face3ca2422556fcdf31dee4ec3ea I don't see the lint errors. What exactly would you like me to change?
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/2c7157d3325f [mozsystemmonitor] Multiprocessing.Process should always be called with keyword arguments, r=ahal. https://hg.mozilla.org/integration/mozilla-inbound/rev/229cfd63e3d8 [mozsystemmonitor] wrap Multiprocessing.Pipe.poll in _collect as well as in SystemResourceMonitor.stop, r=ahal.
Usually flake8 requires two spaces between module level functions, but I guess the flake8 linter isn't enabled on mozsystemresourcemonitor yet. Don't worry about it.
You mean blank lines not spaces? It has two lines before and after _poll. Thanks!
You need to log in before you can comment on or make changes to this bug.