Stall during early `./mach build`: toolkit/library/build/libxul.so sometimes takes exactly 10 minutes to complete
Categories
(Firefox Build System :: General, defect, P3)
Tracking
(Not tracked)
People
(Reporter: manuel, Unassigned)
References
(Blocks 1 open bug)
Details
This is the ~third time it happened to me. First time was probably last week. After pressing ctrl+c and entering ./mach build again the build just works.
System: Arch Linux
0:28.61 toolkit/library/build/libxul.so
0:33.13 ./dependentlibs.list.stub
0:34.82 Packaging specialpowers@mozilla.org.xpi...
0:34.89 Packaging mozscreenshots@mozilla.org.xpi...
0:52.72 W 234 compiler warnings present.
Process Process-1:xport compile misc libs tools
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/user/dev/gecko5/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py", line 137, in _collect
while not _poll(pipe, poll_interval=sleep_interval):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/dev/gecko5/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py", line 101, in _poll
return pipe.poll(poll_interval)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 256, in poll
return self._poll(timeout)
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 423, in _poll
r = wait([self], timeout)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 930, in wait
ready = selector.select(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
Comment 1•2 years ago
|
||
The severity field is not set for this bug.
:ahochheiden, could you have a look please?
For more information, please visit BugBot documentation.
Comment 2•2 years ago
|
||
I wasn't able to reproduce, but the comments where you're hanging seem to leave a clue: https://searchfox.org/mozilla-central/source/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py#91-108
"""Wrap multiprocessing.Pipe.poll to hide POLLERR and POLLIN
exceptions.
multiprocessing.Pipe is not actually a pipe on at least Linux.
That has an effect on the expected outcome of reading from it when
the other end of the pipe dies, leading to possibly hanging on revc()
below.
So I'm guessing something is killing resource monitor process.
| Reporter | ||
Comment 3•2 years ago
|
||
Some additional infos: It always stops after mozscreenshots@mozilla.org.xpi
0:28.61 toolkit/library/build/libxul.so
0:33.13 ./dependentlibs.list.stub
0:34.82 Packaging specialpowers@mozilla.org.xpi...
0:34.89 Packaging mozscreenshots@mozilla.org.xpi...
ctrl-c
0:52.72 W 234 compiler warnings present.
Process Process-1:xport compile misc libs tools
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/user/dev/gecko5/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py", line 137, in _collect
while not _poll(pipe, poll_interval=sleep_interval):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/dev/gecko5/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py", line 101, in _poll
return pipe.poll(poll_interval)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 256, in poll
return self._poll(timeout)
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 423, in _poll
r = wait([self], timeout)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 930, in wait
ready = selector.select(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
| Reporter | ||
Comment 4•2 years ago
|
||
It seems like it eventually completes.
0:54.58 ./dependentlibs.list.stub
0:56.45 Packaging specialpowers@mozilla.org.xpi...
0:56.53 Packaging mozscreenshots@mozilla.org.xpi...
10:53.61 W 232 compiler warnings present.
10:53.76 W Overall system resources - Wall time: 653s; CPU: 5%; Read bytes: 108104774656; Write bytes: 12304355328; Read time: 1102410; Write time: 565229
10:53.76 W Swap in/out (MB): 341.2578125/1153.75390625
10:53.90 /usr/bin/notify-send '--app-name=Mozilla Build System' 'Mozilla Build System' 'Build complete'
To view resource usage of the build, run |mach resource-usage|.
10:53.93 Your build was successful!
To take your build for a test drive, run: |mach run|
For more information on what to do now, see https://firefox-source-docs.mozilla.org/setup/contributing_code.html
Here it took 10 min Packaging mozscreenshots@mozilla.org.xpi..., where I apparently sometimes just cancel early. Here I simply forgot that I still had a build running.
Comment 5•1 year ago
|
||
I've been encountering this issue fairly often on my machine. I've got used to Ctrl-C and restart the build.
| Reporter | ||
Comment 6•1 year ago
|
||
Full log:
0:00.52 W Clobber not needed.
0:00.83 W Adding make options from /home/user/dev/gecko4/.mozconfig
AUTOCLOBBER=1
MOZ_OBJDIR=/home/user/dev/gecko4/obj-debug-sccache
OBJDIR=/home/user/dev/gecko4/obj-debug-sccache
FOUND_MOZCONFIG=/home/user/dev/gecko4/.mozconfig
export FOUND_MOZCONFIG
Parallelism determined by memory: using 28 jobs for 28 cores based on 62.5 GiB RAM and estimated job size of 1.0 GiB
0:00.83 /usr/bin/make -f client.mk -j28 -s
0:01.14 Elapsed: 0.00s; From dist/public: Kept 0 existing; Added/updated 0; Removed 0 files and 0 directories.
0:01.15 Elapsed: 0.01s; From dist/xpi-stage: Kept 91 existing; Added/updated 0; Removed 0 files and 0 directories.
0:01.17 Elapsed: 0.00s; From dist/private: Kept 0 existing; Added/updated 0; Removed 0 files and 0 directories.
0:01.39 Elapsed: 0.25s; From _tests: Kept 1424 existing; Added/updated 0; Removed 0 files and 0 directories.
0:01.48 Elapsed: 0.33s; From dist/include: Kept 6959 existing; Added/updated 0; Removed 0 files and 0 directories.
0:01.57 Elapsed: 0.43s; From dist/bin: Kept 2916 existing; Added/updated 0; Removed 0 files and 0 directories.
0:02.40 toolkit/components/antitracking/bouncetrackingprotection
0:02.43 toolkit/library/buildid.cpp.stub
0:13.54 toolkit/library/build/libxul.so
0:19.37 ./dependentlibs.list.stub
0:21.80 Packaging specialpowers@mozilla.org.xpi...
0:21.99 Packaging mozscreenshots@mozilla.org.xpi...
10:13.59 W 184 compiler warnings present.
10:14.12 W Overall system resources - Wall time: 614s; CPU: 6%; Read bytes: 12693970944; Write bytes: 9846194176; Read time: 164485; Write time: 787700
10:14.12 /usr/bin/notify-send '--app-name=Mozilla Build System' 'Mozilla Build System' 'Build complete'
To view a profile of the build, run |mach resource-usage|.
10:14.21 Your build was successful!
To take your build for a test drive, run: |mach run|
For more information on what to do now, see https://firefox-source-docs.mozilla.org/setup/contributing_code.html
If we assume a 10min timeout then it is 0:13.54 toolkit/library/build/libxul.so taking 10min.
| Reporter | ||
Comment 7•1 year ago
|
||
@Alex, I'm also still encountering this fairly often, but don't know how to force this. Do you have any hints on how to debug this?
Comment 8•1 year ago
|
||
I think it ends up hanging in mozbuild/action/zip.py
From:
https://searchfox.org/mozilla-central/source/config/rules.mk#908
https://searchfox.org/mozilla-central/source/config/makefiles/functions.mk#36
So debugging Jarrer with breakpoints is where I'd start if I could reproduce it.
Comment 9•1 year ago
|
||
I doubt the jarrer is involved here. Those log messages are a red herring, they don't tell when things finish. You'd be better off looking at what processes are still running when it's stuck. And chances are it's mach only, which would suggest yet another bug with the resource manager.
Comment 10•1 year ago
|
||
When this happens this is the process tree I get:
bash
`-python3 ./mach build
|-gmake -f client.mk -j16 -s
|-(python)
|-python3 ./mach build
|-{python3}
|-{python3}
|-{python3}
`-{python3}
And interrupting the command with CTRL-C yields this stack trace:
Process Process-1:xport compile misc libs tools
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/gsvelto/projects/mozilla-central/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py", line 137, in _collect
while not _poll(pipe, poll_interval=sleep_interval):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gsvelto/projects/mozilla-central/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py", line 104, in _poll
return pipe.poll(poll_interval)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 440, in _poll
r = wait([self], timeout)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 948, in wait
ready = selector.select(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
Updated•1 year ago
|
Comment 12•1 year ago
|
||
I see this a great deal, but it's not consistent. I'm confident it's a bug of some sort in the resource monitor, which really means in Python's multiprocessing library.
Comment 13•1 year ago
|
||
(In reply to Nick Alexander :nalexander [he/him] from comment #12)
I see this a great deal, but it's not consistent. I'm confident it's a bug of some sort in the resource monitor, which really means in Python's multiprocessing library.
Or somewhere in our resource monitor code, that wouldn't be the first time.
| Reporter | ||
Comment 14•1 year ago
|
||
Process tree with ps -ef --forest
/bin/bash /home/user/.local/bin/mach build
\_ python3 ./mach build
\_ [python] <defunct>
\_ python3 ./mach build
\_ /usr/bin/make -f client.mk -j28 -s
I don't know how to set breakpoints or run mach with a debugger to investigate this further. Is there an easy way to do that?
Comment 16•1 year ago
|
||
I've started seeing this fairly often in the last few months.
| Reporter | ||
Updated•1 year ago
|
Comment 17•1 year ago
|
||
This sounds similar to Bug 1499382, which I have been experiencing for quite some time. A lot more people seem to be experiencing this now though, so this may be a separate bug with similar symptoms.
Still, to the people experiencing this just now: Are you using sccache? And if so, does running sccache --stop-server in a separate terminal while the hang is happening fix the hang? If that is the case, I think these two bugs are indeed the same and we can dupe one of them.
Comment 18•1 year ago
|
||
(In reply to Malte Jürgens [:maltejur] from comment #17)
Still, to the people experiencing this just now: Are you using sccache? And if so, does running
sccache --stop-serverin a separate terminal while the hang is happening fix the hang?
Running sccache --stop-server seems to fix the hang for me.
| Reporter | ||
Comment 19•1 year ago
|
||
Running
sccache --stop-serverseems to fix the hang for me.
Same for me
$ ps -ef | rg sccache
user 56565 3584 0 11:37 ? 00:00:00 /home/user/.mozbuild/sccache/sccache
$ /home/user/.mozbuild/sccache/sccache --stop-server
Stopping sccache server...
Compile requests 2
Compile requests executed 1
Cache hits 0
Cache misses 1
Cache misses (C/C++) 1
Cache timeouts 0
Cache read errors 0
Forced recaches 0
Cache write errors 0
Compilation failures 0
Cache errors 0
Non-cacheable compilations 0
Non-cacheable calls 0
Non-compilation calls 1
Unsupported compiler calls 0
Average cache write 0.000 s
Average compiler 5.864 s
Average cache read hit 0.000 s
Failed distributed compilations 0
Cache location Local disk: "/home/user/.cache/sccache"
Use direct/preprocessor mode? yes
Version (client) 0.7.7
Cache size 10 GiB
Max cache size 10 GiB
Build completes after calling stop-server.
Description
•