"make[4]: write error: stdout" while building Firefox
Categories
(Firefox Build System :: General, defect)
Tracking
(firefox-esr68 fixed, firefox69 fixed)
People
(Reporter: julienw, Assigned: glandium, NeedInfo)
References
Details
Attachments
(2 files)
51.70 KB,
application/x-bzip
|
Details | |
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-esr68+
|
Details | Review |
When building Firefox, I get these errors often: make[4]: write error: stdout make[3]: *** [security/sandbox/linux/broker/target] Erreur 1 Then restarting the build works. I build with `-j 60` because I use a icecc cluster. So this could be because of the high parallelism.
Comment 1•6 years ago
|
||
I am also experiencing this with scan-build: https://relman-ci.moz.tools/job/firefox-scan-build/348/ with -j20 (doesn't always happen)
Comment 2•6 years ago
|
||
I know developers in Mozilla offices with icecream clusters have used very high -j values to build Firefox in the past, so I'm surprised that would be an issue. From the error it sounds like this is an issue with make itself, so perhaps the version of make in use matters?
Comment 3•6 years ago
|
||
FWIW, make displays this error when it tries to close stdout and either detects that an existing error happened, or fails to close the stdout file: http://git.savannah.gnu.org/cgit/make.git/tree/output.c?h=4.2.1&id=9ab8af7cd8aaac587787d9df7e34758951d7c470#n495
Comment 4•6 years ago
|
||
btw, the make error could be improved :)
Reporter | ||
Comment 5•6 years ago
|
||
Looks like I'm not the only one: https://lists.gnu.org/archive/html/bug-make/2014-07/msg00035.html I'm using make 4.1 on Debian stable, so it's a bit old. But I'm sure this issue is fairly recent for me and wasn't happening some weeks ago.
Comment 6•6 years ago
|
||
Given that it sounds like this is a bug in upstream make. If there's anything we can change in the build system to work around it we'd certainly be open to that.
Comment 7•6 years ago
|
||
Even if this isn't our fault, I still have a ~20/30% failure rate on this in Debian testing...
Comment 9•5 years ago
|
||
Reopening, as, even if this isn't our fault, this is a huge pain for developers impacted by this. We have to start the build several time to get a final binary
Comment 10•5 years ago
|
||
I see the same on openSUSE distribution, particularly on a machine with 160 cores. However one can see it even on a smaller machine.
There's a link to strace of mach command:
https://drive.google.com/file/d/1bt94SWMaYOCnsZLX9PSQewjlNP5dD2fe/view?usp=sharing
and I'm attaching build log as well. I can also confirm the problem is new for me, Firefox 63.* releases were built fine.
And I googled the problem and it looks BSD is affected as well (starting from 12/2018, probably FF 64 release):
https://mail-index.netbsd.org/netbsd-users/2018/12/12/msg021781.html
Feel free to ask for details, I can do debugging on the system I have.
Comment 11•5 years ago
|
||
Note that the same happens on aarch64 and armv7l with only
MOZ_MAKE_FLAGS=-j6.
Comment 12•5 years ago
|
||
Got it, happens only when one uses '-v' for 'mach build'.
Reporter | ||
Comment 13•5 years ago
|
||
I don't use -v. But maybe it makes the problem more visible by making make output more stuff.
Comment 15•5 years ago
|
||
I can confirm that '-v' helps. Though I would like to use it...
Did anybody bisected the commit where it appeared for the first time?
Assignee | ||
Comment 16•5 years ago
|
||
This may be related to https://github.com/nodejs/node/issues/14752 , and may require us wrapping nodejs invocations.
Assignee | ||
Comment 17•5 years ago
|
||
The following patch may have fixed it for me (it's not landable, because it simply hides node.js's stderr, which is not desirable in case errors happen):
--- a/python/mozbuild/mozbuild/action/node.py
+++ b/python/mozbuild/mozbuild/action/node.py
@@ -5,6 +5,7 @@
from __future__ import absolute_import, unicode_literals, print_function
import buildconfig
+import os
import pipes
import subprocess
import sys
@@ -52,7 +53,7 @@ def execute_node_cmd(node_cmd_list):
print('Executing "{}"'.format(printable_cmd), file=sys.stderr)
sys.stderr.flush()
- output = subprocess.check_output(node_cmd_list)
+ output = subprocess.check_output(node_cmd_list, stderr=open(os.devnull, 'w'))
# Process the node script output
#
I'm not sure, though, because I have never been able to reproduce locally, and I'm not sure whether the environment where I did get this error before (which is, some specific downstream Debian builders) just got lucky this time or if the fix actually worked.
Can any of the people who had the problem can check?
Reporter | ||
Comment 18•5 years ago
|
||
Sorry, I can't try because I can't build with icecc these days because of the toolchain problem I talked about already.
That said it seemed to me that I didn't have the make problem with a recent build of icecc anymore. I think Paul may know more.
Comment 19•5 years ago
|
||
Great! Using node-12.4.0 with that upstream patch (minor conflicts resolved) fixed the problem for me. We were encountering the problem very reliably when using high make-parallelism (12-16+) and now it built without a glitch on 32 real threads.
Assignee | ||
Comment 20•5 years ago
|
||
(In reply to vcunat from comment #19)
Great! Using node-12.4.0 with that upstream patch (minor conflicts resolved) fixed the problem for me. We were encountering the problem very reliably when using high make-parallelism (12-16+) and now it built without a glitch on 32 real threads.
Would you mind trying with an older node and the patch from comment 17?
Comment 21•5 years ago
|
||
That also works for me. I tried once with the same set up, except for keeping unpatched node-10.16.0.
Assignee | ||
Comment 22•5 years ago
|
||
Can you try this new version?
--- a/python/mozbuild/mozbuild/action/node.py
+++ b/python/mozbuild/mozbuild/action/node.py
@@ -1,15 +1,16 @@
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
from __future__ import absolute_import, unicode_literals, print_function
import buildconfig
+import os
import pipes
import subprocess
import sys
SCRIPT_ALLOWLIST = [
buildconfig.topsrcdir + "/devtools/client/shared/build/build.js"
]
@@ -47,17 +48,18 @@ def execute_node_cmd(node_cmd_list):
printed to stderr instead.
"""
try:
printable_cmd = ' '.join(pipes.quote(arg) for arg in node_cmd_list)
print('Executing "{}"'.format(printable_cmd), file=sys.stderr)
sys.stderr.flush()
- output = subprocess.check_output(node_cmd_list)
+ output = subprocess.check_output(
+ node_cmd_list, stderr=os.dup(sys.stderr.fileno()))
# Process the node script output
#
# XXX Starting with an empty list means that node scripts can
# (intentionally or inadvertently) remove deps. Do we want this?
deps = []
for line in output.splitlines():
if 'dep:' in line:
Comment 23•5 years ago
|
||
This seem to work well on my linux box, but I see it rather often on my new 12 thread macbook.
Assignee | ||
Comment 24•5 years ago
•
|
||
(In reply to Paul Adenot (:padenot) from comment #23)
This seem to work well on my linux box, but I see it rather often on my new 12 thread macbook.
With either patch?
Comment 25•5 years ago
|
||
Without any patch, sorry. I'll try that next. It used to work a few week back.
Comment 26•5 years ago
|
||
I'll try to recall testing this on Monday, but dup should not be enough, from what I remember off the top of my head. (The non-blocking flag is set on file description, and dup/fork/... will get you another descriptor for the same/shared description.)
Comment 27•5 years ago
|
||
I confirmed experimentally that the dup patch does not solve it.
Assignee | ||
Comment 28•5 years ago
|
||
This works around https://github.com/nodejs/node/issues/14752, which
causes problems with make.
Assignee | ||
Updated•5 years ago
|
Comment 29•5 years ago
|
||
Pushed by mh@glandium.org: https://hg.mozilla.org/integration/autoland/rev/562655fedb9d Redirect node.js's stderr to a pipe. r=froydnj
Comment 30•5 years ago
|
||
bugherder |
Comment 31•5 years ago
|
||
Seems that it is improving the situation, I haven't seen it recently.
Many thanks!
Comment 32•5 years ago
|
||
Can this please be uplifted to the esr branch? The patch applies cleanly and fixes the error.
Assignee | ||
Comment 33•5 years ago
|
||
Comment on attachment 9074104 [details]
Bug 1500436 - Redirect node.js's stderr to a pipe.
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Build failures under some conditions, due to node twiddling with its output file descriptors.
- User impact if declined: See above.
- Fix Landed on Version: 69
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Build system change that has landed a while ago.
- String or UUID changes made by this patch: N/A
Comment 35•5 years ago
|
||
Comment on attachment 9074104 [details]
Bug 1500436 - Redirect node.js's stderr to a pipe.
Low-risk build system fix to avoid random bustage. Approved for 68.3esr.
Comment 36•5 years ago
|
||
bugherder uplift |
Comment 37•4 years ago
|
||
Adding here something I just commented on https://github.com/mozilla/gecko/commit/f8e128f0fe15e6e40a6ac8ac07698649777b8fca#r36592216 -- I think the comment in the fix commit is misleading:
We could theoretically remove this later, once we require a fixed node?
I suspect this is not true, because the "fix" relies on the node process exiting successfully; it appears to not work when the node process is killed or crashes (see my comment on https://github.com/nodejs/node/issues/14752#issuecomment-569285807), and I suspect tests must continue to work even if that happens.
Thus I suspect that the added comment, which suggests that the workaround can be removed when the issue is closed, is misleading.
Updated•3 years ago
|
Description
•