Build logs are still eaten on Windows
Categories
(Firefox Build System :: General, defect)
Tracking
(firefox-esr115 unaffected, firefox-esr128 fixed, firefox130 unaffected, firefox131 fixed, firefox132 fixed)
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox-esr128 | --- | fixed |
firefox130 | --- | unaffected |
firefox131 | --- | fixed |
firefox132 | --- | fixed |
People
(Reporter: glandium, Assigned: glandium)
References
(Regression)
Details
(Keywords: regression)
Attachments
(2 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
This is a followup for bug 1906191. Like bug 1906191, this is made visible by rustc 1.80. There's progress, though, because now the status code is not eaten. So we can have builds that fail with nothing useful in the logs.
Assignee | ||
Comment 1•2 months ago
|
||
(In reply to Mike Hommey [:glandium] from comment #0)
Like bug 1906191, this is made visible by rustc 1.80.
Actually, bug 1906191 made it visible on the current rustc version... yay!
Comment 2•2 months ago
|
||
Set release status flags based on info from the regressing bug 1906191
Assignee | ||
Comment 3•2 months ago
|
||
Well, with everything from bug 1906191 backed out, this is still happening, so something else triggered the latent problem in a similar way rustc 1.80 did when I filed bug 1906191.
Assignee | ||
Comment 4•2 months ago
|
||
I think I know what's going on, at least after bug 1906191. Things might have been different before bug 1906191.
mach build uses processhandler without following children processes because some things start daemons. Back in the day, MSVC would start a process that handles PDB files, and another example is gradle spawning a daemon on android builds. In that mode, we don't join the reader thread. So the reader is... still reading for a while after proc.wait() returns, continues to print things, and because it's a daemon thread, it's not auto-joined when mach terminates (and that's what's expected of it, since otherwise, we'd deadlock if the build itself spawned a daemon process). So when mach terminates, the reader is terminated, even if it's still reading, which, on windows, apparently can happen quite frequently because reading the full output (or printing it out) from the build is slower than mach terminating.
We should probably have an alternative for the thread join, where we'd try to see if there's some active reading happening, and if nothing happens for, some time, we consider the reading done. Although that could mean some extra waiting when the build spawned a daemon process. That was kind of covered by the loop doing a few joins with a timeout of 1 second before bug 1906191, but that was apparently far from enough. There might also be opportunities to speed up the reading... I'll probably investigate that in a followup.
Assignee | ||
Comment 5•2 months ago
|
||
Assignee | ||
Comment 7•2 months ago
|
||
Let's add this as a regression, in the end, because even though the problem existed to some extent before, bug 1906191 definitely made it significantly worse, and we should track the fix's uplift accordingly.
Comment 9•2 months ago
|
||
Set release status flags based on info from the regressing bug 1906191
Comment 10•2 months ago
|
||
bugherder |
Comment 11•2 months ago
|
||
FYI, Bug 1916100 describes a situation where the information regarding the successful build is no longer shown at the end, but
scattered prematurely in the middle of still on going recursive build of OTHER directories (?) and very hard to grasp.
So there the log lines are NOT EATEN completely, but shown in very in appropriate places.
Assignee | ||
Comment 12•2 months ago
|
||
(In reply to ISHIKAWA, Chiaki from comment #11)
FYI, Bug 1916100 describes a situation where the information regarding the successful build is no longer shown at the end, but
scattered prematurely in the middle of still on going recursive build of OTHER directories (?) and very hard to grasp.
So there the log lines are NOT EATEN completely, but shown in very in appropriate places.
Same root cause.
Assignee | ||
Comment 13•2 months ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D220842
Updated•2 months ago
|
Comment 14•2 months ago
|
||
beta Uplift Approval Request
- User impact if declined: Possibly truncated logs, especially on job failure
- Code covered by automated testing: yes
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: N/A
- Risk associated with taking this patch: Low
- Explanation of risk level: Does not affect Firefox itself
- String changes made/needed: N/A
- Is Android affected?: no
Comment hidden (obsolete) |
Comment 16•2 months ago
|
||
:glandium i noticed source-test-python-mozbuild-linux1804-64/opt-py3 fail on this try build. (Fails on bug 1916125 as well). Could you take a look before we uplift this?
my try build with both patches
Updated•2 months ago
|
Comment 18•2 months ago
|
||
uplift |
Updated•2 months ago
|
Updated•2 months ago
|
Updated•2 months ago
|
Comment 19•2 months ago
|
||
uplift |
Description
•