Closed Bug 1915540 Opened 2 months ago Closed 2 months ago

Build logs are still eaten on Windows

Categories

(Firefox Build System :: General, defect)

defect

Tracking

(firefox-esr115 unaffected, firefox-esr128 fixed, firefox130 unaffected, firefox131 fixed, firefox132 fixed)

RESOLVED FIXED
132 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 --- fixed
firefox130 --- unaffected
firefox131 --- fixed
firefox132 --- fixed

People

(Reporter: glandium, Assigned: glandium)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

This is a followup for bug 1906191. Like bug 1906191, this is made visible by rustc 1.80. There's progress, though, because now the status code is not eaten. So we can have builds that fail with nothing useful in the logs.

(In reply to Mike Hommey [:glandium] from comment #0)

Like bug 1906191, this is made visible by rustc 1.80.

Actually, bug 1906191 made it visible on the current rustc version... yay!

No longer blocks: rustc-1.80
No longer depends on: 1906191
Regressed by: 1906191

Set release status flags based on info from the regressing bug 1906191

Well, with everything from bug 1906191 backed out, this is still happening, so something else triggered the latent problem in a similar way rustc 1.80 did when I filed bug 1906191.

I think I know what's going on, at least after bug 1906191. Things might have been different before bug 1906191.

mach build uses processhandler without following children processes because some things start daemons. Back in the day, MSVC would start a process that handles PDB files, and another example is gradle spawning a daemon on android builds. In that mode, we don't join the reader thread. So the reader is... still reading for a while after proc.wait() returns, continues to print things, and because it's a daemon thread, it's not auto-joined when mach terminates (and that's what's expected of it, since otherwise, we'd deadlock if the build itself spawned a daemon process). So when mach terminates, the reader is terminated, even if it's still reading, which, on windows, apparently can happen quite frequently because reading the full output (or printing it out) from the build is slower than mach terminating.

We should probably have an alternative for the thread join, where we'd try to see if there's some active reading happening, and if nothing happens for, some time, we consider the reading done. Although that could mean some extra waiting when the build spawned a daemon process. That was kind of covered by the loop doing a few joins with a timeout of 1 second before bug 1906191, but that was apparently far from enough. There might also be opportunities to speed up the reading... I'll probably investigate that in a followup.

Duplicate of this bug: 1916100

Let's add this as a regression, in the end, because even though the problem existed to some extent before, bug 1906191 definitely made it significantly worse, and we should track the fix's uplift accordingly.

Keywords: regression
Regressed by: 1906191
Pushed by mh@glandium.org: https://hg.mozilla.org/integration/autoland/rev/74cfbfc5681b Somehow wait for the reader to finish even if we can't join the reader. r=releng-reviewers,jcristau

Set release status flags based on info from the regressing bug 1906191

Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 132 Branch

FYI, Bug 1916100 describes a situation where the information regarding the successful build is no longer shown at the end, but
scattered prematurely in the middle of still on going recursive build of OTHER directories (?) and very hard to grasp.
So there the log lines are NOT EATEN completely, but shown in very in appropriate places.

(In reply to ISHIKAWA, Chiaki from comment #11)

FYI, Bug 1916100 describes a situation where the information regarding the successful build is no longer shown at the end, but
scattered prematurely in the middle of still on going recursive build of OTHER directories (?) and very hard to grasp.
So there the log lines are NOT EATEN completely, but shown in very in appropriate places.

Same root cause.

Attachment #9422403 - Flags: approval-mozilla-beta?

beta Uplift Approval Request

  • User impact if declined: Possibly truncated logs, especially on job failure
  • Code covered by automated testing: yes
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: N/A
  • Risk associated with taking this patch: Low
  • Explanation of risk level: Does not affect Firefox itself
  • String changes made/needed: N/A
  • Is Android affected?: no

:glandium i noticed source-test-python-mozbuild-linux1804-64/opt-py3 fail on this try build. (Fails on bug 1916125 as well). Could you take a look before we uplift this?
my try build with both patches

Flags: needinfo?(mh+mozilla)

That should have been fixed by bug 1916292.

Attachment #9422403 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: needinfo?(mh+mozilla)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: