Open Bug 1649416 Opened 4 years ago Updated 2 years ago

mach try auto should be able to schedule base toolchain builds

Categories

(Developer Infrastructure :: Try, defect, P3)

Tracking

(Not tracked)

REOPENED

People

(Reporter: sg, Unassigned)

References

(Blocks 1 open bug)

Details

Whenever a C++ source file is changed, mach try auto must schedule gcc/clang base toolchain builds. As discussed earlier, maybe it's enough to schedule the debug variants of these builds. Oftentimes, the gcc toolchain build seems to be the more relevant one, but that might differ depending on the kind of change.

I had two backouts (and almost a third) one in the last day that were caused by me relying on the results of mach try auto when landing patches (https://bugzilla.mozilla.org/show_bug.cgi?id=1648010#c9, https://bugzilla.mozilla.org/show_bug.cgi?id=1644379#c11)

See Also: → 1636902

I'm somewhat surprised that the behaviour of toolchain builds would differ between autoland and mach try auto. I'll admit to knowing little around how they work but will add this to our "Next up" queue and investigate.

You mentioned earlier discussion, any chance there's a link?

(In reply to Andrew Halberstadt [:ahal] from comment #1)

I'm somewhat surprised that the behaviour of toolchain builds would differ between autoland and mach try auto. I'll admit to knowing little around how they work but will add this to our "Next up" queue and investigate.

It's not "toolchain builds" in the sense of "build gcc/clang/etc." but the "base toolchain builds", i.e. build Firefox with the minimum version of the toolchains that we support.

Oh I think I see what you mean, so the linked bug disabled them on autoland except for backstop pushes (which would also disable them for |mach try auto|).

If I understand correctly, this bug is essentially asking for bug 1636902 to be backed out. Or at least do something other than only backstop pushes. Maybe rather than relegating them to the backstop, we could add a SCHEDULES rule for them. Though every CPP file means they would run quite frequently.

In the description of that bug, Marco claims that base toolchain builds only uniquely fail ~0.2 percent of the time (in other words, when they fail 99.8% of the time, another build fails as well). And here you said that you've been hit by this 3 times already.

So either there was a mistake in the data, or the stuff you tend to modify makes it way more likely for you to be bit by those 0.2% of cases. Let's see what Marco has to say, maybe we can come up with a better compromise.

Flags: needinfo?(mcastelluccio)

(In reply to Andrew Halberstadt [:ahal] from comment #1)

I'm somewhat surprised that the behaviour of toolchain builds would differ between autoland and mach try auto. I'll admit to knowing little around how they work but will add this to our "Next up" queue and investigate.

You mentioned earlier discussion, any chance there's a link?

I am not sure, but this might have been only in a private mail to Marco. I can look it up if necessary, if that's helpful for the discussion here. But maybe my reference to that discussion was just misleading.

As a first step, I think we should try adding the builds to the list of tasks the selector can choose. I'll do that and then we can re-evaluate.

Flags: needinfo?(mcastelluccio)

(In reply to Marco Castelluccio [:marco] from comment #5)

As a first step, I think we should try adding the builds to the list of tasks the selector can choose. I'll do that and then we can re-evaluate.

Are you doing that on this bug, or is there another one for reference?

Flags: needinfo?(mcastelluccio)

(In reply to Simon Giesecke [:sg] [he/him] from comment #6)

(In reply to Marco Castelluccio [:marco] from comment #5)

As a first step, I think we should try adding the builds to the list of tasks the selector can choose. I'll do that and then we can re-evaluate.

Are you doing that on this bug, or is there another one for reference?

Yes, I'm going to track it here.

Assignee: nobody → mcastelluccio
Status: NEW → ASSIGNED
Flags: needinfo?(mcastelluccio)
Priority: -- → P1
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

Unfortunately, the situation has not really improved (yet). Not sure if the bugbug release mentioned in #8 has been done in the meantime?

Specifically, I recently had a case where neither mach try auto nor autoland ran the base toolchain builds, and the patch was backed out only after it has reached central. See here: https://bugzilla.mozilla.org/show_bug.cgi?id=1659674#c5

Similarly, on the mach try auto run https://treeherder.mozilla.org/#/jobs?repo=try&revision=c402672e23aafc9278277040572ca9f239139ae6, no base toolchain build ran, eventually leading to a backout as well: https://bugzilla.mozilla.org/show_bug.cgi?id=1658324#c3

Status: RESOLVED → REOPENED
Flags: needinfo?(mcastelluccio)
Resolution: FIXED → ---
See Also: → 1659674

bugbug did select the base toolchain builds here, but they didn't run anyway.
Probably a bug in the m-c optimization code.

Flags: needinfo?(mcastelluccio)
Summary: mach try auto must schedule base toolchain builds → mach try auto should be able to schedule base toolchain builds

While I have seen the builds being run in some cases now, it still doesn't happen consistently. This is a recent try push that failed to run any gcc base toolchain build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e0f2c9451818f687134b7325840e5257bb677b8f

Blocks: 1668753

This try push yesterday failed again to schedule any base toolchain build: https://treeherder.mozilla.org/jobs?repo=try&revision=7f320dfa91c291e587762daec04067b78e8b4feb&selectedTaskRun=VoaF8HaGSl68k4bAdJpNTQ.0 and a backout resulted.

See Also: → 1678130

This try push didn't schedule a base toolchain build: https://treeherder.mozilla.org/jobs?repo=try&revision=589d03efaa83337c8f0e6352f7270999414ba419, resulting in a backout.

See Also: → 1730939
Severity: -- → S3
Priority: P1 → P3
Assignee: mcastelluccio → nobody
Product: Firefox Build System → Developer Infrastructure
You need to log in before you can comment on or make changes to this bug.