Open Bug 1664462 Opened 5 years ago Updated 5 years ago

Don't run source-test-python-mozbuild-* (mbu) on all platforms

Categories

(Firefox Build System :: General, enhancement, P5)

enhancement

Tracking

(Not tracked)

People

(Reporter: padenot, Unassigned)

References

(Blocks 1 open bug)

Details

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4f9cb061516043490fb87e68661f6bfa463dce31 shows that it's being run quite a lot, and takes a lot of time, compared to the overall CPU time of this push.

Product: Firefox → Firefox Build System

As worded -- "Don't run source-test-python-mozbuild-* (mbu) on all platforms" -- the bug is WONTFIX, imo. The mozbuild suite tests fundamental functionality that in my (admittedly anecdotal) experience 1) if it is broken, very heavily suggests other stuff is probably also broken and 2) quite frequently breaks on only one platform without affecting another. Tests meeting these two criteria are exactly the tests that SHOULD be run on all platforms for a large set of pushes, if not all of them.

These tests weren't run on your push arbitrarily; they were run because your push touches modules/libpref/init, which triggers them -- see https://searchfox.org/mozilla-central/rev/eb9d5c97927aea75f0c8e38bbc5b5d288099e687/taskcluster/ci/source-test/python.yml#372. (Full disclosure, I added this code. :) )

If we relax the actual feature request named in this bug, the mozbuild suite is indeed pretty monolithic and could maybe stand to be split up a bit, but I'm not confident on how to do that immediately without impacting either the correctness of the tests or having an unexpected negative effect on cost.

Maybe it goes without saying, but the mozbuild suite takes so long to run because of a few long-tail tests. The large majority of these are unit tests that don't take much time at all to run. Addressing those specifically (either by optimizing or removing the tests or otherwise by splitting them off into another suite) is a legitimate approach as well.

What about running them on Linux normally, and more rarely only on Windows and Mac?

I would point again to something I've already said:

The mozbuild suite tests fundamental functionality that in my (admittedly anecdotal) experience... quite frequently breaks on only one platform without affecting another

If that's the case, then running the tests on all platforms is important and something that we should continue doing.

Again, this is anecdotal evidence and I don't have numbers to back this up... but then again, nobody else in this thread has brought up anything but anecdotal evidence either, so I don't know why we should take on a "cost-saving" measure on the back of someone else's anecdotal evidence over mine.

(In reply to Ricky Stewart from comment #4)

I would point again to something I've already said:

The mozbuild suite tests fundamental functionality that in my (admittedly anecdotal) experience... quite frequently breaks on only one platform without affecting another

If that's the case, then running the tests on all platforms is important and something that we should continue doing.

Again, this is anecdotal evidence and I don't have numbers to back this up... but then again, nobody else in this thread has brought up anything but anecdotal evidence either, so I don't know why we should take on a "cost-saving" measure on the back of someone else's anecdotal evidence over mine.

Just to be clear, I'm not suggesting stopping to run the tests on all platforms. I'm suggesting to run them more rarely, as we did in many other cases.
In those other cases, even though we knew we were introducing a risk of noticing a regression a bit later than landing time, we decided to accept the risk in exchange for cost savings.
In this case, the risk of running less frequently is likely lower than in some of those other cases where we decided the risk was worth it.

Anyway, I'll check what the actual risk is by counting how often on autoland these tests fail on Windows or Mac and not in Linux. We usually do that before taking a decision.

Severity: -- → S3
Priority: -- → P5

According to the data I have from autoland, these jobs fail on Windows/Mac but not on Linux really infrequently (twice out of ~900 runs).

Yeah, I mean, I was talking about try (not autoland), like the OP was. But that's useful data to have as well. When we make these decisions do we always only look at autoland?

(In reply to Ricky Stewart from comment #7)

Yeah, I mean, I was talking about try (not autoland), like the OP was. But that's useful data to have as well. When we make these decisions do we always only look at autoland?

It is not always representative, but so far yes.

Another reasonable approach is bug 1638395 (no manual decisions, rely on the same algorithm we rely on for test tasks).

You need to log in before you can comment on or make changes to this bug.