Don't run source-test-python-mozbuild-* (mbu) on all platforms
Categories
(Firefox Build System :: General, enhancement, P5)
Tracking
(Not tracked)
People
(Reporter: padenot, Unassigned)
References
(Blocks 1 open bug)
Details
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4f9cb061516043490fb87e68661f6bfa463dce31 shows that it's being run quite a lot, and takes a lot of time, compared to the overall CPU time of this push.
Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 1•5 years ago
•
|
||
As worded -- "Don't run source-test-python-mozbuild-* (mbu) on all platforms" -- the bug is WONTFIX, imo. The mozbuild suite tests fundamental functionality that in my (admittedly anecdotal) experience 1) if it is broken, very heavily suggests other stuff is probably also broken and 2) quite frequently breaks on only one platform without affecting another. Tests meeting these two criteria are exactly the tests that SHOULD be run on all platforms for a large set of pushes, if not all of them.
These tests weren't run on your push arbitrarily; they were run because your push touches modules/libpref/init
, which triggers them -- see https://searchfox.org/mozilla-central/rev/eb9d5c97927aea75f0c8e38bbc5b5d288099e687/taskcluster/ci/source-test/python.yml#372. (Full disclosure, I added this code. :) )
If we relax the actual feature request named in this bug, the mozbuild
suite is indeed pretty monolithic and could maybe stand to be split up a bit, but I'm not confident on how to do that immediately without impacting either the correctness of the tests or having an unexpected negative effect on cost.
Comment 2•5 years ago
|
||
Maybe it goes without saying, but the mozbuild
suite takes so long to run because of a few long-tail tests. The large majority of these are unit tests that don't take much time at all to run. Addressing those specifically (either by optimizing or removing the tests or otherwise by splitting them off into another suite) is a legitimate approach as well.
Comment 3•5 years ago
|
||
What about running them on Linux normally, and more rarely only on Windows and Mac?
Comment 4•5 years ago
|
||
I would point again to something I've already said:
The mozbuild suite tests fundamental functionality that in my (admittedly anecdotal) experience... quite frequently breaks on only one platform without affecting another
If that's the case, then running the tests on all platforms is important and something that we should continue doing.
Again, this is anecdotal evidence and I don't have numbers to back this up... but then again, nobody else in this thread has brought up anything but anecdotal evidence either, so I don't know why we should take on a "cost-saving" measure on the back of someone else's anecdotal evidence over mine.
Comment 5•5 years ago
|
||
(In reply to Ricky Stewart from comment #4)
I would point again to something I've already said:
The mozbuild suite tests fundamental functionality that in my (admittedly anecdotal) experience... quite frequently breaks on only one platform without affecting another
If that's the case, then running the tests on all platforms is important and something that we should continue doing.
Again, this is anecdotal evidence and I don't have numbers to back this up... but then again, nobody else in this thread has brought up anything but anecdotal evidence either, so I don't know why we should take on a "cost-saving" measure on the back of someone else's anecdotal evidence over mine.
Just to be clear, I'm not suggesting stopping to run the tests on all platforms. I'm suggesting to run them more rarely, as we did in many other cases.
In those other cases, even though we knew we were introducing a risk of noticing a regression a bit later than landing time, we decided to accept the risk in exchange for cost savings.
In this case, the risk of running less frequently is likely lower than in some of those other cases where we decided the risk was worth it.
Anyway, I'll check what the actual risk is by counting how often on autoland these tests fail on Windows or Mac and not in Linux. We usually do that before taking a decision.
Updated•5 years ago
|
Comment 6•5 years ago
|
||
According to the data I have from autoland, these jobs fail on Windows/Mac but not on Linux really infrequently (twice out of ~900 runs).
Comment 7•5 years ago
|
||
Yeah, I mean, I was talking about try
(not autoland
), like the OP was. But that's useful data to have as well. When we make these decisions do we always only look at autoland
?
Comment 8•5 years ago
|
||
(In reply to Ricky Stewart from comment #7)
Yeah, I mean, I was talking about
try
(notautoland
), like the OP was. But that's useful data to have as well. When we make these decisions do we always only look atautoland
?
It is not always representative, but so far yes.
Another reasonable approach is bug 1638395 (no manual decisions, rely on the same algorithm we rely on for test
tasks).
Description
•