TL;DR: If release promotions is in-tree, what do you all think of having every update tests for one locale in one single job? Aside from bug 1389466, bug 1388431 was tricky to find because pattern matching has to be done meticulously. More precisely, Looking at half a dozen of log files wasn't enough to determine a pattern. I had to look at almost every single of them. The reason was: tests for en-US are sunk in several jobs. I made a bad assumption thinking than more than en-US was impacted. I think we can improve the readability of the tests results by having 1 update verification job per locale. Here below is how issues would be raised depending of the type of failure 1. ONLY ONE LOCALE IS GENUINELY BROKEN That's what happened in bug 1388431. In this case, we would have only the en-US job failing. Looking at one log file tells us that update paths from 55.0b5 to 55.0rc2 are failing 2. ONE NEW LOCALE WAS ADDED DURING THE CYCLE Sometimes encountered, for instance in bug 1347100. In this case, only the locale impacted would fail. Looking at one log file tells us that partials are missing. 3. ONE PLATFORM FAILS I don't remember such a scenario, but let's imagine that signatures are broken for Linux only. In this case, every locale fails. Looking at a couple of log files, including en-US shows us that Linux is failing. 4. ONE FILE HAS BEEN BADLY UPLOADED ON CDNs Occurs from times to times, like in bug 1343173. In this case, one locale fails. Looking at one log files shows us what upgrade paths failed. It should take us long to figure out that one file is the common denominator. IMPLEMENTATION Because of Buildbot, I don't think we can easily implement this in releasetasks. Nonetheless, once funsize is in-tree, we should have a way to track where the (partial and complete) mars are. Then we should be able to have such a split once the whole release promotion is also in-tree. What do you all think? Do you see scenarios where such a split is uncomfortable to work with?  https://github.com/mozilla/releasetasks
I think this is a good idea. However, generally we're blocked on bug 1385996 for doing this. But technically speaking we're actually blocked on bug 1259627 because the current TC Scheduler allows only 1000 tasks IIRC. If we add update verification per locale (100?ish) we might go beyond that limit as the current graph already has 900+ tasks. However, I might actually get a shot at bug 1259627 in the next two weeks so that should unblock things at least from this perspective.
Splitting by locale would increase the cache hit rate when downloading the "to" version installers as well. We used to hit some concurrency issues with the update server or FTP in the past when increasing ||ization too high, but I think both Balrog and the CDNs would easily hold up to any update verify ||ization that we want to do - it's something to watch out for though.
I like this too. We sometimes get errors from Balrog and end up retrying a whole job, so it'll be interesting to see how it copes with more parallelization. The wall-clock win will be pretty nice if it works out. We might end up waiting on hardware for mac and not see the full advantage there.
Thanks for the warning Ben! While looking at a fix, I came across bug 1392262. This may be good enough to easily spot what locale failed.
Component: Release Automation: Other → Release Automation: Updates
You need to log in before you can comment on or make changes to this bug.