Keep running MSVC builds in CI even if we switch to clang-cl

RESOLVED FIXED in Firefox 63

Status

enhancement
RESOLVED FIXED
11 months ago
4 months ago

People

(Reporter: dmajor, Assigned: dmajor)

Tracking

({in-triage})

3 Branch
mozilla63
Dependency tree / graph

Firefox Tracking Flags

(firefox63 fixed)

Details

Attachments

(1 attachment)

(Assignee)

Description

11 months ago
We'll need to keep MSVC builds green at least until we're sure completely sure that clang-cl sticks (and potentially longer as a grace period for developers, maybe?).

Should we add a new automation config that's pegged to MSVC? Or since we already have too many configurations, should we re-use an existing one? Should the build run full tests?

Flagging a few people for opinions...
Flags: needinfo?(ted)
Flags: needinfo?(nfroyd)
Flags: needinfo?(gps)
(In reply to David Major [:dmajor] from comment #0)
> We'll need to keep MSVC builds green at least until we're sure completely
> sure that clang-cl sticks (and potentially longer as a grace period for
> developers, maybe?).

We should probably keep an MSVC build running in CI until we decide to drop support for MSVC. We haven't always been good about this sort of thing, and as someone pointed out recently we don't have a CI build that actually builds on macOS anymore, for example, but since we already have the builds working it's probably low-effort to make this happen.

> Should we add a new automation config that's pegged to MSVC? Or since we
> already have too many configurations, should we re-use an existing one?

I think a new configuration (that's a copy of one of our existing configurations) would be fine. We do have a ridiculous number of build types, but that doesn't seem to be changing anytime soon.

> Should the build run full tests?

If we're no longer going to be shipping builds to users with MSVC then I don't see any need to run tests on the build. There's definitely going to be a tricky period after we switch where we're not sure if it will stick and not having test coverage for MSVC builds will make backing out the change risky, but I don't think we have the capacity to run a full set of tests on that many build variants, so we'll just have to live with it.
Flags: needinfo?(ted)
We should run MSVC builds once we switch, yes.

We ought to be able to reuse our mozconfigs and whatnot; the taskcluster descriptions should be a straightforward copy-paste-rename job.

I think for a release cycle, maybe two, and also depending how close we are to ESR, we should run some tests on MSVC, maybe just to assure ourselves that we're not going to break things if we have to have an emergency backout.  Full set of tests?  Only every four hours or so?  Not sure.
Flags: needinfo?(nfroyd)

Comment 3

11 months ago
We should definitely still run the MSVC builds.

Tests is harder. It would be nice to have some coverage so if the worst case happens and we need to fall back to MSVC we haven't accumulated tons of technical debt that could jeopardize our shipping schedule.

This is all very similar to how Stylo was landed. And the concerns are nearly identical: it is a cost versus convenience trade-off. We could potentially throw money to run extra EC2 instances for test coverage. But we run some tests on physical machines and we don't have infinite resources there. So I think fully redundant test coverage is out of the question due to capacity concerns.

My initial opinion is to try to maintain some test coverage for MSVC, but only periodically (every few hours or so) and at Tier-2 (or maybe Tier-3). I'm pretty sure we can't run it at Tier-1 and not run it every build, as that would be against our sheriffing policy (it would put undue burden on sheriffs to untangle failures). Periodic non-tier-1 tests will keep our costs in check while still informing us of how many tests are failing.

I've CCd jmaher in case he wants to weigh in.
Flags: needinfo?(gps)
do we need 32 bit and 64 bit builds?  maybe we do this on one platform.  Do we need opt/debug/pgo?  maybe we do it on one config.  

I would think we run win10 (64 bit) pgo everywhere.  This is expensive, but it reduces the matrix quite a bit.  With SETA this turns into a lower volume on integration branches.

Another option is to cherry pick tests which we believe will be problematic- say jittest, web-platform-tests;  The rest of the tests can be run on every m-c push, and the other tests can be run on win10-pgo only on integration.

how long would we do this for?
(Assignee)

Comment 5

11 months ago
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #4)
> how long would we do this for?

I have a feeling that clang-cl is going to bounce several times before sticking for good. In the periods that we go back to MSVC, we could of course turn off the extra MSVC builders. But the total time from "the first time this extra build ever runs" to "the last time this extra build ever runs" might be a bit lengthy.
(Assignee)

Comment 6

11 months ago
Can anybody point me to where the "these configurations only run every four hours" code lives?
there is no real magic like that.  We use SETA built into treeherder and taskcluster that will skip tasks it determines as "low value" and only run them every 5th push or 90 minutes.

maybe others are thinking of something else.

Updated

11 months ago
Keywords: in-triage
(Assignee)

Comment 9

11 months ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #8)
> We do have some cron tasks for nightly builds, it's also used for the
> mochitest-valgrind tests:
> https://dxr.mozilla.org/mozilla-central/source/.cron.yml
> https://firefox-source-docs.mozilla.org/taskcluster/taskcluster/cron.html

Most of those are just builds (without tests) so I'm having trouble finding a model to work from.

mochitest-valgrind looks promising, but I'm puzzled by this build: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=b75acf9652937ce79a9bf02de843c100db0e5ec7&group_state=expanded&filter-tier=1&filter-tier=2&filter-tier=3&filter-searchStr=mochi&selectedJob=179670809 It has a 'Vg' cron job but I can't find its tests. Is this a good configuration to be copying from?
(Assignee)

Comment 10

11 months ago
What if we ran a tier2 MSVC build with full tests on every m-c push (and no others)? I'm seeing something in the ballpark of 4-6 pushes on most days.
(In reply to David Major [:dmajor] from comment #10)
> What if we ran a tier2 MSVC build with full tests on every m-c push (and no
> others)? I'm seeing something in the ballpark of 4-6 pushes on most days.

That's probably fine in terms of just making sure reverting isn't a nightmare. If it's not too hard, maybe doing the build for every push and only running the tests on central would probably be even better. If that's a pain then just running everything on central is not so bad.
(Assignee)

Comment 12

11 months ago
Posted patch msvcSplinter Review
Assignee: nobody → dmajor
Attachment #8984236 - Flags: review?(ted)
(Assignee)

Comment 13

11 months ago
> Created attachment 8984236 [details] [diff] [review]

I submitted this from the wrong tab and lost my description:

This patch adds tier-2 MSVC builds of {win32,win64}x{debug,opt} on mozilla-central only, with the same set of tests that the "regular" versions of those platforms currently have.
Comment on attachment 8984236 [details] [diff] [review]
msvc

Review of attachment 8984236 [details] [diff] [review]:
-----------------------------------------------------------------

::: taskcluster/taskgraph/loader/build_signing.py
@@ +9,5 @@
>  # XXX: This logic should rely in kind.yml. This hasn't been done in the original
>  # patch because it required some heavy changes in single_dep.
>  NON_NIGHTLY_LABELS_WHICH_SHOULD_SIGN_BUILDS = (
>      'build-win32/debug', 'build-win32/opt', 'build-win32/pgo',
> +    'build-win32-msvc/debug', 'build-win32-msvc/opt',

I doubt you need these bits. Does anything break if you leave them out?

::: taskcluster/taskgraph/transforms/signing.py
@@ +186,5 @@
>  
>  
> +def _generate_treeherder_symbol(is_nightly, build_platform):
> +    symbol = 'Ns' if is_nightly else 'Bs'
> +    if '-msvc' in build_platform:

This is janky but hopefully we'll be able to remove it after we ship with clang-cl.
Attachment #8984236 - Flags: review?(ted) → review+
(Assignee)

Comment 15

10 months ago
> >  NON_NIGHTLY_LABELS_WHICH_SHOULD_SIGN_BUILDS = (
> >      'build-win32/debug', 'build-win32/opt', 'build-win32/pgo',
> > +    'build-win32-msvc/debug', 'build-win32-msvc/opt',
> 
> I doubt you need these bits. Does anything break if you leave them out?

xpcshell, since it `requires-signed-builds`.

Comment 16

10 months ago
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/827519f9d032
Add forced-MSVC builds to CI to keep that platform green. r=ted

Comment 17

10 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/827519f9d032
Status: NEW → RESOLVED
Last Resolved: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Should we have a PGO variant of this job running as well, given that those are the builds we actually ship? And historically, PGO green is one of the factors used to decide whether a commit is mergeable in CI or not.
Flags: needinfo?(dmajor)
(Assignee)

Comment 19

10 months ago
I think adding more configurations would be overkill, but it seems reasonable to replace the opt builds with pgo ones. Want to write that patch?
Flags: needinfo?(dmajor)
Blocks: 1474756
Keywords: in-triage
Version: Version 3 → 3 Branch
You need to log in before you can comment on or make changes to this bug.