Closed Bug 1206201 Opened 9 years ago Closed 8 years ago

Eliminate the buildbot-based spidermonkey jobs in favor of taskcluster

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(firefox49 fixed)

RESOLVED FIXED
Tracking Status
firefox49 --- fixed

People

(Reporter: sfink, Assigned: sfink)

References

Details

Attachments

(4 files)

I'm not sure what the state of the taskcluster builds is, since I'm not seeing them right now when I look at TH for mozilla-inbound. But I know they work, and to avoid confusion it'd be nice to switch over to the TC builds and turn off the buildbot ones.

Also, I don't remember if these builds are conditional on touching js/src anymore? They probably ought to be.

It would be nice if they would get automatically triggered on try when pushing with a standard trychooser line -- currently, that'd mean getting the linux64 ones when pushing with -p linux64, or the windows ones when pushing with -p win32. If we have to learn -p something_else, that's probably survivable.
(In reply to Steve Fink [:sfink, :s:] from comment #0)

> I'm not sure what the state of the taskcluster builds is, since I'm not
> seeing them right now when I look at TH for mozilla-inbound. But I know they
> work, and to avoid confusion it'd be nice to switch over to the TC builds
> and turn off the buildbot ones.

Unless someone else has come along and done this, I don't think this is the case. Currently they just run on try.
> 
> Also, I don't remember if these builds are conditional on touching js/src
> anymore? They probably ought to be.

They're not, you need to add a special sm-* flag to get them to run.
> 
> It would be nice if they would get automatically triggered on try when
> pushing with a standard trychooser line -- currently, that'd mean getting
> the linux64 ones when pushing with -p linux64, or the windows ones when
> pushing with -p win32. If we have to learn -p something_else, that's
> probably survivable.

See Bug 1179050
But looking at your comment in Bug 1164656#c35 it seems some additional logic is needed here though.
Attached patch UnenverbosifySplinter Review
When looking into implementing some logic for spidermonkey builds, I noticed that normalize_platform_list was rather verbose for what it's doing.

But actually, this review isn't for that trivial refactoring. normalize_platform_list() isn't very useful. If you actually *did* alias a build:

  linux64: /linux64.*/

or something, then this would end up returning ['/linux64.*/'], which would be treated as an unknown platform and ignored. The only way to get this to do anything sane would be:

  lx: linux64

and then -p lx would indeed run the linux64 stuff. But is that the intent?

I ran into this because I was contemplating doing

  linux64: /linux64|sm-.*/

so that 'linux64' would be aliased to all of the sm-* builds in addition to the linux64 build. But then I realized there's nothing that handles the alias patterns for builds, and the existing code that processes alias patterns will only work for tests.

Anyway, I guess what I'm asking is: is there any point in having this function at all? Or would the proper patch be something like

-     platforms = normalize_platform_list(aliases, jobs['flags']['builds'], args.platforms)
+     platforms = jobs['flags']['builds'] if args.platform == 'all' else args.platform

(pestering you about this because you reviewed bug 1143264.
Attachment #8741989 - Flags: review?(garndt)
Assignee: nobody → sphink
Status: NEW → ASSIGNED
The sm-* build jobs should only fire if js/src or js/public is touched. There's some unfortunate duplication here, but it seems better than complicating the data structures.

Note that builds previously did not allow 'when' clauses. Now they do. (More on why sm-* stuff are builds in the next patch.)
Attachment #8741990 - Flags: review?(dustin)
Alright, here's where I ended up. I left sm-* jobs as build jobs rather than moving them to generic (-j) jobs, because they make use of the debug vs opt build type stuff. And on reflection, any build-specific stuff is pretty likely to apply to these jobs, since they *are* builds. They may even sprout child test tasks before too long.

I considered a few possibilities. I could have made linux64 be an alias for /linux64|sm-.*/, but it would require some funky code changes (see the comment on the earlier patch in this bug). Also, it's pretty weird to be aliasing a build to itself + others.

If I want -p linux64 to trigger the sm-* stuff, then I could either have the linux64 platform specify the children (as is done in this patch), or I could have the children specify what they should be triggered by. I could go either way on that. This way looked pretty nice to me when I wrote it out in the yaml. And I couldn't come up with a decent name for how a child would specify the parent: "trigger-for-platform"? "triggered-by"? It would also require scanning through all builds to find anything that might be pointing into our initial set of platforms, which is a little funky and also opens up the question of whether it should be recursive or not.

But I'd be fine with either direction. I did notice that this is the opposite of tests, which point to their parents via allowed_build_tasks (though that's done through filenames of task yml files, which is interesting.)

The one nasty ugly awful part of this patch is that it ends up with something like

  linux64:
    platforms:
      - 'Linux64'
    extra-platforms:
      - sm-plain

In other words, it points out the nasty overloading of 'platform' that we're using. Personally, I think the right fix is to make that

  linux64:
    test-platforms:
      - 'Linux64'
    extra-platforms:
      - sm-plain

and say there are "build platforms" (sometimes referred to as "platforms" for short) and "test platforms". Build platforms are things like linux64, win32, etc. Test platforms are "Linux64", "OSX 10.6", "OSX 10.7", etc. The current state is very confusing, because the code uses variables named platforms that are different from the things listed in the 'platforms' build struct field.

But maybe that's just me. (I'm also partly to blame for not kiling off this confusion in the buildbot days, when it was also annoying me.)
Attachment #8741993 - Flags: review?(dustin)
Attachment #8741990 - Attachment is obsolete: true
Attachment #8741990 - Flags: review?(dustin)
Doh! I wrote the comment above for the wrong bug. Sorry, that previous one was for a trivial refactoring patch. This is the patch I was talking about.
Attachment #8741994 - Flags: review?(garndt)
Attachment #8741993 - Attachment is obsolete: true
Attachment #8741993 - Flags: review?(dustin)
Comment on attachment 8741990 [details] [diff] [review]
Restrict sm-* builds to things that touch js/src

Dammit, I really need to fix bzexport's behavior when you use |hg split|.
Attachment #8741990 - Flags: review?(dustin)
Attachment #8741993 - Attachment is obsolete: false
Attachment #8741993 - Flags: review?(dustin)
Attachment #8741990 - Attachment is obsolete: false
Attachment #8741989 - Flags: review?(garndt) → review+
Attachment #8741994 - Flags: review?(garndt) → review+
Attachment #8741990 - Flags: review?(dustin) → review+
Comment on attachment 8741993 [details] [diff] [review]
Trigger sm-* builds with -p linux64

> and say there are "build platforms" (sometimes referred to as "platforms"
> for short) and "test platforms". Build platforms are things like linux64,
> win32, etc. Test platforms are "Linux64", "OSX 10.6", "OSX 10.7", etc. The
> current state is very confusing, because the code uses variables named
> platforms that are different from the things listed in the 'platforms' build
> struct field.

I don't understand the distinction you're drawing here.  One is lower-case and the other is mixed-case with spaces?  Or are you getting to the idea that a single build might be tested on several (versions of) operating systems, e.g., the OS X build is tested on 10.6, 10.7 and 10.10 (or whatever)?

At any rate, I don't quite see how that relates to spidermonkey, and I don't really understand what "extra-platforms" means -- why couldn't I just merge that with "platforms"?  Why are the things listed there not platforms (even if spidermonkey is a platform, is sm-compacting a platform?)?

Would it make more sense to think of SM as build types?  So on Linux we have an opt build, a debug build, and an SM build, the latter invoking tests within the build process and thus not having separate test tasks?

I'm not horribly opposed to rubber-stamping this patch, simply because it will be easy enough to replicate in the backward-compatible part of bug 1258497, and it's pretty clear this should become its own "kind" soon after.

So I'd like to learn a bit more based on the questions above before marking r- or r+
(In reply to Dustin J. Mitchell [:dustin] from comment #7)
> Comment on attachment 8741993 [details] [diff] [review]
> Trigger sm-* builds with -p linux64
> 
> > and say there are "build platforms" (sometimes referred to as "platforms"
> > for short) and "test platforms". Build platforms are things like linux64,
> > win32, etc. Test platforms are "Linux64", "OSX 10.6", "OSX 10.7", etc. The
> > current state is very confusing, because the code uses variables named
> > platforms that are different from the things listed in the 'platforms' build
> > struct field.
> 
> I don't understand the distinction you're drawing here.  One is lower-case
> and the other is mixed-case with spaces?  Or are you getting to the idea
> that a single build might be tested on several (versions of) operating
> systems, e.g., the OS X build is tested on 10.6, 10.7 and 10.10 (or
> whatever)?

That is indeed the origin of the distinction. But you made me look more closely at the code, and... well, it's weirdness all the way down.

We tie together a build job to its test jobs through the build job's 'platforms' field. So take for example this build job definition from base_jobs.yml:

  linux64:
    platforms:
      - Linux64
    types:
      opt:
        task: tasks/builds/opt_linux64.yml
      debug:
        task: tasks/builds/dbg_linux64.yml

There are two things here, 'linux64' and 'Linux64'. In the code, 'linux64' is usually held in a variable named 'platform' (or 'platforms'), and is what is given with the -p/--platform try syntax. 'Linux64' is passed into extract_tests_from_platform as 'build_platform'. If you do 'try: -p all -t all[Linux64]', then you would build on all platforms, then the platforms with the uppercase 'Linux64' thing in their lists would trigger dependent test jobs.

Unless I'm very mistaken, I don't think that's what the buildbot try parser would do. It would associate what I'm calling a "test platform" with each test. Examples of these are the strings

  'Rev3 MacOSX Snow Leopard 10.6.2'
  'Rev3 MacOSX Leopard 10.5.8'
  'Rev3 Fedora 12x64'

When processing eg '-t all[x64]', it would match the substring "x64" against all of these test platforms. (The actual code referred to them as "pretty names", and build platforms had their own pretty names. Well, builds had a single pretty name, and tests had a list of pretty names.) It just so happens that 64-bit Linux always included the substring "x64", and neither 64-bit OSX nor Windows ever did.

Thus arises my confusion. In the buildbot world, the test restrictions applied to the tests' pretty names (which I prefer to call test platforms). In taskcluster, it seems those substrings are matched against *builds'* "pretty name" equivalent, namely the list of 'platforms' in the above YAML snippet.

I'm a little skeptical of my reading, though, since it seems to me that taskcluster would not understand '-t all[x64]' at all, since it expects the restrictions to only describe full strings not substrings. But maybe that is correct -- given that we only have linux64 taskcluster jobs, I suspect it looks up the "x64" string and finds it is not a known name, and throws it out. You get all Linux64 stuff instead. And yet, it doesn't matter, because the buildbot side still strips out the Windows and OSX builds, so everything appears to be working. Does that sound plausible?

> At any rate, I don't quite see how that relates to spidermonkey, and I don't
> really understand what "extra-platforms" means -- why couldn't I just merge
> that with "platforms"?  Why are the things listed there not platforms (even
> if spidermonkey is a platform, is sm-compacting a platform?)?

Ok, all of the above is just unfortunate cruft that shouldn't be particularly relevant in the new world. But here it starts to matter.

My only concern was that if you see:

  linux64:
    platforms:
      - Linux64
    extra-platforms:
      - sm-compacting

you will wonder "uh, what's the difference between 'platforms' and 'extra-platforms'?" Which, indeed, is exactly what you did.

So partly my confusion is because I only knew buildbot's treatment of try syntax, and I assumed the TC code faithfully matched its insanity. In that case, "Linux64" would purely be a string for matching tests against, and so the distinction between "linux64" and "Linux64" is real despite them being both called 'platform'. I wanted them called different things, and since I (incorrectly) thought "Linux64" was only for purposes of matching tests against, I thought it should be called "test-platform" (as I've always thought buildbot's test "pretty names" should be called).

So never mind that. Let me just try to get us to a point of shared understanding.

sm-compacting and sm-plain indeed are platforms currently. They have opt and/or debug 'types', just like the existing platforms, and each type has its own script. "try: -b d -p sm-plain" uses tasks/builds/sm_plaindebug.yml and "try: -b o -p sm-plain" uses tasks/builds/sm_plain.yml.

They can't really be tests, since they're not downstream of any builds. I originally intended to make them generic jobs, but given the debug vs opt distinction, that seemed weird. But maybe it's a lesser evil?

Anyway, if you accept that they are builds, then I need some way to make '-p linux64' trigger all of the sm-* platforms. Which is sort of a dependency between platforms (linux64 implies sm-compacting), but it's not a dependency as in one triggers the other, it's more like an alias. That is what is being expressed with 'extra-platforms': "if you request linux64, then automatically add in sm-compacting etc." It could be called 'extra-builds', if that's better? But adding them into the existing 'platforms' list is pretty funky, since that's not what that currently means. For one, that list is only consulted for "-t all[...]" pushes, and these aren't even tests.

> Would it make more sense to think of SM as build types?  So on Linux we have
> an opt build, a debug build, and an SM build, the latter invoking tests
> within the build process and thus not having separate test tasks?

It could, though then it would need to be opt/debug/SM-opt/SM-debug. And then would you choose them with the -b try syntax? That would require even more retraining, though it would at least be more conceptually clean/consistent.

> I'm not horribly opposed to rubber-stamping this patch, simply because it
> will be easy enough to replicate in the backward-compatible part of bug
> 1258497, and it's pretty clear this should become its own "kind" soon after.

I don't know what kinds are. And reading that bug, I see that YAML files are going away, so... uh... dang, I have no clue where that leaves us.

Ok, what I'd like:

 - the sm-* jobs are restricted by what files are touched
 - -b d -p linux64 runs the debug versions of the sm-* jobs
 - -b o -p linux64 runs the opt versions of the sm-* jobs
 - there is a syntax to request specific sm-* jobs, eg -p sm-compacting or -j sm-compacting

What's the least painful way of getting there?
Oops, meant to ni? dustin for that last.
Flags: needinfo?(dustin)
OK, I think this is "least painful" in the sense of least likely to bitrot the work I'm doing.  So, land it as-is.  I may come back later, when the taskgraph stuff is in place, and talk to you about re-implementing it :)

Thanks for the background, too - that really helps guide the rest of what I'm working on.
Flags: needinfo?(dustin)
Attachment #8741993 - Flags: review?(dustin) → review+
Depends on: 1267948
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: