Closed Bug 1284602 Opened 8 years ago Closed 8 years ago

Linux PGO TC builds should publish to 'linux64-pgo'

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla52

People

(Reporter: mshal, Assigned: mshal)

References

Details

Attachments

(1 file)

whimboo noticed that TC Linux PGO builds are using a different gecko.v2 index than the buildbot version (https://bugzilla.mozilla.org/show_bug.cgi?id=1276352#c18). We should update TC to use the same name. They are currently marked as tier-2 in Taskcluster, so the rank will be 0 until they are bumped to tier-1.
Comment on attachment 8768185 [details]
Bug 1284602 - Linux PGO Builds should publish to 'linux64-pgo';

https://reviewboard.mozilla.org/r/62498/#review59288

::: taskcluster/ci/legacy/tasks/builds/opt_linux64_pgo.yml:5
(Diff revision 1)
>  $inherits:
>    from: 'tasks/builds/base_linux64.yml'
>    variables:
> -    build_name: 'linux64-pgo'
> -    build_type: 'opt'
> +    build_name: 'linux64'
> +    build_type: 'pgo'

In bug 1281004, I've been wrestling with the difference between linux64/debug and linux64-pgo/opt (which IMHO should be linux64/pgo).  I'd like to make things more consistent, but changing things seems to be fraught with risks.

In this particular case, it doesn't look like you're changing the routes (on lines 12 and 13) at all, but are changing whatever uses the variables.

<p>How would you feel about holding on this issue until after bug 1281004 lands, and possibly the followup that refactors builds, and then taking a comprehensive approach to bringing sanity to both build and test platforms, including ramifications for task indices, treeherder, and try syntax?</p>
Attachment #8768185 - Flags: review?(dustin)
The buildbot routes on those lines will go away in bug 1279221, so I'm not concerned about keeping them up-to-date. They are currently only used by artifact builds to download artifacts, and we plan to change that to use gecko.v2. The gecko.v2 routes use the build_name and build_type parameters to generate the routes from the routes.json file, so changing these parameters affects those routes.

I would prefer to land it sooner rather than later, but if you're close to landing bug 1281004 I suppose it doesn't matter too much to me. I believe whimboo was planning to make use of the correct PGO routes, so we should check to see how this impacts him.
Flags: needinfo?(hskupin)
I actually don't have to query the index for pgo builds. I have only seen this while checking available index entries for Firefox. So really no impact for me.
Flags: needinfo?(hskupin)
Ahh, my mistake. No rush then! I can wait until after the refactoring if that's easier for you dustin.
(In reply to Michael Shal [:mshal] from comment #5)
> Ahh, my mistake. No rush then! I can wait until after the refactoring if
> that's easier for you dustin.

mshal: are we still waiting on dustin's work here?
Flags: needinfo?(mshal)
It looks like bug 1291473 just landed - I'll see if I can get an updated fix for this.
Depends on: 1291473
Flags: needinfo?(mshal)
Actually, bug 1286075 is the main piece we need here, which hasn't landed yet. I can fix this once it does.
Depends on: 1286075
:mshal, is there still a dependency on the in-tree config rewrite?
Flags: needinfo?(mshal)
dustin, should we just land the patch on this bug for now? Or will that interfere with bug 1286075?
Flags: needinfo?(mshal) → needinfo?(dustin)
If you do land it, this will be the only place where pgo is a build_type, rather than part of the build_platform.  Typically when finding such inconsistencies in bug 1286075 I've reverted things to make them consistent, with the option to change them to be both consistent and sensible later.  So can't promise I won't do the same here :)

I don't think you should make this change without changing everywhere that this occurs.  I very much think someone (you?) should do that, but that bigger fix (bug 1286086) will definitely conflict with bug 1286075.
Flags: needinfo?(dustin)
(In reply to Dustin J. Mitchell [:dustin] from comment #11)
> I don't think you should make this change without changing everywhere that
> this occurs.  I very much think someone (you?) should do that, but that
> bigger fix (bug 1286086) will definitely conflict with bug 1286075.

Can you clarify where else this needs to be changed? The linux64-pgo is the only pgo in the build_name that I see:

./ci/legacy/tasks/builds/opt_linux64_pgo.yml:    build_name: 'linux64-pgo'
Flags: needinfo?(dustin)
linux64-asan works in a similar way.  It just seems like given comment 4 this is WONTFIX since it's just churn without actually solving bug 1286086.  But if you want to land it, it probably will only take me a few minutes to forward port it, so no big deal.
Flags: needinfo?(dustin)
I don't think this is a WONTFIX - we already have clients like mozregression that pull from 'linux64-pgo'. When the Taskcluster PGO builds are promoted to tier-1, mozregression will stop finding new builds, so we'll have the same problems that we did with 'dbg' vs 'debug'.
Attachment #8768185 - Flags: review?(dustin)
OK, we are talking about two slightly different things, I think.  You're talking about the generated gecko.v2 route, with the format

  index.gecko.v2.{project}.latest.{product}.{job-name-gecko-v2}

whereas I'm talking about the build platform and build type.  Currently we do not use "asan" or "pgo" as a build type anywhere -- it's always opt or debug.  Bug 1286086 will change that, but requires a comprehensive approach.

But the job-name-gecko-v2 is pretty much free-form text that follows no clear pattern with respect to build platform/type, so you're welcome to change that.  The task transforms have a way to specify different job names for all three route formats (buildbot, gecko.v1, gecko.v2).
Now that I look more deeply, it's free-form for buildbot and gecko.v1, but based on the build_type and build_name variables for gecko.v2:

   "{index}.gecko.v2.{project}.pushdate.{year}.{month}.{day}.{pushdate}.{build_product}.{build_name}-{build_type}",

So I guess the issue is that the *Buildbot-originated* gecko.v2 routes are wrong, as they do not include the build type?
Maybe this will make more sense after bug 1286075 lands?
(In reply to Dustin J. Mitchell [:dustin] from comment #16)
> Now that I look more deeply, it's free-form for buildbot and gecko.v1, but
> based on the build_type and build_name variables for gecko.v2:
> 
>   
> "{index}.gecko.v2.{project}.pushdate.{year}.{month}.{day}.{pushdate}.
> {build_product}.{build_name}-{build_type}",
> 
> So I guess the issue is that the *Buildbot-originated* gecko.v2 routes are
> wrong, as they do not include the build type?

From what I can tell, the original buildbot configs didn't distinguish between a "build_name" and a "build_type", it just had a platform name. This largely carried over to the mozharness scripts, but we've since added a separation between build_name & build_type in mozharness in order for both mozharness and Taskcluster to share routes.json (so mozharness back-inherited the separation from Taskcluster in a way). In mozharness, "pgo" is valid for a build_type, so the full platform name of {build_name}-{build_type} becomes "linux64-pgo", which matches what buildbot originally called it.

I don't think it's right or wrong to have the build_name/build_type split, or to have "pgo" be a build_type or not. But historically the Linux64 PGO build has been called "linux64-pgo", so that's where buildbot+mozharness currently index the builds, and Taskcluster needs to be consistent with that before it can become tier-1.

> Maybe this will make more sense after bug 1286075 lands?

It doesn't matter much to me - maybe coop or jmaher would like to see it fixed beforehand? Are you aware of something else that breaks by making the build_type something other than opt or debug?
I think it will cause the builds to go to the wrong place in treeherder, or perhaps not display at all, since they are currently represented as

kind: linux64-pgo
collections:
  opt: true

and this change would mean treeherder would get

kind: linux64
collections:
  pgo: true

I do want to get to that state eventually, just not yet.  So I think the best thing to do is to keep the treeherder stuff as it is, and just change this:

https://hg.mozilla.org/users/dmitchell_mozilla.com/mozilla-central/rev/27ede2e1a52b#l6.31
+    index:
+        product: firefox
+        job-name: linux64-pgo-opt

to

+    index:
+        product: firefox
+        job-name: linux64-pgo

to match Buildbot.
Is {job-name} going to replace {build_name}-{build_type} in routes.json? I'm fine with that, but we'll need to update mozharness as well for the remaining buildbot jobs to work.
I'm not chaging routes.json at all -- it's just there for mozharness's benefit, and in this case we want to make the taskgraph generate the routes that mozharness is generating, so there's no need to modify routes.json.
Ahh, I guess I'm confused then. How does making the job-name 'linux64-pgo' change the gecko.v2 routes to be 'linux64-pgo' instead of 'linux64-pgo-opt' if routes.json isn't also changing?
Only mozharness uses routes.json.  The task-graph generation stuff does not.
Ahh - one of the benefits of routes.json being shared between mozharness and Taskcluster is that it would help ensure that the routes were consistent between buildbot jobs and Taskcluster jobs while we are transitioning to Taskcluster. Can you address this in your design so that it remains shared? I'd prefer not to lose consistency here.
dustin and I just chatted in Vidyo. It sounds like bug 1286075 is close to being ready for review, so we'll wait until then. However, if this is the only thing blocking us getting linux64-pgo as tier-1 on Taskcluster, just let us know and we can land this in the meantime.
Attachment #8768185 - Flags: review?(dustin)
Comment on attachment 8768185 [details]
Bug 1284602 - Linux PGO Builds should publish to 'linux64-pgo';

https://reviewboard.mozilla.org/r/62498/#review79058

The patch itself looks great.  I see you're asking elsewhere about the consequences of this change for downstream users e.g., to :decoder
Attachment #8768185 - Flags: review?(dustin) → review+
Yeah, I want to fix this now before it the TC build becomes tier-1 so we can avoid a platform name change like the linux64-asan issue.
Pushed by mshal@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/50c46ebcf3c0
Linux PGO Builds should publish to 'linux64-pgo'; r=dustin
https://hg.mozilla.org/mozilla-central/rev/50c46ebcf3c0
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: