Closed Bug 1253298 Opened 8 years ago Closed 8 years ago

TC Linux 64 Opt / PGO builds as Tier 2

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: selenamarie, Assigned: mtabara)

References

Details

Attachments

(2 files)

This is a planning bug for tracking work related to getting Linux 64 Opt and PGO builds as Tier-2.
Assignee: nobody → mtabara
I put together a quick-and-dirty taskcluster pgo build so I could start looking at tests. I'm not sure it's completely "right", but it builds reliably: https://treeherder.mozilla.org/#/jobs?repo=try&revision=731455d9c8fc

:mtabara -- Is that helpful? You are welcome to inherit or scavenge my patch. Or, if you don't have work in progress here, want me to take this bug?
Flags: needinfo?(mtabara)
Switched my focus 100% towards this bug only Friday, last week, sorry I haven't dropped a status-update yet.

:gbrown - this is huge help, thanks a lot! I worked on better understanding the context as all this is new to me, and was trying to setup a local environment to play with the PGO build. 

I expect to have some tangible progress results in the following days.
Thanks again for the help!
Flags: needinfo?(mtabara)
Depends on: 1253300
Throwing this back to review and potential check-in. It's basically gbrown's patch for which I once again thank him! I only did some extra testing locally and on try to make sure it builds reliably.

The commit lies here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6c2cadcffc99&selectedJob=20888989
It doesn't handle the test part as it's going to be treated separately in bug 1253300.

One of the concerns was if the build-linux.sh change doesn't affect the non-pgo builds. I tested the patch against linux64-pgo and linux64 and it seems to be working fine. 

Log for pgo build is here: https://public-artifacts.taskcluster.net/FAplNO47QZyqqWRRxsTh5g/0/public/logs/live_backing.log
Log for opt build is here: https://public-artifacts.taskcluster.net/FoTkNnAHSB-STb02Dk6GRg/0/public/logs/live_backing.log

If my understanding is right, I may need to follow-up with a patch to promote the build to Tier 2, by specifying the tier param in the task yml description.
Attachment #8752966 - Flags: review?(gbrown)
Attachment #8752966 - Flags: feedback?(dustin)
Comment on attachment 8752966 [details] [diff] [review]
Enable PGO builds.

Review of attachment 8752966 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/taskcluster/scripts/builder/build-linux.sh
@@ +12,5 @@
>  
>  : MOZHARNESS_SCRIPT             ${MOZHARNESS_SCRIPT}
>  : MOZHARNESS_CONFIG             ${MOZHARNESS_CONFIG}
>  : MOZHARNESS_ACTIONS            ${MOZHARNESS_ACTIONS}
> +: MOZHARNESS_PGO                ${MOZHARNESS_PGO}

I'd like this to be a little more generic.  How about MOZHARNESS_OPTIONS, and put --enable-pgo in that variable?

::: testing/taskcluster/tasks/builds/opt_linux64_pgo.yml
@@ +25,5 @@
> +      groupName: Submitted by taskcluster
> +      machine:
> +        # see https://github.com/mozilla/treeherder/blob/master/ui/js/values.js
> +        platform: linux64
> +      symbol: B

Does this need some other symbol, so it's not confused with non-PGO builds?
Attachment #8752966 - Flags: feedback?(dustin) → feedback+
(In reply to Dustin J. Mitchell [:dustin] from comment #5)
> I'd like this to be a little more generic.  How about MOZHARNESS_OPTIONS,
> and put --enable-pgo in that variable?

+1 -- I like that idea.

> Does this need some other symbol, so it's not confused with non-PGO builds?

I think it is okay this way. "Linux x64 pgo" gets its own section on treeherder, and the existing buildbot pgo builds use "B".
Comment on attachment 8752966 [details] [diff] [review]
Enable PGO builds.

Review of attachment 8752966 [details] [diff] [review]:
-----------------------------------------------------------------

> If my understanding is right, I may need to follow-up with a patch to promote the build to Tier 2, by specifying the tier param in the task yml description.

I agree. Until we get the tests running and the build is proven, it should be tier 2. You should be able to add to opt_linux64_pgo.yml:

  extra:
    treeherder:
      tier: 2
Attachment #8752966 - Flags: review?(gbrown) → review+
Thanks gbrown and dustin for review & feedback. 
Did the refactoring and had a second try push - https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c262994f743

It's working fine.
The PGO build log is here: https://public-artifacts.taskcluster.net/TVJwt3KpRM25OvVfspvnhA/0/public/logs/live_backing.log
whilst the non-pgo build is here: https://public-artifacts.taskcluster.net/N_fOfW-RRJymPUyHXDA2aw/0/public/logs/live_backing.log

I pushed to MozReview against inbound, containing the tier change as well.
Comment on attachment 8753435 [details]
MozReview Request: Bug 1253298 - Enable TC Linux64 PGO builds as Tier 2. r=gbrown

https://reviewboard.mozilla.org/r/53272/#review50064

That looks fine to me. Thanks!
Attachment #8753435 - Flags: review?(gbrown) → review+
https://hg.mozilla.org/mozilla-central/rev/3e069aea556f
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Depends on: 1274022
looking into this, the runtime here is about an hour faster than the runtime on buildbot.  Why is that?  are we generating proper builds?

looking at this revision:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=d3d23c5640717bfb9c72db8e951c462685991854&filter-searchStr=pgo&selectedJob=28431651

we have:
bb: 159 minutes
*   3:09 minutes to setup and start |gmake -f client.mk|
* 145:36 minutes to do the build |Finished build step (success)|

tc: 89 minutes
*  00:48 seconds to setup and start |gmake -f client.mk|
*  79:03 minutes to do the build |Finished build step (success)|

so it seems as though the build step is the biggest difference- it given a 70 minute difference in build times, it would be nice to know we are doing the same things resolve any issues now vs later.
another quirk is this is posting data to perfherder which now makes a bimodal graph:
https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bmozilla-inbound,8555b3405a4d6fca6716db38c31dc94d1c4f8fe1,1,2%5D&zoom=1463435529229.1667,1463590048635.4167,4125.27885578822,9701.487034226882&selected=%5Bmozilla-inbound,8555b3405a4d6fca6716db38c31dc94d1c4f8fe1,31804,28048691,2%5D

because of that graph I am looking into this, but we should determine if we want tc data posting different from bb, or how we deal with things like this.
(in the same context of making sure we're generating the same pgo builds as with buildbot)

Before we can go forward with promoting PGO builds to Tier-1 (bug 1274306), in a separate conversation I had with mshal, he suggested as well to run the talos suite against the build and compare it to the buildbot PGO build results to see if they are reasonable.

Not sure I'm running them properly, but for that I triggered https://treeherder.mozilla.org/#/jobs?repo=try&revision=d6b9d39e4a1d to follow-up.
oh, good idea- I have 5 linux boxes right now that I am testing for talos- they should be done in ~4 hours- most likely we could take my patch, make it hook up to pgo- then run the build as you see fit.

this is my patch:
https://hg.mozilla.org/try/rev/80cca75e6d5999ca2ba2840005092a6efcdc11a5

it is sloppy, and treats talos as unittests- my goal is to work on getting talos tests defined properly with the -t flag.
(In reply to Joel Maher (:jmaher) from comment #13)
> looking into this, the runtime here is about an hour faster than the runtime
> on buildbot.  Why is that?  are we generating proper builds?

One thing to keep in mind is that the buildbot Linux 64 pgo builds run on c1.xlarge while the taskcluster builds run on m3.2xlarge: different hardware characteristics may explain different run-times.
:jmaher: not sure I'm following. two questions if I may:

1) My attempt is to try to run the Talos suite against the current TC Linux64 build and compare it to the Buildbot PGO build results to see if they make sense. It looks like I failed to do that in https://treeherder.mozilla.org/#/jobs?repo=try&revision=d6b9d39e4a1d as Talos is being run against the Linux64 opt build. Can you guide me on to how to do this the right way?

2) As to your patch - sorry, not sure I'm following. So you want me to take your patch, add the PGO enabling patch as well (http://hg.mozilla.org/integration/mozilla-inbound/raw-rev/3e069aea556f) and trigger a new linux64-pgo build in Try with talos fully enabled?

Thanks for all the help in this.
Flags: needinfo?(jmaher)
pushed to try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6874f3fa403c3bf10aafd5eb0ae82e30f836948

I will verify tests are scheduled and repush as needed
Flags: needinfo?(jmaher)
I see a lot of green in that Try push, good job :)

:jmaher - is there anything I could help you with on the Talos thing?
Flags: needinfo?(jmaher)
oh this took me a bit of focus time to get the data and validate it:
https://docs.google.com/spreadsheets/d/1e-8R6UGyrTJO4jX0RfDvwEi9QkU213E4l8MHNNr1_bw/edit?usp=sharing

overall we are good- there are 4 tests that are different, but they are different on opt as well in a similar fashion.  

I would say we are getting the same benefits of pgo in these taskcluster builds!
Flags: needinfo?(jmaher)
:jmaher - that's awesome work you've put there, good job!

Is there anything we need to measure/check before we want to go for this on Tier-1? (Other than the scheduling basis mechanism from https://bugzilla.mozilla.org/show_bug.cgi?id=1274310#c12)
from a perf perspective there is nothing wrong with these pgo builds.  I assume we need signing or l10n or something to make pgo tier-1.
(In reply to Joel Maher (:jmaher) from comment #23)
> from a perf perspective there is nothing wrong with these pgo builds.  I
> assume we need signing or l10n or something to make pgo tier-1.

We need to address nightlies in TC before we can do a full migration here.  

It's a big leap of faith to move PGO CI builds to TC as tier 1, but still rely on PGO nightlies generated in buildbot. That's why we started migration with debug builds.
See Also: → 1253314
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: