Closed Bug 1342392 Opened 3 years ago Closed 2 years ago

Migrate funsize to TaskCluster graphs

Categories

(Release Engineering :: General, defect, P2)

defect

Tracking

(firefox57 fixed)

RESOLVED FIXED
Tracking Status
firefox57 --- fixed

People

(Reporter: sfraser, Assigned: sfraser)

References

Details

Attachments

(3 files, 2 obsolete files)

As long as buildbot is producing nightlies, we'll need the existing funsize scheduler and workers. However, for the builds produced in taskcluster, it would be much better if funsize's work was done in the main task graph. 

This is because:
* Currently funsize is invisible unless you know it's there
* Chain of Trust is more easily human-comprehensible with everything in the task graph


How funsize currently works:
* The funsize schedulers listen for pulse messages from signing jobs
* Using the artifacts from those jobs, and querying balrog for the previous 4 builds, submit a task graph for the partial generation
* Build partials
* Sign the results
* Update Balrog

The new taskcluster work allows us to sign things and update balrog, so those steps are now redundant. 


Steps:

* Make the funsize worker use the new signing worker, beetmover and balrogworker so that all components are using the same tools
* Add an equivalent to the funsize scheduler's logic to the decision task
  - This will need to query balrog


Questions:
Do we insert the partial generation before the signing steps? It doesn't make much sense to generate and sign a complete mar, then generate partials and sign the partials and the complete again. We could turn off the second complete mar signing, but then they might end up in balrog at different times.
Assignee: nobody → sfraser
Currently working on the decision task to find out how to add a new step
> Questions:
> Do we insert the partial generation before the signing steps? It doesn't
> make much sense to generate and sign a complete mar, then generate partials
> and sign the partials and the complete again. We could turn off the second
> complete mar signing, but then they might end up in balrog at different
> times.

I think it makes sense to sign and publish the complete/partials at the same time. Chain-of-trust can ensure that we're consuming trusted (but unsigned) complete mars for partial generation.
Done so far:
Partials task inserted between l10n repacks and signing for nightlies

Partials not generated for Android, so signing jobs there depend on the l10n repacks, and for everything else depend on the partial tasks.


Still to-do, mostly for my own notes:

Passing the artifacts
MOZ_BUILD_DATE gives us this current buildID.
balrog has "platforms": { "Linux_x86-gcc3": { "locales": { "ast": { "buildID": "20170228082425",
So we can generate expected Firefox-mozilla-central-54.0a1-linux-ast-20170227203520-20170228082425.partial.mar style filenames

transforms/nightly_l10n_partials.py needs to be told the 'from' MARs, how to fit that in with the upstream_artifacts already listed? Extra fields in the payload?

transforms_nightly_l10n_signing.py needs to be told the completes and partial.mar files in upstream_artifacts.

-

balrogworker and the partials - the 'from' field. Anything else?  Examine funsize's balrog worker to spot differences.

-

Funsize's partial generation doesn't need any scopes, looking at tasks/funsize.yml transforms/partials.py#12 remove get_signing_cert_scope - only needed for signing tasks
Remove uneeded from transforms/partials.py#121

-
 transforms/partials.py#81 rename signing_format_scope transforms/partials.py#90 change default symbol

-

Re-insert dependency kind check into task/signing.py

Update docstring in get_relevant_tasks in task/signing.py

-
Add routes to partials task definition. Which of these are still needed? 
      routes:
        - tc-treeherder-stage.{{ branch }}.{{ revision_hash }}
        - tc-treeherder.{{ branch }}.{{ revision_hash }}
        - index.funsize.v1.{{ branch }}.revision.{{ platform }}.{{ revision }}.{{ update_number }}.{{ chunk_name }}{% if subchunk %}_{{ subchunk }}{% endif %}.updates
        - index.funsize.v1.{{ branch }}.latest.{{ platform }}.{{ update_number }}.{{ chunk_name }}{% if subchunk %}_{{ subchunk }}{% endif %}.updates

-

Update the task reference from 'unsigned-repack' in transforms/nightly_l10n_signing.py#47, if it's not an android job. Check docs for significance of the string contents

-

funsize scriptworker! - need to work out how much of the existing one we can recycle.

-

Partials for en-US between build and build-signing, if nightly is set.
Hendrik's tests, would make sense if those were done on signed versions, but detecting which ones are relevant and so should generate pulse messages may be difficult once the partials generation is a few steps back
Priority: -- → P2
Attached patch funsize-in-tree-diff1.diff (obsolete) — Splinter Review
Work in progress, storing update to share the direction things are heading.

* A new 'mach' subcommand to generate the necessary information from balrog
* The decision task uses this same code to cache this data for later use in the taskgraph.
* paramters.yml can specify this file for offline use and recreation of older graphs.
* partials, partials-signing and beetmover-partials added. beetmover-partials used rather than existing beetmover-* to avoid situations where completes aren't updated if there are no partials to create.
* util.partials added for common elements

Still to do:
* How to sanely access the cache file produced in the decision task
* Adjust funsize worker to cope with upstreamArtifacts instead of a URL
* Adjust beetmover templates to cope with partial filenames & buildids from the 'extra' definition
* Balrog task
Depends on: 1394242
Depends on: 1395649
Depends on: 1396793
Depends on: 1396797
Attached patch partials2.diff (obsolete) — Splinter Review
Decision Task and mach
----------------------

* There is a new mach subcommand, `./mach partials` in the CI category, to generate the release history needed for partial generation
* The decision task fetches this release history
* The task graph generation uses this stored file, allowing for replay and customised tests


Taskgraph
---------

* Two new kinds exist, partials and partials-signing
* These have been inserted after repackage-signing.

After repackage-signing, there are now two branches in the graph:

1. repackage-signing; beetmover-repackage; balrog 
2. repackage-signing; partials; partials-signing; beetmover-partials; balrog

Path 1 already exists. Path 2 will only exist if there is a release history detailing previous releases that can be used to generate partials. If the file, or data within that file, is not present then no partials task will be generated, and so no partials-signing/beetmover-partials or balrog task.

A future improvement will be to add to the taskgraph optimisation to remove path 1 from the graph if path 2 exists for a platform/locale combination.

There's currently a conditional in the balrog kind, to give it a new label and symbol if partials are involved. This is to avoid duplicates of those in the taskgraph. We could make it a separate kind, using the same transforms, but the special case needs to be there anyway to avoid symbol duplication. Perhaps name_sanity could have a symbol_sanity counterpart?

External dependencies
---------------------

The following scriptworkers have been updated to cope with the changes:
signingscript
balrogscript
beetmoverscript

Funsize worker
--------------

This is now built each time it changes, in-tree using the docker-image kind. 

Naming
------

Yes, it's called 'partials' not 'funsize' in-tree. It's less fun, but it makes things easier to find.
Attachment #8905168 - Flags: review?(dustin)
Comment on attachment 8905168 [details] [diff] [review]
partials2.diff

Review of attachment 8905168 [details] [diff] [review]:
-----------------------------------------------------------------

Some questions and suggested changes below.  I've flagged Johan to look at the transforms, and gps for the question below about the mach commands.

::: taskcluster/docs/kinds.rst
@@ +244,5 @@
> +
> +partials
> +--------
> +Partials take the complete.mar files produced in previous steps and generate partial
> +updates between previous nightly releases and the new one.

Let's not introduce "steps" as another synonym for task/job/build :)

::: taskcluster/mach_commands.py
@@ +506,5 @@
>              sys.exit(1)
> +
> +@CommandProvider
> +class TaskClusterPartialsData(object):
> +    @Command('partials', category="ci",

I'd like to get a build peer's opinion on adding another mach command.  What's the utility of this?  It might help to refer to it in the docs at the place where it might be useful.

::: taskcluster/taskgraph/action.yml
@@ +35,4 @@
>  
>    features:
>      taskclusterProxy: true
> +    chainOfTrust: true

Why is this required?

::: taskcluster/taskgraph/decision.py
@@ +110,5 @@
>      parameters = get_decision_parameters(options)
> +
> +    release_history = populate_release_history('Firefox', parameters['project'])
> +
> +    write_artifact(parameters['release_history'], release_history)

Does this always have to happen, or just on certain branches or just when include_nightly = true?  Best to avoid querying or creating an artifact when there's no need for it.

@@ +176,5 @@
>      parameters['morph_templates'] = {}
>  
> +    # Define default build history file, to store balrog data required by
> +    # partials generation
> +    parameters['release_history'] = 'release_history.json'

How big is this data?  It might be simpler to just include it in the parameters directly. Related: is this data necessary for all builds or just those with params['nightly']?  I want to avoid the case where *all* taskgraph hacking requires two steps (download balrog data, then run taskgraph).  Ideally when this information is required it comes included in the parameters file (which the taskgraph commands will automatically download for you).  As a bonus, then there's no reason to write out an additional artifact.

In bug 1383880 I'm doing a similar thing for try_task_config -- embedding the entire content of that JSON file in the parameters.  Following that model, the call to populate_release_history would then move to get_decision_parameters.

::: taskcluster/taskgraph/partials_balrog.py
@@ +1,3 @@
> +# This Source Code Form is subject to the terms of the Mozilla Public
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.

What's the distinction between this file and `taskcluster/taskgraph/util/partials.py`?  Could it all live in that flie?  I've come to regret putting so much stuff in taskcluster/taskgraph when the project was young :)

@@ +13,5 @@
> +import logging
> +logger = logging.getLogger(__name__)
> +
> +
> +BALROG_API_ROOT = 'https://aus5.mozilla.org/api/v1'

I'm glad to see this is publicly available and does not require the balrogVpnProxy :)

::: taskcluster/taskgraph/transforms/beetmover_partials.py
@@ +1,5 @@
> +# This Source Code Form is subject to the terms of the Mozilla Public
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +"""
> +Transform the beetmover task into an actual task description.

beetmover partials task -- or is that accurate at all?

::: taskcluster/taskgraph/transforms/task.py
@@ +495,4 @@
>      'tc-L10n-Rpk': 'Localized Repackaged Repacks executed by Taskcluster',
>      'tc-BM-L10n': 'Beetmover for locales executed by Taskcluster',
>      'tc-BMR-L10n': 'Beetmover repackages for locales executed by Taskcluster',
> +    'c-Up': 'Balrog submission of updates',

What's the "c" here?  I don't see that letter in the description :)
Attachment #8905168 - Flags: review?(jlorenzo)
Attachment #8905168 - Flags: review?(gps)
Attachment #8905168 - Flags: review?(dustin)
Attachment #8905168 - Flags: review-
(In reply to Dustin J. Mitchell [:dustin] from comment #7)
> Comment on attachment 8905168 [details] [diff] [review]

> ::: taskcluster/docs/kinds.rst
> @@ +244,5 @@
> > +
> > +partials
> > +--------
> > +Partials take the complete.mar files produced in previous steps and generate partial
> > +updates between previous nightly releases and the new one.
> 
> Let's not introduce "steps" as another synonym for task/job/build :)

I'll change it
 
> ::: taskcluster/mach_commands.py
> @@ +506,5 @@
> >              sys.exit(1)
> > +
> > +@CommandProvider
> > +class TaskClusterPartialsData(object):
> > +    @Command('partials', category="ci",
> 
> I'd like to get a build peer's opinion on adding another mach command. 
> What's the utility of this?  It might help to refer to it in the docs at the
> place where it might be useful.

The short version is that it allows people to generate this artifact once, and re-use it when attempting to replay a specific event or test things without extra network connections, rather than query balrog each time and potentially get different data. They can also modify the data to experiment with different scenarios instead of changing balrog.

> ::: taskcluster/taskgraph/action.yml
> @@ +35,4 @@
> >  
> >    features:
> >      taskclusterProxy: true
> > +    chainOfTrust: true
> 
> Why is this required?

Oops, I think this is someone else's change on date, I'll filter it out.

> ::: taskcluster/taskgraph/decision.py
> @@ +110,5 @@
> >      parameters = get_decision_parameters(options)
> > +
> > +    release_history = populate_release_history('Firefox', parameters['project'])
> > +
> > +    write_artifact(parameters['release_history'], release_history)
> 
> Does this always have to happen, or just on certain branches or just when
> include_nightly = true?  Best to avoid querying or creating an artifact when
> there's no need for it.

Previous conversations have left me a bit confused as to what constitutes a nightly build, since all nightly tasks are generated in every case. If that's the authoritative switch for it, we can use that.

> @@ +176,5 @@
> >      parameters['morph_templates'] = {}
> >  
> > +    # Define default build history file, to store balrog data required by
> > +    # partials generation
> > +    parameters['release_history'] = 'release_history.json'
> 
> How big is this data? 

For central, currently 548 kilobytes. It seemed too large to put in parameters directly, or store in the environment.

> It might be simpler to just include it in the
> parameters directly. Related: is this data necessary for all builds or just
> those with params['nightly']?  I want to avoid the case where *all*
> taskgraph hacking requires two steps (download balrog data, then run
> taskgraph).  

If the file is not there, then no partials tasks will be generated, but otherwise all taskgraph commands will work as expected. It's only `./mach decision` that generates it, which I'd hoped would limit the number of times people would be unknowingly querying balrog.

I think you're right that if it can only be run on nightlies, this will improve matters.

> In bug 1383880 I'm doing a similar thing for try_task_config -- embedding
> the entire content of that JSON file in the parameters.  Following that
> model, the call to populate_release_history would then move to
> get_decision_parameters.

How big is the data there?
 
> ::: taskcluster/taskgraph/partials_balrog.py
> @@ +1,3 @@
> > +# This Source Code Form is subject to the terms of the Mozilla Public
> > +# License, v. 2.0. If a copy of the MPL was not distributed with this
> > +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> 
> What's the distinction between this file and
> `taskcluster/taskgraph/util/partials.py`?  Could it all live in that flie? 
> I've come to regret putting so much stuff in taskcluster/taskgraph when the
> project was young :)

Ah, I was following that example, in that other mach commands queried files in taskcluster/taskgraph rather than in util/. It's easily moved & combined.

> ::: taskcluster/taskgraph/transforms/beetmover_partials.py
> @@ +1,5 @@
> > +# This Source Code Form is subject to the terms of the Mozilla Public
> > +# License, v. 2.0. If a copy of the MPL was not distributed with this
> > +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> > +"""
> > +Transform the beetmover task into an actual task description.
> 
> beetmover partials task -- or is that accurate at all?

Copy/paste error, will fix.
 
> ::: taskcluster/taskgraph/transforms/task.py
> @@ +495,4 @@
> >      'tc-L10n-Rpk': 'Localized Repackaged Repacks executed by Taskcluster',
> >      'tc-BM-L10n': 'Beetmover for locales executed by Taskcluster',
> >      'tc-BMR-L10n': 'Beetmover repackages for locales executed by Taskcluster',
> > +    'c-Up': 'Balrog submission of updates',
> 
> What's the "c" here?  I don't see that letter in the description :)

Apologies, I'd meant it to be 'completes', and the 'cp-Up' to be 'completes & partials'. I've been advised to remove the 'tc-' designation if at all possible.
Hm, 548kb is pretty big, but it won't be the end of the world for a parameters file if it's only the case for nightlies.  I think the benefits of all-inputs-in-one-place outweight the costs.

Regarding the mach command, it sounds like it could be helpful indeed, especially with a bit of documentation on how to generate that parameter (maybe just mention the command in `parameters.rst`).

Regarding c-Up/cp-Up -- that makes a lot more sense.  And getting rid of tc is good :)
(In reply to Dustin J. Mitchell [:dustin] from comment #9)
> Hm, 548kb is pretty big, but it won't be the end of the world for a
> parameters file if it's only the case for nightlies.  I think the benefits
> of all-inputs-in-one-place outweight the costs.

_If_ we can only generate it for nightlies. I don't yet have a specific way of discovering that at the moment, as include_nightly only appears to be used on 'try'. 

It does feel wrong to make the parameters go from a ~600 bytes to ~480Kb, but I see the advantage - certainly quicker and more reliable than reading the artifact in again.

> Regarding the mach command, it sounds like it could be helpful indeed,
> especially with a bit of documentation on how to generate that parameter
> (maybe just mention the command in `parameters.rst`).

I've renamed the mach command `release-history` to make it reflect more accurately what it actually does.
Attached patch partials4.diffSplinter Review
* Updated the release history so that it's stored in the parameters option
* Only produce the release history when target-task-method contains 'nightly', as it seems to be the best/only way of determining the intent of the task graph.
* Renamed mach subcommand to indicate it generates a release-history, and updated it to produce out .yml that can be appended to a parameters.yml file

This does increase the size of the parameters file by about 500 kilobytes
Attachment #8899871 - Attachment is obsolete: true
Attachment #8905168 - Attachment is obsolete: true
Attachment #8905168 - Flags: review?(jlorenzo)
Attachment #8905168 - Flags: review?(gps)
Attachment #8905581 - Flags: review?(jlorenzo)
Attachment #8905581 - Flags: review?(dustin)
Attachment #8905581 - Flags: review?(gps)
Comment on attachment 8905581 [details] [diff] [review]
partials4.diff

Review of attachment 8905581 [details] [diff] [review]:
-----------------------------------------------------------------

A few general notes that don't need to be fixed here but are worth mentioning for next time: pushing to mozreview (or, soon, phabricator) allows comparison of different patch revisions, which helps with followup reviews like this.  Also, it makes microcommits a lot easier, as you can push (and re-push) a whole patch series.  In this case, microcommits might have been introducing the release-history fetching code and associated mach command; including that information in parameters to the decision task; and then adding the kinds and transforms.  A nice side-benefit to microcommits is that you can get an r+ on a few of them early on, and then focus only on the parts that still have review comments outstanding.

::: taskcluster/docs/kinds.rst
@@ +248,5 @@
> +updates between previous nightly releases and the new one.
> +
> +partials-signing
> +----------------
> +Partials-signing take the partial updates produced in Partials and sign them.

(nit) grammar seems weird here - maybe present singular "Partials-signing takes .. and signs .."? (same for the previous kind)

::: taskcluster/docs/parameters.rst
@@ +110,5 @@
> +``release_history``
> +   History of recent releases by platform and locale, used when generating
> +   partial updates for nightly releases.
> +   Suitable contents can be generated with ``mach release-history``,
> +   which will print to the console by default.

++

::: taskcluster/taskgraph/decision.py
@@ +204,5 @@
>      if options.get('target_tasks_method'):
>          parameters['target_tasks_method'] = options['target_tasks_method']
>  
> +    parameters.setdefault('release_history', dict())
> +    if 'nightly' in parameters.get('target_tasks_method', ''):

I like this, but please add a comment indicating the intent (specifically, to only do this when generating nightly taskgraphs as run from cron)
Attachment #8905581 - Flags: review?(dustin) → review+
Comment on attachment 8905581 [details] [diff] [review]
partials4.diff

Review of attachment 8905581 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not super keen on this somewhat-specific mach command in the top-level namespace (I'm pretty sensitive to new mach commands given how many there already are). But we can improve the UX in a follow-up.

::: taskcluster/mach_commands.py
@@ +506,5 @@
>              sys.exit(1)
> +
> +@CommandProvider
> +class TaskClusterPartialsData(object):
> +    @Command('release-history', category="ci",

In the scope of all mozilla-central, "release-history" could mean a lot of different things! As such, a Balrog specific "release-history" command feels out of place. I see the following options (which can be done as follow-ups because I don't want to delay this landing):

* Move the code to a Python module and use `mach python -m taskgraph.util.partials` (or similar)
* Turn "release-history" into a @SubCommand and implement `mach release-history balrog` (or similar).
Attachment #8905581 - Flags: review?(gps) → feedback+
(In reply to Gregory Szorc [:gps] from comment #13)
> Comment on attachment 8905581 [details] [diff] [review]
> 
> In the scope of all mozilla-central, "release-history" could mean a lot of
> different things! As such, a Balrog specific "release-history" command feels
> out of place. I see the following options (which can be done as follow-ups
> because I don't want to delay this landing):
> 
> * Move the code to a Python module and use `mach python -m
> taskgraph.util.partials` (or similar)
> * Turn "release-history" into a @SubCommand and implement `mach
> release-history balrog` (or similar).

Fair comment - I'd missed that aspect of naming it, and agree that it needs improving.
Comment on attachment 8905581 [details] [diff] [review]
partials4.diff

Review of attachment 8905581 [details] [diff] [review]:
-----------------------------------------------------------------

It looks good to me overall! Thanks for the overviews.

It's nice to see a solution without a second graph! 

We may be able to optimize the time, by waiting on repackage instead of repackage-signing. I'm not 100% sure of it, I will ask Aki, to confirm.

Apart from that, I noted a few things here and there. More details below.

::: taskcluster/ci/docker-image/kind.yml
@@ +26,4 @@
>      symbol: I(agb)
>    index-task:
>      symbol: I(idx)
> +  funsize-update-generator:

Re: Naming. Should we rename the image "partial-updates-generator" or "partials-generator"?

::: taskcluster/mach_commands.py
@@ +510,5 @@
> +    @Command('release-history', category="ci",
> +             description="Query balrog for release history used by enable partials generation")
> +    @CommandArgument('-b', '--branch',
> +                     help="The project branch used in balrog, such as "
> +                          "mozilla-central, release, date")

I was a bit confused by the combination of the explanation and the examples. I wondered if this argument was about the gecko branch ("mozilla-central", "mozilla-release", "date") or the balrog channel ("nightly", "release", "nightly-date").
 
To reduce the risk of confusion, I'd s/branch/gecko_branch (or gecko-branch), from here down to get_releases()

::: taskcluster/taskgraph/decision.py
@@ +204,5 @@
>      if options.get('target_tasks_method'):
>          parameters['target_tasks_method'] = options['target_tasks_method']
>  
> +    parameters.setdefault('release_history', dict())
> +    if 'nightly' in parameters.get('target_tasks_method', ''):

In reply to:
> Does this always have to happen, or just on certain branches or just when include_nightly = true? 
 
Do we also want to generate partials per push on mozilla-beta and mozilla-release? If not, I'd suggest to add a second condition on the branch.

include_nightly is true on each push on these branches. However, we only promote very specific revisions.

::: taskcluster/taskgraph/transforms/balrog.py
@@ +66,4 @@
>  
>          attributes = copy_attributes_from_dependent_job(dep_job)
>  
> +        locale = dep_job.attributes.get('locale', 'N')

I don't think this is a good idea to set "N" as a default locale. The symbol is already defined at line 59. That may also confuse people who look at line 73 and don't see why N isn't set there too.
 
I'd suggest to rename the variable to `treeherder_job_symbol`

::: taskcluster/taskgraph/transforms/beetmover_partials.py
@@ +18,5 @@
> +transforms = TransformSequence()
> +
> +
> +def generate_upstream_artifacts(release_history, platform, locale=None):
> +    if locale == 'en-US':

The case where `locale == None` isn't handled. I guess it should be done here, right?

@@ +63,5 @@
> +        for artifact in artifact_map:
> +            extra.append({
> +                'locale': locale,
> +                'artifact_name': artifact,
> +                'buildid': artifact_map[artifact],

Nit: Additionnal look up may be avoided by `for artifact, build_id in artifact_map.iteritems()` (unless I'm missing something)

::: taskcluster/taskgraph/transforms/partials.py
@@ +22,5 @@
> +_TC_ARTIFACT_LOCATION = \
> +        'https://queue.taskcluster.net/v1/task/{task_id}/artifacts/public/build/{postfix}'
> +
> +
> +def _generate_taskcluster_prefix(task_id, postfix='', locale=None):

Nit: It might be good to expose that function in a common package. It's identical to the one defined in taskcluster/taskgraph/transforms/repackage.py

@@ +36,5 @@
> +    data = list()
> +    for filename in filenames:
> +        data.append({
> +            'type': 'file',
> +            'path': '/home/worker/artifacts/{}'.format(filename),

This path changed a few days ago in https://hg.mozilla.org/mozilla-central/rev/8cf125b4fa04. See https://dxr.mozilla.org/mozilla-central/rev/37b95547f0d27565452136d16b2df2857be840f6/taskcluster/taskgraph/transforms/repackage.py#247

On another note, windows workers don't seem to have the same path than  on macos or Linux https://dxr.mozilla.org/mozilla-central/rev/37b95547f0d27565452136d16b2df2857be840f6/taskcluster/taskgraph/transforms/repackage.py#263.

These 2 points make me wonder if we should expose generate_task_output_files(build_platform, filenames, locale=None), and call it in taskgraph/transforms/repackage.py with filenames being `(target.complete.mar,)` (and more depending on the platform)

@@ +65,5 @@
> +
> +        treeherder.setdefault('platform',
> +                              "{}/opt".format(dep_th_platform))
> +        treeherder.setdefault('kind', 'build')
> +        treeherder.setdefault('tier', 1)

Should we first land these jobs as tier 3 and bump it to tier-1 once we're sure of their stability? Same comment for partials-signing.py.

@@ +92,5 @@
> +            continue
> +
> +        signing_task = None
> +        for dependency in dependencies.keys():
> +            if 'repackage-signing' in dependency:

I might be wrong, but you might be able to wait on simply repackage. If I understand correctly, repackage-signing is about signing the containers (mars on Linux/Mac + installers on Windows). Because partials will change the content of the containers, their signatures should blow up. That's why I think waiting on repackage-signing may be unnecessary. Aki should be able to confirm (if he hadn't already).

::: taskcluster/taskgraph/transforms/partials_signing.py
@@ +70,5 @@
> +        if locale:
> +            attributes['locale'] = locale
> +            treeherder['symbol'] = 'ps({})'.format(locale)
> +
> +        platform = get_friendly_platform_name(dep_th_platform)

There are 3 different platforms in this function. I might not follow why this particular plarform has the right be simply called "platform". I know it's the friendliest one, but maybe we could rename get_friendly_platform_name() into get_balrog_platform_name()? And platform into balrog_platform?

::: taskcluster/taskgraph/transforms/task.py
@@ +510,5 @@
>      'pub': 'APK publishing',
> +    'p': 'Partial generation',
> +    'ps': 'Partials signing',
> +    'pBM': 'Beetmover for partials',
> +    'cp-Up': 'Balrog submission of updates, completes and partials',

Nice nomenclature!

::: taskcluster/taskgraph/util/partials.py
@@ +56,5 @@
> +
> +def _sanitize_platform(platform):
> +    platform = get_friendly_platform_name(platform)
> +    if platform not in BALROG_PLATFORM_MAP:
> +        return platform

What platforms are valid but not in BALROG_PLATFORM_MAP? If there's none, I think we should raise an exception here. Otherwise, I'd add a comment to list some examples

@@ +77,5 @@
> +    platform = _sanitize_platform(platform)
> +    return {k: release_history[platform][locale][k]['buildid'] for k in release_history.get(platform, {}).get(locale, {})}
> +
> +
> +def get_partials_artifacts_friendly(release_history, platform, locale, current_builid):

Nit: Unused function

@@ +102,5 @@
> +def get_releases(product, branch):
> +    """Returns a list of release names from Balrog.
> +    :param product: product name, AKA appName
> +    :param branch: branch name, e.g. mozilla-central
> +    :return: a list of release names

Nit: We could indicate the list is sorted from the most to the least recent

@@ +109,5 @@
> +    params = {
> +        "product": product,
> +        # Adding -nightly-2 (2 stands for the beginning of build ID
> +        # based on date) should filter out release and latest blobs.
> +        # This should be changed to -nightly-3 in 3000 ;)

Nice explicative comment :)

@@ +115,5 @@
> +        "names_only": True
> +    }
> +    params_str = "&".join("=".join([k, str(v)])
> +                          for k, v in params.iteritems())
> +    logger.info("Connecting to %s?%s", url, params_str)

Nit: Should we move this log to _retry_on_http_errors()? Crafting params_str sounds like too much responsibility to get_releases(), to me

@@ +141,5 @@
> +
> +        Args:
> +            product (str): capitalized product name, AKA appName, e.g. Firefox
> +            branch (str): branch name (mozilla-central)
> +            maxbuilds (int): Maximum number of historical releases to populate

Nit: maxsearch is missing

@@ +160,5 @@
> +                'platform2': {
> +                }
> +            }
> +        """
> +    last_releases = get_releases(product, branch)

Nit: I think if the function name states that releases are sorted, the loop would make total sense, at the first glance

@@ +167,5 @@
> +
> +    builds = dict()
> +    for release in last_releases[:maxsearch]:
> +        # maxbuilds in all categories, don't make any more queries
> +        full = len(builds) > 0 and all(len(builds[b][p]) >= maxbuilds for b in builds for p in builds[b])

Nit: Per the code below, I think b and p are misleading. It should be something like: `for platform in builds for locale in builds[platform]`, shouldn't it?
Attachment #8905581 - Flags: review?(jlorenzo) → review+
(In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #6)
> * Two new kinds exist, partials and partials-signing
> * These have been inserted after repackage-signing.
> 
>  After repackage-signing, there are now two branches in the graph:
>  
> 1. repackage-signing; beetmover-repackage; balrog 
> 2. repackage-signing; partials; partials-signing; beetmover-partials; balrog


(In reply to Johan Lorenzo [:jlorenzo] from comment #15)
> I might be wrong, but you might be able to wait on simply repackage.
> If I understand correctly, repackage-signing is about signing the
> containers (mars on Linux/Mac + installers on Windows). Because partials
> will change the content of the containers, their signatures should blow
> up. That's why I think waiting on repackage-signing may be unnecessary.
> Aki should be able to confirm (if he hadn't already).

What do you think of the order of the task, Aki? Could we make partials depend on repackage?
Flags: needinfo?(aki)
(In reply to Johan Lorenzo [:jlorenzo] from comment #15)
> 
> We may be able to optimize the time, by waiting on repackage instead of
> repackage-signing. I'm not 100% sure of it, I will ask Aki, to confirm.

Hm, I'd thought we would always want to work on signed mars. We're not modifying the contents of what's produced by repackage-signing. 
 
> Apart from that, I noted a few things here and there. More details below.
> 
> ::: taskcluster/ci/docker-image/kind.yml
> @@ +26,4 @@
> >      symbol: I(agb)
> >    index-task:
> >      symbol: I(idx)
> > +  funsize-update-generator:
> 
> Re: Naming. Should we rename the image "partial-updates-generator" or
> "partials-generator"?

Changed!
 
> ::: taskcluster/mach_commands.py
> @@ +510,5 @@
> > +    @Command('release-history', category="ci",
> > +             description="Query balrog for release history used by enable partials generation")
> > +    @CommandArgument('-b', '--branch',
> > +                     help="The project branch used in balrog, such as "
> > +                          "mozilla-central, release, date")
> 
> I was a bit confused by the combination of the explanation and the examples.
> I wondered if this argument was about the gecko branch ("mozilla-central",
> "mozilla-release", "date") or the balrog channel ("nightly", "release",
> "nightly-date").
>  
> To reduce the risk of confusion, I'd s/branch/gecko_branch (or
> gecko-branch), from here down to get_releases()

I'm not sure what the best approach here is, to make things clearer.

> ::: taskcluster/taskgraph/transforms/balrog.py
> @@ +66,4 @@
> >  
> >          attributes = copy_attributes_from_dependent_job(dep_job)
> >  
> > +        locale = dep_job.attributes.get('locale', 'N')
> 
> I don't think this is a good idea to set "N" as a default locale. The symbol
> is already defined at line 59. That may also confuse people who look at line
> 73 and don't see why N isn't set there too.
>  
> I'd suggest to rename the variable to `treeherder_job_symbol`

Changed.
 
> ::: taskcluster/taskgraph/transforms/beetmover_partials.py
> @@ +18,5 @@
> > +transforms = TransformSequence()
> > +
> > +
> > +def generate_upstream_artifacts(release_history, platform, locale=None):
> > +    if locale == 'en-US':
> 
> The case where `locale == None` isn't handled. I guess it should be done
> here, right?

Fixed! Thanks! 

> @@ +63,5 @@
> > +        for artifact in artifact_map:
> > +            extra.append({
> > +                'locale': locale,
> > +                'artifact_name': artifact,
> > +                'buildid': artifact_map[artifact],
> 
> Nit: Additionnal look up may be avoided by `for artifact, build_id in
> artifact_map.iteritems()` (unless I'm missing something)

.iteritems() is py27 only, and while that's what mach currently uses, I'd like to avoid changes later if it becomes py3 compatible. Hmm, although I do use iteritems later on.

> ::: taskcluster/taskgraph/transforms/partials.py
> @@ +22,5 @@
> > +_TC_ARTIFACT_LOCATION = \
> > +        'https://queue.taskcluster.net/v1/task/{task_id}/artifacts/public/build/{postfix}'
> > +
> > +
> > +def _generate_taskcluster_prefix(task_id, postfix='', locale=None):
> 
> Nit: It might be good to expose that function in a common package. It's
> identical to the one defined in taskcluster/taskgraph/transforms/repackage.py

I've moved it to taskcluster/taskgraph/util/taskcluster.py

> @@ +36,5 @@
> > +    data = list()
> > +    for filename in filenames:
> > +        data.append({
> > +            'type': 'file',
> > +            'path': '/home/worker/artifacts/{}'.format(filename),
> 
> This path changed a few days ago in
> https://hg.mozilla.org/mozilla-central/rev/8cf125b4fa04. See
> https://dxr.mozilla.org/mozilla-central/rev/
> 37b95547f0d27565452136d16b2df2857be840f6/taskcluster/taskgraph/transforms/
> repackage.py#247

Is this for all docker workers? The funsize docker worker has the same output path for every platform. We can update it to follow repackage's model if needed.

> On another note, windows workers don't seem to have the same path than  on
> macos or Linux
> https://dxr.mozilla.org/mozilla-central/rev/
> 37b95547f0d27565452136d16b2df2857be840f6/taskcluster/taskgraph/transforms/
> repackage.py#263.
> 
> These 2 points make me wonder if we should expose
> generate_task_output_files(build_platform, filenames, locale=None), and call
> it in taskgraph/transforms/repackage.py with filenames being
> `(target.complete.mar,)` (and more depending on the platform)

I've not modified repackage.py's artifacts

> 
> @@ +65,5 @@
> > +
> > +        treeherder.setdefault('platform',
> > +                              "{}/opt".format(dep_th_platform))
> > +        treeherder.setdefault('kind', 'build')
> > +        treeherder.setdefault('tier', 1)
> 
> Should we first land these jobs as tier 3 and bump it to tier-1 once we're
> sure of their stability? Same comment for partials-signing.py.

Sure, can do.

> @@ +92,5 @@
> > +            continue
> > +
> > +        signing_task = None
> > +        for dependency in dependencies.keys():
> > +            if 'repackage-signing' in dependency:
> 
> I might be wrong, but you might be able to wait on simply repackage. If I
> understand correctly, repackage-signing is about signing the containers
> (mars on Linux/Mac + installers on Windows). Because partials will change
> the content of the containers, their signatures should blow up. That's why I
> think waiting on repackage-signing may be unnecessary. Aki should be able to
> confirm (if he hadn't already).

partials generates new mars, and shouldn't modify the existing ones. 

> 
> ::: taskcluster/taskgraph/transforms/partials_signing.py
> @@ +70,5 @@
> > +        if locale:
> > +            attributes['locale'] = locale
> > +            treeherder['symbol'] = 'ps({})'.format(locale)
> > +
> > +        platform = get_friendly_platform_name(dep_th_platform)
> 
> There are 3 different platforms in this function. I might not follow why
> this particular plarform has the right be simply called "platform". I know
> it's the friendliest one, but maybe we could rename
> get_friendly_platform_name() into get_balrog_platform_name()? And platform
> into balrog_platform?

Done.

> ::: taskcluster/taskgraph/util/partials.py
> @@ +56,5 @@
> > +
> > +def _sanitize_platform(platform):
> > +    platform = get_friendly_platform_name(platform)
> > +    if platform not in BALROG_PLATFORM_MAP:
> > +        return platform
> 
> What platforms are valid but not in BALROG_PLATFORM_MAP? If there's none, I
> think we should raise an exception here. Otherwise, I'd add a comment to
> list some examples

I'm trying to remember, now - there were some that didn't need translating. 
 
> @@ +77,5 @@
> > +    platform = _sanitize_platform(platform)
> > +    return {k: release_history[platform][locale][k]['buildid'] for k in release_history.get(platform, {}).get(locale, {})}
> > +
> > +
> > +def get_partials_artifacts_friendly(release_history, platform, locale, current_builid):
> 
> Nit: Unused function

Fixed.

> @@ +102,5 @@
> > +def get_releases(product, branch):
> > +    """Returns a list of release names from Balrog.
> > +    :param product: product name, AKA appName
> > +    :param branch: branch name, e.g. mozilla-central
> > +    :return: a list of release names
> 
> Nit: We could indicate the list is sorted from the most to the least recent

Fixed.

> @@ +115,5 @@
> > +        "names_only": True
> > +    }
> > +    params_str = "&".join("=".join([k, str(v)])
> > +                          for k, v in params.iteritems())
> > +    logger.info("Connecting to %s?%s", url, params_str)
> 
> Nit: Should we move this log to _retry_on_http_errors()? Crafting params_str
> sounds like too much responsibility to get_releases(), to me

Fixed.

> @@ +141,5 @@
> > +
> > +        Args:
> > +            product (str): capitalized product name, AKA appName, e.g. Firefox
> > +            branch (str): branch name (mozilla-central)
> > +            maxbuilds (int): Maximum number of historical releases to populate
> 
> Nit: maxsearch is missing

Fixed.

> @@ +160,5 @@
> > +                'platform2': {
> > +                }
> > +            }
> > +        """
> > +    last_releases = get_releases(product, branch)
> 
> Nit: I think if the function name states that releases are sorted, the loop
> would make total sense, at the first glance

Fixed.

> @@ +167,5 @@
> > +
> > +    builds = dict()
> > +    for release in last_releases[:maxsearch]:
> > +        # maxbuilds in all categories, don't make any more queries
> > +        full = len(builds) > 0 and all(len(builds[b][p]) >= maxbuilds for b in builds for p in builds[b])
> 
> Nit: Per the code below, I think b and p are misleading. It should be
> something like: `for platform in builds for locale in builds[platform]`,
> shouldn't it?

Fixed.
(In reply to Johan Lorenzo [:jlorenzo] from comment #16)

> (In reply to Johan Lorenzo [:jlorenzo] from comment #15)
> > I might be wrong, but you might be able to wait on simply repackage.
> > If I understand correctly, repackage-signing is about signing the
> > containers (mars on Linux/Mac + installers on Windows). Because partials
> > will change the content of the containers, their signatures should blow
> > up. That's why I think waiting on repackage-signing may be unnecessary.
> > Aki should be able to confirm (if he hadn't already).
> 
> What do you think of the order of the task, Aki? Could we make partials
> depend on repackage?

The partials generation code checks to see whether the MAR has a valid signature, and uses it to determine which signature format to use with the partials it generates. It doesn't modify the contents of the MARs it has been given, though, so the signatures there should be intact.
(In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #17)
> Is this for all docker workers? The funsize docker worker has the same
> output path for every platform. We can update it to follow repackage's model
> if needed.

You're right, I hadn't checked funsize. They both keept the old layout:
* https://dxr.mozilla.org/mozilla-central/rev/37b95547f0d27565452136d16b2df2857be840f6/taskcluster/docker/funsize-update-generator/Dockerfile#26
* https://dxr.mozilla.org/mozilla-central/rev/37b95547f0d27565452136d16b2df2857be840f6/taskcluster/docker/funsize-balrog-submitter/Dockerfile#24


> > On another note, windows workers don't seem to have the same path than  on
> > macos or Linux
> > https://dxr.mozilla.org/mozilla-central/rev/
> > 37b95547f0d27565452136d16b2df2857be840f6/taskcluster/taskgraph/transforms/
> > repackage.py#263.
> > 
> > These 2 points make me wonder if we should expose
> > generate_task_output_files(build_platform, filenames, locale=None), and call
> > it in taskgraph/transforms/repackage.py with filenames being
> > `(target.complete.mar,)` (and more depending on the platform)
> 
> I've not modified repackage.py's artifacts
> 
I'm sorry, ignore this comment, paths aren't the same anyway. I initially wanted to factorize both repackage.py's and partial.py's under the same logic.


Thanks for the explanation about the need of a signed mar!
Flags: needinfo?(aki)
Within the in-tree taskgraph code, we refer to mozilla-central, etc. as "project", not "branch".  So it should probably either be --project or --balrog-channel

Regarding .iteritems and other py3isms, those can all get mechanically fixed up as we migrate toward py3 (which is underway).  Don't lose sleep over it here.
Attached patch partials5.diffSplinter Review
Final diff, for reference
(In reply to Johan Lorenzo [:jlorenzo] from comment #16)
> (In reply to Johan Lorenzo [:jlorenzo] from comment #15)
> > I might be wrong, but you might be able to wait on simply repackage.
> > If I understand correctly, repackage-signing is about signing the
> > containers (mars on Linux/Mac + installers on Windows). Because partials
> > will change the content of the containers, their signatures should blow
> > up. That's why I think waiting on repackage-signing may be unnecessary.
> > Aki should be able to confirm (if he hadn't already).
> 
> What do you think of the order of the task, Aki? Could we make partials
> depend on repackage?

I believe you got to the right answer -- repackage-signing is required for signed mars.
Pushed by sfraser@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7f5ae5ff1181
Move partial update generation in-tree r=dustin,jlorenzo
Pushed by sfraser@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bf0bcdc7482e
Rename of docker image for partials generation CLOSED TREE
Backed out for adding .orig file and flake lint failures:

https://hg.mozilla.org/integration/mozilla-inbound/rev/0c0a373e16a6351e1309e6d0579ade47bce192b5

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=bf0bcdc7482ea84542f5500735e7f219e417c0b7&filter-resultStatus=exception&filter-resultStatus=usercancel&filter-resultStatus=runnable&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=retry
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=130006536&repo=mozilla-inbound

TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/mach_commands.py:508:1 | expected 2 blank lines, found 1 (E302)
TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/taskgraph/transforms/partials.py:89:100 | line too long (126 > 99 characters) (E501)
TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/taskgraph/transforms/partials_signing.py:72:100 | line too long (115 > 99 characters) (E501)
TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/taskgraph/util/partials.py:6:1 | 'json' imported but unused (F401)
TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/taskgraph/util/partials.py:7:1 | 'os' imported but unused (F401)
TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/taskgraph/util/partials.py:78:100 | line too long (122 > 99 characters) (E501)
TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/taskgraph/util/partials.py:171:100 | line too long (109 > 99 characters) (E501)
Flags: needinfo?(sfraser)
Pushed by sfraser@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/01bf0c29331e
Move partial update generation in-tree r=dustin,jlorenzo
https://hg.mozilla.org/mozilla-central/rev/01bf0c29331e
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Pushed by kwierso@gmail.com:
https://hg.mozilla.org/mozilla-central/rev/b0e86286b173
Bump maxRunTime to 2400 to fix timeouts r=aki a=me
The original patch caused timeouts in the nightly decision task like https://treeherder.mozilla.org/logviewer.html#?job_id=130192769&repo=mozilla-central so the followup bumped the max runtime to hopefully work around that. 

With the followup, the main decision task busted likehttps://treeherder.mozilla.org/logviewer.html#?job_id=130222713&repo=mozilla-central

Both patches backed out in https://hg.mozilla.org/mozilla-central/rev/bda524beac249b64aa36016800502a34073bf35a
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Looks like we need to add something like `timeout=10` to the get call in https://hg.mozilla.org/mozilla-central/rev/01bf0c29331e#l31.93 and retry on that exception.
So the maxRunTime bumping patch somehow introduced a syntax error on line 1: https://hg.mozilla.org/mozilla-central/rev/b0e86286b173#l1.5

I don't know if that caused the busted decision tasks, but it probably didn't help.
I'm currently testing the maxRunTime bumping on date, it appears to be ok there. 

The decision task that hit the run-time limit spent 29m12s doing task submission:

Tasks started at 2017-09-11T22:03:26.087452+00:00
Latest finisher was at 2017-09-11T22:32:38.246626+00:00
Total submission time: 0:29:12.159174
Task REMubS6HSjqMl4VaDG3o7g didn't finish
API call times:
count    4515.000000
mean        1.238061
std         0.304037
min         0.865413
25%         1.070677
50%         1.176302
75%         1.326842
max         5.137197

Log entries match up with the requests Response elapsed method, so I don't think recording the timing is off, there.

Experimenting with different concurrency levels in the python client doesn't appear to modify these values much, even down to single-threaded, indicating it's not the python client that's giving us ~1s call times. Python's own concurrency issues mean we can't parallelise this much further, though. With the current setup this gives us a limit of around 4500 tasks in a graph before the 30 minute maxRunTime is reached. Sadly, even with some filtering I've just added, we need an extra 792 tasks on top of the ~3900 already present, to do partials in-tree.
Flags: needinfo?(sfraser)
Depends on: 1399393
Blocks: 1399440
Blocks: 1187592
Comment on attachment 8908547 [details]
Bug 1342392 Migrate partial update generation in-tree

https://reviewboard.mozilla.org/r/180212/#review185704

In overall lgtm. There are some nits, that you may want to follow up.

::: taskcluster/docker/partial-update-generator/Makefile:2
(Diff revision 1)
>  DOCKERIO_USERNAME =$(error DOCKERIO_USERNAME should be set)
> -IMAGE_NAME = funsize-update-generator
> +IMAGE_NAME = partial-update-generator

Do we need to uplift this patch up to esr52, so we handle the rename in releasetasks without adding branch-specific logic?

::: taskcluster/taskgraph/transforms/partials.py:84
(Diff revision 1)
> +            continue
> +
> +        signing_task = None
> +        for dependency in dependencies.keys():
> +            if 'repackage-signing' in dependency:
> +                signing_task = dependency

Maybe add a `break` statement here and sort the keys in the `for` loop to make this chunk of code reproducable?

::: taskcluster/taskgraph/util/partials.py:90
(Diff revision 1)
> +    else:
> +        params_str = ''
> +    logger.info("Connecting to %s?%s", url, params_str)
> +    for _ in redo.retrier(sleeptime=5, max_sleeptime=30, attempts=10):
> +        try:
> +            req = requests.get(url, verify=verify, params=params)

Would it be better to add a small TCP timeout here? Otherwise requests may hang here without retrying.

::: taskcluster/taskgraph/util/partials.py:156
(Diff revision 1)
> +                        'buildid1': mar_url,
> +                        'buildid2': mar_url,
> +                        'buildid3': mar_url,
> +                    },
> +                    'locale2': {
> +                        'target.partial-1.mar': ('buildid1': 'mar_url'),

locale2's data structure is a bit different than lcaole1. Is it intended?
Attachment #8908547 - Flags: review?(rail) → review+
(In reply to Rail Aliiev [:rail] ⌚️ET from comment #34)
> Comment on attachment 8908547 [details]
> Bug 1342392 Migrate partial update generation in-tree
> 
> https://reviewboard.mozilla.org/r/180212/#review185704
> 
> In overall lgtm. There are some nits, that you may want to follow up.
> 
> ::: taskcluster/docker/partial-update-generator/Makefile:2
> (Diff revision 1)
> >  DOCKERIO_USERNAME =$(error DOCKERIO_USERNAME should be set)
> > -IMAGE_NAME = funsize-update-generator
> > +IMAGE_NAME = partial-update-generator
> 
> Do we need to uplift this patch up to esr52, so we handle the rename in
> releasetasks without adding branch-specific logic?

I've not yet looked at releasetasks, so I'm not sure how it handles this. I do need to talk to you about it, although, since I understand you're moving it in-tree?

> ::: taskcluster/taskgraph/transforms/partials.py:84
> (Diff revision 1)
> > +            continue
> > +
> > +        signing_task = None
> > +        for dependency in dependencies.keys():
> > +            if 'repackage-signing' in dependency:
> > +                signing_task = dependency
> 
> Maybe add a `break` statement here and sort the keys in the `for` loop to
> make this chunk of code reproducable?

There should only ever be one repackage-signing dependency, so while I can see the need if we reorganise the dependency tree, I don't think it's needed at the moment. There are other examples of this code in the task graph generation which we should consider updating if wr mskr yhid vhsnhr.

> ::: taskcluster/taskgraph/util/partials.py:90
> (Diff revision 1)
> > +    else:
> > +        params_str = ''
> > +    logger.info("Connecting to %s?%s", url, params_str)
> > +    for _ in redo.retrier(sleeptime=5, max_sleeptime=30, attempts=10):
> > +        try:
> > +            req = requests.get(url, verify=verify, params=params)
> 
> Would it be better to add a small TCP timeout here? Otherwise requests may
> hang here without retrying.

Good idea.

> ::: taskcluster/taskgraph/util/partials.py:156
> (Diff revision 1)
> > +                        'buildid1': mar_url,
> > +                        'buildid2': mar_url,
> > +                        'buildid3': mar_url,
> > +                    },
> > +                    'locale2': {
> > +                        'target.partial-1.mar': ('buildid1': 'mar_url'),
> 
> locale2's data structure is a bit different than lcaole1. Is it intended?

No, they're both meant to be dict()
Backed out for failing flake8 and py-compat jobs:

https://hg.mozilla.org/integration/autoland/rev/7dbe23e8e1ee626ff46ddfb71b26828344b60cde

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=a43387ee65d7f889aac9b2581f12a6f2512e59b6&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable

Failure log flake8: https://treeherder.mozilla.org/logviewer.html#?job_id=131717711&repo=autoland
 TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/mach_commands.py:508:1 | expected 2 blank lines, found 1 (E302)
TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/taskgraph/transforms/partials_signing.py:72:100 | line too long (115 > 99 characters) (E501)
TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/taskgraph/util/partials.py:6:1 | 'json' imported but unused (F401)
TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/taskgraph/util/partials.py:7:1 | 'os' imported but unused (F401)
TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/taskgraph/util/partials.py:78:100 | line too long (122 > 99 characters) (E501)
TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/taskgraph/util/partials.py:171:100 | line too long (109 > 99 characters) (E501)

Failure log py-compat: https://treeherder.mozilla.org/logviewer.html#?job_id=131717767&repo=autoland
> 
TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/taskcluster/docker/partial-update-generator/scripts/funsize.py:1:1 | Missing from __future__ import absolute_import (require absolute_import)
Flags: needinfo?(sfraser)
Comment on attachment 8908547 [details]
Bug 1342392 Migrate partial update generation in-tree

https://reviewboard.mozilla.org/r/180212/#review185958
Comment on attachment 8908547 [details]
Bug 1342392 Migrate partial update generation in-tree

https://reviewboard.mozilla.org/r/180212/#review185974
Pushed by sfraser@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/029b8bec31c1
Migrate partial update generation in-tree r=rail
https://hg.mozilla.org/mozilla-central/rev/029b8bec31c1
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Hi! Is there a bug for turning the old jobs off? (Since it will resolve bug 1399440)
(In reply to Ed Morley [:emorley] from comment #45)
> Hi! Is there a bug for turning the old jobs off? (Since it will resolve bug
> 1399440)

There isn't a separate bug, but the old jobs won't be submitted any more once things are running in-tree, so it should fold under this one.
Flags: needinfo?(sfraser)
Ah thank you.

I still see jobs being submitted for oak - presume it just needs updating to latest mozilla-central?
(In reply to Ed Morley [:emorley] from comment #47)
> Ah thank you.
> 
> I still see jobs being submitted for oak - presume it just needs updating to
> latest mozilla-central?

Yes - I don't know who's using oak at the moment, and whether it's safe to merge in changes. I'll ask around
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.