Closed Bug 1405681 Opened 2 years ago Closed 2 years ago

Create partial updates to migrate eligible 56.0 win32 users to 56.0.1 win64

Categories

(Release Engineering :: Release Requests, enhancement)

enhancement
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlorenzo, Assigned: Callek)

References

Details

(Whiteboard: [releaseduty])

Yesterday on IRC, :catlee suggested to create partial updates for the win64 migration. This shrink the update file by a factor of ~2: 56.0.1 complete updates are 40MB, whereas a special partial is around 26MB.

We haven't done this in beta. Only complete updates were served. The regular procedure to manually generate partial updates[1] has to be tweaked to fit this particular case.

More technical details in the next comment.

[1] https://github.com/mozilla/releasewarrior/blob/cec16df0fca78e9f55e6b1acbf66f595abdf1400/how-tos/manually-generate-partials.md
I got partials generated in https://tools.taskcluster.net/groups/Ug31UPNRTt-bZGsoZEC5dA. 

Here's what changed compare to the doc linked in comment 0.

> 1. build/tools changes
Nothing to do here: 56.0.1 hasn't shipped, running the scripts doesn't change anything.

> 2. Update the balrog blobs and generate taskgraph
Generating the task graph is more complex than it should have been. First, the content of partials.tar.gz needed some tweaks. I modified `to_mar` to be:
> to_mar: "https://queue.taskcluster.net/v1/task/{{ complete_to_mar_task_ids_per_chunk(chunk) }}/artifacts/public/build/firefox-56.0.1.{{ locale }}.win64.complete.mar"

`complete_to_mar_task_ids_per_chunk()` is a hardcoded map of what chunk points to what task ID of 56.0.1-build2. It works because no new locale has been introduced between 56.0 and 56.0.1. Otherwise, some locales may have shifted, making some locales fail.

I also updated the image pointer of funsize-generator. Retrospectively, I should probably update every pointer to the docker image.

```
SHA384_SIGNING_CERT: "release_sha384"
SHA1_SIGNING_CERT: "release_sha1"
``` was needed in funsize_task_template.payload.env

I inserted "[win32 to win64]" to the name of each task def.

The latest version currently lives on buildbot-master85.bb.releng.scl3.mozilla.com:~cltbld/56.0-to-56.0.1-partials.



Here comes the fun part, runme.py in partials.tar.gz uses the old scheduler API! We can use it anymore to submit the graphs. I used Rail's converter/submitter[1] to do so. I basically generated the graph on bm85 to a file. scp'd it onto my machine and ran Rail's command. I needed these fixes[2] to successfully submit a graph. I also needed to add a few more scopes to my TC client [3] (I plan to make a separate one in the future). The expected taskcluster config looks like the example added in [2].


[1] https://github.com/rail/graph2tasks/blob/master/src/graph2tasks/__init__.py
[2] https://github.com/rail/graph2tasks/pull/1
[3] https://tools.taskcluster.net/auth/clients/mozilla-ldap%2Fjlorenzo%40mozilla.com%2Ftctalker
https://tools.taskcluster.net/groups/Ug31UPNRTt-bZGsoZEC5dA probably failed because of outdated beetmover docker image. To fix it:
1. ssh buildbot-master85.bb.releng.scl3.mozilla.com
2. Edit buildbot-master85.bb.releng.scl3.mozilla.com:~cltbld/56.0-to-56.0.1-partials/graph.yml.tmpl with most recent images.
3. python runme.py > taskgraph.yml
4. scp taskgraph.yml onto your local machine
5. graph2tasks --taskcluster-config <(gpg -d tc_config.yml.gpg) taskgraph.yml
With tc_config containing https://github.com/rail/graph2tasks/pull/1/files#diff-4170577e5bb1043d449359812a8136f9. The task client needs to have the scopes: https://tools.taskcluster.net/auth/clients/mozilla-ldap%2Fjlorenzo%40mozilla.com%2Ftctalker
I have Followed the steps in c#2 which left me with an edited graph.yml.tmpl [1]

Went to run it, but the time it took me to generate ==> setup graph2tasks and the token and submit was > 15 min so it failed.

Reran the generation and re-ran and succeeded in submission, with a task group url of [2]

===

That again failed, looks like its due to --product not being specified, so I adjusted the tmpl and ran again. diff is [3] and graph is [4]


[1]
--- ./graph.yml.tmpl.bug1405681comment2	2017-10-04 10:10:22.885971543 -0700
+++ ./graph.yml.tmpl.bug1405681comment2.next	2017-10-04 12:12:13.120465733 -0700
@@ -133,11 +133,14 @@
         workerType: "funsize-balrog"
         provisionerId: "aws-provisioner-v1"
         scopes:
             - docker-worker:feature:balrogVPNProxy
         payload:
-            image: 'rail/funsize-balrog-submitter@sha256:27cb3235e09b1ba196dd5afd74569db88b368086d52c7482bcb98ac6d43c0dfb'
+            image: 
+                path: "public/image.tar.zst"
+                type: "task-image"
+                taskId: "JD0HwmhnR8WA_6tVdrLb4w"
             maxRunTime: 1800
             command:
                 - /runme.sh
 
             artifacts:
@@ -172,12 +175,12 @@
           retries: 5
           payload:
               maxRunTime: 7200
               image:
                   type: task-image
-                  path: public/image.tar
-                  taskId: "Iz7QknewT1K4cXsQwuEM0w"
+                  path: public/image.tar.zst
+                  taskId: "LImyHeTMQgekJ98oIj_tsA"
               command:
                   - /bin/bash
                   - -c
                   - >
                     wget -O mozharness.tar.bz2 https://hg.mozilla.org/{{ repo_path }}/archive/{{ mozharness_changeset }}.tar.bz2/testing/mozharness &&

<==============================================>

[2] https://tools.taskcluster.net/groups/UqWabdssQt2HCublexN6yQ

[3]

--- ./graph.yml.tmpl.bug1405681comment2.next	2017-10-04 12:12:13.120465733 -0700
+++ ./graph.yml.tmpl	2017-10-04 12:13:49.828035931 -0700
@@ -183,11 +183,11 @@
                   - /bin/bash
                   - -c
                   - >
                     wget -O mozharness.tar.bz2 https://hg.mozilla.org/{{ repo_path }}/archive/{{ mozharness_changeset }}.tar.bz2/testing/mozharness &&
                     mkdir mozharness && tar xvfj mozharness.tar.bz2 -C mozharness --strip-components 3 && cd mozharness &&
-                    python scripts/release/beet_mover.py --template configs/beetmover/partials.yml.tmpl --platform {{ platform }} --version {{ to_version }} --partial-version {{ from_version }} --artifact-subdir env {% for l in our_locales %}{{ "--locale {} ".format(l) }}{% endfor %} --taskid {{ stableSlugId('sign_{}_{}'.format(platform, chunk)) }} --build-num build{{ to_build_number }} --bucket {{ beetmover_candidates_bucket }} --no-refresh-antivirus
+                    python scripts/release/beet_mover.py --template configs/beetmover/partials.yml.tmpl --platform {{ platform }} --product {{ product }} --version {{ to_version }} --partial-version {{ from_version }} --artifact-subdir env {% for l in our_locales %}{{ "--locale {} ".format(l) }}{% endfor %} --taskid {{ stableSlugId('sign_{}_{}'.format(platform, chunk)) }} --build-num build{{ to_build_number }} --bucket {{ beetmover_candidates_bucket }} --no-refresh-antivirus
               env:
                   DUMMY_ENV_FOR_ENCRYPT: "fake"
               encryptedEnv:
                   - {{ encrypt_env_var(stableSlugId('beetmove_{}_{}'.format(platform, chunk)), now_ms, now_ms + 24 * 4 * 3600 * 1000, 'AWS_ACCESS_KEY_ID', beetmover_aws_access_key_id) }}
                   - {{ encrypt_env_var(stableSlugId('beetmove_{}_{}'.format(platform, chunk)), now_ms, now_ms + 24 * 4 * 3600 * 1000, 'AWS_SECRET_ACCESS_KEY', beetmover_aws_secret_access_key) }}

<======================================>

[4] https://tools.taskcluster.net/groups/PyP7XgzCQBWPYlz3sFjBbQ
Newest run... https://tools.taskcluster.net/groups/bwEHiJJBTG-WrFwd7l7-Nw

I landed https://hg.mozilla.org/releases/mozilla-release/rev/49568655692e579b5a9575014ebbf15f15d98c60 to help fix the beetmover issue seen in https://tools.taskcluster.net/groups/PyP7XgzCQBWPYlz3sFjBbQ/tasks/7byZZuZoRCm48lpf4qZG0Q/runs/1/artifacts

I also modified the runme.py and the graph template due to two seperate issues, en-US was trying to grab from the l10n task, so I gave it a pointer to its own taskid (from repackage-signing) and then due to en-US being in the locale list the chunking didn't align with the l10n chunking, so I had the graph template append en-US onto the last chunk.

Those diffs:
--- ./runme.py.orig	2017-10-04 17:47:01.773442978 -0700
+++ runme.py	2017-10-04 18:08:13.018831307 -0700
@@ -43,9 +43,9 @@
     7: 'MS4S0i_LTFqaxM2T8rx1hw',
     8: 'd_VhAiurTOer70d_loPBGA',
     9: 'Gub9Dj_CRd2RqDy0GCjlag',
 }
-
+en_us_mar_task_id = 'fRPlsYotQZCSMfICJJJ6wA'
 
 def buildbot2bouncer(platform):
     return bouncer_platform_map.get(platform, platform)
 
@@ -62,9 +62,9 @@
     req.raise_for_status()
     # FIXME: mac is different!
     locales = [
         line.split()[0] for line in req.text.splitlines()
-        if not line.startswith("ja-JP-mac")
+        if not (line.startswith("ja-JP-mac") or line.startswith("en-US"))
     ]
     return locales
 
 
@@ -138,8 +138,9 @@
     "signing_class": "release-signing",
     "build_tools_repo_path": "build/tools",
     "buildbot2bouncer": buildbot2bouncer,
     "complete_to_mar_task_ids_per_chunk": complete_to_mar_task_ids_per_chunk,
+    "complete_en_us_mar_task_id": en_us_mar_task_id,
     "for_sure": True,
 }
 
 graph_repr = template.render(**template_vars)
--- ./graph.yml.tmpl.bug1405681comment2.next2	2017-10-04 17:55:56.123004470 -0700
+++ ./graph.yml.tmpl	2017-10-04 18:53:46.309557130 -0700
@@ -32,8 +32,11 @@
 {% set uv_deps = [] %}
 {% for platform in platforms %}
 {% for chunk in range(1, chunks + 1) %}
 {% set our_locales = chunkify(sorted(locales), chunk, chunks) %}
+{% if chunk == chunks %}
+{% set our_locales = our_locales + ["en-US"] %}
+{% endif %}
   -
     taskId: "{{ stableSlugId('generate_{}_{}'.format(platform, chunk)) }}"
     reruns: 5
     task:
@@ -54,9 +57,13 @@
 {% for locale in our_locales %}
                 -
                     locale: {{ locale }}
                     from_mar: "http://download.mozilla.org/?product={{ product }}-{{ from_version }}-complete&os={{ buildbot2bouncer(platform) }}&lang={{ locale }}"
+{% if locale == "en-US" %}
+                    to_mar: "https://queue.taskcluster.net/v1/task/{{ complete_en_us_mar_task_id }}/artifacts/public/build/target.complete.mar"
+{% else %}
                     to_mar: "https://queue.taskcluster.net/v1/task/{{ complete_to_mar_task_ids_per_chunk(chunk) }}/artifacts/public/build/firefox-56.0.1.{{ locale }}.win64.complete.mar"
+{% endif %}
                     platform: {{ platform }}
                     branch: {{ branch }}
                     previousVersion: "{{ from_version }}"
                     previousBuildNumber: {{ from_build_number }}
@@ -150,8 +157,10 @@
                    expires: "{{ never }}"
 
             env:
                 SIGNING_CERT: "release"
+                SHA384_SIGNING_CERT: "release_sha384"
+                SHA1_SIGNING_CERT: "release_sha1"
                 PARENT_TASK_ARTIFACTS_URL_PREFIX: "https://queue.taskcluster.net/v1/task/{{ stableSlugId('sign_{}_{}'.format(platform, chunk)) }}/artifacts/public/env"
                 BALROG_API_ROOT: "http://balrog/api"
             encryptedEnv:
                 - {{ encrypt_env_var(stableSlugId('balrog_{}_{}'.format(platform, chunk)), now_ms, now_ms + 24 * 4 * 3600 * 1000, "BALROG_USERNAME", balrog_username) }}
@@ -184,9 +193,9 @@
                   - -c
                   - >
                     wget -O mozharness.tar.bz2 https://hg.mozilla.org/{{ repo_path }}/archive/{{ mozharness_changeset }}.tar.bz2/testing/mozharness &&
                     mkdir mozharness && tar xvfj mozharness.tar.bz2 -C mozharness --strip-components 3 && cd mozharness &&
-                    python scripts/release/beet_mover.py --template configs/beetmover/partials.yml.tmpl --platform {{ platform }} --product {{ product }} --version {{ to_version }} --partial-version {{ from_version }} --artifact-subdir env {% for l in our_locales %}{{ "--locale {} ".format(l) }}{% endfor %} --taskid {{ stableSlugId('sign_{}_{}'.format(platform, chunk)) }} --build-num build{{ to_build_number }} --bucket {{ beetmover_candidates_bucket }} --no-refresh-antivirus
+                    python scripts/release/beet_mover.py --template configs/beetmover/win32_to_win64.yml.tmpl --platform {{ platform }} --product {{ product }} --version {{ to_version }} --partial-version {{ from_version }} --artifact-subdir env {% for l in our_locales %}{{ "--locale {} ".format(l) }}{% endfor %} --taskid {{ stableSlugId('sign_{}_{}'.format(platform, chunk)) }} --build-num build{{ to_build_number }} --bucket {{ beetmover_candidates_bucket }} --no-refresh-antivirus
               env:
                   DUMMY_ENV_FOR_ENCRYPT: "fake"
               encryptedEnv:
                   - {{ encrypt_env_var(stableSlugId('beetmove_{}_{}'.format(platform, chunk)), now_ms, now_ms + 24 * 4 * 3600 * 1000, 'AWS_ACCESS_KEY_ID', beetmover_aws_access_key_id) }}
Newest run: https://tools.taskcluster.net/groups/L8w_3QXBRPeYX5NLZXeveg

The last attempt had my fat-finger of "win32_to_win64.yml.tmpl" instead of the correct "win32_to_win64_partials.yml.tmpl" in the graph for beetmover. I fixed that.

The last attempt also ended up happening around a noticed balrog breakage, which was rolled back by :relud, the balrog bustage delayed the start of this newer graph.
status update: callek and Johan through heroic efforts have gotten partials generated and included in release blob under release-cdntest.

side from expected Update Verify failures, last and successful partial gen graph: https://tools.taskcluster.net/groups/QKbjdC5eShWPawW8EEKGSQ


for those with access:
   * https://aus4-admin.mozilla.org/rules/659
   * https://aus4-admin.mozilla.org/releases/Firefox-56.0.1-build2-win64-migration


we are ready for QE testing on release-cdntest.


Still todo:

add 'release' rule similar to 659 for when we push to release channel at 1%
(In reply to Jordan Lund (:jlund) from comment #7)
> final verify ran into errors:
> https://public-artifacts.taskcluster.net/Oz05H2uVSm6ACipXyvKETQ/0/public/
> logs/live_backing.log

errors were expected.

all failures were coming from 56.0. Only 56.0 updates that should work are win32 requests that set flags that allow for migration. we have to rely on QE
I'm going to call this done unless QE comes back and says there are problems.
Assignee: nobody → bugspam.Callek
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Pushed by jlorenzo@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/667f7caf4a32
Use a special beetmover template for win32 to win64. r=nthomas a=release DONTBUILD
Blocks: 1411428
You need to log in before you can comment on or make changes to this bug.