Closed Bug 1361255 Opened 7 years ago Closed 7 years ago

Error stashed as MAR in 54.0b4 for "te" locale in Mac updates makes final verification fail

Categories

(Release Engineering :: Release Automation: Other, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mtabara, Assigned: mtabara)

References

Details

Final update beta verification task[1] failed in 54.0b4 due to size mismatch in "te" locale for Mac updates. At a closer look, seems like we've somehow packages an Internal Error code into a PMAR somewhere along the way, see file at [2]. The size was quite suspicios as there is no way we can get partial MAR files of just 202 bytes size.

Debugging upstream to see where the problem might have occured (funsize, signing, beetmoving, etc).

[1]: https://tools.taskcluster.net/task-inspector/#FlVunno9RIWwbB0kwvzZsQ/
[2]: http://archive.mozilla.org/pub/firefox/releases/54.0b4/update/mac/te/firefox-54.0b3-54.0b4.partial.mar
@nthomas> bucketlister-delivery.prod.mozaws.net agrees, so bad beetmove or generation ?
12:07:09 <@nthomas> oopsie, we got an error and stashed it as the mar file - https://pastebin.mozilla.org/9020492
12:09:51 <@aki> is te new?
12:10:31 <@aki> looks like no
12:12:12 <@nthomas> generation looks ok in http://mozilla-release-logs.s3.amazonaws.com/mozilla-beta/firefox-54.0b4/build1/%5Bfunsize%5D_Update_generating_task_macosx64_chunk_9_for_54.0b3-macosx64-97sIkDioQi6F9D49ajbctw-0
12:12:23 <@nthomas> right at the very end
12:15:15 <@nthomas> artifact on https://tools.taskcluster.net/task-inspector/#97sIkDioQi6F9D49ajbctw/0 is 6414732 bytes
12:16:05 <@nthomas> beetmover log is http://mozilla-release-logs.s3.amazonaws.com/mozilla-beta/firefox-54.0b4/build1/%5Bbeetmover%5D_firefox_mozilla-beta_macosx64_locales_partials_candidates_9_10-macosx64-dTQiLyrRRjqih2WWFjpThw-0
12:17:22 <@nthomas> https://irccloud.mozilla.com/pastebin/DoosrNq0/
12:17:48 <@nthomas> I’m guessing there is an api call which got an error 282 bytes long there, which the code missed
12:20:31 <@nthomas> s/api call/download/, since we need a size and to calculate various hashes
12:23:34 <@nthomas> which beetmover code am I looking at again ? not https://github.com/mozilla-releng/beetmoverscript IIRC
12:23:45 <~mtabara> yep, not that
12:23:51 <~mtabara> that's nightly/fennec only
12:24:11 <~mtabara> https://hg.mozilla.org/mozilla-central/file/tip/testing/mozharness/scripts/release/beet_mover.py
12:24:20 <@nthomas> ah, thanks
12:25:11 <@nthomas> https://irccloud.mozilla.com/pastebin/w76UgM7c/
12:25:46 <@nthomas> ah, I forgot about signing, duh
12:25:51 <~mtabara> the error stashed as mar is pretty scary
12:26:33 <gchang|afk> mtabara: Hi, Is the error going to impact 54.0b4?
12:28:37 <~mtabara> it's only impacting "te" locale for Mac users but I'd hold on from publishing this to beta until we've understood what's happened. otherwise, we'd be serving failed partials to users, if my understanding is right. however, we still have a good chunk of hours until QE signs this off so hopefully we've solved it by then
12:34:34 — ~mtabara files 1361255 to track that
12:36:12 <@nthomas> the size of public/env/firefox-54.0b3-54.0b4.te.mac.partial.mar on https://tools.taskcluster.net/task-inspector/#uqIErLdASHCTdLKOe81CeQ/0 looks right (6414996 bytes)
12:36:53 <~mtabara> since balrog has the right size, it means funsize generation + submission must have worked so beetmover must be the culprit
12:37:03 — ~mtabara reads scrollback again on ntho.mas's findings
12:38:31 <@nthomas> that’s pretty much where I got to
12:39:00 <@nthomas> the last pastebin is from the beetmover log

01:22:58     INFO - Downloading https://queue.taskcluster.net/v1/task/uqIErLdASHCTdLKOe81CeQ/artifacts/public/env/firefox-54.0b3-54.0b4.te.mac.partial.mar to /mozharness/build/firefox-54.0b3-54.0b4.te.mac.partial.mar
01:22:58     INFO - retry: Calling _download_file with args: (), kwargs: {'url': 'https://queue.taskcluster.net/v1/task/uqIErLdASHCTdLKOe81CeQ/artifacts/public/env/firefox-54.0b3-54.0b4.te.mac.partial.mar', 'file_name': '/mozharness/build/firefox-54.0b3-54.0b4.te.mac.partial.mar'}, attempt #1
01:23:08     INFO - Downloaded 282 bytes.

tl;dr - the file in TC is good, but beetmover most likely fails to download it and stashes the error as the MAR.


12:43:55 <@nthomas> https://dxr.mozilla.org/mozilla-beta/source/testing/mozharness/mozharness/base/script.py#699 I guess, and eventually _download_file
12:44:32 <@nthomas> if we got an error without an error for http status code, and no content-length, we could get here
12:44:47 <@nthomas> kinda hard to tell what happened tbh
12:46:53 <@nthomas> the ‘content’ of the mar does look like an Amazon message, with a RequestId and all
So if we ship this, Firefox updates would get: 
https://aus5.mozilla.org/update/3/Firefox/53.0b2/20170427091925/Darwin_x86_64-gcc3-u-i386-x86_64/te/beta-cdntest/default/default/default/update.xml?force=1

Possible solutions:
* ignore - most likely it'd default over to the CMAR
* we could add a rule to block mac te until 53.0b5
* remove files from S3 (both candidates/releases) and rerun beetmover jobs + invalidate CDN caches via a bug filed to CloudOps

Discussing with nthomas what's to be done here.
(In reply to Mihai Tabara [:mtabara]⌚️GMT+8 from comment #2)
> So if we ship this, Firefox updates would get: 
> https://aus5.mozilla.org/update/3/Firefox/53.0b2/20170427091925/
> Darwin_x86_64-gcc3-u-i386-x86_64/te/beta-cdntest/default/default/default/
> update.xml?force=1
> 
> Possible solutions:
> * ignore - most likely it'd default over to the CMAR
> * we could add a rule to block mac te until 53.0b5
> * remove files from S3 (both candidates/releases) and rerun beetmover jobs +
> invalidate CDN caches via a bug filed to CloudOps
> 
> Discussing with nthomas what's to be done here.

Eventually we went on to update https://aus4-admin.mozilla.org/releases#firefox-54.0b4 and delete the partial information specific for mac locale "te". Users in that pool will be offered complete mar instead. Update verify will likely fail but hopefully final update beta verification will work. 

mihaitabara@mozspace:[]~/Downloads$ diff Firefox-54.0b4-build1.json.backup Firefox-54.0b4-build1.json
2577,2581d2576
<                             "filesize": 6414996, 
<                             "from": "Firefox-54.0b3-build1", 
<                             "hashValue": "7f0cd71a8f038e9e8fe0ba36281bdf34f916a5a086c18b838da4157c7e8c7965271bfadb474e2866281e8e75845ab7eb6d899ebd7d44e0cb7eb2b9eecd943725"
<                         }, 
<                         {
Rerun Final verification fails because it expects a partial mar for 54.b3 -> 54.0b4. It's fine, we can ignore this.

https://public-artifacts.taskcluster.net/FlVunno9RIWwbB0kwvzZsQ/6/public/logs/live_backing.log
Note to self: talking to rail about this today, turns out :nthomas was right. The Firefox updater is smart enough to find a good reason to reject the corrupt partial MAR anyway (either signature failing, or size or hash or something else). So if we didn't take action, we would have gotten to the same end result anyway, but on the user side, rather than Balrog tweaking, which we eventually did.
A possible fix is in bug 1361878.
Corrupt PMAR was not nearly as bad as thought initially as most likely Firefox updater will find a reason to reject the partial and default to complete anyway (size, signature, SHA, etc).

Will close this for now, as automation fix to prevent this in the future is tracked under bug 1361878.
Assignee: nobody → mtabara
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
See Also: → 1361878
You need to log in before you can comment on or make changes to this bug.