Closed Bug 1179382 Opened 6 years ago Closed 6 years ago

Update from June 30 to July 1 failing

Categories

(Firefox for Android Graveyard :: General, defect)

Unspecified
Android
defect
Not set
normal

Tracking

(firefox41 unaffected, firefox42+ fixed, fennec+)

RESOLVED FIXED
Tracking Status
firefox41 --- unaffected
firefox42 + fixed
fennec + ---

People

(Reporter: kbrosnan, Assigned: mshal)

Details

(Keywords: regression)

Attachments

(1 file)

Install the June 30 nightly and try to manually update to July 1 or later build. 

STR: 
  install ftp://ftp.mozilla.org/pub/mobile/nightly/2015-06-30-03-02-04-mozilla-central-android-api-11/fennec-42.0a1.multi.android-arm.apk
  attempt to update either by the button in about:firefox or in the notification tray fails.
Automatic update fails the same way (and is how this was discovered).  Notification showed up in the tray, and tapping it did nothing except make the notification go away.
[Tracking Requested - why for this release]: updates broken

One day regression range. http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=aad95360a002&tochange=ca0fd580a9ce
tracking-fennec: --- → ?
07-01 11:25:49.415 E/UpdateService(21153): Package hash does not match
07-01 11:25:49.415 E/UpdateService(21153): Not installing update, failed verification
This looks like bug 1174581 again. Did releng change something?
Flags: needinfo?(mshal)
Not that I'm aware of... I'll try to track it down. Just to confirm, this was working at some point after bug 1174581, right? This looks like it has the right URL for the multi package, but the hash is from the en-US package.
Flags: needinfo?(mshal)
(In reply to Michael Shal [:mshal] from comment #5)
> Not that I'm aware of... I'll try to track it down. Just to confirm, this
> was working at some point after bug 1174581, right? This looks like it has
> the right URL for the multi package, but the hash is from the en-US package.

Yeah, it was working again until today.
I think what happened is the output from 'make upload' changed slightly - there is output from build/upload.py to stdout (the "Uploading /builds/..." messages), and separately there is output from post_upload.py to stderr (the "http://..." messages). The output from these is grepped to pull out urls and such, and since the output is redirected to a file, there is buffering involved. It looks like some more files were added to the upload list, which caused the stdout buffer to hit the 4k limit and dump some of its contents earlier than the output parser is expecting:

04:15:10     INFO -  Uploading /builds/slave/m-cen-and-api-11-ntly-00000000/build/src/obj-firefox/dist/fennecsys.argv: ['/usr/local/bin/post_upload.py', ...

The "sys.argv" message starts at character 4096 or so.

In short, grepping the output of "make upload" sucks. I'll try to find a better way.
Keywords: regression
tracking-fennec: ? → +
Assignee: nobody → mshal
I've disabled updates in balrog for now. I'm actively working on this.
Attached patch android-updateSplinter Review
This changes the logic a bit to hopefully work around some of the issues. We still have to rely on the MakeUploadOutputParser to get the final package URL, but we no longer use the output to determine the package filename. Instead, we get make to tell us what the name of the package is, and then look at the output of the checksum manifest to get the hash & file size, so we don't need to calculate those in mozharness.

Here is a staging log:
http://dev-master2.bb.releng.use1.mozilla.com:8064/builders/Android%20armv7%20API%2011%2B%20mozilla-central%20nightly/builds/13/steps/run_script/logs/stdio

I've filed bug 1179962 as a followup, since parsing 'make upload' output is obviously quite fickle.
Attachment #8629069 - Flags: review?(jlund)
Comment on attachment 8629069 [details] [diff] [review]
android-update

Review of attachment 8629069 [details] [diff] [review]:
-----------------------------------------------------------------

looks good. one comment

::: mozharness/mozilla/building/buildbase.py
@@ +1702,5 @@
>          self.run_command_m(upload_cmd,
>                             env=self.query_mach_build_env(multiLocale=False),
>                             cwd=objdir, halt_on_failure=True,
>                             output_parser=parser)
>          for prop in parser.matches:

I wonder if we should be validating some of these props so we know that if there is a balrog submission, the package/hash/sizes are all good. seems wrong that we allow it all to happen and then fail at the end user. This is not your design or fault but maybe something we should address since we keep running into this.

Ben any thoughts? Maybe we beef up the balrog submitter to sanity check properties?
Attachment #8629069 - Flags: review?(jlund) → review+
> Ben any thoughts? Maybe we beef up the balrog submitter to sanity check
> properties?

essentially we are not catching invalid props that cause updates to break.
Flags: needinfo?(bhearsum)
Yeah, this was something we chatted about a week or two ago. The problem of doing that in balrog is that it never actually has the file, just the URL - so it would have to download the file to verify the hash/size. And while that would help catch this class of error, it wouldn't actually verify that whatever we uploaded is actually a valid package or update file (eg: a few weeks ago when I had it uploading geckoview.apk or whatever instead of the real package). I think it'd be awesome if we could have a downstream test that runs, which actually queries balrog, and tries to download and apply the update. And then we could hook it up to try and I could actually test things in a non-insane way :)
Comment on attachment 8629069 [details] [diff] [review]
android-update

https://hg.mozilla.org/build/mozharness/rev/f8135e250428

I just bumped mozharness.json, so we'll see what happens...
Attachment #8629069 - Flags: checkin+
I think the updates from the latest nightly look good as far as the log is concerned:

04:46:41     INFO -  Data sent: {'product': u'Fennec', 'hashFunction': u'sha512', 'alias': 'null', 'schema_version': 4, 'data_version': '2', 'copyTo': '["Fennec-mozilla-central-nightly-latest"]', 'version': u'42.0a1', 'data': '{"buildID": "20150703030216", "platformVersion": "42.0a1", "displayVersion": "42.0a1", "appVersion": "42.0a1", "completes": [{"fileUrl": "http://download.cdn.mozilla.net/pub/mozilla.org/mobile/nightly/2015/07/2015-07-03-03-02-16-mozilla-central-android-api-11/fennec-42.0a1.multi.android-arm.apk", "hashValue": "6a12f38ceb4bcdd638c2667c02d9dc843e353935da5f627de91349d8889eadee735012fa6ff3997bc48ff635319fd16eca2d0be8a178a0421765d912b745664e", "from": "*", "filesize": "43563678"}]}'}

$ wget http://download.cdn.mozilla.net/pub/mozilla.org/mobile/nightly/2015/07/2015-07-03-03-02-16-mozilla-central-android-api-11/fennec-42.0a1.multi.android-arm.apk
$ sha512sum fennec-42.0a1.multi.android-arm.apk 
6a12f38ceb4bcdd638c2667c02d9dc843e353935da5f627de91349d8889eadee735012fa6ff3997bc48ff635319fd16eca2d0be8a178a0421765d912b745664e  fennec-42.0a1.multi.android-arm.apk
$ ls -l fennec-42.0a1.multi.android-arm.apk 
-rw-rw-r-- 1 mshal mshal 43563678 Jul  3 07:46 fennec-42.0a1.multi.android-arm.apk

I've enabled updates on balrog again. Can anyone confirm that it works?
(In reply to Michael Shal [:mshal] from comment #12)
> Yeah, this was something we chatted about a week or two ago. The problem of
> doing that in balrog is that it never actually has the file, just the URL -
> so it would have to download the file to verify the hash/size. And while
> that would help catch this class of error, it wouldn't actually verify that
> whatever we uploaded is actually a valid package or update file (eg: a few
> weeks ago when I had it uploading geckoview.apk or whatever instead of the
> real package). I think it'd be awesome if we could have a downstream test
> that runs, which actually queries balrog, and tries to download and apply
> the update. And then we could hook it up to try and I could actually test
> things in a non-insane way :)

The other potential problem with this is that the files don't always exist in their download location when we submit them. Eg, for releases they don't exist on the CDN until we're closer to shipping. I don't think it makes sense for the Balrog submitter to be doing a check like this -- there's not even any guarantee that you've uploaded the MAR prior to submitting data.

The way we deal with this for releases is by running "update verify" tests against a test channel (that uses the same metadata as the live channel) prior to shipping. These tests simulate doing updates from an older version to the current one, which verifies the update metadata (among other things). We've wanted to do similar for nightlies for a long time, bug 588396. Taskcluster and/or Funsize might make this easier.
Flags: needinfo?(bhearsum)
I confirmed with kbrosnan in IRC that this should be fixed. Feel free to reopen or file a new bug if that's not the case.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.