If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

B2G Updates: Gecko can get wiped by the updater if a FOTA update fails to apply for some reason

RESOLVED FIXED

Status

Firefox OS
General
P1
blocker
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: marshall_law, Assigned: marshall_law)

Tracking

unspecified
ARM
Gonk (Firefox OS)

Firefox Tracking Flags

(blocking-basecamp:+, firefox18 fixed, firefox19 fixed, firefox20 fixed)

Details

Attachments

(5 attachments)

(Assignee)

Description

5 years ago
When the FOTA applier fails (I don't think the reason matters), it looks like gecko and the updater get confused on the next bootup, and try applying the FOTA update as if it were a gecko update. Ω

This causes the updater to wipe the /system/b2g directory, and only leave the wrapped update.zip, which is obviously very bad. This also apparently locks the /system partition, which fails to mount as read-only. This triggers the fail over of the system reboot, in which case the FOTA update is still ready to be applied!

I'm attaching Ben Hearsum's logs from the 11/15 FOTA update that I deduced this from. Basically the flow looks like this:

1) Gecko extracts the FOTA MAR to /sdcard/updates like it's supposed to - see update-1.log
2) Gecko reboots the device into recovery
3) The FOTA application fails because the recovery kernel can't find the sdcard partition for some reason (this seems to be a random bug?) - see recovery-1.log
4) The recovery kernel reboots back to the normal system
5) Somehow gecko tries to apply the old FOTA MAR as if it were an OTA update, wiping anything not in the MAR. Possibly the "isOSUpdate" metadata gets lots somewhere in translation - see update-2.log.
6) /system fails to remount as read-only, and the emergency measure of rebooting the system is triggered - see update-2.log
7) The FOTA update.zip is still there apparently, and applies successfully. This is probably some unagi specific bug, but the source of the failure is (probably) irrelevant for the purposes of this bug. - see recovery-2.log
8) Recovery boots us back into the normal system, but /system/b2g is hosed.
(Assignee)

Comment 1

5 years ago
Created attachment 683767 [details]
update-1.log
(Assignee)

Comment 2

5 years ago
Created attachment 683769 [details]
recovery-1.log
(Assignee)

Updated

5 years ago
Attachment #683769 - Attachment description: recovery-1 → recovery-1.log
(Assignee)

Comment 3

5 years ago
Created attachment 683770 [details]
update-2.log
(Assignee)

Comment 4

5 years ago
Created attachment 683772 [details]
recovery-2.log

Updated

5 years ago
blocking-basecamp: ? → +
(Assignee)

Comment 5

5 years ago
Created attachment 684199 [details] [diff] [review]
don't retry when FOTA application fails - v1
Attachment #684199 - Flags: review?(robert.bugzilla)
Comment on attachment 684199 [details] [diff] [review]
don't retry when FOTA application fails - v1

Hey Marshall, I'm going to be out for Thanksgiving but bbondy will be around so passing the review over to him.

Also, could you help out with getting an answer to bug 787578 comment #9? Thanks!
Attachment #684199 - Flags: review?(robert.bugzilla) → review?(netzen)
Comment on attachment 684199 [details] [diff] [review]
don't retry when FOTA application fails - v1

Review of attachment 684199 [details] [diff] [review]:
-----------------------------------------------------------------

So do you just want the update to be eventually downloaded again?
I'm a bit concerned this could always fail to apply and always keep downloading the update, wasting bandwidth.
Seems to be an improvement overall though.

::: toolkit/mozapps/update/nsUpdateService.js
@@ +1034,5 @@
> +
> +    Cc["@mozilla.org/updates/update-prompt;1"].
> +      createInstance(Ci.nsIUpdatePrompt).
> +      showUpdateError(update);
> +    writeStatusFile(getUpdatesDir(), update.state + ": " + errorCode);

might as well just use STATE_FAILED here instead of update.state

@@ +1045,5 @@
>        update.errorCode == WRITE_ERROR_SHARING_VIOLATION_SIGNALED ||
>        update.errorCode == WRITE_ERROR_SHARING_VIOLATION_NOPROCESSFORPID ||
>        update.errorCode == WRITE_ERROR_SHARING_VIOLATION_NOPID ||
>        update.errorCode == WRITE_ERROR_CALLBACK_APP ||
> +      update.errorCode == FILESYSTEM_MOUNT_READWRITE_ERROR) {

Should we update the showUpdateError similar "if" condition clause to include update.errorCode == FILESYSTEM_MOUNT_READWRITE_ERROR?
Attachment #684199 - Flags: review?(netzen) → review+
(Assignee)

Comment 8

5 years ago
(In reply to Brian R. Bondy [:bbondy] from comment #7)
> So do you just want the update to be eventually downloaded again?
> I'm a bit concerned this could always fail to apply and always keep
> downloading the update, wasting bandwidth.
> Seems to be an improvement overall though.

It isn't clear to me what the best path is when a FOTA update fails to apply, other than to log and send telemetry (heh). Since we do hash verification of the FOTA MAR bits, redownloading may only be useful if there was a legitimate problem with the FOTA itself, and a replacement had been posted.

> @@ +1045,5 @@
> >        update.errorCode == WRITE_ERROR_SHARING_VIOLATION_SIGNALED ||
> >        update.errorCode == WRITE_ERROR_SHARING_VIOLATION_NOPROCESSFORPID ||
> >        update.errorCode == WRITE_ERROR_SHARING_VIOLATION_NOPID ||
> >        update.errorCode == WRITE_ERROR_CALLBACK_APP ||
> > +      update.errorCode == FILESYSTEM_MOUNT_READWRITE_ERROR) {
> 
> Should we update the showUpdateError similar "if" condition clause to
> include update.errorCode == FILESYSTEM_MOUNT_READWRITE_ERROR?

Probably not, this new clause is special because the flow for FOTA update errors are a little different. I wouldn't want to short circuit an OTA update that fails because of a remount failure, when it's possible a reboot might fix the issue..
> Since we do hash verification of the FOTA MAR bits, redownloading may only be useful if there was a legitimate problem with the FOTA itself, and a replacement had been posted.

We should assume that can happen, rare but it may happen eventually.
(In reply to Brian R. Bondy [:bbondy] from comment #9)
> > Since we do hash verification of the FOTA MAR bits, redownloading may only be useful if there was a legitimate problem with the FOTA itself, and a replacement had been posted.
> 
> We should assume that can happen, rare but it may happen eventually.
Agreed. We have also had cases where we have fixed mar files on the backend and with the client redownloading were able to update the client.
(Assignee)

Comment 11

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/c5c72ec4d8c5
https://hg.mozilla.org/mozilla-central/rev/c5c72ec4d8c5
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
https://hg.mozilla.org/releases/mozilla-aurora/rev/5000315b6331
https://hg.mozilla.org/releases/mozilla-beta/rev/ad300a75bcc5
status-firefox18: --- → fixed
status-firefox19: --- → fixed
status-firefox20: --- → fixed
You need to log in before you can comment on or make changes to this bug.