Closed Bug 813778 Opened 12 years ago Closed 12 years ago

B2G Updates: Gecko can get wiped by the updater if a FOTA update fails to apply for some reason

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-basecamp:+, firefox18 fixed, firefox19 fixed, firefox20 fixed)

RESOLVED FIXED
blocking-basecamp +
Tracking Status
firefox18 --- fixed
firefox19 --- fixed
firefox20 --- fixed

People

(Reporter: marshall, Assigned: marshall)

Details

Attachments

(5 files)

When the FOTA applier fails (I don't think the reason matters), it looks like gecko and the updater get confused on the next bootup, and try applying the FOTA update as if it were a gecko update. Ω

This causes the updater to wipe the /system/b2g directory, and only leave the wrapped update.zip, which is obviously very bad. This also apparently locks the /system partition, which fails to mount as read-only. This triggers the fail over of the system reboot, in which case the FOTA update is still ready to be applied!

I'm attaching Ben Hearsum's logs from the 11/15 FOTA update that I deduced this from. Basically the flow looks like this:

1) Gecko extracts the FOTA MAR to /sdcard/updates like it's supposed to - see update-1.log
2) Gecko reboots the device into recovery
3) The FOTA application fails because the recovery kernel can't find the sdcard partition for some reason (this seems to be a random bug?) - see recovery-1.log
4) The recovery kernel reboots back to the normal system
5) Somehow gecko tries to apply the old FOTA MAR as if it were an OTA update, wiping anything not in the MAR. Possibly the "isOSUpdate" metadata gets lots somewhere in translation - see update-2.log.
6) /system fails to remount as read-only, and the emergency measure of rebooting the system is triggered - see update-2.log
7) The FOTA update.zip is still there apparently, and applies successfully. This is probably some unagi specific bug, but the source of the failure is (probably) irrelevant for the purposes of this bug. - see recovery-2.log
8) Recovery boots us back into the normal system, but /system/b2g is hosed.
Attached file update-1.log
Attached file recovery-1.log
Attachment #683769 - Attachment description: recovery-1 → recovery-1.log
Attached file update-2.log
Attached file recovery-2.log
blocking-basecamp: ? → +
Attachment #684199 - Flags: review?(robert.bugzilla)
Comment on attachment 684199 [details] [diff] [review]
don't retry when FOTA application fails - v1

Hey Marshall, I'm going to be out for Thanksgiving but bbondy will be around so passing the review over to him.

Also, could you help out with getting an answer to bug 787578 comment #9? Thanks!
Attachment #684199 - Flags: review?(robert.bugzilla) → review?(netzen)
Comment on attachment 684199 [details] [diff] [review]
don't retry when FOTA application fails - v1

Review of attachment 684199 [details] [diff] [review]:
-----------------------------------------------------------------

So do you just want the update to be eventually downloaded again?
I'm a bit concerned this could always fail to apply and always keep downloading the update, wasting bandwidth.
Seems to be an improvement overall though.

::: toolkit/mozapps/update/nsUpdateService.js
@@ +1034,5 @@
> +
> +    Cc["@mozilla.org/updates/update-prompt;1"].
> +      createInstance(Ci.nsIUpdatePrompt).
> +      showUpdateError(update);
> +    writeStatusFile(getUpdatesDir(), update.state + ": " + errorCode);

might as well just use STATE_FAILED here instead of update.state

@@ +1045,5 @@
>        update.errorCode == WRITE_ERROR_SHARING_VIOLATION_SIGNALED ||
>        update.errorCode == WRITE_ERROR_SHARING_VIOLATION_NOPROCESSFORPID ||
>        update.errorCode == WRITE_ERROR_SHARING_VIOLATION_NOPID ||
>        update.errorCode == WRITE_ERROR_CALLBACK_APP ||
> +      update.errorCode == FILESYSTEM_MOUNT_READWRITE_ERROR) {

Should we update the showUpdateError similar "if" condition clause to include update.errorCode == FILESYSTEM_MOUNT_READWRITE_ERROR?
Attachment #684199 - Flags: review?(netzen) → review+
(In reply to Brian R. Bondy [:bbondy] from comment #7)
> So do you just want the update to be eventually downloaded again?
> I'm a bit concerned this could always fail to apply and always keep
> downloading the update, wasting bandwidth.
> Seems to be an improvement overall though.

It isn't clear to me what the best path is when a FOTA update fails to apply, other than to log and send telemetry (heh). Since we do hash verification of the FOTA MAR bits, redownloading may only be useful if there was a legitimate problem with the FOTA itself, and a replacement had been posted.

> @@ +1045,5 @@
> >        update.errorCode == WRITE_ERROR_SHARING_VIOLATION_SIGNALED ||
> >        update.errorCode == WRITE_ERROR_SHARING_VIOLATION_NOPROCESSFORPID ||
> >        update.errorCode == WRITE_ERROR_SHARING_VIOLATION_NOPID ||
> >        update.errorCode == WRITE_ERROR_CALLBACK_APP ||
> > +      update.errorCode == FILESYSTEM_MOUNT_READWRITE_ERROR) {
> 
> Should we update the showUpdateError similar "if" condition clause to
> include update.errorCode == FILESYSTEM_MOUNT_READWRITE_ERROR?

Probably not, this new clause is special because the flow for FOTA update errors are a little different. I wouldn't want to short circuit an OTA update that fails because of a remount failure, when it's possible a reboot might fix the issue..
> Since we do hash verification of the FOTA MAR bits, redownloading may only be useful if there was a legitimate problem with the FOTA itself, and a replacement had been posted.

We should assume that can happen, rare but it may happen eventually.
(In reply to Brian R. Bondy [:bbondy] from comment #9)
> > Since we do hash verification of the FOTA MAR bits, redownloading may only be useful if there was a legitimate problem with the FOTA itself, and a replacement had been posted.
> 
> We should assume that can happen, rare but it may happen eventually.
Agreed. We have also had cases where we have fixed mar files on the backend and with the client redownloading were able to update the client.
https://hg.mozilla.org/mozilla-central/rev/c5c72ec4d8c5
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: