Closed Bug 813451 Opened 7 years ago Closed 7 years ago

Updater can get into infinite restart loop trying to apply partially-downloaded update

Categories

(Firefox OS Graveyard :: General, defect, P1, critical)

defect

Tracking

(blocking-basecamp:+, firefox18 fixed, firefox19 fixed)

RESOLVED FIXED
B2G C2 (20nov-10dec)
blocking-basecamp +
Tracking Status
firefox18 --- fixed
firefox19 --- fixed

People

(Reporter: cjones, Assigned: cjones)

References

Details

(Whiteboard: dogfooding-blocker)

Attachments

(1 file)

clee's phone got into a state where the b2g process was continually restarting to apply an update, but not applying the update; it was only partially downloaded.  Then the b2g process would start back up, spin for a few seconds, then restart to apply the partially-downloaded update.  Then ...

I believe that clee was on (or trying to update to) the 11-18 nightly.  Will paste some logcat here and see if I can repro.

clee's phone was completely hosed by this with pretty much no way it could have been recovered.  Some nontrivial AUS-fu was required to bring his phone back from the dead.  If this happens to "testers" (it's not possible to actually dogfood atm) in the field, their phones will be effectively bricked.
Edited logs follows (I have the raw ones available if anyone wants them)

[turn on phone, SIM card inserted and cellular data connection enabled]

E/GeckoConsole(  109): AUS:SVC UpdateService:_downloadUpdate
E/GeckoConsole(  109): AUS:SVC readStatusFile - status: downloading, path: /data/local/updates/0/update.status
E/GeckoConsole(  109): AUS:SVC Downloader:_selectPatch - found existing patch with state: downloading
E/GeckoConsole(  109): AUS:SVC Downloader:_selectPatch - resuming download
E/GeckoConsole(  109): AUS:SVC Downloader:downloadUpdate - downloading from http://update.boot2gecko.org/nightly/b2g_update_2012-11-19_075049.mar?build_id=20121119070826&version=18.0a2 to /data/local/updates/0/update.mar
E/GeckoConsole(  109): AUS:SVC Downloader:onStopRequest - status: 2152398864, current fail: 0, max fail: 10, retryTimeout: 2000
E/GeckoConsole(  109): AUS:SVC Downloader:onStopRequest - offline, register online observer: true
E/GeckoConsole(  109): AUS:SVC Downloader:onStopRequest - setting state to: downloading
E/GeckoConsole(  109): AUS:SVC Downloader:onStopRequest - Registering online observer
E/GeckoConsole(  109): AUS:SVC UpdateService:_registerOnlineObserver - waiting for the network to be online, then forcing another check

[more startup happens]

I/GeckoDump(  109): XXX FIXME : Got a mozContentEvent: force-update-check

[^^^ hmm ... not sure what this is.  Immaterial though.]

[back online, resume update]

E/GeckoConsole(  109): AUS:SVC UpdateService:_offlineStatusChanged - network is online, forcing another background check
E/GeckoConsole(  109): AUS:SVC UpdateService:_attemptResume - _patch.state: downloading
E/GeckoConsole(  109): AUS:SVC Downloader:onProgress - progress: 4801008/49924596
E/GeckoConsole(  109): AUS:SVC Downloader:onProgress - progress: 4809096/49924596
...
E/GeckoConsole(  109): AUS:SVC Downloader:onProgress - progress: 5094872/49924596
I/Gecko   (  109): UpdatePrompt: Timed out waiting for result, restarting
I/Gecko   (  109): UpdatePrompt: Update downloaded, restarting to apply it

[^^^ what in the bloody hell?!?]

E/GeckoConsole(  109): AUS:SVC Downloader:onProgress - progress: 4800000/49924596

[^^^ what the hell?  this makes no sense either.]

E/GeckoConsole(  109): AUS:SVC Downloader:onStopRequest - original URI spec: http://update.boot2gecko.org/nightly/b2g_update_2012-11-19_075049.mar?build_id=20121119070826&version=18.0a2, final URI spec: http://update.boot2gecko.org/nightly/b2g_update_2012-11-19_075049.mar?build_id=20121119070826&version=18.0a2, status: 2147500036
E/GeckoConsole(  109): AUS:SVC Downloader:onStopRequest - status: 2147500036, current fail: 0, max fail: 10, retryTimeout: 2000
E/GeckoConsole(  109): AUS:SVC Downloader:onStopRequest - setting state to: downloading

[b2g process restarts, another comes up]
[and again, ...]

--------------------------------------------------

[power off phone, pull out SIM card to disable cellular data]
[boot back up]

E/GeckoConsole(  109): AUS:SVC UpdateService:_downloadUpdate
E/GeckoConsole(  109): AUS:SVC readStatusFile - status: downloading, path: /data/local/updates/0/update.status
E/GeckoConsole(  109): AUS:SVC Downloader:_selectPatch - found existing patch with state: downloading
E/GeckoConsole(  109): AUS:SVC Downloader:_selectPatch - resuming download
E/GeckoConsole(  109): AUS:SVC Downloader:downloadUpdate - downloading from http://update.boot2gecko.org/nightly/b2g_update_2012-11-19_075049.mar?build_id=20121119070826&version=18.0a2 to /data/local/updates/0/update.mar
E/GeckoConsole(  109): AUS:SVC Downloader:onStopRequest - offline, register online observer: true
E/GeckoConsole(  109): AUS:SVC Downloader:onStopRequest - setting state to: downloading
E/GeckoConsole(  109): AUS:SVC Downloader:onStopRequest - Registering online observer
E/GeckoConsole(  109): AUS:SVC UpdateService:_registerOnlineObserver - waiting for the network to be online, then forcing another check

[other startup stuff]

I/IdleService(  109): ReconfigureTimer: no idle or waiting observers
I/IdleService(  109): Register idle observer 47c44320 for 1 seconds
I/IdleService(  109): Register: adjusting next switch from -1 to 1 seconds
I/IdleService(  109): next timeout 1353393017138020 usec (985 msec from now)
I/IdleService(  109): SetTimerExpiryIfBefore: next timeout 1353393017138020 usec
I/IdleService(  109): reset timer expiry from 5648360249227853 usec to 1353393017148020 usec
I/IdleService(  109): Get idle time: time since reset 15 msec
I/Gecko   (  109): UpdatePrompt: Timed out waiting for result, restarting
I/Gecko   (  109): UpdatePrompt: Update downloaded, restarting to apply it
I/IdleService(  109): Remove idle observer 47c44320 (1 seconds)
I/Gonk    (  109):   no window to draw, bailing

[^^^ again, what in the bloody hell?!?  Apparently off an idle notification this time.]

[b2g process restarts, another comes up]
[and again, ...]
STR
 (1) Set up a gecko build so that an update is available for it.  (I have a patch I can share.)
 (2) Enable wifi
 (3) Choose to download update
 (4) In the middle of the download, disable wifi in system tray
 (5) Reboot

After rebooting, b2g's UpdatePrompt.js somehow gets the idea that there's an update that's been fully downloaded.  It then sets the apply-update timer and when that expires, we get into the restart loop from

I/Gecko   (  662): UpdatePrompt: Timed out waiting for result, restarting
I/Gecko   (  662): UpdatePrompt: Update downloaded, restarting to apply it
What seems to be happening is

I/GeckoDump(  471): XXX FIXME : Got a mozContentEvent: force-update-check
I/Gecko   (  471): UpdatePrompt: Forcing update check
I/Gecko   (  471): UpdatePrompt: Setting gecko.updateStatus: check-complete

What code is sending that?  Why?  I didn't initiate that ...

Then we get to

  forceUpdateCheck: function UP_forceUpdateCheck() {
    log("Forcing update check");

    // If we already have an active update available, don't try to
    // download again, just prompt for install.
    if (Services.um.activeUpdate) {
      this.setUpdateStatus("check-complete");
      this.showApplyPrompt(Services.um.activeUpdate);
      return;
    }

Whups, that doesn't look right.  We're not checking the state of the activeUpdate, which in this case is "downloading".
Reading back over bug 801987, it appears that this code was trying to work around the case where an "Download update?" prompt was shown to the user, but the user said "Later" and then force-checked.  Apparently UpdateService wouldn't notify again about the update it had already notified.

In the current UX, it's not possible to say "Later" to a download-update prompt.  This code looks all kinds of wrong besides that.  So AFAICT we can just remove it.

This code has gotten vastly more complex since the last time I was in here, so please look over this carefully.  I may be missing something.

Confirmed that with this patch the restart loop goes away and going online successfully resumes the download.  However, the download status isn't shown in the system tray after resuming the download.  But that's a separate bug ;).
Assignee: nobody → jones.chris.g
Attachment #683504 - Flags: review?(marshall)
Attachment #683504 - Flags: feedback?(fabrice)
Summary: Updater can get into infinite restart loop trying to apply partially-downloaded update (?) → Updater can get into infinite restart loop trying to apply partially-downloaded update
Comment on attachment 683504 [details] [diff] [review]
Update-available notifications don't get lost anymore, so the code from bug 801987 is no longer needed

Review of attachment 683504 [details] [diff] [review]:
-----------------------------------------------------------------

Wow, what a mess. Thanks for debugging this Chris. It isn't clear to me what code is now triggering a force check on B2G startup -- maybe the new FTE?

One quick note: We should probably make sure that if the update is downloaded, but the user hit later on the "apply" prompt, that we short-circuit back to the "apply prompt" when the force check button is clicked. That is technically different than what this code was doing, so I'd be fine filing a follow up for that.
Attachment #683504 - Flags: review?(marshall) → review+
(In reply to Marshall Culpepper [:marshall_law] from comment #5)
> Comment on attachment 683504 [details] [diff] [review]
> Update-available notifications don't get lost anymore, so the code from bug
> 801987 is no longer needed
> 
> Review of attachment 683504 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Wow, what a mess. Thanks for debugging this Chris. It isn't clear to me what
> code is now triggering a force check on B2G startup -- maybe the new FTE?

Yep...

> 
> One quick note: We should probably make sure that if the update is
> downloaded, but the user hit later on the "apply" prompt, that we
> short-circuit back to the "apply prompt" when the force check button is
> clicked. That is technically different than what this code was doing, so I'd
> be fine filing a follow up for that.

Please CC me on this bug, I may need to do a quick gaia change for this to work properly.
Duplicate of this bug: 813627
(In reply to Marshall Culpepper [:marshall_law] from comment #5)
> Wow, what a mess. Thanks for debugging this Chris. It isn't clear to me what
> code is now triggering a force check on B2G startup -- maybe the new FTE?
> 

That is definitely a bug.

> One quick note: We should probably make sure that if the update is
> downloaded, but the user hit later on the "apply" prompt, that we
> short-circuit back to the "apply prompt" when the force check button is
> clicked. That is technically different than what this code was doing, so I'd
> be fine filing a follow up for that.

s/downloaded/applied/ ?  That would make sense if we also don't get duplicate notifications on force-check.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #8)
> (In reply to Marshall Culpepper [:marshall_law] from comment #5)
> > Wow, what a mess. Thanks for debugging this Chris. It isn't clear to me what
> > code is now triggering a force check on B2G startup -- maybe the new FTE?
> > 
> 
> That is definitely a bug.

Then we should probably back out bug 811244.
Or make it not run on every startup, which is what seems to be happening now.  We were testing here way after first-run.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #10)
> Or make it not run on every startup, which is what seems to be happening
> now.  We were testing here way after first-run.

Oh nervermind then, it's not.
Attachment #683504 - Flags: feedback?(fabrice)
(In reply to Chris Jones [:cjones] [:warhammer] from comment #8)
> (In reply to Marshall Culpepper [:marshall_law] from comment #5)
> > One quick note: We should probably make sure that if the update is
> > downloaded, but the user hit later on the "apply" prompt, that we
> > short-circuit back to the "apply prompt" when the force check button is
> > clicked. That is technically different than what this code was doing, so I'd
> > be fine filing a follow up for that.
> 
> s/downloaded/applied/ ?  That would make sense if we also don't get
> duplicate notifications on force-check.

I double checked and if I reboot while a downloaded update is being applied, there doesn't seem to be a way to trigger a request to re-apply after the reboot.
Marking for C2, given this meets the criteria of known P1/P2 blocking-basecamp+ bugs at the end of C1.
Target Milestone: --- → B2G C2 (20nov-10dec)
https://hg.mozilla.org/mozilla-central/rev/fa74a5dfd0ca
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.