[OTA] Unable to reuse Update object to trigger OTA download again after severe network error in previous download

RESOLVED FIXED in Firefox 25

Status

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: whimboo, Assigned: schien)

Tracking

({regression})

unspecified
1.1 QE5
ARM
Gonk (Firefox OS)
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:leo+, firefox23 wontfix, firefox24 wontfix, firefox25 fixed, b2g18 verified, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 wontfix, b2g-v1.1hd fixed)

Details

(Whiteboard: [apps watch list])

Attachments

(2 attachments, 2 obsolete attachments)

+++ This bug was initially created as a clone of Bug #880737 +++

The issue as reported on bug 880737 is still not fixed for me and I can constantly see it with one of my local wifi networks and:

Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/39607fd11f6b
Gaia   d336288e6cda1a8f974ceea41b9d7860c2367d1f
BuildID 20130706230209
Version 18.0

When you download an update and the network screws up and you are getting disconnected, the download stops as expected, but you are not able to continue. You will have to restart the phone to make it working again.
Adding qawanted as jason's working to reliably reproduce the issue on a Leo device before we block on it.
Keywords: qawanted
As filed, I can't reproduce the bug indicated here on Mozilla Guest. I can reproduce an issue that if you end up having an instable/no network present during start/during download that the download gets hung and doesn't gracefully get killed. But if the network becomes reliable during start of download, then the OTA update will download resources. A restart of the phone is not required here - only a reliable connection.
Keywords: qawanted
QA Contact: jsmith
I filed bug 891979 for the issue mentioned in comment 2 found though.
Posted file adb log (obsolete) —
this is the logcat output from adb which shows that the download stalls by around 1.7MB and does not continue. After a while it get stopped. Retriggering the download process does nothing. As you can see there is no gecko updater output at all.
If you still have the unstable network, then this is expected behavior. See https://bugzilla.mozilla.org/show_bug.cgi?id=891979#c1 that explains this in detail.

The use case that would be a bug is if this STR:

1. Start downloading an OTA update
2. Enter a bad/no network state
3. Stop the download
4. Enter a good network state
5. Start downloading an OTA update again

Causes the download to stall at 0.00 bytes. If you try to doing step #5 on a bad network you will get the same behavior as what happens after #2 eventually in alignment with https://bugzilla.mozilla.org/show_bug.cgi?id=891979#c1, which is expected behavior. Testing done on comment 2 reveals that this use case actually does work as expected, so I do not understand what this bug is about. Are you reproducing this bug with bad networks in both cases? Only in the first case?
It doesn't matter in which network I am after step 2, the download will never start when I manually stop and restart it. No process is shown in the adb log and in the notification area it stays at 0.00 bytes.
(In reply to Henrik Skupin (:whimboo) from comment #6)
> It doesn't matter in which network I am after step 2, the download will
> never start when I manually stop and restart it. No process is shown in the
> adb log and in the notification area it stays at 0.00 bytes.

I can't reproduce that behavior. That works fine for me on a 7/10 build.
Looking at the logs, it appears that the network starts to fail, and I see the watchdog timer go off.

I/Gecko   (  107): UpdatePrompt: Download watchdog fired

A bit later, it retries:

I/Gecko   (  107): UpdatePrompt: Download - restarting download - attempt 1

Then a bit later it gets a more severe failure:

I/Gecko   (  107): *** AUS:SVC Downloader:onStopRequest - status: 2152398878, current fail: 0, max fail: 20, retryTimeout: 30000

I/Gecko   (  107): *** AUS:SVC getStatusTextFromCode - transfer error: Update server not found (check your internet connection), code: 2152398878

and then the user should have been notified of a failure (this only flashes up for a couple of seconds):

I/Gecko   (  107): UpdatePrompt: Update error, state: download-failed, errorCode: 0
I/Gecko   (  107): UpdatePrompt: Setting gecko.updateStatus: Update server not found (check your internet connection)

Now things get slightly confusing because I see another download started sometime after this (unfortunately, the logcat doesn't have timestamps - tip use adb logcat -v threadtime).

I suspect that this additional download may be the one thats stuck at 0 bytes.
Posted file adb log
Ok, so here another adb log with timestamps included and filtered by 'Gecko'. What you can see here is the following:

1. I restarted the phone so we have fresh log data
2. I started the download of the update
3. The download stalled after about 11MB and didn't continue
4. I think two retries happened, which were also unsuccessful and finally the download got stopped
5. I tried to re-download the update but it doesn't even start and keeps showing 0.00 bytes
6. I turned off wifi and tried to re-download the update after accepting the traffic warning
7. The download still keeps failing
8. I started Firefox and loaded http://www.google.de, which was working fine
9. I tried again to download the update via the mobile connection but it still failed

Not sure what those CTRL-EVENT-BSS-REMOVED messages from the wifi component are but those were most likely stalling the download. And after that we are no longer able to continue or to restart the download until a system reboot.
Attachment #773508 - Attachment is obsolete: true
Here is my STR:
1. change the value "app.update.url.override" in prefs.js to the update.xml on my server. (I use http://people.mozilla.org/~schien/update.xml )
2. reboot phone and start download the update.
3. While downloading the patch, rename the mar file on server. Error prompt will show on the screen.
4. rename the mar file back to its original name.
5. Try triggering download on device.

The key point to reproduce this bug is to enter “non-verification failure". see http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#4073
The selected patch is not clean up after non-verification failure, therefore, Downloader will unable to handle a selected patch with unknown state. see http://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#l3678
Assignee: nobody → schien
Attachment #773897 - Flags: review?(netzen)
Great observation. Lets hope this is the last issue I have with the OTA updater. Would you mind to update the summary so it better reflects what's broken here? You might find a better wording than I would be able to.
Status: NEW → ASSIGNED
Comment on attachment 773897 [details] [diff] [review]
remove selected patch from update request

Review of attachment 773897 [details] [diff] [review]:
-----------------------------------------------------------------

I'd rather rstrong take this one because I'm not sure of any side effects
Attachment #773897 - Flags: review?(netzen) → review?(robert.bugzilla)
Comment on attachment 773897 [details] [diff] [review]
remove selected patch from update request

I should be able to get to this by no later than Tuesday
Blocking given its an update issue, Henrik can you please help verify this once this lands as you are the only one who was reliably able to reproduce this :)
blocking-b2g: leo? → leo+
(In reply to bhavana bajaj [:bajaj] from comment #16)
> Blocking given its an update issue, Henrik can you please help verify this
> once this lands as you are the only one who was reliably able to reproduce
> this :)

That's not a question. Once it has been landed on b2g18 I will test the new behavior in that network environment.
(In reply to Shih-Chiang Chien [:schien] from comment #10)
> Here is my STR:
> 1. change the value "app.update.url.override" in prefs.js to the update.xml
> on my server. (I use http://people.mozilla.org/~schien/update.xml )
> 2. reboot phone and start download the update.
> 3. While downloading the patch, rename the mar file on server. Error prompt
> will show on the screen.
> 4. rename the mar file back to its original name.
> 5. Try triggering download on device.
> 
> The key point to reproduce this bug is to enter “non-verification failure".
> see
> http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/
> nsUpdateService.js#4073

Can I get updated links to the code from this comment and comment #11?

It seems like it should reach
http://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#l4408

Where this._update is set to null.

Are you able to reproduce this on desktop as well?
The Update object is still hold by Gaia and will be passed into nsUpdateService.downloadUpdate() while manual retry. So setting null to this._update won't help reset the state of Update object.

Here is the code ref for comment #11:
http://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#l3927

I don't think we can reproduce this issue on desktop because I don't see we store Update object after severe network error.
Robert, given this issue may be more specific and applicable to B2G given comment #20, Triage would be willing to take an uplift here if we can get a risk evaluation from your side and if Henrik can confirm the patch fixes the issue for him.
With the info provided in comment #20 I should be have time to finish the review and risk evaluation by tomorrow.
Needinfo'ing Robert to help with comment #22 before we make a blocking call on this.
Flags: needinfo?(robert.bugzilla)
Summary: [OTA] If the download of an update is stopped due to an instable network it cannot be continued until the device gets restarted → [OTA] Unable to reuse Update object to trigger OTA download again after severe network error in previous download
Comment on attachment 773897 [details] [diff] [review]
remove selected patch from update request

I spent a bit too much time trying to reproduce without success regretfully. Sorry about that.

I'm ok with this for Gaia though it would be better to fix this in Gaia itself. We do use the updates.xml file to determine which patch was selected when it failed so please #ifdef MOZ_WIDGET_GONK and include a comment inside the ifdef that a reference to the update object is being held by B2G with a reference to this bug.

Bhavana, the patch is fairly safe though with this being app update manual testing would be a very good thing.
Attachment #773897 - Flags: review?(robert.bugzilla) → review+
update according to review comment, carry r+.
Attachment #773897 - Attachment is obsolete: true
Attachment #780743 - Flags: review+
Component: Gaia::System → General
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/1027708ce5df
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Triage - Partners would like to take this given the user impact described in comment 0.
Comment 24 suggests a safe patch.
blocking-b2g: leo? → leo+
That works fine now in the environment I have spotted the problem. Thanks for the fix!
You need to log in before you can comment on or make changes to this bug.