Closed Bug 791836 Opened 13 years ago Closed 13 years ago

[OTA update] Updates can just stall without uninterrupted reasons

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-basecamp:+)

RESOLVED DUPLICATE of bug 791829
B2G C2 (20nov-10dec)
blocking-basecamp +

People

(Reporter: tchung, Assigned: marshall)

References

Details

(Whiteboard: [ota update])

Attachments

(2 files)

This bug precludes bug 791829. Updates can just stall, and never recover. This bug is to track why the updates is stalling in the first place. Steps to reproduce: 1. load the 2012-09-16 EU build onto an otoro phone 2. force an OTA update using ttps://gist.github.com/7acb26790bf3dd0bd71e 3. wait ~2 minutes and the phone will begin downloading the update 4. Verify you may be on a slow network, or screen timeout, etc.. but even though i'm connected to Wifi, the updater will just stop and AUS:SVC pings no longer happens. ** Note: i did this two times: the first time it stalled around ~45% status marker. The second, ~20% status marker. Expected: Updater should never stall for any uninterrupted reason Actual: updater will happily be updating, AUS:SVC is logging correctly, but then it just stalls.
blocking-basecamp: --- → ?
blocking-basecamp: ? → +
Assignee: nobody → marshall
How long are you waiting in step (4)? This could be us going into low-power sleep and throttling/turning off wifi. Would you rather the update stall, or come back and find a dead battery? :/ That's what this is going to reduce to in general. With throttled downloads, we'll schedule timers to download chunks of data. As long as the chunks are small enough, it makes sense to hold a wifi lock while they download. But for 10s of MB downloads, on a connection with unknown bandwidth, holding a wifi lock is really going to hurt. This merits more investigation, though.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #1) > How long are you waiting in step (4)? the first time, i waited 15 minutes before trying again. the second time, an hour? (i went to a meeting and came back.) Both times, i had to force stop because there was no automatic recovery at all. > > This could be us going into low-power sleep and throttling/turning off wifi. > Would you rather the update stall, or come back and find a dead battery? :/ > That's what this is going to reduce to in general. > > With throttled downloads, we'll schedule timers to download chunks of data. > As long as the chunks are small enough, it makes sense to hold a wifi lock > while they download. But for 10s of MB downloads, on a connection with > unknown bandwidth, holding a wifi lock is really going to hurt. > > This merits more investigation, though.
Wait, I'm not sure I understand what's stalling here. I thought it was the update download itself. Is it the apply process? Are things stalling after you press the "Restart" button to relaunch b2g with the updated bits?
Also, what does "force stop" mean in comment 2?
Depends on: 799482
Attached file logcat
(In reply to Chris Jones [:cjones] [:warhammer] from comment #3) > Wait, I'm not sure I understand what's stalling here. I thought it was the > update download itself. Is it the apply process? Are things stalling after > you press the "Restart" button to relaunch b2g with the updated bits? Sorry for the latest response, but i needed to find a way to easily reproduce this. I've found that trying to update against Mozilla guest SSID in MV, is flaky wifi galore. Attached is a logcat. Things to look for: * The last successful AUS progress is at: 10-10 14:10:31.833: E/GeckoConsole(4048): AUS:SVC Downloader:onProgress - progress: 26002331/42924805 * Following that log, there was 10+ minutes of idleness. no AUS message saying it failed or anything else. * My otoro notification has the progress meter stuck, and wont progress (as expected) ** Screenshot of the stuck progress meter: http://i.imgur.com/i5MrZ.png I've been able to reproduce this twice today. This is on a 10-09-2012 otoro daily build.
Hrm, the log is interesting. It looks like after the last onProgress message, the wifi connection is reconnecting twice: 10-10 14:10:31.833: E/GeckoConsole(4048): AUS:SVC Downloader:onProgress - progress: 26002331/42924805 ... 10-10 14:15:02.237: I/wpa_supplicant(4113): wlan0: Associated with 00:1a:1e:66:2c:62 10-10 14:15:02.237: I/wpa_supplicant(4113): wlan0: CTRL-EVENT-CONNECTED - Connection to 00:1a:1e:66:2c:62 completed (reauth) [id=0 id_str=] 10-10 14:15:04.710: I/wpa_supplicant(4113): wlan0: Associated with 00:1a:1e:15:3b:02 10-10 14:15:04.710: I/wpa_supplicant(4113): wlan0: CTRL-EVENT-CONNECTED - Connection to 00:1a:1e:15:3b:02 completed (reauth) [id=0 id_str=] I'm wondering if a new socket would need to be established at this point. It's really odd that the XHR used by AUS doesn't fire a timeout error here, though..
Bumping up the priority and severity of this until we can rule it out as a dogfooding blocker. Same as with bug 799482, does being able to force updates (bug 798948) allow us to get past this?
Severity: normal → critical
Priority: -- → P1
Whiteboard: [ota update] → [ota update][dogfooding-blocker]
What is the status here. This is a critical dogfooding blocker. Please post an ETA to fix this.
tchung, we need to figure out whether this is a dogfooding blocker. It's a blocker if the stalled downloads get the device into an unrecoverable state. In comment 5, what happens to the device after the stalled download progress? Does using the force-update resume the download?
(In reply to Chris Jones [:cjones] [:warhammer] from comment #9) > tchung, we need to figure out whether this is a dogfooding blocker. It's a > blocker if the stalled downloads get the device into an unrecoverable state. > > In comment 5, what happens to the device after the stalled download > progress? Does using the force-update resume the download? i wasnt able to reproduce the self-wifi interruption this time. Instead, i killed wpa_supplicant, saw download fail, then re-enabled wifi and force checked for updates. Updates started up again as a full mar download. Logcat: https://gist.github.com/ca1e7786ff0f3597c97c
I was able to interrupt my wifi connection, but whenever I did it resulted in socket timeouts. Force-checking for an update worked every time when the socket was interrupted. I would not hold dogfooding for this. The workaround is to interrupt network connection, reboot, or wait for b2g to crash, then force an update check.
Keywords: steps-wanted
Whiteboard: [ota update][dogfooding-blocker] → [ota update]
The updates in themselves are really wonky on the otoro. I tried updating and I've seen the bar at full and the indicator showing it's still downloading, I've retried and seen it at 0 ... it seems all over the place. It's hard to tell when it's done or not due to the slow transition in the animation or what it's really doing. Is there any other refined method/logging that we can have other than what's already in logcat?
QA Contact: nhirata.bugzilla
Attached file logcat of new str
## Environment : Unagi phone, build 20121031073004 2012-10-31 10:23:10 ## Repro : 1. pull down notification 2. select update notification 3. select update to update 4. let device time out Expected: 1) the screen timeout will not affect the download of the update or 2) the update will continue after the device is woken Actual: download stops and never continues
Marking for C2, given this meets the criteria of known P1/P2 blocking-basecamp+ bugs at the end of C1.
Target Milestone: --- → B2G C2 (20nov-10dec)
Is this still a problem? There's been no activity here for nearly a month.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: