Closed Bug 791836 Opened 13 years ago Closed 13 years ago

[OTA update] Updates can just stall without uninterrupted reasons

Tracking

(blocking-basecamp:+)

Status:

RESOLVED DUPLICATE of bug 791829

Milestone:

B2G C2 (20nov-10dec)

Project Flags:

blocking-basecamp

People

(Reporter: tchung, Assigned: marshall)

References

Details

(Whiteboard: [ota update])

Attachments

(2 files)

logcat 13 years ago Tony Chung [:tchung] 569.79 KB, text/plain		Details
logcat of new str 13 years ago Naoki Hirata :nhirata (please use needinfo instead of cc) 380.55 KB, text/plain		Details

Tony Chung [:tchung]

Reporter

Description

•

13 years ago

This bug precludes bug 791829. Updates can just stall, and never recover. This bug is to track why the updates is stalling in the first place. Steps to reproduce: 1. load the 2012-09-16 EU build onto an otoro phone 2. force an OTA update using ttps://gist.github.com/7acb26790bf3dd0bd71e 3. wait ~2 minutes and the phone will begin downloading the update 4. Verify you may be on a slow network, or screen timeout, etc.. but even though i'm connected to Wifi, the updater will just stop and AUS:SVC pings no longer happens. ** Note: i did this two times: the first time it stalled around ~45% status marker. The second, ~20% status marker. Expected: Updater should never stall for any uninterrupted reason Actual: updater will happily be updating, AUS:SVC is logging correctly, but then it just stalls.

Marshall Culpepper [:marshall_law]

Assignee

Updated

•

13 years ago

blocking-basecamp: --- → ?

Alex Keybl [:akeybl]

Updated

•

13 years ago

blocking-basecamp: ? → +

Alex Keybl [:akeybl]

Updated

•

13 years ago

Assignee: nobody → marshall

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 1

•

13 years ago

How long are you waiting in step (4)? This could be us going into low-power sleep and throttling/turning off wifi. Would you rather the update stall, or come back and find a dead battery? :/ That's what this is going to reduce to in general. With throttled downloads, we'll schedule timers to download chunks of data. As long as the chunks are small enough, it makes sense to hold a wifi lock while they download. But for 10s of MB downloads, on a connection with unknown bandwidth, holding a wifi lock is really going to hurt. This merits more investigation, though.

Tony Chung [:tchung]

Reporter

Comment 2

•

13 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #1) > How long are you waiting in step (4)? the first time, i waited 15 minutes before trying again. the second time, an hour? (i went to a meeting and came back.) Both times, i had to force stop because there was no automatic recovery at all. > > This could be us going into low-power sleep and throttling/turning off wifi. > Would you rather the update stall, or come back and find a dead battery? :/ > That's what this is going to reduce to in general. > > With throttled downloads, we'll schedule timers to download chunks of data. > As long as the chunks are small enough, it makes sense to hold a wifi lock > while they download. But for 10s of MB downloads, on a connection with > unknown bandwidth, holding a wifi lock is really going to hurt. > > This merits more investigation, though.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 3

•

13 years ago

Wait, I'm not sure I understand what's stalling here. I thought it was the update download itself. Is it the apply process? Are things stalling after you press the "Restart" button to relaunch b2g with the updated bits?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 4

•

13 years ago

Also, what does "force stop" mean in comment 2?

Tony Chung [:tchung]

Reporter

Updated

•

13 years ago

Depends on: 799482

Tony Chung [:tchung]

Reporter

Comment 5

•

13 years ago

Attached file logcat — Details

(In reply to Chris Jones [:cjones] [:warhammer] from comment #3) > Wait, I'm not sure I understand what's stalling here. I thought it was the > update download itself. Is it the apply process? Are things stalling after > you press the "Restart" button to relaunch b2g with the updated bits? Sorry for the latest response, but i needed to find a way to easily reproduce this. I've found that trying to update against Mozilla guest SSID in MV, is flaky wifi galore. Attached is a logcat. Things to look for: * The last successful AUS progress is at: 10-10 14:10:31.833: E/GeckoConsole(4048): AUS:SVC Downloader:onProgress - progress: 26002331/42924805 * Following that log, there was 10+ minutes of idleness. no AUS message saying it failed or anything else. * My otoro notification has the progress meter stuck, and wont progress (as expected) ** Screenshot of the stuck progress meter: http://i.imgur.com/i5MrZ.png I've been able to reproduce this twice today. This is on a 10-09-2012 otoro daily build.

Marshall Culpepper [:marshall_law]

Assignee

Comment 6

•

13 years ago

Hrm, the log is interesting. It looks like after the last onProgress message, the wifi connection is reconnecting twice: 10-10 14:10:31.833: E/GeckoConsole(4048): AUS:SVC Downloader:onProgress - progress: 26002331/42924805 ... 10-10 14:15:02.237: I/wpa_supplicant(4113): wlan0: Associated with 00:1a:1e:66:2c:62 10-10 14:15:02.237: I/wpa_supplicant(4113): wlan0: CTRL-EVENT-CONNECTED - Connection to 00:1a:1e:66:2c:62 completed (reauth) [id=0 id_str=] 10-10 14:15:04.710: I/wpa_supplicant(4113): wlan0: Associated with 00:1a:1e:15:3b:02 10-10 14:15:04.710: I/wpa_supplicant(4113): wlan0: CTRL-EVENT-CONNECTED - Connection to 00:1a:1e:15:3b:02 completed (reauth) [id=0 id_str=] I'm wondering if a new socket would need to be established at this point. It's really odd that the XHR used by AUS doesn't fire a timeout error here, though..

Lukas Blakk [:lsblakk] use ?needinfo

Comment 7

•

13 years ago

Bumping up the priority and severity of this until we can rule it out as a dogfooding blocker. Same as with bug 799482, does being able to force updates (bug 798948) allow us to get past this?

Severity: normal → critical

Priority: -- → P1

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

13 years ago

Whiteboard: [ota update] → [ota update][dogfooding-blocker]

Andreas Gal :gal

Comment 8

•

13 years ago

What is the status here. This is a critical dogfooding blocker. Please post an ETA to fix this.

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

13 years ago

Keywords: qawanted

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 9

•

13 years ago

tchung, we need to figure out whether this is a dogfooding blocker. It's a blocker if the stalled downloads get the device into an unrecoverable state. In comment 5, what happens to the device after the stalled download progress? Does using the force-update resume the download?

Tony Chung [:tchung]

Reporter

Comment 10

•

13 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #9) > tchung, we need to figure out whether this is a dogfooding blocker. It's a > blocker if the stalled downloads get the device into an unrecoverable state. > > In comment 5, what happens to the device after the stalled download > progress? Does using the force-update resume the download? i wasnt able to reproduce the self-wifi interruption this time. Instead, i killed wpa_supplicant, saw download fail, then re-enabled wifi and force checked for updates. Updates started up again as a full mar download. Logcat: https://gist.github.com/ca1e7786ff0f3597c97c

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 11

•

13 years ago

I was able to interrupt my wifi connection, but whenever I did it resulted in socket timeouts. Force-checking for an update worked every time when the socket was interrupted. I would not hold dogfooding for this. The workaround is to interrupt network connection, reboot, or wait for b2g to crash, then force an update check.

Keywords: steps-wanted

Whiteboard: [ota update][dogfooding-blocker] → [ota update]

Naoki Hirata :nhirata (please use needinfo instead of cc)

Comment 12

•

13 years ago

The updates in themselves are really wonky on the otoro. I tried updating and I've seen the bar at full and the indicator showing it's still downloading, I've retried and seen it at 0 ... it seems all over the place. It's hard to tell when it's done or not due to the slow transition in the animation or what it's really doing. Is there any other refined method/logging that we can have other than what's already in logcat?

Naoki Hirata :nhirata (please use needinfo instead of cc)

Comment 13

•

13 years ago

er. I meant unagi not otoro in comment 12.

Alex Keybl [:akeybl]

Updated

•

13 years ago

QA Contact: nhirata.bugzilla

Naoki Hirata :nhirata (please use needinfo instead of cc)

Comment 14

•

13 years ago

Attached file logcat of new str — Details

## Environment : Unagi phone, build 20121031073004 2012-10-31 10:23:10 ## Repro : 1. pull down notification 2. select update notification 3. select update to update 4. let device time out Expected: 1) the screen timeout will not affect the download of the update or 2) the update will continue after the device is woken Actual: download stops and never continues

Naoki Hirata :nhirata (please use needinfo instead of cc)

Updated

•

13 years ago

Keywords: qawanted, steps-wanted

Alex Keybl [:akeybl]

Comment 15

•

13 years ago

Marking for C2, given this meets the criteria of known P1/P2 blocking-basecamp+ bugs at the end of C1.

Target Milestone: --- → B2G C2 (20nov-10dec)

Dietrich Ayala (:dietrich)

Comment 16

•

13 years ago

Is this still a problem? There's been no activity here for nearly a month.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Updated

•

13 years ago

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.