Closed Bug 825598 Opened 7 years ago Closed 7 years ago

[OTA Update] Download stalls and never recovers when the network connection goes down

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-basecamp:+, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18 fixed)

RESOLVED FIXED
B2G C4 (2jan on)
blocking-basecamp +
Tracking Status
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- fixed
b2g18 --- fixed

People

(Reporter: marcia, Assigned: dhylands)

References

Details

(Keywords: unagi, Whiteboard: [lame-network])

Attachments

(6 files, 2 obsolete files)

Seen while updating the stable 12-12 build to the 12-26 build.

STR:
1. Receive the system notification and start downloading the update
2. Leave the phone and let the update finish
3. Come back to black screen.
4. Pull the battery out of the phone - update has now been applied

I have seen this intermittently, and the release-drivers thread indicates Hal and Justin have seen it has well. Some have seen it performing nightly updates.

Hal's email comment:

Marcia,

I couldn't find that bug to add to it - I had a similar experience:
- started ota wifi update from 20121212 version (with no sim card inserted)
- download quickly progressed to 46.65MB (or some such - I have a screen
shot)
- download never finished (waited over 25 min)
- based on prior update issues I tried power cycling (in the past, that
gave the the notification to install)
- no notification presented
- manually checked for update - never returned any status (but buttons
still active)
- version still showed 20121212
- I gave up, and figured I'd have to have it reflashed
- I saw your email & pulled battery
- powering on phone it came up with new firmware of 20121226
Some thoughts about this:

- currently, when we interrupt the system update, it resumes without a user notification when the phone reboots. That explains why the update is applied after a reboot, and then after another reboot. I'm quite sure that pulling the battery doesn't help much and is only a coincidence.

When Bug 819548 will land (hopefully soon) then it won't resume automatically anymore, and manually checking for update should show the notification again (after the usual 30 seconds delay).

- we know the phone nearly freezes (or at least is extremely slow) when the system update is uncompressed. I don't know if there is a bug about this. This is quite long (about 1 minute IIRC).

- I'd like to have an |adb logcat| output before you try power cycling or pulling out the batterie.

Thanks !
i think i've seen updates stall, and when i was running logcat, my update.status showed failed.

anyway, i was told that a recent patch has landed on gaia--post 12/26 beta build.   see bug 825441, which sounds similar, and is requesting dev input.
Etienne, can you reproduce it ? And comment this ?
Flags: needinfo?(etienne)
(In reply to David Scravaglieri [:scravag] from comment #3)
> Etienne, can you reproduce it ? And comment this ?

Basically, what Julien says :)

> we know the phone nearly freezes (or at least is extremely slow) when the system update is uncompressed. I don't know if there is a bug about this. This is quite long (about 1 minute IIRC).

We've addressed part of the issue by displaying a clear "Uncompressing..." message to the user instead of the old UI which pretty much looked like the download stalled, but anything beyond that should be tracked on B2G:General since it's likely lower level.
Component: Gaia::System → General
Flags: needinfo?(etienne)
Just happened to me this morning.
Assignee: nobody → marshall
blocking-basecamp: ? → +
Just finished a system update and it worked well.

However I'm wondering if we should not display the dialog "do you want to install now" before uncompressing.
Attached file logcat of updater
i had to pull battery today to resume update.  

Environment: 
* updating from 12-31 build -> 1-1 nightly build, unagi
* mozilla guest network

Logcat attached, hope its helpful.
Summary: [OTA] Battery sometimes needs to pulled out in order to finish update → [OTA Update] Battery sometimes needs to pulled out in order to finish update
Here's another logcat attached.   the log doesnt say much, but essentially in this snippet of logcat timestamp, my device essentially paused at:

01-02 11:13:43.619: E/GeckoConsole(109): AUS:SVC Downloader:onProgress - progress: 17241188/51529753

restarted phone, and resumed once wifi connected:

01-02 11:14:12.949: E/GeckoConsole(109): AUS:SVC Downloader:onProgress - progress: 17338742/51529753


logcat snippet:
01-02 11:12:04.169: E/GeckoConsole(109): AUS:SVC Downloader:onProgress - progress: 15188773/51529753
01-02 11:12:04.689: I/Gecko(109): *** AUS:SVC Downloader:onProgress - progress: 15240985/51529753
01-02 11:12:04.689: E/GeckoConsole(109): AUS:SVC Downloader:onProgress - progress: 15240985/51529753
01-02 11:12:10.129: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 73, Prog: 31000000, Vers: fc37ad5c, Proc: 00000012
01-02 11:12:10.129: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 73
01-02 11:12:10.129: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 73
01-02 11:12:10.139: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 74, Prog: 31000000, Vers: fc37ad5c, Proc: 00000012
01-02 11:12:10.139: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 74
01-02 11:12:10.139: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 74
01-02 11:12:10.169: I/ONCRPC(116): Setup RPC Call for task 4015a420
01-02 11:12:10.169: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000000, Ver: ddc9bd91, Proc: 000000b0
01-02 11:12:10.169: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: aa, Prog: 30000000, Ver: ddc9bd91, Proc: 000000b0
01-02 11:12:10.169: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: aa, Prog: 30000000, Ver: ddc9bd91, Proc: 000000b0
01-02 11:12:10.169: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 75, Prog: 31000000, Vers: fc37ad5c, Proc: 00000004
01-02 11:12:10.169: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 75
01-02 11:12:10.169: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 75
01-02 11:12:10.169: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 76, Prog: 31000000, Vers: fc37ad5c, Proc: 0000000f
01-02 11:12:10.169: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 76
01-02 11:12:10.169: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 76
01-02 11:12:11.359: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 78, Prog: 31000003, Vers: cf0d42a3, Proc: 00000003
01-02 11:12:11.359: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 78
01-02 11:12:11.359: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 78
01-02 11:12:12.069: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 79, Prog: 31000000, Vers: fc37ad5c, Proc: 00000012
01-02 11:12:12.069: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 79
01-02 11:12:12.069: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 79
01-02 11:12:12.069: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 7a, Prog: 31000000, Vers: fc37ad5c, Proc: 00000012
01-02 11:12:12.069: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 7a
01-02 11:12:12.069: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 7b, Prog: 31000012, Vers: 00040001, Proc: 00000004
01-02 11:12:12.079: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 7a
01-02 11:12:12.079: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 7b
01-02 11:12:12.079: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 7b
01-02 11:12:12.089: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 7c, Prog: 31000000, Vers: fc37ad5c, Proc: 00000012
01-02 11:12:12.089: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 7c
01-02 11:12:12.089: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 7d, Prog: 31000012, Vers: 00040001, Proc: 00000004
01-02 11:12:12.089: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 7d
01-02 11:12:12.089: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 7c
01-02 11:12:12.089: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 7d
01-02 11:12:12.089: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 7e, Prog: 31000000, Vers: fc37ad5c, Proc: 00000012
01-02 11:12:12.089: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 7e
01-02 11:12:12.089: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 7e
01-02 11:12:12.109: I/ONCRPC(116): Setup RPC Call for task 4015a420
01-02 11:12:12.109: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000000, Ver: ddc9bd91, Proc: 000000b0
01-02 11:12:12.109: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: ab, Prog: 30000000, Ver: ddc9bd91, Proc: 000000b0
01-02 11:12:12.109: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 7f, Prog: 31000000, Vers: fc37ad5c, Proc: 00000004
01-02 11:12:12.109: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 7f
01-02 11:12:12.109: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 7f
01-02 11:12:12.109: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: ab, Prog: 30000000, Ver: ddc9bd91, Proc: 000000b0
01-02 11:12:12.109: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 80, Prog: 31000000, Vers: fc37ad5c, Proc: 0000000f
01-02 11:12:12.109: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 80
01-02 11:12:12.109: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 80
01-02 11:12:13.109: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:13.119: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000001f
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: ac, Prog: 30000012, Ver: 00040001, Proc: 0000001f
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: ac, Prog: 30000012, Ver: 00040001, Proc: 0000001f
01-02 11:12:13.119: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:13.119: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: ad, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: ad, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:13.119: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:13.119: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: ae, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: ae, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:13.119: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:13.119: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: af, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: af, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:13.119: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:13.119: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: b0, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: b0, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:13.119: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:13.119: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: b1, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: b1, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:13.119: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:13.119: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: b2, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: b2, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:13.119: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:13.119: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:13.119: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: b3, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:13.129: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: b3, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:13.129: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:13.129: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:13.129: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: b4, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:13.129: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: b4, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:13.129: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:13.129: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:13.129: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: b5, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:13.129: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: b5, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:13.129: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:13.129: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:13.129: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: b6, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:13.129: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: b6, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:13.129: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:13.129: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:13.129: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: b7, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:13.129: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: b7, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:13.129: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:13.129: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:13.129: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: b8, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:13.129: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: b8, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:13.129: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:13.129: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:13.129: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: b9, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:13.129: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: b9, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:14.499: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 83, Prog: 31000000, Vers: fc37ad5c, Proc: 00000012
01-02 11:12:14.499: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 83
01-02 11:12:14.499: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 83
01-02 11:12:14.499: I/ONCRPC(116): Setup RPC Call for task 4015a3d8
01-02 11:12:14.499: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 3000000e, Ver: 00090001, Proc: 0000000a
01-02 11:12:14.499: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: ba, Prog: 3000000e, Ver: 00090001, Proc: 0000000a
01-02 11:12:14.499: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: ba, Prog: 3000000e, Ver: 00090001, Proc: 0000000a
01-02 11:12:14.499: I/ONCRPC(116): Setup RPC Call for task 4015a3d8
01-02 11:12:14.499: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 3000000e, Ver: 00090001, Proc: 0000000a
01-02 11:12:14.499: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: bb, Prog: 3000000e, Ver: 00090001, Proc: 0000000a
01-02 11:12:14.499: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: bb, Prog: 3000000e, Ver: 00090001, Proc: 0000000a
01-02 11:12:14.529: I/ONCRPC(116): Setup RPC Call for task 4015a420
01-02 11:12:14.529: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000000, Ver: ddc9bd91, Proc: 000000b0
01-02 11:12:14.529: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: bc, Prog: 30000000, Ver: ddc9bd91, Proc: 000000b0
01-02 11:12:14.529: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: bc, Prog: 30000000, Ver: ddc9bd91, Proc: 000000b0
01-02 11:12:14.529: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 84, Prog: 31000000, Vers: fc37ad5c, Proc: 00000004
01-02 11:12:14.529: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 84
01-02 11:12:14.529: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 84
01-02 11:12:14.529: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 85, Prog: 31000000, Vers: fc37ad5c, Proc: 0000000f
01-02 11:12:14.529: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 85
01-02 11:12:14.529: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 85
01-02 11:12:15.059: I/ONCRPC(116): rpc_handle_rpc_call: for Xid: 86, Prog: 31000000, Vers: fc37ad5c, Proc: 00000012
01-02 11:12:15.059: I/ONCRPC(116): rpc_handle_rpc_call: Find Status: 0 Xid: 86
01-02 11:12:15.059: I/ONCRPC(116): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 86
01-02 11:12:15.519: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:15.519: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000001f
01-02 11:12:15.519: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: bd, Prog: 30000012, Ver: 00040001, Proc: 0000001f
01-02 11:12:15.519: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: bd, Prog: 30000012, Ver: 00040001, Proc: 0000001f
01-02 11:12:15.519: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:15.519: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:15.519: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: be, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:15.519: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: be, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:15.519: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:15.519: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:15.519: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: bf, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:15.519: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: bf, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:15.519: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:15.519: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:15.519: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: c0, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:15.519: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: c0, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:15.519: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:15.519: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:15.519: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: c1, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:15.529: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: c1, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:15.529: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:15.529: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:15.529: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: c2, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:15.529: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: c2, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:15.529: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:15.529: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:15.529: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: c3, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:15.529: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: c3, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:15.529: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:15.529: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:15.529: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: c4, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:15.529: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: c4, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:15.529: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:15.529: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:15.529: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: c5, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:15.529: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: c5, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:15.529: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:15.529: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:15.529: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: c6, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:15.529: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: c6, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:15.529: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:15.529: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:15.529: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: c7, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:15.529: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: c7, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:15.529: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:15.529: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:15.529: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: c8, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:15.539: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: c8, Prog: 30000012, Ver: 00040001, Proc: 0000002d
01-02 11:12:15.539: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:15.539: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:15.539: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: c9, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:15.539: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: c9, Prog: 30000012, Ver: 00040001, Proc: 0000002b
01-02 11:12:15.539: I/ONCRPC(116): Setup RPC Call for task 4015a4b0
01-02 11:12:15.539: I/ONCRPC(116): oncrpc_xdr_call_msg_start: Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:15.539: I/ONCRPC(116): xdr_std_msg_send_call: Sent Xid: ca, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:15.539: I/ONCRPC(116): xdr_std_msg_send_call: Received Reply Xid: ca, Prog: 30000012, Ver: 00040001, Proc: 00000028
01-02 11:12:37.419: E/rmt_storage(132): write_log_file: [MyTag]open: fd=-1
01-02 11:12:37.419: E/rmt_storage(132): block rmt_storage client thread
01-02 11:12:49.829: I/Gecko(109): *** AUS:SVC Downloader:onProgress - progress: 15269839/51529753
01-02 11:12:49.829: E/GeckoConsole(109): AUS:SVC Downloader:onProgress - progress: 15269839/51529753
When looking through some older bugs, I noticed that Lukas filed Bug 821462 a while back, but in her case she was mentioned that the phone was connected to USB and had USB Mass storage on.
Dave, could these problems be fixed by your work on Bug 785124 ?
Flags: needinfo?(dhylands)
(In reply to Julien Wajsberg [:julienw] from comment #1)

> - we know the phone nearly freezes (or at least is extremely slow) when the
> system update is uncompressed. I don't know if there is a bug about this.
> This is quite long (about 1 minute IIRC).

This is Bug 802423
(In reply to Julien Wajsberg [:julienw] from comment #10)
> Dave, could these problems be fixed by your work on Bug 785124 ?

bug 785124 is about putting the update.mar file on the SDCard, so I don't think that will help.
Flags: needinfo?(dhylands)
Target Milestone: --- → B2G C4 (2jan on)
Assignee: marshall → dhylands
Priority: -- → P1
is there any other data that QA can provide here?  i'm reproducing this daily now when testing in MV guest wifi network.
Can you do:

adb shell watch df /data

and see if the Free column goes down to zero (or close to zero) during the update?
Sorry, got the command messed up. It should be:

watch adb shell df /data
And for that matter, just the output of:

adb shell df

on your phone that reproduces the problem at anytime would be useful.
i have lots of free space when the download stalls.

Filesystem             Size   Used   Free   Blksize                                                   7:49
/dev                   203M   100K   203M   4096
Filesystem             Size   Used   Free   Blksize
/dev                    92M   100K    92M   4096
/mnt/asec               92M     0K    92M   4096
/mnt/obb                92M     0K    92M   4096
/system                503M   182M   321M   4096
/data                    2G   139M     2G   4096
/persist                15M     4M    11M   4096
/cache                 295M    15M   279M   4096
/mnt/sdcard              3G     1G     2G   32768
/mnt/secure/asec         3G     1G     2G   32768
MOre information via disk space.  it seems i have plenty available:

Tonys:b2g-distro tchung$ adb shell b2g-ps
APPLICATION      USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
b2g              root      2535  1     32348  6968  c025cf64 400e87b0 D /system/b2g/b2g
Tonys:b2g-distro tchung$ adb shell b2g-ps
APPLICATION      USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
b2g              root      110   1     213204 84464 ffffffff 400c2330 S /system/b2g/b2g
Cost Control     app_328   328   110   61256  21056 ffffffff 40025330 S /system/b2g/plugin-container
Homescreen       app_351   351   110   60964  23324 ffffffff 40017330 S /system/b2g/plugin-container
Settings         app_382   382   110   83856  25268 ffffffff 40078330 S /system/b2g/plugin-container
Calendar         app_395   395   110   61160  25232 ffffffff 400ce330 S /system/b2g/plugin-container
(Preallocated a  app_444   444   110   55700  17180 ffffffff 40072330 S /system/b2g/plugin-container
Tonys:b2g-distro tchung$ 

Tonys:b2g-distro tchung$ adb shell cat /proc/meminfo | grep MemFree
MemFree:          178428 kB
Tonys:b2g-distro tchung$
Duplicate of this bug: 807755
Since this seems to be caused by something on the network side, I'm thinking that one solution is to have a timer which goes off and gets reset each time we get a progress report.

If we then fail to get a progress report for some time N, we'll automatically cancel and restart the download.
Hi, Dave, so do we need to have someone works on network component to help?
Hi Khu,

If you're able to reproduce the problem, then it would be useful to figure out the root cause. I'm actually unable to reproduce the problem, so I was going to try a bandaid.
Actually, I did not see the issue in my device either. In this case, maybe we could mark this one as qawanted? 

I just want to see if there has anything I can help from here. :)
I saw this on Faramarz' phone today with a recent nightly update. It looks like the nsUpdateService's downloader "onStopRequest" is never actually being called to properly clean up this update and try starting again.

Also attaching my log here for posterity.
CCing Brian Smith, who might also be able help with triaging this
Summary: [OTA Update] Battery sometimes needs to pulled out in order to finish update → [OTA Update] Download stalls and never recovers when the network connection goes down
I hear from :dhylands that this is happening when wifi cuts out: has that generally been the case for everyone here, or are some of you seeing the update stall even on a good wifi network?  We have some long-standing necko reports of firefox stalling and/or not resuming network loads in the presence of jittery network connectivity (see bug 707744 and generally the 'lame-network' tag).

I think the plan in comment 20 (to cancel and launch a new network load if initial one seems to stall) is a good one for the b2g blocker timeline.  Dave tells me that this will use HTTP range requests, so we won't even be starting the download from scratch.
Whiteboard: [lame-network]
Attached patch WIP - Add watchdog timers (obsolete) — Splinter Review
Code (with additional debug) which implements the watchdog timers.

We need to test the firing of the timer.
STR:

You need to build for the phone with VARIANT=user (in your .userconfig). I also use NOFTU=1 in my .userconfig.

To create the update.mar file after you've got a build:

./build.sh gecko-update-full

and then run

./update.sh

You'll need a web server running on your build machine and you'll need to modify update.sh to point to the directory you'll be servibg updates from.
Component: General → Gaia
Component: Gaia → General
Any updates here? Do we expect a fix today given the fact that we have STR?
I have it all coded and am in the final stages of testing, so I should be landing today.
And sorry, the STR in comment 30 were steps to reproduce the environment to test.
Manual testing performed:

1 - Do a download that works successfully
2 - Start a download and pause/resume it manually
3 - Change the watchdog timeout to be 1 second.
    Comment out the stop/start of the watchdog timer in touchWatchdogTimer routine
    Do a download and see that it still succeeds even when being interrupted once per second.
Attachment #701101 - Attachment is obsolete: true
Attachment #702027 - Flags: review?(marshall)
Comment on attachment 702027 [details] [diff] [review]
Add a watchdog timer on downloads and automatically restart if timer expires.

Review of attachment 702027 [details] [diff] [review]:
-----------------------------------------------------------------

looks good, just some nits

::: b2g/components/UpdatePrompt.js
@@ +499,5 @@
> +    if (!this._watchdogTimer) {
> +      this._watchdogTimer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
> +    }
> +    let watchdogTimeout = Services.prefs.getIntPref(PREF_DOWNLOAD_WATCHDOG_TIMEOUT);
> +    this._watchdogTimer.initWithCallback(function UP_watchdogFired() {

nit: can we define UP_watchdogFired as an instance function of UpdatePrompt? that will save us a function allocation here

@@ +538,5 @@
> +    }
> +
> +    if (!this._autoRestartDownload) {
> +      this.sendChromeEvent("update-download-stopped", {
> +          paused: paused

nit: 2 space indentation
Attachment #702027 - Flags: review?(marshall) → review+
Fun is we have code that allows restart of stall http transactions (stall transaction == transaction w/o any progress for some defined time) in Necko natively.  It is just disabled until pipelining is preffed on (which is now by default off).

Patrick, do you think we could try to adjust and enable the "restart in progress" particular code portions by default?
Flags: needinfo?(mcmanus)
Comment on attachment 702027 [details] [diff] [review]
Add a watchdog timer on downloads and automatically restart if timer expires.

Review of attachment 702027 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know enough about the update process to do a thorough review here, but since I'm here looking at blocker bugs these are my thoughts.  Overall the code looks good to me, though I'd prefer to avoid calling getIntPref() over and over again on each progress event.

::: b2g/components/UpdatePrompt.js
@@ +158,5 @@
>      log("Update is ready to apply, registering idle timeout of " +
>          applyIdleTimeoutSeconds + " seconds before prompting.");
>  
>      this._update = aUpdate;
> +

nit: whitespace

@@ +498,5 @@
> +  startWatchdogTimer: function UP_startWatchdogTimer() {
> +    if (!this._watchdogTimer) {
> +      this._watchdogTimer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
> +    }
> +    let watchdogTimeout = Services.prefs.getIntPref(PREF_DOWNLOAD_WATCHDOG_TIMEOUT);

You're going to call getIntPref() on every progress events. Could this be moved to be one of the properties above, so you can use this._watchdogTimeout?

@@ +509,5 @@
> +
> +  stopWatchdogTimer: function UP_stopWatchdogTimer() {
> +    if (this._watchdogTimer) {
> +      // When we touch the timer we stop and immediately start the timer
> +      // resuing the same timer object (i.e. don't set _watchdogTimer to null)

nit: s/resuing/reusing/

@@ +516,5 @@
> +  },
> +
> +  touchWatchdogTimer: function UP_touchWatchdogTimer() {
> +    this.stopWatchdogTimer();
> +    this.startWatchdogTimer();

You could probably simplify this is a bit if startWatchdogTimer() was written to do this:

  if (this._watchdogTimer) {
    // stop and restart the timer
  }
  else {
    // start a new timer
  }

Then stopWatchdogTimer() could cancel and set to null.

Just a thought though: the code seems okay as it is.

@@ +545,3 @@
>      Services.aus.removeDownloadListener(this);
> +
> +    if (this._autoRestartDownload) {

You can make this an else clause of the above, can't you? And move the removeDownloadListener call to before or after the if/else?

@@ +546,5 @@
> +
> +    if (this._autoRestartDownload) {
> +      log("Download - autorestarting due to watchdog timeout");
> +      this._autoRestartDownload = false;
> +      this.downloadUpdate(null);

If the download stalled out because the user went out of range of wifi or whatever, what will happen here when you restart the download.  Does the error get reported appropriately?

If the restarted download makes progress, then a new timer will be started to monitor that progress.  But if the restarted download fails, then no timer is set and that is the end of the process.  What happens in that case? Nothing until the next call to showUpdateAvailable? Is that what is desired here?
Attachment #702027 - Flags: review+ → review?(marshall)
Oops, sorry for overwriting the r+ on this...
(In reply to David Flanagan [:djf] from comment #37)
> Comment on attachment 702027 [details] [diff] [review]
> Add a watchdog timer on downloads and automatically restart if timer expires.
> 
> Review of attachment 702027 [details] [diff] [review]:
> -----------------------------------------------------------------

> @@ +498,5 @@
> > +  startWatchdogTimer: function UP_startWatchdogTimer() {
> > +    if (!this._watchdogTimer) {
> > +      this._watchdogTimer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
> > +    }
> > +    let watchdogTimeout = Services.prefs.getIntPref(PREF_DOWNLOAD_WATCHDOG_TIMEOUT);
> 
> You're going to call getIntPref() on every progress events. Could this be
> moved to be one of the properties above, so you can use
> this._watchdogTimeout?

Sounds reasonable

> @@ +516,5 @@
> > +  },
> > +
> > +  touchWatchdogTimer: function UP_touchWatchdogTimer() {
> > +    this.stopWatchdogTimer();
> > +    this.startWatchdogTimer();
> 
> You could probably simplify this is a bit if startWatchdogTimer() was
> written to do this:
> 
>   if (this._watchdogTimer) {
>     // stop and restart the timer
>   }
>   else {
>     // start a new timer
>   }
> 
> Then stopWatchdogTimer() could cancel and set to null.

I was trying to reuse the same timer object. You're allowed to call cancel and initWithCallback on the same object over and over again.

Oh - If I made touch just call start and not call stop then this would make sense.

> If the download stalled out because the user went out of range of wifi or
> whatever, what will happen here when you restart the download.  Does the
> error get reported appropriately?

I'm in the process of adding a retry count. If that gets exceeded, then it will report an error.

> If the restarted download makes progress, then a new timer will be started
> to monitor that progress.  But if the restarted download fails, then no
> timer is set and that is the end of the process.  What happens in that case?

The timer is initially started in onStartRequest, so if no progress is made, then the timer will expire.
Cleanup as per marshal and djf.
Added a max retry counter.
Attachment #702027 - Attachment is obsolete: true
Attachment #702027 - Flags: review?(marshall)
Attachment #702091 - Flags: review?(marshall)
Manual testing performed on the v2 patch was as in comment 34 with one additional test:

- Set the max retries to 1 and timeout to 1 second, commented out call to startTimer in the touchWatchdog routine.
- Started a download, observed it report an error and the download is in a state for the user to try again manually.
- Repeated until download was applied.
Comment on attachment 702091 [details] [diff] [review]
Add a watchdog timer on downloads and automatically restart if timer expires. v2

Review of attachment 702091 [details] [diff] [review]:
-----------------------------------------------------------------

::: b2g/components/UpdatePrompt.js
@@ +496,5 @@
> +
> +  _autoRestartDownload: false,
> +  _autoRestartCount: 0,
> +
> +  watchdogTimerFired: function UP_watchdogFired() {

nit: UP_watchdogTimerFired

@@ +509,5 @@
> +    } else {
> +      this._watchdogTimer = Cc["@mozilla.org/timer;1"].createInstance(Ci.nsITimer);
> +      this._watchdogTimeout = Services.prefs.getIntPref(PREF_DOWNLOAD_WATCHDOG_TIMEOUT);
> +    }
> +    this._watchdogTimer.initWithCallback(this.watchdogTimerFired.bind(this),

You can actually reuse the existing |notify| function here, and bypass using bind(this) altogether, by passing |this| directly. We shouldn't hold up the patch for this though, I'm fine creating a follow up.
Attachment #702091 - Flags: review?(marshall) → review+
Filed followup bug 830612
Inbound was closed, so I pushed to try just in case:
https://tbpl.mozilla.org/?tree=Try&rev=d99553da02c2
https://hg.mozilla.org/releases/mozilla-b2g18/rev/f2460bf17811
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Maybe it's quite late but I'm wondering if we could not have simply tweaked either the Linux default timeout or the timeout we use for these sockets. Then we could have failed earlier.

It always seems strange to me to need to do this. Is this because of TCP/IP weird behaviour on bad links (esp with mobile devices) ?
I think that this solution is really just a bandaid solution, and I think its worth a followup bug to remove it.
Blocks: 830776
> we have code that allows restart of stall http transactions (stall transaction
> == transaction w/o any progress for some defined time) in Necko natively.  It
> is just disabled until pipelining is preffed on (which is now by default off).

But pipelining is preffed on for B2G.  So why don't we get the restart logic?  Do we just need to set it to a lower timeout value?
(In reply to Jason Duell (:jduell) from comment #50)
> > we have code that allows restart of stall http transactions (stall transaction
> > == transaction w/o any progress for some defined time) in Necko natively.  It
> > is just disabled until pipelining is preffed on (which is now by default off).
> 
> But pipelining is preffed on for B2G.  So why don't we get the restart
> logic?  Do we just need to set it to a lower timeout value?

The code hozna is referring to is pipeline recovery code - so it only comes into play if a pipelined request stalls. (and if you are just downloading one file, it can't form a pipeline by itself).

That can probably be adapted, but its not a easy slam dunk. (in particular it doesn't work on ranges, partial cache entries, etc..)
Flags: needinfo?(mcmanus)
You need to log in before you can comment on or make changes to this bug.