Closed Bug 787366 Opened 8 years ago Closed 8 years ago

B2G 3G: Re-established data call after B2G process restarted

Categories

(Core :: DOM: Device Interfaces, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
blocking-basecamp +

People

(Reporter: swu, Assigned: vicamo)

Details

(Whiteboard: [LOE:M][WebAPI:P0][vivo-lab])

Attachments

(4 files, 2 obsolete files)

If we restart B2G process when there's an existing data call, the data call was not re-established.  In this situation, we got a data call response with empty ifname, and it registered to NetworkManager with this empty ifname.  This should be fixed.

{"status":0,"suggestedRetryTime":-1,"cid":"0","active":2,"type":"","ifname":"","ipaddr":"","dns":"","gw":"","ip":null,"netmask":null,"broadcast":null,"state":1,"rilMessageType":"datacallstatechange"}
I/Gecko   ( 1391): -*- RadioInterfaceLayer: Data call ID: 0, interface name: 
I/Gecko   ( 1391): -*- RadioInterfaceLayer: === dataCallStateChanged() this.state:-1, datacall.state:1
I/Gecko   ( 1391): RIL Worker: Next parcel size unknown, going to sleep.
I/Gecko   ( 1391): -*- RadioInterfaceLayer: === datacall:{"cid":"0","ifname":"","state":1,"apn":null,"ip":null,"netmask":null,"broadcast":null,"gw":"","dns":""}
I/Gecko   ( 1391): -*- NetworkManager: Evaluating whether active network needs to be changed.
I/Gecko   ( 1391): -*- NetworkManager: Network '' registered.
Per discussed with Hsin-Yi, when B2G process restarts, we should get the UNSOLICITED_RESPONSE_RADIO_STATE_CHANGED event in ril_worker.js, and it will 
call setRadioPower() reset the radio power, to make sure B2G restart handled gracefully(bug 775375).

On Otoro device, we didn't get the radio state change event, so the radio power wasn't reset.
I will take this issue to tracking.
Assignee: nobody → vliu
I got information from observing the RIL[UNSOLICITED_RESPONSE_RADIO_STATE_CHANGED] in ril_worker.js. When we restart B2G process when there's an existing data call, the flow like below.
1.
I/Gecko   (  506): RIL Worker: this._isInitialRadioState = true
I/Gecko   (  506): RIL Worker: radioState = 10
I/Gecko   (  506): RIL Worker: this.setRadioPower
2. 
I/Gecko   (  105): RIL Worker: radioState = 0
I/Gecko   (  105): RIL Worker: Radio state changed from 'null' to 'off'
3. 
I/Gecko   (  105): RIL Worker: radioState = 10
I/Gecko   (  105): RIL Worker: Radio state changed from 'off' to 'ready'
 
In the above case, we got empty ifname.

But when we first time booting up the device. Or we reboot the device(adb reboot) when there's an existing data call, the flow like below.

1.
I/Gecko   (  506): RIL Worker: this._isInitialRadioState = true
I/Gecko   (  105): RIL Worker: radioState = 1
I/Gecko   (  105): RIL Worker: this.setRadioPower
2. 
I/Gecko   (  105): RIL Worker: radioState = 0
I/Gecko   (  105): RIL Worker: Radio state changed from 'null' to 'off'
3. 
I/Gecko   (  105): RIL Worker: radioState = 10
I/Gecko   (  105): RIL Worker: Radio state changed from 'off' to 'ready'

In the above cases, we got correct ifname.

(In reply to Shian-Yow Wu from comment #1)
> Per discussed with Hsin-Yi, when B2G process restarts, we should get the
> UNSOLICITED_RESPONSE_RADIO_STATE_CHANGED event in ril_worker.js, and it will 
> call setRadioPower() reset the radio power, to make sure B2G restart handled
> gracefully(bug 775375).
>

> On Otoro device, we didn't get the radio state change event, so the radio
> power wasn't reset.

I think we still got the radio state change event when we restarted the b2g process. 

I may have some questions. Can you please give me any suggestion?

Q1: When we restart the b2g, I think rild is still alive. I am not sure what will happen when we reset it at this stage. 

Q2: For user operation experience, we normally reboot the device and not just only restart b2g process. If we can get the correct ifname after rebooting the device, do we still track on this issue?
(In reply to vliu from comment #3)
> Q1: When we restart the b2g, I think rild is still alive. I am not sure what
> will happen when we reset it at this stage. 

We expect to reset the data call state.  Please reference bug 775375, it's been tested on SGS2 platform.  I think the question is, does this approach work on all other platforms, and how to make it also work on Otoro platform?

> 
> Q2: For user operation experience, we normally reboot the device and not
> just only restart b2g process. If we can get the correct ifname after
> rebooting the device, do we still track on this issue?

Good question.  The main reason is that b2g process crash & restart will cause data call failure.  In that condition, only way to recover is to reboot the device.  So the issue should be blocking-basecamp.  Also see bug 775375 for more information.
blocking-basecamp: --- → ?
From philikon said in bug 775375, when Gecko starts up and finds the radio in anything other than "off", we should turn it off, and then depending on what ril.radio.disabled says, turn it back on or not. So I observed the flow and monitored the behavior for ril.radio.disabled/ril.radio.enabled, UNSOLICITED_DATA_CALL_LIST_CHANGED and UNSOLICITED_RESPONSE_RADIO_STATE_CHANGED event. I found the flow was different between sgs2 and otoro. Please see the attached file. In otoro, the RadioInterfaceLayer turned ril.data.enabled to true after it had received UNSOLICITED_DATA_CALL_LIST_CHANGED to list changed APN data. I think it could be the problem. Does anyone can give me any suggestion? Thanks.
(In reply to vliu from comment #5)
> Created attachment 657984 [details]
> logs to compare flow between sgs2 and otoro.
> 
> From philikon said in bug 775375, when Gecko starts up and finds the radio
> in anything other than "off", we should turn it off, and then depending on
> what ril.radio.disabled says, turn it back on or not. So I observed the flow
> and monitored the behavior for ril.radio.disabled/ril.radio.enabled,
> UNSOLICITED_DATA_CALL_LIST_CHANGED and
> UNSOLICITED_RESPONSE_RADIO_STATE_CHANGED event. I found the flow was
> different between sgs2 and otoro. Please see the attached file. In otoro,
> the RadioInterfaceLayer turned ril.data.enabled to true after it had
> received UNSOLICITED_DATA_CALL_LIST_CHANGED to list changed APN data. I
> think it could be the problem. Does anyone can give me any suggestion?
> Thanks.

Hi vliu,

From the attached log, it seems you added additional debug messages, right? If yes, could you please attach the diff? In this way, I might have clear understanding of where the log came from, and see how I can help. Thanks :)

b.t.w, I don't see the setting 'ril.radio.enabled' in RadioInterfaceLayer.js, only 'ril.radio.disabled.'
Hi Shin-Yi,

> b.t.w, I don't see the setting 'ril.radio.enabled' in RadioInterfaceLayer.js, > only 'ril.radio.disabled.'

Sorry about the wrong name and I supposed to say ril.data.enabled. I attached another log without focusing on my extra log.

1.
 
09-04 13:39:56.282 I/Gecko   (  640): -*- RadioInterfaceLayer: 'ril.radio.disabled' is now false

2.

09-04 13:39:57.303 I/Gecko   (  640): RIL Worker: Handling parcel as UNSOLICITED_DATA_CALL_LIST_CHANGED
09-04 13:39:57.313 I/Gecko   (  640): -*- RadioInterfaceLayer: We haven't read completely the APN data from the settings DB yet. Wait for that.
09-04 13:39:57.333 I/Gecko   (  640): RIL Worker: Unexpected new data call: {"status":65535,"suggestedRetryTime":-1,"cid":"0","active":0,"type":"","ifname":"","ipaddr":"","dns":"","gw":"","ip":null,"netmask":null,"broadcast":null,"state":3}

3.

09-04 13:40:01.227 I/Gecko   (  640): -*- RadioInterfaceLayer: 'ril.data.enabled' is now true


In sgs2, we have different orders.
1. Turn ril.radio.disabled to false
2. Turn ril.data.enabled to true 
3. handle UNSOLICITED_DATA_CALL_LIST_CHANGED

I am not sure if it is any matter to miss the order. Do you have any suggestion? Thanks.
Attachment #657984 - Attachment is obsolete: true
Hi Vincent, 

Thanks for the log. I just read them, and the cause of the issue seems here (see attachment #657984 [details]).
09-04 11:07:58.441 I/Gecko   (  923): -*- RadioInterfaceLayer: 'ril.data.enabled' is now true
09-04 11:07:58.441 I/Gecko   (  923): -*- RadioInterfaceLayer: RIL is not ready for data connection: Phone's not registered or doesn't have data connection.

RadioInterfaceLayer reads the 'ril.data.enabled' settings from UI and tries to setup connection when RIL is not ready. Therefore, RILNetworkInterface.connect() is not called when updateRILNetworkInterface() is the first time called. More, since the setting 'ril.data.enabled' is not changed, updateRILNetworkInterface() is not called again, either. Of cause, no further data connection anyway ;-)

So, maybe we should double check 'data connection status' and 'ril.data.enabled' setting when *data registration state* is ready. If the current connection status doesn't meet the setting, then call 'RILNetworkInterface.connect()' again.

Does it look good?
Hi Hsin-Yi,
I have a question. Even I got empty ifname, some log information still showed the data call was established.

1.
dataInfo.connected is true in updateDataConnection
newInfo.regState is 1(NETWORK_CREG_STATE_REGISTERED_HOME) in updateDataConnection

RILContentHelper: Received message 'RIL:DataInfoChanged': {"connected":true,"emergencyCallsOnly":false,"roaming":false,"network":{"rilMessageType":"operatorchange","longName":"Chungwa","shortName":"Chungwa","mcc":466,"mnc":92},"cell":{"gsmLocationAreaCode":10291,"gsmCellId":19072823},"type":"umts","signalStrength":-53,"relSignalStrength":96,"state":"registered"}

Also, I could browsing the internet without apn setting again. Got idea?
When I took a look in _processDataCallList() in ril_worker.js, I found there was a problem here. When we restart b2g process, updatedDataCall in _processDataCallList() kept the data from rild and prepared to reply to RadioInterfaceLayer. In this case, updatedDataCall.state and currentDataCall.state were different so it only updated status, active and state. But apparently these data exclude others like ifname. So this is the reason why Radio RadioInterfaceLayer gets empty ifname. When I added ifname likes the below patch, the RadioInterface can get the ifname.

--- a/dom/system/gonk/ril_worker.js
+++ b/dom/system/gonk/ril_worker.js
@@ -2760,39 +2760,43 @@ let RIL = {
         delete this.currentDataCalls[currentDataCall.callIndex];
         currentDataCall.state = GECKO_NETWORK_STATE_DISCONNECTED;
         currentDataCall.rilMessageType = "datacallstatechange";
         this.sendDOMMessage(currentDataCall);
         continue;
       }
 
       this._setDataCallGeckoState(updatedDataCall);
+
       if (updatedDataCall.state != currentDataCall.state) {
         currentDataCall.status = updatedDataCall.status;
         currentDataCall.active = updatedDataCall.active;
         currentDataCall.state = updatedDataCall.state;
+        currentDataCall.ifname = updatedDataCall.ifname;  // add by vincent liu
         currentDataCall.rilMessageType = "datacallstatechange";
         this.sendDOMMessage(currentDataCall);
       }
     }

There still have some tricky points to think. Why sgs2 doesn't have this issue? Why we don't have this issue if we use adb reboot? When I discussed with Hshin-Yi about the log, we found when ril.data.enabled turned to true and request REQUEST_SETUP_DATA_CALL, rild was not ready to connection and then fail. I will see if sgs2 have this racing issue. I am not sure the above patch is right way and maybe any of you can give me any suggestion. Thanks.
Having to restart data connections if the B2G process crashes seems like a blocker.
blocking-basecamp: ? → +
The issue was seen in Sao Paulo.  But when I back to Taipei and test again with latest Gecko on my Otoro, the data call can be re-established after B2G restart.

And the empty ifname issue was disappeared after applying patch from bug 744700.

To be confirmed again after bug 744700 landed.
Whiteboard: [WebAPI:P0]
After checking again with the Gecko code used in Sao Paulo, I cannot reproduce the issue again either.

This means the issue could be operator specific.
(In reply to Shian-Yow Wu from comment #13)
> This means the issue could be operator specific.

Still means we need to fix it.

Adding the [vivo-lab] keyword that I just invented to indicate that this a bug we should investigate when we're working with the operator.
Whiteboard: [WebAPI:P0] → [WebAPI:P0][vivo-lab]
I encountered another error with Gecko revision in these two days, it appeared from time to time when I restart b2g process.  When this happens, data call cannot be established.

==================================================================================
I/Gecko   (25253): -*- RadioInterfaceLayer: RIL is not ready for data connection: Phone's not registered or doesn't have data connection. state:registered, type:null
==================================================================================

RadioInterfaceLayer.rilContext.data.type is null in this condition.
(In reply to Shian-Yow Wu from comment #15)
> Created attachment 659441 [details]
> Log (RIL is not ready for data connection)
> 
> I encountered another error with Gecko revision in these two days, it
> appeared from time to time when I restart b2g process.  When this happens,
> data call cannot be established.
> 
> =============================================================================
> =====
> I/Gecko   (25253): -*- RadioInterfaceLayer: RIL is not ready for data
> connection: Phone's not registered or doesn't have data connection.
> state:registered, type:null
> =============================================================================
> =====
> 
> RadioInterfaceLayer.rilContext.data.type is null in this condition.

When I investigated this issue the other day, I also saw this log when I restarted b2g or reboot. But even so, I could still establish data call.  

I/Gecko   (  506): -*- RadioInterfaceLayer: RIL is not ready for data connection: Phone's not registered or doesn't have data connection.

I will sync today for otoro and tried again. BTW, I can't see "state:registered, type:null
" in my log. Did this log your extra dump? Can you give me a simple diff to show me where to add it for observation? Thanks
(In reply to vliu from comment #16)
> I will sync today for otoro and tried again. BTW, I can't see
> "state:registered, type:null
> " in my log. Did this log your extra dump? Can you give me a simple diff to
> show me where to add it for observation? Thanks

Yes, the extra log is added by me, sorry for not stated clearly.

Here's the diff:

diff --git a/dom/system/gonk/RadioInterfaceLayer.js b/dom/system/gonk/RadioInterfaceLayer.js
--- a/dom/system/gonk/RadioInterfaceLayer.js
+++ b/dom/system/gonk/RadioInterfaceLayer.js
@@ -700,17 +700,17 @@ RadioInterfaceLayer.prototype = {
       debug("We haven't read completely the APN data from the " +
             "settings DB yet. Wait for that.");
       return;
     }

     // This check avoids data call connection if the radio is not ready
     // yet after toggling off airplane mode.
     if (this.rilContext.radioState != RIL.GECKO_RADIOSTATE_READY) {
-      debug("RIL is not ready for data connection: radio's not ready");
+      debug("RIL is not ready for data connection: radio's not ready --- " + this.rilContext.radioState);
       return;
     }
Wrong patch in comment 17, this is the correct one.

@@ -738,17 +738,17 @@ debug("=== enabled:" + this.dataCallSett
     }
     let dataInfo = this.rilContext.data;
     let isRegistered =
       dataInfo.state == RIL.GECKO_MOBILE_CONNECTION_STATE_REGISTERED;
     let haveDataConnection =
       dataInfo.type != RIL.GECKO_MOBILE_CONNECTION_STATE_UNKNOWN;
     if (!isRegistered || !haveDataConnection) {
       debug("RIL is not ready for data connection: Phone's not registered " +
-            "or doesn't have data connection.");
+            "or doesn't have data connection." + " state:" + dataInfo.state + ", type:" + dataInfo.type);
       return;
     }
(In reply to Shian-Yow Wu from comment #18)
> Wrong patch in comment 17, this is the correct one.
> 
> @@ -738,17 +738,17 @@ debug("=== enabled:" + this.dataCallSett
>      }
>      let dataInfo = this.rilContext.data;
>      let isRegistered =
>        dataInfo.state == RIL.GECKO_MOBILE_CONNECTION_STATE_REGISTERED;
>      let haveDataConnection =
>        dataInfo.type != RIL.GECKO_MOBILE_CONNECTION_STATE_UNKNOWN;
>      if (!isRegistered || !haveDataConnection) {
>        debug("RIL is not ready for data connection: Phone's not registered "
> +
> -            "or doesn't have data connection.");
> +            "or doesn't have data connection." + " state:" + dataInfo.state
> + ", type:" + dataInfo.type);
>        return;
>      }

Sorry for late response. I synced to latest and added logs the same with you. But I didn't add patch on bug 744700. Based on this condition, even I had type=null in log file, I could still establish data call.

09-08 17:25:48.336 I/Gecko   (  857): -*- RadioInterfaceLayer: RIL is not ready for data connection: Phone's not registered or doesn't have data connection. state:registered, type:null

I could see the "H" on screen and surf the internet. I will try more to verify the above result. Please also see the logs for detail.
Attachment #658003 - Attachment is obsolete: true
Hi Shian,
I tried more observation and I saw two issues on 9/8 release.

1. After flashing image and started b2g, we could't set up data call.
   I found we couldn't get apn from mozSetting. After checked with Tim, he said it was a gaia bug.

2. After restart b2g, we had the following message.

09-08 17:25:48.336 I/Gecko   (  857): -*- RadioInterfaceLayer: RIL is not ready for data connection: Phone's not registered or doesn't have data connection. state:registered, type:null

I will keep working on this.
When I see this message which prevents to establish data call, making a phone call can recover it.
================================================================
I/Gecko   (  506): -*- RadioInterfaceLayer: RIL is not ready for data connection: Phone's not registered or doesn't have data connection.
================================================================
Switch owner to Vicamo Yang and he will check this issue in Brazil.
Assignee: vliu → vyang
Whiteboard: [WebAPI:P0][vivo-lab] → [LOE:M][WebAPI:P0][vivo-lab]
Attached file Success - Oct Brazil
I've tested this bug again in Brazil, and the data connection is reconnected back and web surfing works fine as expected. I think we can close this issue.
To be detailed, I'm using Gecko commit 33473ee from https://github.com/mozilla/mozilla-central and Gaia commit 2c727ca.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.