Closed
Bug 853489
Opened 12 years ago
Closed 9 years ago
Cannot connect to carrier after reboot during automated test
Categories
(Firefox OS Graveyard :: Gaia, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: bsilverberg, Assigned: vliu)
Details
Attachments
(9 files, 3 obsolete files)
We have been seeing this intermittent issue with our tests on CI, as well as locally on our phones. Sometimes, after rebooting the phone, it is unable to find a carrier signal. This then results in failures of any tests that require a carrier signal and any tests that require wifi. When we restart the b2g process between tests it does not resolve the issue, but a reboot of the phone does.
I was able to reproduce this problem by rebooting the phone and running a single test over and over again in a loop, and the problem occurred after the 45th iteration. I had RIL logging enabled while running this test and am attaching thew logcat from the test run.
Note that this logcat includes all of the passing tests, the failing test, and then a reboot of the phone which brought the cell signal back.
| Reporter | ||
Comment 1•12 years ago
|
||
I should add that what we see on the phone when this is happening is that the cell signal bars are continuously flashing, as if it's constantly looking for a signal.
| Reporter | ||
Comment 2•12 years ago
|
||
We just reproduced this on our CI as well. The job is at http://qa-selenium.mv.mozilla.com:8080/view/B2G/job/b2g.unagi.gaia.nightly.ui.cell_data_reboot_loop/31/, and I am attaching the logcat from that run as I'm not sure whether it will be saved indefinitely on the CI server.
| Reporter | ||
Comment 3•12 years ago
|
||
Attaching the logcat from the test run that experienced this problem on CI.
Comment 4•12 years ago
|
||
I pulled Unagi-4 off our infra last night, after noticing it had failed at least a few of the carrier/data tests, and got some logcats from it, which I'll attach.
Comment 5•12 years ago
|
||
Comment 6•12 years ago
|
||
Alive, Vincent, do you have any recommendations as to how to troubleshoot this further?
(Also, this is with the Mozilla RIL -- we'll move to the commercial-vendor RIL when Legal clears it.)
Comment 7•12 years ago
|
||
Marking as critical, even though we don't yet have a cause or owner, because, well, it's not only a huge distraction and annoyance, but worse is that it's causing wasted build-time/cycles, as well as potentially being a real issue.
Severity: normal → critical
| Assignee | ||
Comment 8•12 years ago
|
||
(In reply to Stephen Donner [:stephend] from comment #5)
> Created attachment 730250 [details]
> Logcat from unagi-4, after it had run and failed data tests
From the log, ril_worker received an error parcel REQUEST_GET_IMEI.
I/Gecko (13120): RIL Worker: Parcel (size 12): 0,0,0,0,8,0,0,0,2,0,0,0
I/Gecko (13120): RIL Worker: We have at least one complete parcel.
I/Gecko (13120): RIL Worker: Solicited response for request type 38, token 8, error 2
I/Gecko (13120): RIL Worker: Handling parcel as REQUEST_GET_IMEI
I/Gecko (13120): RIL Worker: Parcel handling threw Error: Trying to read data beyond the parcel end!
The error means this parcel size value which carried in parcel is bigger than the actual parcel size and that cause the ril_worker try to read data beyond the parcel.
I tried to dump the parcel data of REQUEST_GET_IMEI from my unagi.
I/Gecko ( 109): RIL Worker: New incoming parcel of size 48
I/Gecko ( 109): RIL Worker: Parcel (size 48): 0,0,0,0,21,0,0,0,0,0,0,0,15,0,0,0,56,0,54,0,52,0,56,0,57,0,57,0,48,0,49,0,50,0,48,0,50,0,48,0,50,0,48,0,50,0,0,0
I/Gecko ( 109): RIL Worker: We have at least one complete parcel.
I/Gecko ( 109): RIL Worker: Solicited response for request type 38, token 21, error 0
I/Gecko ( 109): RIL Worker: Handling parcel as REQUEST_GET_IMEI
In Comment 6, you mentioned about commercial-vendor RIL and Mozilla RIL. I don't understand what's different between them. The rild used in my unagi was pulling the from unagi itself when it run Android. Maybe you should clarify the correct version of rild to get the correct parcel.
Comment 9•12 years ago
|
||
(In reply to Vincent Liu[:vliu] from comment #8)
> (In reply to Stephen Donner [:stephend] from comment #5)
> > Created attachment 730250 [details]
> > Logcat from unagi-4, after it had run and failed data tests
>
> From the log, ril_worker received an error parcel REQUEST_GET_IMEI.
>
> I/Gecko (13120): RIL Worker: Parcel (size 12): 0,0,0,0,8,0,0,0,2,0,0,0
> I/Gecko (13120): RIL Worker: We have at least one complete parcel.
> I/Gecko (13120): RIL Worker: Solicited response for request type 38, token
> 8, error 2
> I/Gecko (13120): RIL Worker: Handling parcel as REQUEST_GET_IMEI
> I/Gecko (13120): RIL Worker: Parcel handling threw Error: Trying to read
> data beyond the parcel end!
>
> The error means this parcel size value which carried in parcel is bigger
> than the actual parcel size and that cause the ril_worker try to read data
> beyond the parcel.
>
> I tried to dump the parcel data of REQUEST_GET_IMEI from my unagi.
>
> I/Gecko ( 109): RIL Worker: New incoming parcel of size 48
> I/Gecko ( 109): RIL Worker: Parcel (size 48):
> 0,0,0,0,21,0,0,0,0,0,0,0,15,0,0,0,56,0,54,0,52,0,56,0,57,0,57,0,48,0,49,0,50,
> 0,48,0,50,0,48,0,50,0,48,0,50,0,0,0
> I/Gecko ( 109): RIL Worker: We have at least one complete parcel.
> I/Gecko ( 109): RIL Worker: Solicited response for request type 38, token
> 21, error 0
> I/Gecko ( 109): RIL Worker: Handling parcel as REQUEST_GET_IMEI
>
> In Comment 6, you mentioned about commercial-vendor RIL and Mozilla RIL. I
> don't understand what's different between them. The rild used in my unagi
> was pulling the from unagi itself when it run Android. Maybe you should
> clarify the correct version of rild to get the correct parcel.
We have Version 6 of the Mozilla RIL, for now; I've attached a new log showing failures from a recent run, where cell-data failures caused multiple test fails.
Build was correlates to this logcat is http://qa-selenium.mv.mozilla.com:8080/view/B2G/job/b2g.unagi.gaia.v1-train.ui/411/
| Assignee | ||
Comment 11•12 years ago
|
||
(In reply to Stephen Donner [:stephend] from comment #10)
> Created attachment 731765 [details]
> Logcat from unagi-3
>
> Build was correlates to this logcat is
> http://qa-selenium.mv.mozilla.com:8080/view/B2G/job/b2g.unagi.gaia.v1-train.
> ui/411/
Did you still see no carrier camped on? From the log, I could see some registered info in RILContentHelper.
I/Gecko (15832): -*- RILContentHelper: Received message 'RIL:DataInfoChanged': {"connected":true,"emergencyCallsOnly":false,"roaming":false,"network":{"rilMessageType":"operatorchange","longName":"T-Mobile","shortName":"T-Mobile","mcc":310,"mnc":260},"cell":{"gsmLocationAreaCode":328,"gsmCellId":26672},"type":"edge","signalStrength":-
I will CC someone else to help to see the log file.
Comment 12•12 years ago
|
||
(In reply to Vincent Liu[:vliu] from comment #11)
> (In reply to Stephen Donner [:stephend] from comment #10)
> > Created attachment 731765 [details]
> > Logcat from unagi-3
> >
> > Build was correlates to this logcat is
> > http://qa-selenium.mv.mozilla.com:8080/view/B2G/job/b2g.unagi.gaia.v1-train.
> > ui/411/
>
> Did you still see no carrier camped on? From the log, I could see some
> registered info in RILContentHelper.
>
> I/Gecko (15832): -*- RILContentHelper: Received message
> 'RIL:DataInfoChanged':
> {"connected":true,"emergencyCallsOnly":false,"roaming":false,"network":
> {"rilMessageType":"operatorchange","longName":"T-Mobile","shortName":"T-
> Mobile","mcc":310,"mnc":260},"cell":{"gsmLocationAreaCode":328,"gsmCellId":
> 26672},"type":"edge","signalStrength":-
>
> I will CC someone else to help to see the log file.
The log looks good to me, and everything seems work fine that voice connection got registered eventually. I wonder whether the log covers the failed case and whether it accidentally misses something. I said that because I expected seeing the line '-*- RadioInterfaceLayer: Starting RIL Worker', which is the beginning of RIL messages after rebooting or restarting b2g, in logcat but no such message there.
Unagi-3 had another failure today, which I captured here:
1) RIL log: http://dl.dropbox.com/u/725560/full-ril-log.txt
2) Radio log: http://dl.dropbox.com/u/725560/radio-logcat.txt
According to Vincent on #b2g today, cardState had an error, and then later, "Failed to read from rilproxy socket, closing..." in the 2nd log.
This particular one was from http://qa-selenium.mv.mozilla.com:8080/view/B2G/job/b2g.unagi.gaia.nightly.ui.cell_data_loop_test/37/
Vincent, what's the best way to move forward with troubleshooting this?
Flags: needinfo?(vliu)
Comment 16•12 years ago
|
||
Perhaps the devs can look at the way our test framework enables cell data and see if there's anything different between that and how they expect it would be done?
We may be doing it too quickly or aggressively.
| Assignee | ||
Comment 17•12 years ago
|
||
(In reply to Stephen Donner [:stephend] from comment #14)
> Created attachment 734775 [details]
> Yet another logcat of cell-data failure
>
> This particular one was from
> http://qa-selenium.mv.mozilla.com:8080/view/B2G/job/b2g.unagi.gaia.nightly.
> ui.cell_data_loop_test/37/
The log looks good to me, and it means I can see the voice connection registered. I am afraid this log is far behind the error part. Did you enlarge the duration doing reboot? How long it happens for reboot? From Comment 13, the cardState had an error. Can you try different SIM card?
From the log, I can see unagi ever wants to set up data call. Can we try to simplify the test procedure?
1. boot up the device.
2. Verify voice connection is registered.
3. Wait for a little time like 5 seconds.
4. reboot the device.
Please close all other function like disabling wifi for just in case. Maybe we can verify the stability of the above 4 steps. How do you think?
Flags: needinfo?(vliu)
| Reporter | ||
Comment 18•12 years ago
|
||
(In reply to Vincent Liu[:vliu] from comment #17)
>
> From the log, I can see unagi ever wants to set up data call. Can we try to
> simplify the test procedure?
>
> 1. boot up the device.
> 2. Verify voice connection is registered.
> 3. Wait for a little time like 5 seconds.
> 4. reboot the device.
>
> Please close all other function like disabling wifi for just in case. Maybe
> we can verify the stability of the above 4 steps. How do you think?
This is a good idea and something I've been meaning to do. I will work on this today.
| Reporter | ||
Comment 19•12 years ago
|
||
I have created a new job on our CI [1], which uses a reduced version of gaia_test which does nothing other than run a test (i.e., it doesn't do anything with cell data, wifi, etc.). The test simply checks for a carrier connection and fails if one is not found. The job does a reboot, runs this test, then reboots again, etc.
Once we get a failure from this job we can add the logcat from that to this bug.
[1] http://qa-selenium.mv.mozilla.com:8080/view/B2G/job/b2g.unagi.gaia.nightly.ui.carrier_reboot_loop/
| Reporter | ||
Comment 20•12 years ago
|
||
We experienced a failure with the test I created and mentioned above. I will attach the logcat in the hope that it will be helpful. It seems awfully short to me. Does the logcat get reset on reboot? If not then something strange is going on with the logcat.
| Reporter | ||
Comment 21•12 years ago
|
||
This is the logcat from the test run which just reboots and checks for a carrier signal.
| Assignee | ||
Updated•12 years ago
|
Assignee: nobody → vliu
This particular logfile is from http://qa-selenium.mv.mozilla.com:8080/job/b2g.unagi.gaia.nightly.ui/187/, which was a full-run job; we'll have to continue with the more minimal one, but that hasn't yet netted us a full logcat :-(
| Reporter | ||
Comment 23•12 years ago
|
||
I just ran the full suite on my phone, trying out the branch in the pull request at https://github.com/mozilla/gaia-ui-tests/pull/647. I can see from the results that some tests requiring a carrier connection passed, and then at one point in the job they started failing. I can also see the carrier signal icon on my phone is constantly scanning, which is what I have seen before when this problem occurred.
What is different this time from all other times is that the problem occurred during a test run, as opposed to being present from the beginning of a test run. The full output of the run can be seen at https://gist.github.com/bobsilverberg/5392106, with https://gist.github.com/bobsilverberg/5392106#file-gistfile1-txt-L103 being an example of a test that passed, and https://gist.github.com/bobsilverberg/5392106#file-gistfile1-txt-L429 seeming to be the first failure that is possibly due to this problem.
| Assignee | ||
Comment 24•12 years ago
|
||
(In reply to Bob Silverberg from comment #23)
> Created attachment 737740 [details]
> Logcat from a run of the full suite during which the carrier problem occurred
>
> I just ran the full suite on my phone, trying out the branch in the pull
> request at https://github.com/mozilla/gaia-ui-tests/pull/647. I can see from
> the results that some tests requiring a carrier connection passed, and then
> at one point in the job they started failing. I can also see the carrier
> signal icon on my phone is constantly scanning, which is what I have seen
> before when this problem occurred.
>
> What is different this time from all other times is that the problem
> occurred during a test run, as opposed to being present from the beginning
> of a test run. The full output of the run can be seen at
> https://gist.github.com/bobsilverberg/5392106, with
> https://gist.github.com/bobsilverberg/5392106#file-gistfile1-txt-L103 being
> an example of a test that passed, and
> https://gist.github.com/bobsilverberg/5392106#file-gistfile1-txt-L429
> seeming to be the first failure that is possibly due to this problem.
Thanks for you log information. From the log observation, there is no any ril log in Gecko in the log file. I am afraid there is nothing I can do here. I had told Stephen how to enable ril log. Can you please work with him to enable it? Thanks.
| Reporter | ||
Comment 25•12 years ago
|
||
My apologies vliu, this log is from my own phone and I did not enable ril logging on it, as the tests I was running were not for that purpose. We do have it enabled on our phones on CI, but unfortunately I did not have it enabled on my local phone.
This logcat was from http://qa-selenium.mv.mozilla.com:8080/job/b2g.unagi.gaia.nightly.ui/248/ - anything interesting in here?
| Assignee | ||
Comment 27•12 years ago
|
||
Hi Yoshi,
From the log observation, I could see the some error log.
I/Gecko (19000): RIL Worker: Handling parcel as REQUEST_GET_IMEI
E/GeckoConsole(19000): [JavaScript Error: "[Exception... "Component returned failure code: 0x80470002 (NS_BASE_STREAM_CLOSED) [nsIOutputStream.write]" nsresult: "0x80470002 (NS_BASE_STREAM_CLOSED)" location: "JS frame :: chrome://global/content/devtools/dbg-transport.js :: DT_onOutputStreamReady :: line 94" data: no]" {file: "chrome://global/content/devtools/dbg-transport.js" line: 94}]
:
I/Gecko (19000): RIL Worker: Parcel handling threw Error: Trying to read data beyond the parcel end!
It happens on unagi device under Marionette test. Do you have any idea with this? Thanks.
Flags: needinfo?(allstars.chh)
(In reply to Vincent Liu[:vliu] from comment #27)
> E/GeckoConsole(19000): [JavaScript Error: "[Exception... "Component returned
> failure code: 0x80470002 (NS_BASE_STREAM_CLOSED) [nsIOutputStream.write]"
> nsresult: "0x80470002 (NS_BASE_STREAM_CLOSED)" location: "JS frame ::
> chrome://global/content/devtools/dbg-transport.js :: DT_onOutputStreamReady
> :: line 94" data: no]" {file:
> "chrome://global/content/devtools/dbg-transport.js" line: 94}]
> :
I have met this problem on latest m-c,
but I have no idea about this.
> I/Gecko (19000): RIL Worker: Parcel handling threw Error: Trying to read
> data beyond the parcel end!
>
This should be another problem(Unrelated to above)
Can you post more detailed log ?
I think some logs from RIL are missing.
Flags: needinfo?(allstars.chh)
Oh never mind,
I found log is in Comment 26.
From Comment 26,
it looks some thing wrong about the radio.
All requests are failed with Generic Error (errno = 2)
Can you help to provide radio log?
adb logcat -b radio
Comment 31•12 years ago
|
||
Askeing, have you ever seen this on your CI?
(In reply to Stephen Donner [:stephend] from comment #33)
> Created attachment 740638 [details]
> logcat associated with the above comment/run
:yoshi, anything interesting here?
Flags: needinfo?(allstars.chh)
Those 2 logs from Comment 33 and Comment 34 seems different from Comment 26
Comment 34 those RIL requests are succeed.
Also I didn't see any log about voice connection/operator.
And there are too many logs attached,
and seems not quite useful.
I would suggest obsolete some of them.
Vincent or Askeing (from Comment 31)
Can we reproduce this?
Flags: needinfo?(allstars.chh) → needinfo?(vliu)
Askeing said he could reproduce this and will gather the logs for us.
Flags: needinfo?(vliu)
(In reply to Stephen Donner [:stephend] from comment #5)
> Created attachment 730250 [details]
> Logcat from unagi-4, after it had run and failed data tests
line 35 :I/Gecko (13120): RIL Worker: Solicited response for request type 38, token 8, error 2
This is the same problem in Comment 26.
I mark this comment obsolete.
Attachment #730250 -
Attachment is obsolete: true
Comment on attachment 737740 [details]
Logcat from a run of the full suite during which the carrier problem occurred
Obsolete this since it doesn't contain debug message
Attachment #737740 -
Attachment is obsolete: true
| Assignee | ||
Comment 39•12 years ago
|
||
Comment on attachment 735946 [details]
Logfile from unagi-1 failed build
Obsolete it because There is no any ril messages to debug
Attachment #735946 -
Attachment is obsolete: true
(In reply to Bob Silverberg from comment #3)
> Created attachment 729952 [details]
> Logcat from test run on CI which caused carrier signal to stop functioning
>
> Attaching the logcat from the test run that experienced this problem on CI.
This log from line 1300~line 2000,
seems RIL requests related to voice/data/operator/signal are all failed.
If that's the problem you mentioned,
we probably need to forward this problem to modem side.
(In reply to Bob Silverberg from comment #0)
From line 1486 ~ 2544 , the voice/data are dead. But it seems signal strength is still working.
line 3385, voice registration okay.
3399, data registration okay
3410, operator okay,
(In reply to Stephen Donner [:stephend] from comment #10)
> Created attachment 731765 [details]
> Logcat from unagi-3
>
> Build was correlates to this logcat is
> http://qa-selenium.mv.mozilla.com:8080/view/B2G/job/b2g.unagi.gaia.v1-train.
> ui/411/
Line 2416 ~ 2525 the voice/data/signal are dead
Line 3061, signal works again.
Line 3153, voice is working
Comment 43•12 years ago
|
||
Updated•12 years ago
|
Attachment #741654 -
Attachment mime type: text/x-log → text/plain
(In reply to Yoshi Huang[:allstars.chh][:yoshi] from comment #41)
> (In reply to Bob Silverberg from comment #0)
>
> From line 1486 ~ 2544 , the voice/data are dead. But it seems signal
> strength is still working.
>
> line 3385, voice registration okay.
> 3399, data registration okay
> 3410, operator okay,
But I saw the PID of B2G is still the same (442).
This seems not the problem Stephen mentioned.
(In reply to Stephen Donner [:stephend] from comment #13)
> Unagi-3 had another failure today, which I captured here:
>
> 1) RIL log: http://dl.dropbox.com/u/725560/full-ril-log.txt
> 2) Radio log: http://dl.dropbox.com/u/725560/radio-logcat.txt
>
> According to Vincent on #b2g today, cardState had an error, and then later,
> "Failed to read from rilproxy socket, closing..." in the 2nd log.
The message "Failed to read from rilproxy socket, closing..." is normal here.
Because b2g restarts, so the socket connection is closed.
It's not an error here.
(In reply to Askeing Yen[:askeing] from comment #43)
> Created attachment 741654 [details]
> logcat_radio during run gaiatest
Thanks for Askeing for capturing this log.
It helps a lot!!!
CCing Anshul.
Anshul, can you check the log from comment 43. It's captured from 'adb logcat -b radio'
In line 48614
E/RILC ( 112): [RID 0] Fail to add ReqList entry for token id 2 [0x383130] (no buffer)
Then after this line
All RIL requests are failed with the same reason.
The ReqList entries contains a lot of
"RIL_REQUEST_RADIO_POWER (23), token id 1"
because in these tests, b2g restarts many many times.
However the 1st token is always 'cached' (or ignored?) by modem side.
Can you help to check this?
Thanks
Flags: needinfo?(anshulj)
Comment 47•12 years ago
|
||
Hey Yoshi, we haven't seen this issue on our side yet. Wondering if you see the same issue with commercial RIL. Could you please give it a test to isolate the issue in QC code or Moz's.
Flags: needinfo?(anshulj)
I think this happened on Mozilla RIL only.
But the problem is on modem side.
Because modem caches those RIL_REQUEST_RADIO_POWER requests, and never clears it.
After several rounds of tests, this makes modem cannot accept any RIL request anymore.
Hi Anshul,
Can you help to check why is ReqList never cleaned?
The error is generated from modem, not from Moz RIL.
Thanks.
Flags: needinfo?(anshulj)
Also can you also tell us why modem always keeps/queues the "RIL_REQUEST_RADIO_POWER_ON" command, without cleaning it?
Comment 51•12 years ago
|
||
This issue is reported against Ugani, and we don't support that device.
Updated•12 years ago
|
Flags: needinfo?(anshulj)
From the log (Comment 43)
Line 27603 should be the start point of this problem.
Modem still hasn't executed the request "RIL_REQUEST_RADIO_POWER",
however at this time b2g restarts(run another test),
Then b2g sends another RIL_REQUEST_RADIO_POWER, also with other requests.
And in this round,
b2g restarts too quickly, and modem still doesn't have change to execute the RADIO_POWER command. So two RIL_REQUEST_RADIO_POWER requests have been queued.
And so on,
Finally the ReqList has been full (the list should store about 20 requests) in 48614.
(In reply to Yoshi Huang[:allstars.chh][:yoshi] from comment #52)
> Finally the ReqList has been full (the list should store about 20 requests)
^^^ should be 50 I think
Comment 54•12 years ago
|
||
(In reply to Yoshi Huang[:allstars.chh][:yoshi] from comment #52)
> From the log (Comment 43)
>
> Line 27603 should be the start point of this problem.
>
> Modem still hasn't executed the request "RIL_REQUEST_RADIO_POWER",
> however at this time b2g restarts(run another test),
Should we be waiting for the request to be executed? If so, is this information exposed by any of the WebAPIs?
> Then b2g sends another RIL_REQUEST_RADIO_POWER, also with other requests.
> And in this round,
> b2g restarts too quickly, and modem still doesn't have change to execute the
> RADIO_POWER command. So two RIL_REQUEST_RADIO_POWER requests have been
> queued.
>
> And so on,
>
> Finally the ReqList has been full (the list should store about 20 requests)
> in 48614.
Is there a way for us to query or clear this queue when we stop/start the B2G process?
I've checked the logs and run the test,
and I got different discover this time.
When B2G is initialized(or restarts), RIL will send some requests to modem.
For the log provided by Askeing in Comment 43,
I found b2g keeps restarting very quickly,
so it causes B2G keeps sending requests to modem.
Line 27749, b2g restarts,
Then line 27908, b2g restarts again, notice that line 27906 the token 7 just returns,
it means B2G only sends 7 tokens.
Next, line 28080, b2g restarts again, also this time just 7 tokens are sent.
... then the same,
until B2G sending too many RIL requests and overflow modem's command queue.
So I think the problem should be why b2g keeps restarting ?
Is it crashed or caused by gaia-ui-test?
How do you think?
Thanks
Comment 56•12 years ago
|
||
:yoshi, we intentionally stop the b2g process in gaia-ui-tests. We do this to kill all apps, remove databases and restore it to a 'clean start'. This way the results of a previous test do not affect the results of the next test.
It makes the tests much much more reliable and we'd like to retain it.
We can sacrifice some test speed if we need to stop or restart b2g or the ril more softly to stop overloading the modem queue.
Is there something you think we should be waiting for before stopping or starting the process?
Hi Zac
I'll check with Askeing why b2g restarts so quickly and will update the status tomorrow.
Thanks
Askeing has provided the latest log with more information enabled.
see http://release1-qa.corp.tpe1.mozilla.com:8080/view/B2G.v1train/job/B2G.v1train.unagi.gaiatest/88/
Please see the logcat_radio.log
it's captured by 'adb logcat -b main -b radio -v time",
which means we can see log from B2G and RILD process, with timestamp enabled.
From that log, I think the problem starts from line 361524 ~ 362869.
In this test, b2g restarts in line 361524:
361524 05-06 00:05:10.439 E/RILPROXY( 110): Failed to read from rilproxy socket, closing...
The 1st RIL request is sent in line 361631
However this request is *never* processed by modem,
The last time we see ReqList in this test is line 362341
05-06 00:05:19.408 D/RILC ( 116): [RID 0] ReqList entries :
05-06 00:05:19.408 D/RILC ( 116): RIL_REQUEST_RADIO_POWER (23), token id 1
The test ends in line 362869:
362869 05-06 00:05:49.377 E/RILPROXY( 110): Failed to read from rilproxy socket, closing...
During these 30 seconds(00:05:19 ~ 00:05:49) Modem didn't process this request at all,
so we can still this request is queued in next round of test.
362946 05-06 00:05:54.422 D/RILC ( 116): [RID 0] ReqList entries :
362947 05-06 00:05:54.422 D/RILC ( 116): RIL_REQUEST_RADIO_POWER (23), token id 1
362948 05-06 00:05:54.422 D/RILC ( 116): RIL_REQUEST_RADIO_POWER (23), token id 1
And we can see Modem always queued the 1st request From now on.
So the log here shows b2g didn't restart too quickly nor crash, and provide more information(timestamp) that it's caused by modem.
And since modem team won't help to fix this problem (Comment 51),
I would suggest
1. Don't restart b2g at all
2. reboot the entire OS
3. restart rild and b2g.
How do you think?
Thanks
Comment 59•12 years ago
|
||
Hi Yoshi, thanks for your really thorough investigation here :)
Part 3 would be the preferred option here for us. If it's just a process then we can experiment with stopping and starting it at the same time we stop and start b2g. (do you know if it would be sensitive to the order of startup?)
I am quite sure we'll be able to replicate this on an Inari or other device once we try so I think we'll be able to get the Modem team's co-operation. I'll try to replicate it this week on one of those devices and take a log of it.
What do you feel is the expected safe behaviour? That the modem should clear old requests from its queue when b2g is started up? Or the RILD should detect and handle the b2g process shutting down and restarting?
(In reply to Zac C (:zac) from comment #59)
> (do you know if it would be sensitive to the order of
> startup?)
>
From init*rc, I think rild should restart first, then b2g
> I am quite sure we'll be able to replicate this on an Inari or other device
> once we try so I think we'll be able to get the Modem team's co-operation.
> I'll try to replicate it this week on one of those devices and take a log of
> it.
>
If we can reproduce this on Commercial RIL, I think we can get the modem team look into this.
> What do you feel is the expected safe behaviour? That the modem should clear
> old requests from its queue when b2g is started up? Or the RILD should
> detect and handle the b2g process shutting down and restarting?
You point also makes sense to me. But I still wonder why modem cannot process the 1st token after running many tests, if that problem can be fixed, whether rild should clean the queue is less important to me.
Comment 61•12 years ago
|
||
We are definitely seeing this on the Commercial RIL with b2g18/v1.0.1 now. I haven't managed to capture a proper log yet but it shouldn't be far away.
| Assignee | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•