Closed Bug 902203 Opened 12 years ago Closed 10 years ago

Intermittent test_ril_code_quality.py TestRILCodeQuality.test_ril_consts | TimeoutException: socket.timeout | MarionetteException: Please start a session

Categories

(Firefox OS Graveyard :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: jgriffin, Unassigned)

References

Details

(Keywords: intermittent-failure)

Attachments

(2 files, 1 obsolete file)

RyanVM https://tbpl.mozilla.org/php/getParsedLog.php?id=26231145&tree=B2g-Inbound b2g_emulator b2g-inbound opt test marionette-webapi on 2013-08-06 14:40:40 revision: 9e989262176d slave: talos-r3-fed-081 ImportError: cannot import name OrderedDict raise TimeoutException(message='socket.timeout', status=ErrorCodes.TIMEOUT, stacktrace=None) TEST-UNEXPECTED-FAIL | test_ril_code_quality.py TestRILCodeQuality.test_ril_worker | TimeoutException: socket.timeout TEST-UNEXPECTED-FAIL | test_ril_code_quality.py TestRILCodeQuality.test_ril_worker | MarionetteException: Please start a session Return code: 2560 Marionette exited with return code 2560: harness failures # TBPL FAILURE # This failure is distinct from bug 898047.
> This failure is distinct from bug 898047. Er, make that bug 898074.
Aus: Would you be a good person to look at this? Or is this maybe something for the RIL team?
Summary: test_ril_code_quality.py TestRILCodeQuality.test_ril_worker | TimeoutException: socket.timeout → Intermittent test_ril_code_quality.py TestRILCodeQuality.test_ril_consts | TimeoutException: socket.timeout | MarionetteException: Please start a session
Aknow, please look at this bug.
Flags: needinfo?(szchen)
This is a python-based marionette test with standard structure. class TestSomething(MarionetteTestCase): def setUp(self): MarionetteTestCase.setUp(self) def test_foo(self): # run test for 'foo' def tearDown(self): MarionetteTestCase.tearDown(self) 16:38:19 INFO - File "/home/cltbld/talos-slave/test/build/tests/marionette/marionette/marionette.py", line 625, in execute_script 16:38:19 INFO - filename=os.path.basename(frame[0])) 16:38:19 INFO - File "/home/cltbld/talos-slave/test/build/tests/marionette/marionette/marionette.py", line 342, in _send_message 16:38:19 ERROR - raise TimeoutException(message='socket.timeout', status=ErrorCodes.TIMEOUT, stacktrace=None) 16:38:19 ERROR - TEST-UNEXPECTED-FAIL | test_ril_code_quality.py TestRILCodeQuality.test_ril_consts | TimeoutException: socket.timeout 16:38:19 INFO - ====================================================================== 16:38:19 INFO - ERROR: test_ril_consts (test_ril_code_quality.TestRILCodeQuality) 16:38:19 INFO - ---------------------------------------------------------------------- 16:38:19 INFO - Traceback (most recent call last): 16:38:19 INFO - File "/home/cltbld/talos-slave/test/build/tests/marionette/marionette/marionette_test.py", line 162, in run 16:38:19 INFO - self.tearDown() 16:38:19 INFO - File "/home/cltbld/talos-slave/test/build/tests/marionette/tests/dom/system/gonk/tests/marionette/test_ril_code_quality.py", line 341, in tearDown 16:38:19 INFO - MarionetteTestCase.tearDown(self) 16:38:19 INFO - File "/home/cltbld/talos-slave/test/build/tests/marionette/marionette/marionette_test.py", line 307, in tearDown 16:38:19 INFO - self.marionette.set_context("content") 16:38:19 INFO - File "/home/cltbld/talos-slave/test/build/tests/marionette/marionette/marionette.py", line 512, in set_context 16:38:19 INFO - return self._send_message('setContext', 'ok', value=context) 16:38:19 INFO - File "/home/cltbld/talos-slave/test/build/tests/marionette/marionette/marionette.py", line 328, in _send_message 16:38:19 INFO - raise MarionetteException(message="Please start a session") 16:38:19 INFO - TEST-UNEXPECTED-FAIL | test_ril_code_quality.py TestRILCodeQuality.test_ril_consts | MarionetteException: Please start a session You could check marionette.py Marionette._send_message() When socket.timeout exception occurs, Marionette.session is set to None http://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/marionette.py#338 Then, in MarionetteTestCase.tearDown(), it calls Marionette._send_message('setContext', 'ok', value=context). At this time, session is None, and hit the condition for new-session exception http://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/marionette.py#327 Therefore, in this test case. We would always got |MarionetteException: Please start a session| if |TimeoutException: socket.timeout| occurs.
Flags: needinfo?(szchen)
Aknow, it is quite possible that this is a problem somewhere in marionette or in some interaction between marionette and b2g. It would still be greatly appreciated if you could help figure out what is going wrong here. Johnathan should be able to help as well. We are still having a lot of test issues in b2g and we need help with getting us to a better state.
Sure. I just want to mention that although the failure pattern looks different in this test, it is also because of socket timeout (Bug 898074).
Even though this is a socket.error, it looks quite distinct from bug 898074. In this case, the error is occurring during one of several execute_script calls; it seems that something bad is happening when the JS is executed.
16:31:49 INFO - test_RadioInterfaceLayer (test_ril_code_quality.TestRILCodeQuality) ... ok 16:34:53 INFO - test_ril_consts (test_ril_code_quality.TestRILCodeQuality) ... ERROR For each test case, we create the socket in MarionetteClient and set its timeout to 180s when the first time calling _send_message(). http://mzl.la/19P230t From log above, the time difference between these two test cases is just about 180s. So I guess its just because the script is running so long and cause socket raise the timeout exception. The long running time in this case is expected. Below is the result when I test on local PC. TEST-START test_ril_code_quality.py test_RILContentHelper (test_ril_code_quality.TestRILCodeQuality) ... ok test_RadioInterfaceLayer (test_ril_code_quality.TestRILCodeQuality) ... ok test_ril_consts (test_ril_code_quality.TestRILCodeQuality) ... ok test_ril_worker (test_ril_code_quality.TestRILCodeQuality) ... ok ---------------------------------------------------------------------- Ran 4 tests in 162.447s real 3m9.829s user 2m59.715s sys 0m15.929s Maybe we could extend the default timeout. Or is there any way I could set the timeout value from test case.
Thanks for looking into this. There's no way to extend the socket timeout from a test, but you could increase it here: http://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/client.py#59
Previous fail log 14:53:44 INFO - test_ril_consts (test_ril_code_quality.TestRILCodeQuality) ... ok 14:57:00 INFO - test_ril_worker (test_ril_code_quality.TestRILCodeQuality) ... ERROR 14:57:01 ERROR - raise TimeoutException(message='socket.timeout', status=ErrorCodes.TIMEOUT, stacktrace=None) Test runs for 3:16, and got the socket.timeout exception. (socket timeout is set to 180s) I try to extend the timeout value from 180s to 240s. https://tbpl.mozilla.org/?tree=Try&rev=030f075f0f15 Run 20 times and doesn't got the socket timeout exception for this test. From successful log, the worst running time is 3:28, which is longer than 3:16. However, this time, we get the pass result. 02:27:57 INFO - test_ril_consts (test_ril_code_quality.TestRILCodeQuality) ... ok 02:31:25 INFO - test_ril_worker (test_ril_code_quality.TestRILCodeQuality) ... ok So I think that maybe we could extend the timer. It seems help for this test. Is there any concern or side effect if we change the value from 180s to 240s?
> So I think that maybe we could extend the timer. It seems help for this test. > Is there any concern or side effect if we change the value from 180s to 240s? No, I think that would be fine.
Depends on: 904490
Maybe I could add a retry mechanism in this test case when the failure is caused by socket timeout. The test case is not for functionality. So we could run it several times and report pass if we got one pass. It won't let us miss the seldom bug issue.
Assignee: nobody → szchen
Attached patch Add 3 times of retry (obsolete) — Splinter Review
Postpone one of runjs (= marionette.execute_script) call. So all of the runjs fall into the try block. We catch the TimeoutException and add at most three times of retry.
Attachment #796429 - Flags: review?(htsai)
From log, looks like no socket.timeout occurs. Therefore, I could not make sure that the retry mechanism indeed recover some fail case. However, no harm to add the retry into the test case. https://tbpl.mozilla.org/?tree=Try&rev=e9fb31ecd4dd
(In reply to Szu-Yu Chen [:aknow] from comment #18) > From log, looks like no socket.timeout occurs. > Therefore, I could not make sure that the retry mechanism indeed recover > some fail case. However, no harm to add the retry into the test case. I mean that no socket timeout is happened on this test case. Other situations are track in Bug 898074
Comment on attachment 796429 [details] [diff] [review] Add 3 times of retry Cancel the review. Current patch will not work as I expected. After catch the exception, some reset/cleanUp should be process first. Then next retry could be executed in a good condition. Reset and cleanUp part are missing in this patch. Let me check whether it is possible to control it in test case.
Attachment #796429 - Flags: review?(htsai)
Attachment #796429 - Attachment is obsolete: true
(In reply to Szu-Yu Chen [:aknow] from comment #20) > Comment on attachment 796429 [details] [diff] [review] > Add 3 times of retry > > Cancel the review. Current patch will not work as I expected. After catch > the exception, some reset/cleanUp should be process first. Then next retry > could be executed in a good condition. > > Reset and cleanUp part are missing in this patch. Let me check whether it is > possible to control it in test case. Although it is technically possible, but the fix is heavily rely on the behaviour of marionette. It's not a good resolution. So I just leave the patch here to show my attempt. After socket timeout, the marionette client is closed. Therefore, to recover the marionette environment before the retry, we have to: (1) re-init the marionette client (2) start marionette session again In my patch, I implement the above two things and simulate the timeout exception with some probability. The work could indeed recover the fail. However, it doesn't mean that it could also help on try server. Maybe at that time, something in the system is broken and recovering only (1) and (2) is not enough to let it work.
un-assign myself for comment 23
Assignee: szchen → nobody
Blocks: 936504
Inactive; closing (see bug 1180138).
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: