Closed
Bug 902203
Opened 12 years ago
Closed 10 years ago
Intermittent test_ril_code_quality.py TestRILCodeQuality.test_ril_consts | TimeoutException: socket.timeout | MarionetteException: Please start a session
Categories
(Firefox OS Graveyard :: General, defect)
Firefox OS Graveyard
General
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: jgriffin, Unassigned)
References
Details
(Keywords: intermittent-failure)
Attachments
(2 files, 1 obsolete file)
4.02 KB,
patch
|
Details | Diff | Splinter Review | |
4.15 KB,
patch
|
Details | Diff | Splinter Review |
RyanVM
https://tbpl.mozilla.org/php/getParsedLog.php?id=26231145&tree=B2g-Inbound
b2g_emulator b2g-inbound opt test marionette-webapi on 2013-08-06 14:40:40
revision: 9e989262176d
slave: talos-r3-fed-081
ImportError: cannot import name OrderedDict
raise TimeoutException(message='socket.timeout', status=ErrorCodes.TIMEOUT, stacktrace=None)
TEST-UNEXPECTED-FAIL | test_ril_code_quality.py TestRILCodeQuality.test_ril_worker | TimeoutException: socket.timeout
TEST-UNEXPECTED-FAIL | test_ril_code_quality.py TestRILCodeQuality.test_ril_worker | MarionetteException: Please start a session
Return code: 2560
Marionette exited with return code 2560: harness failures
# TBPL FAILURE #
This failure is distinct from bug 898047.
Reporter | ||
Comment 1•12 years ago
|
||
> This failure is distinct from bug 898047.
Er, make that bug 898074.
Aus: Would you be a good person to look at this? Or is this maybe something for the RIL team?
Updated•12 years ago
|
Summary: test_ril_code_quality.py TestRILCodeQuality.test_ril_worker | TimeoutException: socket.timeout → Intermittent test_ril_code_quality.py TestRILCodeQuality.test_ril_consts | TimeoutException: socket.timeout | MarionetteException: Please start a session
Comment 3•12 years ago
|
||
Comment 5•12 years ago
|
||
This is a python-based marionette test with standard structure.
class TestSomething(MarionetteTestCase):
def setUp(self):
MarionetteTestCase.setUp(self)
def test_foo(self):
# run test for 'foo'
def tearDown(self):
MarionetteTestCase.tearDown(self)
16:38:19 INFO - File "/home/cltbld/talos-slave/test/build/tests/marionette/marionette/marionette.py", line 625, in execute_script
16:38:19 INFO - filename=os.path.basename(frame[0]))
16:38:19 INFO - File "/home/cltbld/talos-slave/test/build/tests/marionette/marionette/marionette.py", line 342, in _send_message
16:38:19 ERROR - raise TimeoutException(message='socket.timeout', status=ErrorCodes.TIMEOUT, stacktrace=None)
16:38:19 ERROR - TEST-UNEXPECTED-FAIL | test_ril_code_quality.py TestRILCodeQuality.test_ril_consts | TimeoutException: socket.timeout
16:38:19 INFO - ======================================================================
16:38:19 INFO - ERROR: test_ril_consts (test_ril_code_quality.TestRILCodeQuality)
16:38:19 INFO - ----------------------------------------------------------------------
16:38:19 INFO - Traceback (most recent call last):
16:38:19 INFO - File "/home/cltbld/talos-slave/test/build/tests/marionette/marionette/marionette_test.py", line 162, in run
16:38:19 INFO - self.tearDown()
16:38:19 INFO - File "/home/cltbld/talos-slave/test/build/tests/marionette/tests/dom/system/gonk/tests/marionette/test_ril_code_quality.py", line 341, in tearDown
16:38:19 INFO - MarionetteTestCase.tearDown(self)
16:38:19 INFO - File "/home/cltbld/talos-slave/test/build/tests/marionette/marionette/marionette_test.py", line 307, in tearDown
16:38:19 INFO - self.marionette.set_context("content")
16:38:19 INFO - File "/home/cltbld/talos-slave/test/build/tests/marionette/marionette/marionette.py", line 512, in set_context
16:38:19 INFO - return self._send_message('setContext', 'ok', value=context)
16:38:19 INFO - File "/home/cltbld/talos-slave/test/build/tests/marionette/marionette/marionette.py", line 328, in _send_message
16:38:19 INFO - raise MarionetteException(message="Please start a session")
16:38:19 INFO - TEST-UNEXPECTED-FAIL | test_ril_code_quality.py TestRILCodeQuality.test_ril_consts | MarionetteException: Please start a session
You could check marionette.py Marionette._send_message()
When socket.timeout exception occurs, Marionette.session is set to None
http://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/marionette.py#338
Then, in MarionetteTestCase.tearDown(), it calls Marionette._send_message('setContext', 'ok', value=context). At this time, session is None, and hit the condition for new-session exception
http://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/marionette.py#327
Therefore, in this test case. We would always got |MarionetteException: Please start a session| if |TimeoutException: socket.timeout| occurs.
Flags: needinfo?(szchen)
Aknow, it is quite possible that this is a problem somewhere in marionette or in some interaction between marionette and b2g. It would still be greatly appreciated if you could help figure out what is going wrong here. Johnathan should be able to help as well.
We are still having a lot of test issues in b2g and we need help with getting us to a better state.
Comment 7•12 years ago
|
||
Sure. I just want to mention that although the failure pattern looks different in this test, it is also because of socket timeout (Bug 898074).
Reporter | ||
Comment 8•12 years ago
|
||
Even though this is a socket.error, it looks quite distinct from bug 898074. In this case, the error is occurring during one of several execute_script calls; it seems that something bad is happening when the JS is executed.
Comment 9•12 years ago
|
||
16:31:49 INFO - test_RadioInterfaceLayer (test_ril_code_quality.TestRILCodeQuality) ... ok
16:34:53 INFO - test_ril_consts (test_ril_code_quality.TestRILCodeQuality) ... ERROR
For each test case, we create the socket in MarionetteClient and set its timeout to 180s when the first time calling _send_message().
http://mzl.la/19P230t
From log above, the time difference between these two test cases is just about 180s. So I guess its just because the script is running so long and cause socket raise the timeout exception.
The long running time in this case is expected. Below is the result when I test on local PC.
TEST-START test_ril_code_quality.py
test_RILContentHelper (test_ril_code_quality.TestRILCodeQuality) ... ok
test_RadioInterfaceLayer (test_ril_code_quality.TestRILCodeQuality) ... ok
test_ril_consts (test_ril_code_quality.TestRILCodeQuality) ... ok
test_ril_worker (test_ril_code_quality.TestRILCodeQuality) ... ok
----------------------------------------------------------------------
Ran 4 tests in 162.447s
real 3m9.829s
user 2m59.715s
sys 0m15.929s
Maybe we could extend the default timeout. Or is there any way I could set the timeout value from test case.
Reporter | ||
Comment 10•12 years ago
|
||
Thanks for looking into this.
There's no way to extend the socket timeout from a test, but you could increase it here: http://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/client.py#59
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 12•12 years ago
|
||
Previous fail log
14:53:44 INFO - test_ril_consts (test_ril_code_quality.TestRILCodeQuality) ... ok
14:57:00 INFO - test_ril_worker (test_ril_code_quality.TestRILCodeQuality) ... ERROR
14:57:01 ERROR - raise TimeoutException(message='socket.timeout', status=ErrorCodes.TIMEOUT, stacktrace=None)
Test runs for 3:16, and got the socket.timeout exception. (socket timeout is set to 180s)
I try to extend the timeout value from 180s to 240s.
https://tbpl.mozilla.org/?tree=Try&rev=030f075f0f15
Run 20 times and doesn't got the socket timeout exception for this test.
From successful log, the worst running time is 3:28, which is longer than 3:16. However, this time, we get the pass result.
02:27:57 INFO - test_ril_consts (test_ril_code_quality.TestRILCodeQuality) ... ok
02:31:25 INFO - test_ril_worker (test_ril_code_quality.TestRILCodeQuality) ... ok
So I think that maybe we could extend the timer. It seems help for this test. Is there any concern or side effect if we change the value from 180s to 240s?
Reporter | ||
Comment 13•12 years ago
|
||
> So I think that maybe we could extend the timer. It seems help for this test.
> Is there any concern or side effect if we change the value from 180s to 240s?
No, I think that would be fine.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 16•12 years ago
|
||
Maybe I could add a retry mechanism in this test case when the failure is caused by socket timeout.
The test case is not for functionality. So we could run it several times and report pass if we got one pass. It won't let us miss the seldom bug issue.
Updated•12 years ago
|
Assignee: nobody → szchen
Comment 17•12 years ago
|
||
Postpone one of runjs (= marionette.execute_script) call. So all of the runjs fall into the try block. We catch the TimeoutException and add at most three times of retry.
Attachment #796429 -
Flags: review?(htsai)
Comment 18•12 years ago
|
||
From log, looks like no socket.timeout occurs.
Therefore, I could not make sure that the retry mechanism indeed recover some fail case. However, no harm to add the retry into the test case.
https://tbpl.mozilla.org/?tree=Try&rev=e9fb31ecd4dd
Comment 19•12 years ago
|
||
(In reply to Szu-Yu Chen [:aknow] from comment #18)
> From log, looks like no socket.timeout occurs.
> Therefore, I could not make sure that the retry mechanism indeed recover
> some fail case. However, no harm to add the retry into the test case.
I mean that no socket timeout is happened on this test case.
Other situations are track in Bug 898074
Comment 20•12 years ago
|
||
Comment on attachment 796429 [details] [diff] [review]
Add 3 times of retry
Cancel the review. Current patch will not work as I expected. After catch the exception, some reset/cleanUp should be process first. Then next retry could be executed in a good condition.
Reset and cleanUp part are missing in this patch. Let me check whether it is possible to control it in test case.
Attachment #796429 -
Flags: review?(htsai)
Comment 21•12 years ago
|
||
Comment 22•12 years ago
|
||
Attachment #796429 -
Attachment is obsolete: true
Comment 23•12 years ago
|
||
(In reply to Szu-Yu Chen [:aknow] from comment #20)
> Comment on attachment 796429 [details] [diff] [review]
> Add 3 times of retry
>
> Cancel the review. Current patch will not work as I expected. After catch
> the exception, some reset/cleanUp should be process first. Then next retry
> could be executed in a good condition.
>
> Reset and cleanUp part are missing in this patch. Let me check whether it is
> possible to control it in test case.
Although it is technically possible, but the fix is heavily rely on the behaviour of marionette. It's not a good resolution. So I just leave the patch here to show my attempt.
After socket timeout, the marionette client is closed. Therefore, to recover the marionette environment before the retry, we have to:
(1) re-init the marionette client
(2) start marionette session again
In my patch, I implement the above two things and simulate the timeout exception with some probability. The work could indeed recover the fail. However, it doesn't mean that it could also help on try server. Maybe at that time, something in the system is broken and recovering only (1) and (2) is not enough to let it work.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 32•10 years ago
|
||
Inactive; closing (see bug 1180138).
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•