Closed Bug 936504 Opened 11 years ago Closed 11 years ago

Frequent TimeoutException(message='socket.timeout', status=ErrorCodes.TIMEOUT, stacktrace=None) during test_ril_code_quality.py

Categories

(Firefox OS Graveyard :: RIL, defect)

defect
Not set
normal

Tracking

(firefox26 unaffected, firefox27 unaffected, firefox28 fixed, firefox-esr24 unaffected, b2g-v1.2 fixed)

RESOLVED FIXED
1.3 Sprint 5 - 11/22
Tracking Status
firefox26 --- unaffected
firefox27 --- unaffected
firefox28 --- fixed
firefox-esr24 --- unaffected
b2g-v1.2 --- fixed

People

(Reporter: jgriffin, Assigned: aknow)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file, 1 obsolete file)

The test test_ril_code_quality.py is frequently timing out with the exception:

raise TimeoutException(message='socket.timeout', status=ErrorCodes.TIMEOUT, stacktrace=None)
TEST-UNEXPECTED-FAIL | test_ril_code_quality.py test_ril_code_quality.TestRILCodeQuality.test_ril_worker | TimeoutException: socket.timeout
TEST-UNEXPECTED-FAIL | test_ril_code_quality.py test_ril_code_quality.TestRILCodeQuality.test_ril_worker | MarionetteException: Please start a session
Return code: 2560

log: https://tbpl.mozilla.org/php/getParsedLog.php?id=30309471&tree=B2g-Inbound

We saw a similar problem back in bug 902203.  In that case, the fix was to increase the timeout, since this test simply runs very long.
:aknow, is this the same problem as in bug 902203?  Is increasing the timeout again the right solution?
Flags: needinfo?(szchen)
(In reply to Jonathan Griffin (:jgriffin) from comment #1)
> :aknow, is this the same problem as in bug 902203?  Is increasing the
> timeout again the right solution?

Yes, it is a same problem as bug 902203. I think that we could extend the timeout again.

I Just checked a pass log on a recent push from TBPL.

Pass log
========
           12:04:56     INFO -  TEST-START test_ril_code_quality.py
38s        12:05:34     INFO -  test_RILContentHelper (test_ril_code_quality.TestRILCodeQuality) ... ok
58s        12:06:32     INFO -  test_RadioInterfaceLayer (test_ril_code_quality.TestRILCodeQuality) ... ok
45s        12:07:17     INFO -  test_ril_consts (test_ril_code_quality.TestRILCodeQuality) ... ok
4min 2s    12:11:19     INFO -  test_ril_worker (test_ril_code_quality.TestRILCodeQuality) ... ok
11s        12:11:30     INFO -  test_worker_buf (test_ril_code_quality.TestRILCodeQuality) ... ok


Current fail log
================
           20:01:54     INFO -  TEST-START test_ril_code_quality.py
43s        20:02:37     INFO -  test_RILContentHelper (test_ril_code_quality.TestRILCodeQuality) ... ok
1min 12s   20:03:49     INFO -  test_RadioInterfaceLayer (test_ril_code_quality.TestRILCodeQuality) ... ok
46s        20:04:35     INFO -  test_ril_consts (test_ril_code_quality.TestRILCodeQuality) ... ok
4min 18s   20:08:53     INFO -  test_ril_worker (test_ril_code_quality.TestRILCodeQuality) ... ERROR
24s        20:09:17     INFO -  test_worker_buf (test_ril_code_quality.TestRILCodeQuality) ... ok

The running time of each sub test from fail log is longer than the one in pass log. As we could see, there is one test in pass log which runs in 4min 2s. It is closed to current socket timeout value in marionette (240s). How about extending the timeout further this time, i.e., 240s -> 360s? So it has more margin.
Flags: needinfo?(szchen)
Depends on: 937272
Increasing the socket timeout didn't help here.  When this fails, I see something like this in the log:

20:38:16     INFO -  11-12 23:30:58.698    45    45 I Gecko   : 1384317058599	Marionette	DEBUG	Got request: execute, data: {"to":"0","sessionId":"6-b2g","name":"executeScript","parameters":{"specialPowers":false,"scriptTimeout":null,"newSandbox":false,"args":[],"filename":"test_ril_code_quality.py","script":"\n    global.script = '/* Copyright 2012 Mozilla Foundation and Mozilla contributors\\n *\\n * Licensed under the Apache License, Version 2.0 (the \"License\");\\n * you may not use this file except in compliance with the License.\\n * You may obtain a copy of the License at\\n *\\n *     http://www.apache.org/licenses/LICENSE-2.0\\n *\\n * Unless required by applicable law or agreed to in writing, software\\n * distributed under the License is distributed on an \"AS IS\" BASIS,\\n * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\\n * See the License for the specific language governing permissions and\\n * limitations under the License.\\n */\\n\\n// Set to true to debug all RIL layers\\nthis.DEBUG_ALL = false;\\n\\n// Set individually to de
20:38:16     INFO -  11-12 23:31:02.268  3516  3516 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:31:07.328  3518  3518 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:31:12.418  3520  3520 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:31:17.488  3522  3522 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:31:22.568  3524  3524 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:31:27.648  3526  3526 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:31:32.749  3528  3528 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:31:37.819  3530  3530 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:31:41.898  3532  3532 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:31:46.968  3534  3534 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:31:52.048  3536  3536 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:31:57.148  3538  3538 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:32:02.218  3540  3540 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:32:07.318  3542  3542 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:32:12.398  3544  3544 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:32:17.458  3546  3546 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:32:22.568  3548  3548 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:32:27.679  3550  3550 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:32:32.769  3552  3552 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:32:37.819  3554  3554 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:32:41.928  3556  3556 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:32:47.018  3558  3558 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:32:52.088  3560  3560 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:32:57.188  3562  3562 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:33:02.288  3564  3564 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:33:07.368  3566  3566 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:33:12.438  3568  3568 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:33:17.508  3570  3570 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:33:22.618  3572  3572 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:33:27.678  3574  3574 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:33:32.788  3576  3576 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:33:37.892  3578  3578 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:33:41.958  3580  3580 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:33:47.078  3582  3582 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:33:52.148  3584  3584 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:33:57.248  3586  3586 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:34:02.328  3588  3588 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:34:07.448  3590  3590 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:34:12.519  3592  3592 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:34:17.619  3594  3594 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:34:22.688  3596  3596 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:34:27.818  3598  3598 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:34:31.938  3600  3600 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:34:37.018  3602  3602 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:34:42.108  3604  3604 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:34:47.178  3606  3606 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:34:52.273  3608  3608 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:34:57.378  3613  3613 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:35:02.448  3619  3619 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:35:07.548  3621  3621 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:35:12.618  3623  3623 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:35:17.759  3625  3625 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:35:22.838  3627  3627 I log     : OOM Message Logger Started
20:38:16     INFO -  11-12 23:35:23.138    45    45 I Gecko   : 1384317323141	Marionette	DEBUG	accepted connection on 127.0.0.1:55740

I'm not sure if this test is just sometimes *very* slow, or if the OOM Message Logger messages indicate we're running out of memory, or what.  :aknow, do you have any ideas?
On TBPL, It shows that we are now facing the OOM problem. Maybe it is related to the test fail if "OOM Message Logger" indicates the occurrence of OOM (out of memory).

The test will creates a JavaScript string contains the whole contents of a file. For example ril_worker.js, which is 425 KB now. So, the test requires some extra amount of memory. It might fails in OOM condition. Moreover, from comment 3, the line of the log above "OOM" is indeed showing the action of building that huge string.

Is "OOM Message" log always shown when the test fail?
Looks like it is not related to "OOM Message Logger". They are also shown in the pass log.
So... if it cause the problem of pass rate on try server, we would better disable the test (or just partially for some subtest).
Since this is causing so many oranges, I'd be in favor of disabling it if we can't fix it.

Can the test be broken into pieces to potentially avoid the socket timeouts?
Flags: needinfo?(szchen)
Assignee: nobody → szchen
Flags: needinfo?(szchen)
Remove one of the sub-test. It sometimes runs very slow and cause the high fail rate on TBPL.
Attachment #832726 - Flags: review?(htsai)
Attachment #832726 - Flags: review?(htsai) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/699fa8832139
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.3 Sprint 5 - 11/22
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: