Closed
Bug 1091680
Opened 10 years ago
Closed 9 years ago
Socket timeout while in findElement
Categories
(Testing Graveyard :: JSMarionette, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: kgrandon, Assigned: aus)
References
Details
(Whiteboard: [affects=gaia])
User Story
08:10:53 INFO - TEST-UNEXPECTED-FAIL | null | editing an image should have exposure options 08:10:53 INFO - Error: Polling socket recv() timeout! 08:10:53 INFO - at TcpSync._readResponse (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/drivers/tcp-sync.js:140:30) 08:10:53 INFO - at TcpSync.send (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/drivers/tcp-sync.js:153:24) 08:10:53 INFO - at Object.send (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:457:36) 08:10:53 INFO - at Object.Client._sendCommand (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:503:19) 08:10:53 INFO - at Object.Element._sendCommand (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/element.js:49:21) 08:10:53 INFO - at Object.displayed (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/element.js:287:19) 08:10:53 INFO - at Object.Client.waitForSync (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:671:9) 08:10:53 INFO - at Object.Client.waitFor (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:650:60) 08:10:53 INFO - at Object.MarionetteHelper.waitForElement (/builds/slave/test/gaia/node_modules/marionette-helper/index.js:142:12) 08:10:53 INFO - at Object.editExposureButton (/builds/slave/test/gaia/apps/gallery/test/marionette/lib/gallery.js:151:31) 08:10:53 INFO - at Context.<anonymous> (/builds/slave/test/gaia/apps/gallery/test/marionette/edit_image_test.js:44:8) 08:10:53 INFO - at callFn (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:223:21) 08:10:53 INFO - at Test.Runnable.run (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:216:7) 08:10:53 INFO - at Runner.runTest (/builds/slave/test/gaia/node_modules/mocha/lib/runner.js:373:10) 08:10:53 INFO - at /builds/slave/test/gaia/node_modules/mocha/lib/runner.js:451:12 08:10:53 INFO - at next (/builds/slave/test/gaia/node_modules/mocha/lib/runner.js:298:14) 08:10:53 INFO - at /builds/slave/test/gaia/node_modules/mocha/lib/runner.js:308:7 08:10:53 INFO - at next (/builds/slave/test/gaia/node_modules/mocha/lib/runner.js:246:23) 08:10:53 INFO - at /builds/slave/test/gaia/node_modules/mocha/lib/runner.js:270:7 08:10:53 INFO - at done (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:185:5) 08:10:53 INFO - at callFn (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:228:7) 08:10:53 INFO - at Hook.Runnable.run (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:216:7) 08:10:53 INFO - at next (/builds/slave/test/gaia/node_modules/mocha/lib/runner.js:258:10) 08:10:53 INFO - at /builds/slave/test/gaia/node_modules/mocha/lib/runner.js:270:7 08:10:53 INFO - at done (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:185:5) 08:10:53 INFO - at /builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:199:9 08:10:53 INFO - at Object.executeHook (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:369:18) 08:10:53 INFO - at process._tickCallback (node.js:415:13) 08:10:53 INFO - TEST-END | null | editing an image should have exposure options took 21197 ms 08:11:14 INFO - TEST-UNEXPECTED-FAIL | null | editing an image "after each" hook 08:11:14 INFO - Error: Polling socket recv() timeout! 08:11:14 INFO - at TcpSync._readResponse (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/drivers/tcp-sync.js:140:30) 08:11:14 INFO - at TcpSync.send (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/drivers/tcp-sync.js:153:24) 08:11:14 INFO - at Object.send (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:457:36) 08:11:14 INFO - at Object.Client._sendCommand (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:503:19) 08:11:14 INFO - at Object.closeDriver (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:751:14) 08:11:14 INFO - at Object.executeHook (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:359:20) 08:11:14 INFO - at Object.Client.runHook (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:377:7) 08:11:14 INFO - at Object.destroySession [as deleteSession] (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:761:12) 08:11:14 INFO - at Context.<anonymous> (/builds/slave/test/gaia/node_modules/marionette-js-runner/lib/runtime/hostmanager.js:102:14) 08:11:14 INFO - at Hook.Runnable.run (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:196:15) 08:11:14 INFO - at next (/builds/slave/test/gaia/node_modules/mocha/lib/runner.js:258:10) 08:11:14 INFO - at Object._onImmediate (/builds/slave/test/gaia/node_modules/mocha/lib/runner.js:275:5) 08:11:14 INFO - at processImmediate [as _immediateCallback] (timers.js:330:15) 08:12:14 INFO - TEST-UNEXPECTED-FAIL | null | editing an image "after all" hook 08:12:14 INFO - Error: timeout of 60000ms exceeded 08:12:14 INFO - at null.<anonymous> (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:139:19) 08:12:14 INFO - at Timer.listOnTimeout [as ontimeout] (timers.js:110:15)
Attachments
(3 files, 1 obsolete file)
https://tbpl.mozilla.org/php/getParsedLog.php?id=51434183&full=1&branch=b2g-inbound#error0
This also generally results in this error showing up: Error: timeout of 60000ms exceeded
Reporter | ||
Updated•10 years ago
|
User Story: (updated)
Comment 1•10 years ago
|
||
This may be a marionette bug
Updated•10 years ago
|
Status: NEW → ASSIGNED
Component: JSMarionette → Marionette
Summary: Harness Issue: Error: Polling socket recv() timeout! → Socket timeout while in findElement
Reporter | ||
Comment 2•10 years ago
|
||
We're also seeing lots of these errors, 1) Software Home Button - Update Dialog Confirm "before each" hook:
12:25:23 INFO - Error: timeout of 60000ms exceeded
12:25:23 INFO - at null.<anonymous> (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:139:19)
12:25:23 INFO - at Timer.listOnTimeout [as ontimeout] (timers.js:110:15)
12:25:23 INFO - 2) Software Home Button - Update Dialog Confirm "after each" hook:
12:25:23 INFO - Error: Not connected. To write data you must call connect first.
12:25:23 INFO - at TcpSync.send (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/drivers/tcp-sync.js:152:15)
Seen on: http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/b2g-inbound-linux64_gecko/1414954539/b2g-inbound_ubuntu64_vm-b2gdt_test-gaia-js-integration-3-bm113-tests1-linux64-build30.txt.gz
I'm not sure if these are the same thing.
Reporter | ||
Comment 3•10 years ago
|
||
This may be a blocker for re-enabling Gij on mainline branches as it's happening quite a lot on TBPL. I'm not really seeing this on TC, but there are different errors to resolve there still.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•10 years ago
|
Whiteboard: [affects=gaia]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 30•10 years ago
|
||
Moving this to marionette-js since this is not seen python from looking at a handful of comments.
Component: Marionette → JSMarionette
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 39•10 years ago
|
||
When trying to run marionette-js on device test on the latest marionette ( gaia commit 1913c69336f2e8f8bfd97e3179e862d325aa3a99), now it throws a different error:
make[1]: Leaving directory `/home/mozilla/B2G/gaia'
runner-service out: /home/mozilla/B2G/gaia/node_modules/marionette-socket-host
gaia-integration --buildapp=device
runner-service err: ERROR:tornado.general:Uncaught exception, closing connection.
Traceback (most recent call last):
File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/pyzmq-14.4.1-py2.7-linux-x86_64.egg/zmq/eventloop/zmqstream.py", line 407, in _run_callback
callback(*args, **kwargs)
File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/tornado-4.0.2-py2.7-linux-x86_64.egg/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/corredor-0.1-py2.7.egg/corredor/pattern.py", line 95, in on_recv
self.handler(data)
File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/corredor-0.1-py2.7.egg/corredor/handler.py", line 41, in __call__
self.action_map[data['action']](data)
File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/python/runner-service/runner_service/listener.py", line 20, in _
handler(data)
File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/python/runner-service/runner_service/handlers/runner.py", line 128, in start_runner
self.runner.start()
File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/mozrunner-6.5-py2.7.egg/mozrunner/base/device.py", line 68, in start
self.device.setup_profile(self.profile)
File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/mozrunner-6.5-py2.7.egg/mozrunner/devices/base.py", line 113, in setup_profile
self.app_ctx.setup_profile(profile)
File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/mozrunner-6.5-py2.7.egg/mozrunner/application.py", line 148, in setup_profile
self.dm.removeFile(self.remote_settings_db)
File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/mozrunner-6.5-py2.7.egg/mozrunner/application.py", line 120, in remote_settings_db
raise DMError("Could not find settings db in '%s'!" % self.remote_idb_dir)
DMError: Could not find settings db in '/data/local/storage/persistent/chrome/idb'!
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 43•10 years ago
|
||
Having the b2gdesktop stdout might help debug this, but it appears to be sucked into a black hole. Anyone know how to get at it with this test harness?
Assignee | ||
Comment 44•10 years ago
|
||
I'll start poking around, hopefully I can find something.
(In reply to Jonathan Griffin (:jgriffin) from comment #43)
> Having the b2gdesktop stdout might help debug this, but it appears to be
> sucked into a black hole. Anyone know how to get at it with this test
> harness?
I believe the env var DEBUG=b2g-desktop will show the output. Or you can also use DEBUG=* for even more logs.
Updated•10 years ago
|
Assignee: nobody → gaye
Comment 47•10 years ago
|
||
I think I know quite a bit about how to figure out what's going on here, so I'll assign myself and start to investigate.
Comment on attachment 8546281 [details] [diff] [review]
Add DEBUG=* to gaia-integration jobs,
Review of attachment 8546281 [details] [diff] [review]:
-----------------------------------------------------------------
That would do it, I've confirmed locally. The logs go to stderr not stdout, not sure if you are already capturing that.
However, the logs are very verbose, probably would end up with ~100MB logs per run.
Attachment #8546281 -
Flags: review?(jryans) → review+
Comment 49•10 years ago
|
||
That's too verbose; buildbot caps log sizes at 50MB. Would DEBUG=b2g-desktop produce more manageable logs? Or is there another DEBUG setting that would be more helpful?
Flags: needinfo?(jryans)
(In reply to Jonathan Griffin (:jgriffin) from comment #49)
> That's too verbose; buildbot caps log sizes at 50MB. Would
> DEBUG=b2g-desktop produce more manageable logs? Or is there another DEBUG
> setting that would be more helpful?
No, the full log for me was still 108MB. However, it looks like the tests are in 4 chunks on treeherder, so maybe you'd only get 27MB per chunk, and remain below the limit?
Flags: needinfo?(jryans)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 53•10 years ago
|
||
Landing on default, so I can verify this doesn't break anything on cedar:
https://hg.mozilla.org/build/mozharness/rev/194cb626604e
Assignee | ||
Comment 54•10 years ago
|
||
I've been thinking about this issue and I honestly don't think we can simply make it go away. Connections reset by peer seem to be the main reason why this problem would occur IMHO. This is somewhat normal. We should be better at handling this.
It looks by the logging output that we detect that the timeout appropriately but we don't make any attempt to reconnect in the marionette js client. I'm not super familiar with this code but, it should be possible to reconnect and reissue the last command.
sockit-to-me itself has the ability to timeout before the system tcp timeout kicks in. we need to make sure this timeout is lower than our test timeout.
Comment 55•10 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #53)
> Landing on default, so I can verify this doesn't break anything on cedar:
>
> https://hg.mozilla.org/build/mozharness/rev/194cb626604e
Feel free to use pine for this as well if needed.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 57•10 years ago
|
||
This mozharness change is running on cedar, and it would seem to confirm it isn't a Marionette problem. In all the cases I see, including e.g., https://tbpl.mozilla.org/php/getParsedLog.php?id=56158031&tree=Cedar&full=1, I see that Marionette successfully responds to a newSession message before the timeout occurs...i.e., it isn't Marionette stalling or hanging that is causing the problem.
Comment 58•10 years ago
|
||
In production: https://hg.mozilla.org/build/mozharness/rev/194cb626604e
Comment 59•10 years ago
|
||
DEBUG=* will build a debug profile if its set as part of the make script, we can export NODE_DEBUG=b2g-desktop (or *) in https://github.com/mozilla-b2g/gaia/blob/master/bin/gaia-marionette#L6 to get this output
Comment 60•10 years ago
|
||
(In reply to Dale Harvey (:daleharvey) from comment #59)
> DEBUG=* will build a debug profile if its set as part of the make script, we
> can export NODE_DEBUG=b2g-desktop (or *) in
> https://github.com/mozilla-b2g/gaia/blob/master/bin/gaia-marionette#L6 to
> get this output
This was backed out in bug 1120618.
Comment 61•10 years ago
|
||
For anyone else simultaneously investigating, I wrote a little script that you can run on a machine in a gaia directory to try to reproduce the issue and give you some (possibly useful) repro statistics. You can specify an arbitrary statistical confidence and maximum error and the script will run TEST_FILES="apps/gallery/test/marionette/edit_image_test.js" and note whether the socket timeout occurs enough times to reach your desired confidence and error.
Comment hidden (obsolete) |
Reporter | ||
Comment 63•10 years ago
|
||
Attachment #8546281 -
Attachment is obsolete: true
Reporter | ||
Comment 64•10 years ago
|
||
Reporter | ||
Comment 65•10 years ago
|
||
Here is the line in marionette-client that we are running into: https://github.com/mozilla-b2g/marionette-js-client/blob/25d1b2d4c083739bb0cd0648b548c9d153ef5859/lib/marionette/drivers/tcp-sync.js#L78
Comment hidden (obsolete) |
Reporter | ||
Comment 67•10 years ago
|
||
I landed a small logging patch in marionette-js-client which adds some logging and have published v1.5.2: https://github.com/mozilla-b2g/marionette-js-client/commit/ac7d93d72fdc97dd4bcb733be6c1757ad63755d6
Reporter | ||
Comment 68•10 years ago
|
||
Ignore comment 66. The real error we get is: marionette:tcp-sync exception when probing socket [Error: Failed to connect.] Failed to connect.
This comes from: https://github.com/mozilla-b2g/sockit-to-me/blob/0d21bab114bbe1cb145c7b02dda6ef8de1209812/src/sockit.cc#L218
Updated•10 years ago
|
Assignee: gaye → aus
Comment 69•10 years ago
|
||
this still fails in gaia commit 3efaeeb8bde3fc5a5e034c973a19e0fcff384c24. James, is there anything that you want QA team to try in order to fix this bug? This is the bug that blocks us from trying out MarionetteJS on device at the moment.
cc'ing Bhavana for info.
Comment 70•10 years ago
|
||
aus/james can you please help on this ? WE are blocked to try MJS per comment#69. No-jun will be able to help with logs or test this out again or try anything else he can help with to get this moving ,if you NI him.
Flags: needinfo?(aus)
Comment 71•10 years ago
|
||
@nkpark do you have a STR for this? We have been trying on and off to kill this one for awhile?
Flags: needinfo?(jlal) → needinfo?(npark)
Comment 72•10 years ago
|
||
For me, the str is simple (which I went over with mdas before):
- get the latest gaia from master (I do ./repo sync -d on B2G and go into the gaia tree)
- type 'make really-clean' to remove all downloaded files
- Have the latest master branch flashed into the flame device, with v18D firmware loaded
- Make sure the memory on the flame phone is set to 319MB
- connect the flame device, make sure adb devices lists it
- type 'TEST_FILES=./apps/calendar/test/marionette/day_view_test.js BUILDAPP=device make test-integration' (The name of the test does not matter, I just wanted to run the test on device
Note: I run this on my ubuntu VM (14.04)
Expected:
Test should run on device
Actual:
the connection times out
Flags: needinfo?(npark)
Assignee | ||
Comment 73•10 years ago
|
||
Bajaj, I'll be full time on this issue starting next week. In the meantime, I believe gaye also has a work-around.
:gaye, if you could comment on your work here and maybe mark that bug as a dependent that'd be super for our QA folks. :)
Flags: needinfo?(aus) → needinfo?(gaye)
Reporter | ||
Comment 74•10 years ago
|
||
We still need to fix this, but with the new retry logic, I don't think this blocks bug 1120508.
No longer blocks: 1120508
Comment 75•10 years ago
|
||
(In reply to Ghislain Aus Lacroix [:aus] from comment #73)
> Bajaj, I'll be full time on this issue starting next week. In the meantime,
> I believe gaye also has a work-around.
>
> :gaye, if you could comment on your work here and maybe mark that bug as a
> dependent that'd be super for our QA folks. :)
Hi :aus, if you want me to try out your fix, please let me know, as this is one of my higher priorities.
Thanks,
Comment 76•10 years ago
|
||
(In reply to Ghislain Aus Lacroix [:aus] from comment #73)
> Bajaj, I'll be full time on this issue starting next week. In the meantime,
> I believe gaye also has a work-around.
>
> :gaye, if you could comment on your work here and maybe mark that bug as a
> dependent that'd be super for our QA folks. :)
:aus, its been a week ;) so a friendly ping :P given how much we are blocked on this.
Flags: needinfo?(aus)
Comment 77•10 years ago
|
||
So this test suite was unhidden on TBPL since we wrote a bit of logic to retry marionette tests on failure. As far as device testing goes, I'm working on using the python harness' crash reporting utilities in marionette js which may help depending on what's going on here. I know that Aus had a patch to marionette-js-runner a while ago, but I don't know that it was downstreamed to gaia (or that it fixed the problem for good).
Flags: needinfo?(gaye)
Flags: needinfo?(aus)
Comment 78•10 years ago
|
||
2 different issues are tracked under this bug. Let's track the one upon which QA is blocked on bug 1131866.
Assignee | ||
Comment 79•9 years ago
|
||
The root cause of this issue, which is a disconnected session with bad data in the runner, has been resolved. I expect new failures exhibiting this symptom to be new issues related to the tests themselves (a bad test, with an actual failure) rather than an infrastructure or framework issue.
Updated•7 years ago
|
Product: Testing → Testing Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•