Closed Bug 1091680 Opened 10 years ago Closed 9 years ago

Socket timeout while in findElement

Categories

(Testing Graveyard :: JSMarionette, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kgrandon, Assigned: aus)

References

Details

(Whiteboard: [affects=gaia])

User Story

08:10:53     INFO -  TEST-UNEXPECTED-FAIL | null | editing an image should have exposure options
08:10:53     INFO -  Error: Polling socket recv() timeout!
08:10:53     INFO -      at TcpSync._readResponse (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/drivers/tcp-sync.js:140:30)
08:10:53     INFO -      at TcpSync.send (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/drivers/tcp-sync.js:153:24)
08:10:53     INFO -      at Object.send (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:457:36)
08:10:53     INFO -      at Object.Client._sendCommand (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:503:19)
08:10:53     INFO -      at Object.Element._sendCommand (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/element.js:49:21)
08:10:53     INFO -      at Object.displayed (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/element.js:287:19)
08:10:53     INFO -      at Object.Client.waitForSync (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:671:9)
08:10:53     INFO -      at Object.Client.waitFor (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:650:60)
08:10:53     INFO -      at Object.MarionetteHelper.waitForElement (/builds/slave/test/gaia/node_modules/marionette-helper/index.js:142:12)
08:10:53     INFO -      at Object.editExposureButton (/builds/slave/test/gaia/apps/gallery/test/marionette/lib/gallery.js:151:31)
08:10:53     INFO -      at Context.<anonymous> (/builds/slave/test/gaia/apps/gallery/test/marionette/edit_image_test.js:44:8)
08:10:53     INFO -      at callFn (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:223:21)
08:10:53     INFO -      at Test.Runnable.run (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:216:7)
08:10:53     INFO -      at Runner.runTest (/builds/slave/test/gaia/node_modules/mocha/lib/runner.js:373:10)
08:10:53     INFO -      at /builds/slave/test/gaia/node_modules/mocha/lib/runner.js:451:12
08:10:53     INFO -      at next (/builds/slave/test/gaia/node_modules/mocha/lib/runner.js:298:14)
08:10:53     INFO -      at /builds/slave/test/gaia/node_modules/mocha/lib/runner.js:308:7
08:10:53     INFO -      at next (/builds/slave/test/gaia/node_modules/mocha/lib/runner.js:246:23)
08:10:53     INFO -      at /builds/slave/test/gaia/node_modules/mocha/lib/runner.js:270:7
08:10:53     INFO -      at done (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:185:5)
08:10:53     INFO -      at callFn (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:228:7)
08:10:53     INFO -      at Hook.Runnable.run (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:216:7)
08:10:53     INFO -      at next (/builds/slave/test/gaia/node_modules/mocha/lib/runner.js:258:10)
08:10:53     INFO -      at /builds/slave/test/gaia/node_modules/mocha/lib/runner.js:270:7
08:10:53     INFO -      at done (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:185:5)
08:10:53     INFO -      at /builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:199:9
08:10:53     INFO -      at Object.executeHook (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:369:18)
08:10:53     INFO -      at process._tickCallback (node.js:415:13)
08:10:53     INFO -  TEST-END | null | editing an image should have exposure options took 21197 ms
08:11:14     INFO -  TEST-UNEXPECTED-FAIL | null | editing an image "after each" hook
08:11:14     INFO -  Error: Polling socket recv() timeout!
08:11:14     INFO -      at TcpSync._readResponse (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/drivers/tcp-sync.js:140:30)
08:11:14     INFO -      at TcpSync.send (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/drivers/tcp-sync.js:153:24)
08:11:14     INFO -      at Object.send (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:457:36)
08:11:14     INFO -      at Object.Client._sendCommand (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:503:19)
08:11:14     INFO -      at Object.closeDriver (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:751:14)
08:11:14     INFO -      at Object.executeHook (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:359:20)
08:11:14     INFO -      at Object.Client.runHook (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:377:7)
08:11:14     INFO -      at Object.destroySession [as deleteSession] (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/client.js:761:12)
08:11:14     INFO -      at Context.<anonymous> (/builds/slave/test/gaia/node_modules/marionette-js-runner/lib/runtime/hostmanager.js:102:14)
08:11:14     INFO -      at Hook.Runnable.run (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:196:15)
08:11:14     INFO -      at next (/builds/slave/test/gaia/node_modules/mocha/lib/runner.js:258:10)
08:11:14     INFO -      at Object._onImmediate (/builds/slave/test/gaia/node_modules/mocha/lib/runner.js:275:5)
08:11:14     INFO -      at processImmediate [as _immediateCallback] (timers.js:330:15)
08:12:14     INFO -  TEST-UNEXPECTED-FAIL | null | editing an image "after all" hook
08:12:14     INFO -  Error: timeout of 60000ms exceeded
08:12:14     INFO -      at null.<anonymous> (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:139:19)
08:12:14     INFO -      at Timer.listOnTimeout [as ontimeout] (timers.js:110:15)

Attachments

(3 files, 1 obsolete file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=51434183&full=1&branch=b2g-inbound#error0 This also generally results in this error showing up: Error: timeout of 60000ms exceeded
User Story: (updated)
This may be a marionette bug
Status: NEW → ASSIGNED
Component: JSMarionette → Marionette
Summary: Harness Issue: Error: Polling socket recv() timeout! → Socket timeout while in findElement
We're also seeing lots of these errors, 1) Software Home Button - Update Dialog Confirm "before each" hook: 12:25:23 INFO - Error: timeout of 60000ms exceeded 12:25:23 INFO - at null.<anonymous> (/builds/slave/test/gaia/node_modules/mocha/lib/runnable.js:139:19) 12:25:23 INFO - at Timer.listOnTimeout [as ontimeout] (timers.js:110:15) 12:25:23 INFO - 2) Software Home Button - Update Dialog Confirm "after each" hook: 12:25:23 INFO - Error: Not connected. To write data you must call connect first. 12:25:23 INFO - at TcpSync.send (/builds/slave/test/gaia/node_modules/marionette-client/lib/marionette/drivers/tcp-sync.js:152:15) Seen on: http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/b2g-inbound-linux64_gecko/1414954539/b2g-inbound_ubuntu64_vm-b2gdt_test-gaia-js-integration-3-bm113-tests1-linux64-build30.txt.gz I'm not sure if these are the same thing.
This may be a blocker for re-enabling Gij on mainline branches as it's happening quite a lot on TBPL. I'm not really seeing this on TC, but there are different errors to resolve there still.
James - are you looking into this?
Flags: needinfo?(jlal)
Blocks: 1104967
Blocks: 1104968
Blocks: 1104966
Blocks: 1104849
Blocks: 1102904
Blocks: 1102920
Blocks: 1102905
Blocks: 1104753
Whiteboard: [affects=gaia]
Blocks: 1105681
Blocks: 1102921
Blocks: 1106195
Blocks: 1066578
Blocks: 1105675
Blocks: 1108566
Moving this to marionette-js since this is not seen python from looking at a handful of comments.
Component: Marionette → JSMarionette
Blocks: 1034504
When trying to run marionette-js on device test on the latest marionette ( gaia commit 1913c69336f2e8f8bfd97e3179e862d325aa3a99), now it throws a different error: make[1]: Leaving directory `/home/mozilla/B2G/gaia' runner-service out: /home/mozilla/B2G/gaia/node_modules/marionette-socket-host gaia-integration --buildapp=device runner-service err: ERROR:tornado.general:Uncaught exception, closing connection. Traceback (most recent call last): File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/pyzmq-14.4.1-py2.7-linux-x86_64.egg/zmq/eventloop/zmqstream.py", line 407, in _run_callback callback(*args, **kwargs) File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/tornado-4.0.2-py2.7-linux-x86_64.egg/tornado/stack_context.py", line 275, in null_wrapper return fn(*args, **kwargs) File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/corredor-0.1-py2.7.egg/corredor/pattern.py", line 95, in on_recv self.handler(data) File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/corredor-0.1-py2.7.egg/corredor/handler.py", line 41, in __call__ self.action_map[data['action']](data) File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/python/runner-service/runner_service/listener.py", line 20, in _ handler(data) File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/python/runner-service/runner_service/handlers/runner.py", line 128, in start_runner self.runner.start() File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/mozrunner-6.5-py2.7.egg/mozrunner/base/device.py", line 68, in start self.device.setup_profile(self.profile) File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/mozrunner-6.5-py2.7.egg/mozrunner/devices/base.py", line 113, in setup_profile self.app_ctx.setup_profile(profile) File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/mozrunner-6.5-py2.7.egg/mozrunner/application.py", line 148, in setup_profile self.dm.removeFile(self.remote_settings_db) File "/home/mozilla/B2G/gaia/node_modules/marionette-socket-host/venv/local/lib/python2.7/site-packages/mozrunner-6.5-py2.7.egg/mozrunner/application.py", line 120, in remote_settings_db raise DMError("Could not find settings db in '%s'!" % self.remote_idb_dir) DMError: Could not find settings db in '/data/local/storage/persistent/chrome/idb'!
Blocks: 1109157
Blocks: 1104060
Having the b2gdesktop stdout might help debug this, but it appears to be sucked into a black hole. Anyone know how to get at it with this test harness?
I'll start poking around, hopefully I can find something.
(In reply to Jonathan Griffin (:jgriffin) from comment #43) > Having the b2gdesktop stdout might help debug this, but it appears to be > sucked into a black hole. Anyone know how to get at it with this test > harness? I believe the env var DEBUG=b2g-desktop will show the output. Or you can also use DEBUG=* for even more logs.
Does this look right?
Attachment #8546281 - Flags: review?(jryans)
Assignee: nobody → gaye
I think I know quite a bit about how to figure out what's going on here, so I'll assign myself and start to investigate.
Comment on attachment 8546281 [details] [diff] [review] Add DEBUG=* to gaia-integration jobs, Review of attachment 8546281 [details] [diff] [review]: ----------------------------------------------------------------- That would do it, I've confirmed locally. The logs go to stderr not stdout, not sure if you are already capturing that. However, the logs are very verbose, probably would end up with ~100MB logs per run.
Attachment #8546281 - Flags: review?(jryans) → review+
That's too verbose; buildbot caps log sizes at 50MB. Would DEBUG=b2g-desktop produce more manageable logs? Or is there another DEBUG setting that would be more helpful?
Flags: needinfo?(jryans)
(In reply to Jonathan Griffin (:jgriffin) from comment #49) > That's too verbose; buildbot caps log sizes at 50MB. Would > DEBUG=b2g-desktop produce more manageable logs? Or is there another DEBUG > setting that would be more helpful? No, the full log for me was still 108MB. However, it looks like the tests are in 4 chunks on treeherder, so maybe you'd only get 27MB per chunk, and remain below the limit?
Flags: needinfo?(jryans)
Landing on default, so I can verify this doesn't break anything on cedar: https://hg.mozilla.org/build/mozharness/rev/194cb626604e
I've been thinking about this issue and I honestly don't think we can simply make it go away. Connections reset by peer seem to be the main reason why this problem would occur IMHO. This is somewhat normal. We should be better at handling this. It looks by the logging output that we detect that the timeout appropriately but we don't make any attempt to reconnect in the marionette js client. I'm not super familiar with this code but, it should be possible to reconnect and reissue the last command. sockit-to-me itself has the ability to timeout before the system tcp timeout kicks in. we need to make sure this timeout is lower than our test timeout.
(In reply to Jonathan Griffin (:jgriffin) from comment #53) > Landing on default, so I can verify this doesn't break anything on cedar: > > https://hg.mozilla.org/build/mozharness/rev/194cb626604e Feel free to use pine for this as well if needed.
This mozharness change is running on cedar, and it would seem to confirm it isn't a Marionette problem. In all the cases I see, including e.g., https://tbpl.mozilla.org/php/getParsedLog.php?id=56158031&tree=Cedar&full=1, I see that Marionette successfully responds to a newSession message before the timeout occurs...i.e., it isn't Marionette stalling or hanging that is causing the problem.
Blocks: 1120508
Depends on: 1106178
Depends on: 1105679
DEBUG=* will build a debug profile if its set as part of the make script, we can export NODE_DEBUG=b2g-desktop (or *) in https://github.com/mozilla-b2g/gaia/blob/master/bin/gaia-marionette#L6 to get this output
(In reply to Dale Harvey (:daleharvey) from comment #59) > DEBUG=* will build a debug profile if its set as part of the make script, we > can export NODE_DEBUG=b2g-desktop (or *) in > https://github.com/mozilla-b2g/gaia/blob/master/bin/gaia-marionette#L6 to > get this output This was backed out in bug 1120618.
For anyone else simultaneously investigating, I wrote a little script that you can run on a machine in a gaia directory to try to reproduce the issue and give you some (possibly useful) repro statistics. You can specify an arbitrary statistical confidence and maximum error and the script will run TEST_FILES="apps/gallery/test/marionette/edit_image_test.js" and note whether the socket timeout occurs enough times to reach your desired confidence and error.
Attachment #8546281 - Attachment is obsolete: true
Blocks: 1104331
Blocks: 1116330
Blocks: 1114836
I landed a small logging patch in marionette-js-client which adds some logging and have published v1.5.2: https://github.com/mozilla-b2g/marionette-js-client/commit/ac7d93d72fdc97dd4bcb733be6c1757ad63755d6
Ignore comment 66. The real error we get is: marionette:tcp-sync exception when probing socket [Error: Failed to connect.] Failed to connect. This comes from: https://github.com/mozilla-b2g/sockit-to-me/blob/0d21bab114bbe1cb145c7b02dda6ef8de1209812/src/sockit.cc#L218
Assignee: gaye → aus
this still fails in gaia commit 3efaeeb8bde3fc5a5e034c973a19e0fcff384c24. James, is there anything that you want QA team to try in order to fix this bug? This is the bug that blocks us from trying out MarionetteJS on device at the moment. cc'ing Bhavana for info.
aus/james can you please help on this ? WE are blocked to try MJS per comment#69. No-jun will be able to help with logs or test this out again or try anything else he can help with to get this moving ,if you NI him.
Flags: needinfo?(aus)
@nkpark do you have a STR for this? We have been trying on and off to kill this one for awhile?
Flags: needinfo?(jlal) → needinfo?(npark)
For me, the str is simple (which I went over with mdas before): - get the latest gaia from master (I do ./repo sync -d on B2G and go into the gaia tree) - type 'make really-clean' to remove all downloaded files - Have the latest master branch flashed into the flame device, with v18D firmware loaded - Make sure the memory on the flame phone is set to 319MB - connect the flame device, make sure adb devices lists it - type 'TEST_FILES=./apps/calendar/test/marionette/day_view_test.js BUILDAPP=device make test-integration' (The name of the test does not matter, I just wanted to run the test on device Note: I run this on my ubuntu VM (14.04) Expected: Test should run on device Actual: the connection times out
Flags: needinfo?(npark)
Bajaj, I'll be full time on this issue starting next week. In the meantime, I believe gaye also has a work-around. :gaye, if you could comment on your work here and maybe mark that bug as a dependent that'd be super for our QA folks. :)
Flags: needinfo?(aus) → needinfo?(gaye)
We still need to fix this, but with the new retry logic, I don't think this blocks bug 1120508.
No longer blocks: 1120508
(In reply to Ghislain Aus Lacroix [:aus] from comment #73) > Bajaj, I'll be full time on this issue starting next week. In the meantime, > I believe gaye also has a work-around. > > :gaye, if you could comment on your work here and maybe mark that bug as a > dependent that'd be super for our QA folks. :) Hi :aus, if you want me to try out your fix, please let me know, as this is one of my higher priorities. Thanks,
(In reply to Ghislain Aus Lacroix [:aus] from comment #73) > Bajaj, I'll be full time on this issue starting next week. In the meantime, > I believe gaye also has a work-around. > > :gaye, if you could comment on your work here and maybe mark that bug as a > dependent that'd be super for our QA folks. :) :aus, its been a week ;) so a friendly ping :P given how much we are blocked on this.
Flags: needinfo?(aus)
So this test suite was unhidden on TBPL since we wrote a bit of logic to retry marionette tests on failure. As far as device testing goes, I'm working on using the python harness' crash reporting utilities in marionette js which may help depending on what's going on here. I know that Aus had a patch to marionette-js-runner a while ago, but I don't know that it was downstreamed to gaia (or that it fixed the problem for good).
Flags: needinfo?(gaye)
Flags: needinfo?(aus)
See Also: → 1131866
2 different issues are tracked under this bug. Let's track the one upon which QA is blocked on bug 1131866.
Blocks: 1141793
The root cause of this issue, which is a disconnected session with bad data in the runner, has been resolved. I expect new failures exhibiting this symptom to be new issues related to the tests themselves (a bad test, with an actual failure) rather than an infrastructure or framework issue.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Depends on: 1195220
Resolution: --- → FIXED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: