Closed Bug 1195220 Opened 5 years ago Closed 4 years ago

Can't run any Gij test on device (TypeError: Cannot read property 'id' of undefined)

Categories

(Testing Graveyard :: JSMarionette, defect)

defect
Not set

Tracking

(b2g-master affected)

RESOLVED FIXED
Tracking Status
b2g-master --- affected

People

(Reporter: jlorenzo, Assigned: aus)

References

Details

Attachments

(5 files)

STR
1. Update the gaia repo at the most up to date revision (currently 60489c1ff8c5d1633fc4837d4f8019623d4e1940 - same gaia version as on device)
2. make really-clean
3. BUILDAPP=device make test-integration

Results
The same test will try to run over and over, but you get the following stack trace each time:
> TypeError: Cannot read property 'id' of undefined
>     at /home/jlorenzo/git/gaia/node_modules/marionette-js-runner/host/session.js:66:43
>     at /home/jlorenzo/git/gaia/node_modules/marionette-js-runner/node_modules/promise/lib/core.js:33:15
>     at flush (/home/jlorenzo/git/gaia/node_modules/marionette-js-runner/node_modules/promise/node_modules/asap/asap.js:27:13)
>     at process._tickCallback (node.js:355:11)

Build info 
Build ID               20150816150205
Gaia Revision          60489c1ff8c5d1633fc4837d4f8019623d4e1940
Gaia Date              2015-08-16 02:21:48
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/0876695d1abdeb363a780bda8b6cc84f20ba51c9
Gecko Version          43.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150816.182145
Firmware Date          Sun Aug 16 18:21:56 EDT 2015
Bootloader             L1TC000118D0
I second this issue. with the same exact steps to reproduce. Same problem happens on Aries.
Flags: needinfo?(aus)
:Silne30 or :jlorenzo -- Could you both run this with NODE_DEBUG=* as well? Here's how we run it on TaskCluster to get all the tasty logs...

BUILDAPP=device GAIA_DEVICE_TYPE=phone NODE_DEBUG=* ./bin/ci run marionette_js 2> artifacts/debug.log
Flags: needinfo?(aus)
Flags: needinfo?(jlorenzo)
Flags: needinfo?(jdorlus)
Assignee: nobody → aus
Attached file debug.log
Here is the log file from the run. I am not sure that the marionette session ever connects to the device as the device remains unaffected and never performs any actions.
Flags: needinfo?(jdorlus)
Flags: needinfo?(aus)
Clearing NI based on comment 3.
Flags: needinfo?(jlorenzo)
I think the interesting part is this...

[marionette-mocha] Tue, 25 Aug 2015 17:11:23 GMT marionette-socket-host:request 500 {"message": "list index out of range", "stack": "Traceback (most recent call last):\n  File \"/Users/jdorlus/Desktop/gaia-dev/node_modules/marionette-js-runner/host/python/runner-service/runner_service/listener.py\", line 61, in do_POST\n    result = func(payload)\n  File \"/Users/jdorlus/Desktop/gaia-dev/node_modules/marionette-js-runner/host/python/runner-service/runner_service/listener.py\", line 128, in do_start_runner\n    handler = runner_handlers[options['buildapp']](**handler_args)\n  File \"/Users/jdorlus/Desktop/gaia-dev/node_modules/marionette-js-runner/host/python/runner-service/runner_service/handlers/runner.py\", line 129, in __init__\n    self.runner = B2GDeviceRunner(serial=serial, **self.common_runner_args)\n  File \"/Users/jdorlus/Desktop/gaia-dev/node_modules/marionette-js-runner/venv/lib/python2.7/site-packages/mozrunner-6.9-py2.7.egg/mozrunner/runners.py\", line 153, in B2GDeviceRunner\n    **kwargs)\n  File \"/Users/jdorlus/Desktop/gaia-dev/node_modules/marionette-js-runner/venv/lib/python2.7/site-packages/mozrunner-6.9-py2.7.egg/mozrunner/base/device.py\", line 54, in __init__\n    self.device = device_class(**device_args)\n  File \"/Users/jdorlus/Desktop/gaia-dev/node_modules/marionette-js-runner/venv/lib/python2.7/site-packages/mozrunner-6.9-py2.7.egg/mozrunner/devices/base.py\", line 24, in __init__\n    self.dm = self.app_ctx.dm\n  File \"/Users/jdorlus/Desktop/gaia-dev/node_modules/marionette-js-runner/venv/lib/python2.7/site-packages/mozrunner-6.9-py2.7.egg/mozrunner/application.py\", line 103, in dm\n    self._dm = DeviceManagerADB(adbPath=self.adb, autoconnect=False, deviceRoot=self.remote_test_root)\n  File \"/Users/jdorlus/Desktop/gaia-dev/node_modules/marionette-js-runner/venv/lib/python2.7/site-packages/mozrunner-6.9-py2.7.egg/mozrunner/application.py\", line 84, in adb\n    self.which('adb')]\n  File \"/Users/jdorlus/Desktop/gaia-dev/node_modules/marionette-js-runner/venv/lib/python2.7/site-packages/mozrunner-6.9-py2.7.egg/mozrunner/application.py\", line 125, in which\n    if self.bindir is not None and os.path.abspath(self.bindir) not in paths:\n  File \"/Users/jdorlus/Desktop/gaia-dev/node_modules/marionette-js-runner/venv/lib/python2.7/site-packages/mozrunner-6.9-py2.7.egg/mozrunner/application.py\", line 97, in bindir\n    self._bindir = glob.glob(path)[0]\nIndexError: list index out of range\n"}

List index out of range, eh? looks like it's looking for something that it's not finding. Not sure what that code does yet but digging into it.
Whoops, forgot to clear my ni? here when I posted my reply. :)
Flags: needinfo?(aus)
Depends on: 1204103
OK, so, things are actually a lot closer to working than one would expect based on the abysmal showing that you get when running things.

Here's a adb log showing that we're actually connecting over marionette and rebooting the device to run tests, however, something goes wrong shortly after it reboots.

https://pastebin.mozilla.org/8847032
Flags: needinfo?(gaye)
This even happens when running regular integration tests locally with Mulet and Device for me when using the tools from pypi.build.mozilla.org vs pypi.python.org. 

marionette-socket-host:request 500 +8ms {"message": "'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)", "stack": "Traceback (most recent call last):\n  File \"/home/aus/Projects/gaia/node_modules/marionette-js-runner/host/python/runner-service/runner_service/listener.py\", line 65, in do_POST\n    result = func(payload)\n  File \"/home/aus/Projects/gaia/node_modules/marionette-js-runner/host/python/runner-service/runner_service/listener.py\", line 137, in do_start_runner\n    handler.start_runner(binary, options)\n  File \"/home/aus/Projects/gaia/node_modules/marionette-js-runner/host/python/runner-service/runner_service/handlers/runner.py\", line 97, in start_runner\n    logging.debug(\"desktop runner ({0}, {1}, {2})\".format(binary, cmdargs, profile))\n  File \"/home/aus/Projects/gaia/node_modules/marionette-js-runner/venv/local/lib/python2.7/site-packages/mozprofile/profile.py\", line 283, in summary\n    parts.append(('Files', '\\n%s' % mozfile.tree(self.profile)))\n  File \"/home/aus/Projects/gaia/node_modules/marionette-js-runner/venv/local/lib/python2.7/site-packages/mozfile/mozfile.py\", line 327, in tree\n    for index, filename in enumerate(filenames)])\nUnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)\n"}
Fixed the last issue and back on track to diagnose this issue. It will absolutely require a custom gecko build with debugging and the likes to figure this one out, I'm afraid.

It appears that we are connecting successfully once. Requesting that b2g restart (with NO_EM_RESTART set iirc). It will then bind to a different tcp port (which we correctly adb forward tcp:<PORT> tcp:<PORT>, verified using adb forward --list).

and then... nothing. Sometimes, it will also start-up too many b2g processes (seen via b2g-ps). But it _NEVER_ manages to get the client to connect again.
Hey :aus was there something you wanted me to look at?
Flags: needinfo?(gaye)
K, here's the latest... we're failing here.

http://mxr.mozilla.org/mozilla-central/source/testing/mozbase/mozrunner/mozrunner/devices/base.py#90

We're unable to verify that we've pushed the profile. It's possible it's simply taking too long to push the profile. I wouldn't be surprised. 5 seconds is quite... short. Hacking some things right now to see if it helps.
Attachment #8668133 - Flags: review?(gaye)
Comment on attachment 8668133 [details] [review]
[gaia] nullaus:bug1195220 > mozilla-b2g:master

Mostly looks good! Have a few nits. Reflag when you get a chance to look/respond/update.
Attachment #8668133 - Flags: review?(gaye)
(In reply to Gareth Aye [:gaye] (back from PTO) from comment #13)
> Comment on attachment 8668133 [details] [review]
> [gaia] nullaus:bug1195220 > mozilla-b2g:master
> 
> Mostly looks good! Have a few nits. Reflag when you get a chance to
> look/respond/update.

I addressed the nits and updated the PR. Let me know what you think. :)
Attachment #8668133 - Flags: review?(gaye)
Comment on attachment 8668133 [details] [review]
[gaia] nullaus:bug1195220 > mozilla-b2g:master

One more nit on GH but after that looks great. Nice one!
Attachment #8668133 - Flags: review?(gaye) → review+
Commit (master): https://github.com/mozilla-b2g/gaia/commit/5b6194daff01b2fa0f76d96cbeac6ad3f645bbe4

Fixed!

Some handy notes to go onto MDN --

* You need a phone that boots into FxOS properly! (Pro-tip DEVICE_DEBUG=1 make reset-gaia, you do _not_ need to complete the FTU process)
* Do _NOT_ use a phone with a profile that you care about without backing it up first!
* If you run these tests locally, I highly suggest running in batches or sets (using APP=, or TEST_FILES= env vars)
* When running locally I also suggest using a different reporter than the TBPL reporter. REPORTER=spec is currently my favorite but there are others to choose from (including nyancat iirc). We default to the TBPL reporter if none is specified.
* VirtualBox is known to be flaky with USB and ADB. If you are using a Virtual Machine we suggest (currently) that you use VMWare Fusion / Workstation / Player
* You may use node v0.10 or v0.12 but _NOT_ v4.0 (yet)
* Arm yourself with patience, things don't run / restart / reset to start / continue onto the next test very fast.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
sorry had to revert this change for bustage like https://treeherder.mozilla.org/logviewer.html#?job_id=3104521&repo=b2g-inbound
Status: RESOLVED → REOPENED
Flags: needinfo?(aus)
Resolution: FIXED → ---
Commit (master): https://github.com/mozilla-b2g/gaia/commit/cc3cd63a5b8ee3138db1fd6ce4e2f25d410d2d71

Waiting to see what happens on b2g-inbound before marking fixed.
Status: REOPENED → ASSIGNED
Flags: needinfo?(aus)
I'm failing to understand what's happening here. It works fine on gaia (try), but breaks elsewhere?!? This is _SUPER_ frustrating. Patience will be required from everyone while I try and figure this out and re-land... for the THIRD time. *sigh*
Flags: needinfo?(aus)
OK. Hopefully that's the rest of the issues surfaced by fixing things.

Commit (master): https://github.com/mozilla-b2g/gaia/commit/6f2dd4a84de2b592b58fe70286aa9f14b891a1cf

I'm going to wait, once again, to mark this fixed until it's picked up by b2g-inbound.
Looking good on b2g-inbound, marking this fixed.
Status: ASSIGNED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Blocks: 1159200
Blocks: 1091680
Keywords: dev-doc-needed
Attached file gij_on_device_node.log
Hi, I tried to run GIJ on device (aries) but failed. Here is a deubg log. Looks like it's the same error message:

[marionette-mocha] [TypeError: Cannot read property 'id' of undefined]

Here is my environment:

gaia version:	~/gaia ~/MozITP
a38c0c325b0f34b0549fa7518355287b9f4147e0Greg Weng Mon Nov 16 13:51:34 2015 +0800~/MozITP
node version:	v0.12.7
npm version:	2.11.3
python version:	Python 2.7.6
pip version:	pip 1.5.4 from /usr/lib/python2.7/dist-packages (python 2.7)
adb version:	Android Debug Bridge version 1.0.31
Linux version:	Linux 3.13.0-68-generic #111-Ubuntu SMP Fri Nov 6 18:17:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
"adb devices" output:
List of devices attached 
YT9112VWLS	device


John, I know you are an expert for this, can you double confirm if this is a regression? If so I will reopen this bug. Thank you.
Flags: needinfo?(jdorlus)
I will look into this. (In reply to Shing Lyu [:slyu] from comment #25)
> Created attachment 8688890 [details]
> gij_on_device_node.log
> 
> Hi, I tried to run GIJ on device (aries) but failed. Here is a deubg log.
> Looks like it's the same error message:
> 
> [marionette-mocha] [TypeError: Cannot read property 'id' of undefined]
> 
> Here is my environment:
> 
> gaia version:	~/gaia ~/MozITP
> a38c0c325b0f34b0549fa7518355287b9f4147e0Greg Weng Mon Nov 16 13:51:34 2015
> +0800~/MozITP
> node version:	v0.12.7
> npm version:	2.11.3
> python version:	Python 2.7.6
> pip version:	pip 1.5.4 from /usr/lib/python2.7/dist-packages (python 2.7)
> adb version:	Android Debug Bridge version 1.0.31
> Linux version:	Linux 3.13.0-68-generic #111-Ubuntu SMP Fri Nov 6 18:17:06
> UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> "adb devices" output:
> List of devices attached 
> YT9112VWLS	device
> 
> 
> John, I know you are an expert for this, can you double confirm if this is a
> regression? If so I will reopen this bug. Thank you.

I will look into this. Did you try running make really-clean and then make reset-gaia?
Flags: needinfo?(jdorlus)
Flags: needinfo?(slyu)
I clone a fresh gaia everytime, so I thing `make really-clean` is not required?
Flags: needinfo?(slyu)
(In reply to Shing Lyu [:slyu] from comment #27)
> I clone a fresh gaia everytime, so I thing `make really-clean` is not
> required?

See https://bugzilla.mozilla.org/show_bug.cgi?id=1195220#c16 for the steps you should take before trying to run Gij on a device.
:slyu, I just tried out my own instructions and tips from comment #16 and I was able to run the clock tests like this:

1. make really-clean (yes, if you have a 'fresh' checkout of the tree, you don't need to do this, but it won't do any harm to do it, so why not? :))
2. DEVICE_DEBUG=1 make reset-gaia (this ensures that the device has the CORRECT default settings to allow marionette client to connect)
3. APP=clock BUILDAPP=device REPORTER=spec make test-integration (note that I also used REPORTER=mocha-tbpl-reporter and it also ran fine).

The device can sometimes get wedged so if you see the error again, you will probably need to |DEVICE_DEBUG=1 make reset-gaia| once again.
Blocks: 1141793
Gij test docs have been archived https://developer.mozilla.org/en-US/docs/Archive/Firefox_OS/Automated_testing/Gaia_integration_tests

Please re-add ddn if you think this would still be useful to document.
Keywords: dev-doc-needed
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.