Closed
Bug 980541
Opened 12 years ago
Closed 11 years ago
Marionette-JS does not automatically fail when an oop app process crashes
Categories
(Testing Graveyard :: JSMarionette, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: qdot, Assigned: hub)
References
Details
(Whiteboard: [systemsfe] [c=automation p=3])
Attachments
(5 files)
|
789 bytes,
text/plain
|
Details | |
|
51 bytes,
text/x-github-pull-request
|
aus
:
review+
aus
:
review+
aus
:
feedback+
|
Details | Review |
|
60 bytes,
text/x-github-pull-request
|
jugglinmike
:
review+
|
Details | Review |
|
56 bytes,
text/x-github-pull-request
|
Details | Review | |
|
46 bytes,
text/x-github-pull-request
|
kgrandon
:
review+
|
Details | Review |
In researching bug 962988, it was found that marionette simply stalls on the messaging app crashing instead of failing the test. This needs to be fixed in order to reliably run integration tests on OOP b2g-desktop.
| Reporter | ||
Comment 1•12 years ago
|
||
Here's a sample test for gaia to prove out this issue. This bug doesn't just happen when the app crashes, but even when window.close() is called before a marionetteScriptFinished happens. Usually this would trigger a "unload was called" error when b2g-desktop is run non-OOP. When run OOP, no error fires and the test is stuck. It appears that timeouts no longer fire at this point either.
Assignee: nobody → kyle
Status: NEW → ASSIGNED
Comment 2•12 years ago
|
||
Chatted with qDot about this a bit; he thinks we could have a watchdog in marionette-server pinging the active marionette-listener, and if the ping isn't returned, we could send an error back to Marionette. This would protect against the cases where an OOP app crashes or disappears before the final response gets sent back to marionette-server.
In the particular example he attached, this sendError isn't getting received by the parent:
http://mxr.mozilla.org/mozilla-central/source/testing/marionette/marionette-listener.js#584
| Assignee | ||
Comment 4•12 years ago
|
||
This is also important for make perf-test
Blocks: gaia-perf-measure
| Assignee | ||
Comment 5•12 years ago
|
||
Wouldn't a timeout on the client side be enough?
Comment 6•12 years ago
|
||
Yes, there should be a client-side timeout regardless, to catch other cases where something bad happens on the server side.
| Assignee | ||
Comment 7•12 years ago
|
||
ok, filed bug 983461 for the timeout on the client side. I need this for bug 980993
| Assignee | ||
Comment 9•12 years ago
|
||
Here is the problem:
Using the above test (or with bug 983461), I get node to hang.
1. I can't even debug Node with --debug, which might also explain why no timeout is raised.
2. If I attach gdb to the node process, I get this stack trace:
(gdb) thread apply all bt
Thread 2 (Thread 0x7f54edf62700 (LWP 6964)):
#0 0x0000003c722bc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x0000003c722ece24 in usleep (useconds=<optimized out>) at ../sysdeps/unix/sysv/linux/usleep.c:32
#2 0x00000039775564fd in v8::internal::SignalSender::Run() () from /lib64/libv8.so.3
#3 0x0000003977556166 in v8::internal::ThreadEntry(void*) () from /lib64/libv8.so.3
#4 0x0000003c72607f33 in start_thread (arg=0x7f54edf62700) at pthread_create.c:309
#5 0x0000003c722f4ded in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 1 (Thread 0x7f54edf25880 (LWP 6963)):
#0 0x0000003c7260ec0b in __libc_recv (fd=11, buf=0x2d448ff, n=2, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1 0x00007f54edd21d8a in Sockit::Read(v8::Arguments const&) () from /home/hub/source/mozilla/B2G-buri/gaia/node_modules/marionette-client/node_modules/sockit-to-me/build/Release/sockit.node
#2 0x00003e352f3791d9 in ?? ()
#3 0x00007fff2b013200 in ?? ()
#4 0x00007fff2b013208 in ?? ()
#5 0x0000000000000001 in ?? ()
#6 0x0000000000000000 in ?? ()
We get stuck on a blocking recv() from sockit-to-me.
Kyle, are you actively working on this? If not may I take it as this is blocking tarako and other tests.
Flags: needinfo?(kyle)
| Assignee | ||
Updated•12 years ago
|
Priority: -- → P1
Comment 10•12 years ago
|
||
(In reply to Hubert Figuiere [:hub] from comment #9)
> We get stuck on a blocking recv() from sockit-to-me.
So we're all on the same page, I believe the blocking call is an intentional means of making the API be synchronous. This has the side-effect that the node.js ChildRunner process is not a traditional node.js async execution environment.
As described in https://developer.mozilla.org/en-US/Firefox_OS/Platform/Automated_testing/Gaia_integration_tests#Okay.2C_what_does_make_test-integration_actually_do.3F there is a ParentRunner that sits outside the ChildRunner which does operate in an asynchronous fashion and is probably the right place to perform watchdog capabilities.
Having said that, I suspect sockit-to-me could be altered to support some type of fatal timeout mechanism. Unless the ChildRunner process is keeping the parent appraised of notable test steps as it goes along, that really is the only way to get the call-stack of the JS runner's state.
| Reporter | ||
Comment 11•12 years ago
|
||
Ah, yeah, I didn't mean to have this assigned to myself. Feel free to take.
Assignee: kyle → nobody
Flags: needinfo?(kyle)
| Assignee | ||
Comment 12•12 years ago
|
||
Taking it then.
QA Contact: hub
Whiteboard: [systemsfe] [p=5] → [systemsfe] [c=automation p=3]
| Assignee | ||
Updated•12 years ago
|
Assignee: nobody → hub
QA Contact: hub
| Assignee | ||
Comment 13•12 years ago
|
||
Kyle, do you think this is the right approach? You test case now throw an error instead of hanging.
If that's the way to go, I'll add an API to the module to set the timeout (hardcoded at 60sec) so we can use the same value as marionette's.
Attachment #8391656 -
Flags: feedback?(kyle)
| Assignee | ||
Comment 14•12 years ago
|
||
Also I need to add tests.
| Assignee | ||
Updated•12 years ago
|
Attachment #8391656 -
Flags: feedback?(gaye)
| Assignee | ||
Comment 15•12 years ago
|
||
I now have a test and bumped the module version.
| Assignee | ||
Comment 16•12 years ago
|
||
Comment on attachment 8391656 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/sockit-to-me/pull/14
Actually, Gareth, if you could review.
Attachment #8391656 -
Flags: feedback?(gaye) → review?(gaye)
| Reporter | ||
Comment 17•12 years ago
|
||
Comment on attachment 8391656 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/sockit-to-me/pull/14
While this looks good to me, I'm not super familiar with sockit-to-me, so I'm redirecting to aus, who was the original author.
Attachment #8391656 -
Flags: feedback?(kyle) → feedback?(aus)
Comment 18•12 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #2)
> Chatted with qDot about this a bit; he thinks we could have a watchdog in
> marionette-server pinging the active marionette-listener, and if the ping
> isn't returned, we could send an error back to Marionette. This would
> protect against the cases where an OOP app crashes or disappears before the
> final response gets sent back to marionette-server.
>
> In the particular example he attached, this sendError isn't getting received
> by the parent:
>
> http://mxr.mozilla.org/mozilla-central/source/testing/marionette/marionette-
> listener.js#584
I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=984508 to get this fixed on the server-side.
Comment 19•12 years ago
|
||
Comment on attachment 8391656 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/sockit-to-me/pull/14
Yes, you can definitely do it this way. You can also use stock bsd style select and non-blocking recv to timeout but it's nice to use the modern bits from node.
Attachment #8391656 -
Flags: feedback?(aus) → feedback+
Comment 20•12 years ago
|
||
(In reply to Ghislain Aus Lacroix [:aus] from comment #19)
> Comment on attachment 8391656 [details] [review]
> Link to Github pull-request:
> https://github.com/mozilla-b2g/sockit-to-me/pull/14
>
> Yes, you can definitely do it this way. You can also use stock bsd style
> select and non-blocking recv to timeout but it's nice to use the modern bits
> from node.
Ugh, nm, you're using poll already. Still feedback+, obviously :P
| Assignee | ||
Comment 21•12 years ago
|
||
Comment on attachment 8391656 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/sockit-to-me/pull/14
<gaye> I think auswerk should review,[...]
Attachment #8391656 -
Flags: review?(gaye) → review?(aus)
Comment 22•12 years ago
|
||
Comment on attachment 8391656 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/sockit-to-me/pull/14
Hooray! I had been meaning to do something similar to avoid using the default system timeout for socket reads. Thanks for doing this!
As mentioned on IRC, I would like the pull request to be a single commit, you can do this pretty easily with 'git rebase -i master' and doing a force push to your pull request. When that's done, I'll go ahead and merge the PR.
Attachment #8391656 -
Flags: review?(aus) → review+
| Assignee | ||
Comment 23•12 years ago
|
||
This is the patch for marionette-client.
Attachment #8393483 -
Flags: review?(gaye)
| Assignee | ||
Comment 24•12 years ago
|
||
Comment on attachment 8393483 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/marionette-js-client/pull/105
cancelling r? for now.
Attachment #8393483 -
Flags: review?(gaye)
| Assignee | ||
Comment 25•12 years ago
|
||
Comment on attachment 8391656 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/sockit-to-me/pull/14
Need a review for the second patch in that pull request: I needed to set non blocking as in some cases it was still blocking on it.
(I intend to squash it before merge as requested).
Attachment #8391656 -
Flags: review?(aus)
| Assignee | ||
Comment 26•12 years ago
|
||
Comment on attachment 8393483 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/marionette-js-client/pull/105
Mike,
do you mind reviewing this since Gareth is on PTO. Thanks.
(this patch depends on the other that is for sockit-to-me).
Attachment #8393483 -
Flags: review?(mike)
Comment 27•12 years ago
|
||
Comment on attachment 8393483 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/marionette-js-client/pull/105
Code looks good and I can confirm the tests against your proposed patch to Sockittome. After some minor changes to the comments, this should be good to merge (so long as nothing crazy happens on that other patch).
Attachment #8393483 -
Flags: review?(mike) → review+
Comment 28•12 years ago
|
||
Comment on attachment 8391656 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/sockit-to-me/pull/14
r+, with a few nits, see github for details. :)
Attachment #8391656 -
Flags: review?(aus) → review+
| Assignee | ||
Comment 29•12 years ago
|
||
sockit-to-me patch merged:
https://github.com/mozilla-b2g/sockit-to-me/commit/91c0460acd8f7aa95e38e21de79cf0ac890fadc4
| Assignee | ||
Comment 30•12 years ago
|
||
:aus, do you think we can get a release so I can check the other patch in? Or give me the rights on npm (username hub).
Thanks!
Flags: needinfo?(aus)
Comment 31•12 years ago
|
||
Updated to 0.2.0 -- https://www.npmjs.org/package/sockit-to-me
Flags: needinfo?(aus)
| Assignee | ||
Comment 32•12 years ago
|
||
merge marionette-js-client
https://github.com/mozilla-b2g/marionette-js-client/commit/9a8aa8913befbc64826524423fb8d243d683ca82
:mhenretty, we need a 1.1.2 release on NPM if possible. Thanks !
Flags: needinfo?(mhenretty)
| Assignee | ||
Comment 34•12 years ago
|
||
The very specific question for feedback, Gareth, is : do I have to rebuild sockit-to-me for Ubuntu and check that in with this pull request?
I want us to use 0.2.0.
Thanks
Attachment #8395047 -
Flags: feedback?(gaye)
| Assignee | ||
Comment 35•12 years ago
|
||
Comment on attachment 8395047 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia-node-modules/pull/11
Since you answered on IRC, here is the review request. Thanks.
(I haven't figured out how I can "try" this change)
Attachment #8395047 -
Flags: feedback?(gaye) → review?(gaye)
| Assignee | ||
Comment 36•12 years ago
|
||
Comment on attachment 8395047 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia-node-modules/pull/11
Kevin Grandon told me to merge this since it just pull already published modules.
Attachment #8395047 -
Flags: review?(gaye)
| Assignee | ||
Comment 37•12 years ago
|
||
Merged
https://github.com/mozilla-b2g/gaia-node-modules/commit/af7758928e2e2024c614672a4cca13483df7f34c
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 38•12 years ago
|
||
And FWIW I forgot to bump the marionette-client requirement for gaia. It was taken care for another bug so we are good.
See https://github.com/mozilla-b2g/gaia/commit/d19a7fb3efa74c479be7f0b24f030e9895eb4b7a
Comment 39•12 years ago
|
||
Good work Hub!
| Assignee | ||
Comment 40•12 years ago
|
||
reverted due to general tree bustage.
| Assignee | ||
Comment 41•12 years ago
|
||
| Assignee | ||
Updated•12 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
Comment 42•11 years ago
|
||
Here is a try-gaia with the new modules.
https://tbpl.mozilla.org/?tree=Try&rev=0c403419049b
Let's see how it wrecks havoc.
| Assignee | ||
Comment 43•11 years ago
|
||
Second try request. The previous had issues with overriding the repo URL.
https://tbpl.mozilla.org/?tree=Try&rev=3c914311e3ae
Status: REOPENED → ASSIGNED
| Assignee | ||
Comment 44•11 years ago
|
||
I can reproduce this problem locally on Linux. Trying to isolate the change and fix it ASAP.
Comment 45•11 years ago
|
||
| Assignee | ||
Comment 46•11 years ago
|
||
Will get Travis and Try server green first.
Attachment #8404380 -
Flags: review?(kgrandon)
Comment 47•11 years ago
|
||
Strange, that Gi error is the same one we were seeing in bug 974197, which has since been backed out. I'm not sure why we're still seeing it.
Comment 48•11 years ago
|
||
Ah right, looks like it was sent to try before that was backed out. I think we should be good to go here. Nice work.
Updated•11 years ago
|
Attachment #8404380 -
Flags: review?(kgrandon) → review+
Comment 49•11 years ago
|
||
Landed in master: https://github.com/mozilla-b2g/gaia/commit/5e8fe1f267da2a5f7cf88a1a02342ffb1e2b7064
Status: ASSIGNED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 50•11 years ago
|
||
Yeah here is the second Try run: https://tbpl.mozilla.org/?tree=Try&rev=24e7b79d93e3
And only orange is know intermittent in Mochitest.
Thanks !
Updated•8 years ago
|
Product: Testing → Testing Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•