Closed Bug 865690 Opened 7 years ago Closed 7 years ago

Assertion failure and shutdown crash: !"I/O method is invalid", at /nsprpub/pr/src/io/priometh.c:54 because jsbridge calling recv after socket already closed

Categories

(Testing Graveyard :: Mozmill, defect, critical)

defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: whimboo)

References

Details

(Keywords: assertion, crash, regression, Whiteboard: [mozmill-2.0+][ateamtrack: p=mozmill q=2013q3 m=1])

Attachments

(4 files, 8 obsolete files)

500 bytes, patch
whimboo
: review+
Details | Diff | Splinter Review
2.29 KB, patch
whimboo
: review+
Details | Diff | Splinter Review
3.67 KB, patch
cmtalbert
: review+
Details | Diff | Splinter Review
38.10 KB, patch
cmtalbert
: review+
Details | Diff | Splinter Review
Running a debug build of Firefox with Mozmill we run into an assertion during shutdown which also crashes the browser:

Assertion failure: !"I/O method is invalid", at /mozilla/code/firefox/nightly/nsprpub/pr/src/io/priometh.c:54

Program /mozilla/code/firefox/obj/debug/dist/bin/firefox (pid = 16468) received signal 6.
Stack:
UNKNOWN [/lib/x86_64-linux-gnu/libpthread.so.0 +0x0000FCB0]
gsignal+0x00000035 [/lib/x86_64-linux-gnu/libc.so.6 +0x00036425]
abort+0x0000017B [/lib/x86_64-linux-gnu/libc.so.6 +0x00039B8B]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libnspr4.so +0x00012D3A]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libnspr4.so +0x0000F2AC]
ffi_call_unix64+0x0000004C [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x024A8B20]
ffi_call+0x00000449 [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x024A820D]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x02498205]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x02038932]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x02047009]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x020479DD]
js::DirectProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&)+0x0000006D [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x02093DED]
js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&)+0x000001C2 [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x0212D412]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x02096594]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x020966C5]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x02038932]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x02047009]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x020447F2]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x020463EF]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x02047056]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x020479DD]
JS_CallFunctionValue(JSContext*, JSObject*, JS::Value, unsigned int, JS::Value*, JS::Value*)+0x000000C9 [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x01F5FA39]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x013260BB]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x0131E835]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x019FE00E]
UNKNOWN [/mozilla/code/firefox/obj/debug/dist/bin/libxul.so +0x019FD3E3]
Sleeping for 300 seconds.
Seems like this is only happening with running Mozmill in manual (--manual) mode.
Please drop the last comment. The used build was not a debug build.
Mike, could this be related to your changes in bug 648407? I have started a bisect already but if you could give me your impression, I would appreciate it.
(In reply to Henrik Skupin (:whimboo) from comment #3)
> Mike, could this be related to your changes in bug 648407?

Seems unlikely.
(In reply to Mike Hommey [:glandium] from comment #4)
> (In reply to Henrik Skupin (:whimboo) from comment #3)
> > Mike, could this be related to your changes in bug 648407?
> 
> Seems unlikely.

Which is indeed. A build before the landing of those patches don't work. Using mozmill 1.5 we do not crash. So something new for Mozmill 2.0 causes it.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
As bisecting has shown it should actually be a regression from bug 777554. Changesets before that mentioned one are not able to shutdown the browser. So it's hard to determine.
Blocks: 777554
Keywords: regression
If Clint is faster he will try to debug it.
Whiteboard: [mozmill-2.0?] → [mozmill-2.0+]
Some better stack as you can find below. I'm not sure if it is us given that in frame 11 a call to js::DirectProxyHandler::call is made.

Assertion failure: !"I/O method is invalid", at /mozilla/code/firefox/nightly/nsprpub/pr/src/io/priometh.c:54

Program received signal SIGABRT, Aborted.
0x00007ffff7031425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff7031425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7034b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff6bb0d8a in PR_Assert (s=s@entry=0x7ffff6bd0ec4 "!\"I/O method is invalid\"", 
    file=file@entry=0x7ffff6bd0e20 "/mozilla/code/firefox/nightly/nsprpub/pr/src/io/priometh.c", ln=ln@entry=54)
    at /mozilla/code/firefox/nightly/nsprpub/pr/src/io/prlog.c:554
#3  0x00007ffff6bad2fc in _PR_InvalidInt () at /mozilla/code/firefox/nightly/nsprpub/pr/src/io/priometh.c:54
#4  0x00007ffff395d3f0 in ffi_call_unix64 () from /mozilla/code/firefox/obj/debug/dist/bin/libxul.so
#5  0x00007ffff395cadd in ffi_call (cif=0x1531530, fn=0x7ffff6bad430 <PR_Recv>, rvalue=0x14b9720, 
    avalue=0x7fffffffae00) at /mozilla/code/firefox/nightly/js/src/ctypes/libffi/src/x86/ffi64.c:485
#6  0x00007ffff394cac5 in js::ctypes::FunctionType::Call (cx=0xadbe80, argc=5, vp=0x7fffdebfe100)
    at /mozilla/code/firefox/nightly/js/src/ctypes/CTypes.cpp:5817
#7  0x00007ffff3508b02 in js::CallJSNative (cx=0xadbe80, 
    native=0x7ffff394c570 <js::ctypes::FunctionType::Call(JSContext*, unsigned int, jsval*)>, args=...)
    at /mozilla/code/firefox/nightly/js/src/jscntxtinlines.h:337
#8  0x00007ffff3516e79 in js::InvokeKernel (cx=cx@entry=0xadbe80, args=..., construct=construct@entry=js::NO_CONSTRUCT)
    at /mozilla/code/firefox/nightly/js/src/jsinterp.cpp:428
#9  0x00007ffff351780d in Invoke (construct=js::NO_CONSTRUCT, args=..., cx=0xadbe80)
    at /mozilla/code/firefox/nightly/js/src/jsinterp.h:134
#10 js::Invoke (cx=0xadbe80, thisv=..., fval=..., argc=5, argv=0x7fffdebfe0d8, rval=0x7fffdebfe0c8)
    at /mozilla/code/firefox/nightly/js/src/jsinterp.cpp:475
#11 0x00007ffff3563f5d in js::DirectProxyHandler::call (this=<optimized out>, cx=0xadbe80, proxy=..., args=...)
    at /mozilla/code/firefox/nightly/js/src/jsproxy.cpp:481
#12 0x00007ffff35fe222 in js::CrossCompartmentWrapper::call (
    this=0x7ffff55a3940 <js::CrossCompartmentWrapper::singleton>, cx=0xadbe80, wrapper=..., args=...)
    at /mozilla/code/firefox/nightly/js/src/jswrapper.cpp:445
#13 0x00007ffff3566704 in js::Proxy::call (cx=0xadbe80, proxy=proxy@entry=..., args=...)
    at /mozilla/code/firefox/nightly/js/src/jsproxy.cpp:2613
#14 0x00007ffff3566835 in proxy_Call (cx=<optimized out>, argc=<optimized out>, vp=<optimized out>)
    at /mozilla/code/firefox/nightly/js/src/jsproxy.cpp:3177
#15 0x00007ffff3508b02 in js::CallJSNative (cx=0xadbe80, 
    native=0x7ffff3566790 <proxy_Call(JSContext*, unsigned int, JS::Value*)>, args=...)
    at /mozilla/code/firefox/nightly/js/src/jscntxtinlines.h:337
#16 0x00007ffff3516e79 in js::InvokeKernel (cx=cx@entry=0xadbe80, args=..., construct=construct@entry=js::NO_CONSTRUCT)
    at /mozilla/code/firefox/nightly/js/src/jsinterp.cpp:428
#17 0x00007ffff35145fa in js::Interpret (cx=cx@entry=0xadbe80, entryFrame=entryFrame@entry=0x7fffdebfe040, 
    interpMode=js::JSINTERP_NORMAL, useNewType=<optimized out>)
    at /mozilla/code/firefox/nightly/js/src/jsinterp.cpp:2404
#18 0x00007ffff351625f in js::RunScript (cx=cx@entry=0xadbe80, fp=0x7fffdebfe040)
    at /mozilla/code/firefox/nightly/js/src/jsinterp.cpp:385
---Type <return> to continue, or q <return> to quit---
#19 0x00007ffff3516ec6 in js::InvokeKernel (cx=cx@entry=0xadbe80, args=..., construct=construct@entry=js::NO_CONSTRUCT)
    at /mozilla/code/firefox/nightly/js/src/jsinterp.cpp:442
#20 0x00007ffff351780d in Invoke (construct=js::NO_CONSTRUCT, args=..., cx=0xadbe80)
    at /mozilla/code/firefox/nightly/js/src/jsinterp.h:134
#21 js::Invoke (cx=0xadbe80, thisv=..., fval=..., argc=1, argv=0x7fffffffc310, rval=0x7fffffffc180)
    at /mozilla/code/firefox/nightly/js/src/jsinterp.cpp:475
#22 0x00007ffff343ce97 in JS_CallFunctionValue (cx=0xadbe80, objArg=<optimized out>, fval=..., argc=1, 
    argv=0x7fffffffc310, rval=<optimized out>) at /mozilla/code/firefox/nightly/js/src/jsapi.cpp:5841
#23 0x00007ffff27e70b8 in nsXPCWrappedJSClass::CallMethod (this=0xbb0390, wrapper=<optimized out>, methodIndex=3, 
    info_=0x7941b0, nativeParams=0x7fffffffc570)
    at /mozilla/code/firefox/nightly/js/xpconnect/src/XPCWrappedJSClass.cpp:1435
#24 0x00007ffff27dff4d in nsXPCWrappedJS::CallMethod (this=0x26712c0, methodIndex=<optimized out>, info=0x7941b0, 
    params=<optimized out>) at /mozilla/code/firefox/nightly/js/xpconnect/src/XPCWrappedJS.cpp:578
#25 0x00007ffff2ecd03e in PrepareAndDispatch (self=0x2646600, methodIndex=<optimized out>, args=<optimized out>, 
    gpregs=0x7fffffffc630, fpregs=0x7fffffffc660)
    at /mozilla/code/firefox/nightly/xpcom/reflect/xptcall/src/md/unix/xptcstubs_x86_64_linux.cpp:122
#26 0x00007ffff2ecc413 in SharedStub () from /mozilla/code/firefox/obj/debug/dist/bin/libxul.so
#27 0x00007ffff2eb6acf in nsTimerImpl::Fire (this=0x2670fa0)
    at /mozilla/code/firefox/nightly/xpcom/threads/nsTimerImpl.cpp:550
#28 0x00007ffff2eb6c72 in nsTimerEvent::Run (this=0x7fffb40011b0)
    at /mozilla/code/firefox/nightly/xpcom/threads/nsTimerImpl.cpp:634
#29 0x00007ffff2eb21fb in nsThread::ProcessNextEvent (this=0x70caf0, mayWait=false, result=0x7fffffffc7cf)
    at /mozilla/code/firefox/nightly/xpcom/threads/nsThread.cpp:627
#30 0x00007ffff2e7063e in NS_ProcessNextEvent (thread=<optimized out>, mayWait=<optimized out>)
    at /mozilla/code/firefox/obj/debug/xpcom/build/nsThreadUtils.cpp:238
#31 0x00007ffff2b884ad in mozilla::ipc::MessagePump::Run (this=0x70abd0, aDelegate=0x6c7ce0)
    at /mozilla/code/firefox/nightly/ipc/glue/MessagePump.cpp:82
#32 0x00007ffff2ee668c in MessageLoop::RunInternal (this=this@entry=0x6c7ce0)
    at /mozilla/code/firefox/nightly/ipc/chromium/src/base/message_loop.cc:219
#33 0x00007ffff2ee66b4 in RunHandler (this=0x6c7ce0)
    at /mozilla/code/firefox/nightly/ipc/chromium/src/base/message_loop.cc:212
#34 MessageLoop::Run (this=0x6c7ce0) at /mozilla/code/firefox/nightly/ipc/chromium/src/base/message_loop.cc:186
#35 0x00007ffff2a92b79 in nsBaseAppShell::Run (this=0xaabce0)
    at /mozilla/code/firefox/nightly/widget/xpwidgets/nsBaseAppShell.cpp:163
#36 0x00007ffff28e4e2f in nsAppStartup::Run (this=0xac3ba0)
    at /mozilla/code/firefox/nightly/toolkit/components/startup/nsAppStartup.cpp:289
#37 0x00007ffff1c89ee4 in XREMain::XRE_mainRun (this=this@entry=0x7fffffffcaa0)
    at /mozilla/code/firefox/nightly/toolkit/xre/nsAppRunner.cpp:3879
#38 0x00007ffff1c8a1b7 in XREMain::XRE_main (this=this@entry=0x7fffffffcaa0, argc=argc@entry=3, 
---Type <return> to continue, or q <return> to quit---
    argv=argv@entry=0x7fffffffdf98, aAppData=aAppData@entry=0x7fffffffcc90)
    at /mozilla/code/firefox/nightly/toolkit/xre/nsAppRunner.cpp:3946
#39 0x00007ffff1c8a425 in XRE_main (argc=3, argv=0x7fffffffdf98, aAppData=0x7fffffffcc90, aFlags=<optimized out>)
    at /mozilla/code/firefox/nightly/toolkit/xre/nsAppRunner.cpp:4147
#40 0x00000000004028be in do_main (argc=argc@entry=3, argv=argv@entry=0x7fffffffdf98, xreDirectory=0x6150f0)
    at /mozilla/code/firefox/nightly/browser/app/nsBrowserApp.cpp:271
#41 0x0000000000401e3a in main (argc=3, argv=0x7fffffffdf98)
    at /mozilla/code/firefox/nightly/browser/app/nsBrowserApp.cpp:576
Essentially right now we are calling recv after the socket has already been closed, resulting in this assertion due to the fact that the fd pointer we are using is effectively toast.

The right answer here is to kill off the timer as soon as we are closed so that doesn't happen.

Henrik, I still see mozmill thinking that Firefox is still running when it ends, I'm getting a message from python indicating that the application is still running -- but Firefox is actually shutting down normally. I think this might simply be due to how slow things are running on my windows laptop, it's rather old. I looked through the mozrunner and mozprocess code and it seems to all be functioning normally, so I think it's just a factor of slowness. Anyhow, I'd be interested to know if you see that issue or not and if so, we should treat it as a separate issue.
Assignee: hskupin → ctalbert
Attachment #744371 - Flags: review?(hskupin)
Summary: Assertion failure and shutdown crash: !"I/O method is invalid", at /nsprpub/pr/src/io/priometh.c:54 → Jsbridge calling recv after socket already closed
Summary: Jsbridge calling recv after socket already closed → Assertion failure and shutdown crash: !"I/O method is invalid", at /nsprpub/pr/src/io/priometh.c:54 because jsbridge calling recv after socket already closed
Comment on attachment 744371 [details] [diff] [review]
Don't call recv after close

Review of attachment 744371 [details] [diff] [review]:
-----------------------------------------------------------------

That would explain it very well. Thanks Clint for looking into it. I have only a small nit, which you might wanna fix before the landing.

::: jsbridge/jsbridge/extension/resource/modules/Sockets.jsm
@@ +67,5 @@
>    close : function () {
> +    if (this.timer) {
> +      this.timer.cancel();
> +    }
> +    this.timer = null;

I would put the last line here also into the if block, given that it is a no-op if timer doesn't exist.

nit: please keep a blank line before the return statement.
Attachment #744371 - Flags: review?(hskupin) → review+
(In reply to Clint Talbert ( :ctalbert ) from comment #9)
> still running -- but Firefox is actually shutting down normally. I think
> this might simply be due to how slow things are running on my windows
> laptop, it's rather old. I looked through the mozrunner and mozprocess code
> and it seems to all be functioning normally, so I think it's just a factor
> of slowness. Anyhow, I'd be interested to know if you see that issue or not
> and if so, we should treat it as a separate issue.

No, I don't see this. I have a real fast machine now. So yes, lets get this handled in another bug then.
Cool, if you don't see the crash, then I'll blame it on this crappy laptop. If we see it more widely spread, then we'll file a bug.

Landed on master: https://github.com/mozilla/mozmill/commit/de6c29dc94402eef7939ded06fc213df365bc870
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Whiteboard: [mozmill-2.0+] → [mozmill-2.0+][ateamtrack: p=mozmill q=2013q2 m=2]
So this bug is not fully fixed yet. It's still happening for me with debug builds while testing the patch on bug 872237 with our restart tests in mozmill-tests/tests/functional/restartTests.

Assertion failure: !"I/O method is invalid", at /mozilla/code/firefox/nightly/nsprpub/pr/src/io/priometh.c:54

Program /mozilla/code/firefox/obj/debug/dist/bin/firefox (pid = 18957) received signal 6.
Stack:

This is with tip of Mozmill master.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This is also reproducible by running '-m mutt/mutt/tests/js/frame/user_shutdown/tests.ini' with a debug build.
Duplicate of this bug: 872414
Blocks: 877101
So what is happening here is that we are trying to do our PR_Send after the socket is already closed. But that is happening because there is essentially a race condition between us closing down the application and sending out the last bit of results. In particular, see the code here where the application is already closing and note all the stuff we do: https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/modules/frame.js#L572

We should either find a way to do less of this at that time or institute our own observer notice for "mozmill wants us to shutdown" which will be activated by "cleanquit" from this file: https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/driver/mozmill.js#L158 and capture that instead of the "application-quit" notice. Then once our observer that catches our "mozmill wants to quit" notice finishes all the shutdown items and then actually quits the application.
Here is a wallpaper patch that fixes the symptoms but not the underlying cause as outlined in my previous comment.
Attachment #758814 - Flags: review?(hskupin)
Comment on attachment 758814 [details] [diff] [review]
Wallpaper patch [checked-in]

Review of attachment 758814 [details] [diff] [review]:
-----------------------------------------------------------------

We tested the wall paper fix on OS X and Linux and we no longer see the I/O port assertion for NSPR. So that's a good thing. There are still a lot of application disconnects around but those should be covered by other bugs.

I think we should get this landed as it is for now and I will continue the work on this bug later this week or early next week.

::: mozmill/mozmill/extension/resource/driver/mozmill.js
@@ +162,4 @@
>    //       if we don't do this we crash on shutdown in linux
>    var quitmethod = utils.getMethodInWindows('goQuitApplication');
>    var settimeoutmethod = utils.getMethodInWindows('setTimeout');
> +  settimeoutmethod(quitmethod, 150);

That one will need an update given that my other patch already landed for shutting down Mozmill for Firefox Metro. I will update it before the landing.
Attachment #758814 - Flags: review?(hskupin) → review+
Comment on attachment 758814 [details] [diff] [review]
Wallpaper patch [checked-in]

Landed on master as:
https://github.com/mozilla/mozmill/commit/461ced89633d56693c05867e5b8915331e68dfae
Attachment #758814 - Attachment description: This wallpaper patch fixes the symptoms but not the problem → Wallpaper patch [checked-in]
As spoken I will take over the work here.
Assignee: ctalbert → hskupin
Status: REOPENED → ASSIGNED
Blocks: 881657
Blocks: 764640
So I did some investigation and tests here and we will have to fix the problem in multiple steps. The current situation is that we have dozen of ways how Mozmill itself is doing a shutdown of the application. We should really centralize that logic in a method which handles shutdown and restart. That way we might be able to cleanly shutdown the application. Once fixed we have to ensure that we always send all of our own data before the bridge gets closed.

One thing I'm not that sure at the moment is the 'application-quit-requested' observer notification. I would really like to make use of it to implement the logic pointed out above. But that might cause a hang if at least one listener is not granting the shutdown. It might only happen when further add-ons are installed, and coming up with some modal dialogs - not sure if that is allowed anymore. On the other side such a shutdown would be way cleaner because we can ensure that all components have completed their own shutdown logic, e.g. session store or places.

Clint, what do you think?
Flags: needinfo?(ctalbert)
I think the application-quit-requested observer is the right tool for the job here. While it's possible that some addons may break us here by depending on that, I don't see any cleaner way to do it.  Since if an addon were to break us by throwing up a modal dialog/whatever we'd still timeout and capture the error from the python side, I don't think it's a huge deal if that occurs.  And I think given that the benefit is a sane shutdown/restart mechanism, the benefit clearly outweighs the danger.
Flags: needinfo?(ctalbert)
It's quiet interesting to see how JSBridge handled the application shutdown so far. It's really kinda broken. I will give some notes to all of my changes:

* As of now we have not deleted the client but only called close() on it when Session.quit() is getting called. That means in some cases we still received broken (might have been cut-off) data from the Python side, which got internally handled via the sandbox, and caused a failure message to be send back. All within JSBridge code. By setting the client to null within the session we can stop that behavior, so no more messages will be accepted. Now we only have to check for a valid client in Session.send(). With that change we can also get rid of the is_closing states.

* sessions.remove() indeed removes the session from the array, but it doesn't return the object. So the calling code cannot correctly handle the close action for that session. The cycle collector and garbage collector will kick in at some time but there might be traces left, which keep the underlying client code active. So lets return the session object and let the calling code handle the removed session. As of now none of our code uses the remove() method. So nothing to change for the calling site.

* Improve logging for the send() after close() action, so we can see when the server gets closed. Therefore we always route everything through Log.dump().
Attachment #758812 - Attachment is obsolete: true
Attachment #764106 - Flags: review?(ctalbert)
Oh, and I missed one thing. We really should call session.quit() before closing the socket! Only that way the session has time to finish with everything.
Attachment #764106 - Flags: review?(ctalbert) → review+
So far my latest changes to mozmill work pretty good and gives us a total pass for our Firefox restart tests. None of the tests fail!!

The only issue I have to solve now is the following:

* Exec function: '9b43148c-d8fa-11e2-9b3b-3c970e4b31f5 (shutdownApplication)'
******** cleanQuit called with: 2

************** able to shutdown: false

********* in handler topic: quit-application
* Observer topic: 'quit-application'
* Stop JSBridge server on port: '45086'
* Attempting to send message after session closed: '{"result":true,"data":null,"uuid":"9b43148c-d8fa-11e2-9b3b-3c970e4b31f5"}'

Here we are trying to respond to the shutdownApplication() call requested by the jsbridge side. Not sure if we can do anything here, given that we have to respond after the function has been called. But lets see.
I moved out any larger refactoring / clean-up changes onto bug 884768.
Depends on: 884768
(In reply to Henrik Skupin (:whimboo) from comment #25)
> Here we are trying to respond to the shutdownApplication() call requested by
> the jsbridge side. Not sure if we can do anything here, given that we have
> to respond after the function has been called. But lets see.

Actually that was trivial. Using a timer to call appStartup.quit() made it. That way we trigger the shutdown/restart about 100ms later, and have enough time to send a response through the bridge.

I have to clean-up everything now, and probably will do some other minor adjustments.
Comment on attachment 764106 [details] [diff] [review]
Patch v1 (JSBridge fix for shutdown) [checked-in]

Landed jsbridge patch on master:
https://github.com/mozilla/mozmill/commit/ed5d7d35097088e05fe2110a3b6c4bfbbd148edc
Attachment #764106 - Attachment description: Patch v1 (JSBridge fix for shutdown) → Patch v1 (JSBridge fix for shutdown) [checked-in]
Ok, so I have nearly finished the patch for this bug. The one and only situation when we will send data after the socket has been closed, if when we are executing the runTestFile() method. Given that it has been started from the Python code, Bridge.execFunction() is trying to answer with the return value. In cases of user restart/shutdown the method will be stopped during application-quit, and is not able to send the result. I don't think that this is important and we can ignore that.
This is mostly ready but before calling it final I would like to run a lot of tests across platforms and versions of Firefox. Andreea and Andrei, please me help me as talked on IRC. Thanks.
Attachment #765238 - Flags: feedback?(andrei.eftimie)
Attachment #765238 - Flags: feedback?(andreea.matei)
Updated version of the WIP patch which enables the dump() calls for suspicious messages after the socked being closed by default. There is no need to run with --debug by default. Please let me all know if you see anything which gets send after the socket has been closed and which gets printed out to the console. Thanks
Attachment #765238 - Attachment is obsolete: true
Attachment #765238 - Flags: feedback?(andrei.eftimie)
Attachment #765238 - Flags: feedback?(andreea.matei)
Attachment #765817 - Flags: feedback?(tojonmz)
Attachment #765817 - Flags: feedback?(dave.hunt)
Attachment #765817 - Flags: feedback?(andrei.eftimie)
Attachment #765817 - Flags: feedback?(andreea.matei)
This fails with --debug on Ubuntu 13.04
I'm attaching the log here, I'm not sure what to make of it.
As seen in Andrei's log, we do not correctly print the jsbridge attempt message. This WIP fixes it. Sorry.
Attachment #765817 - Attachment is obsolete: true
Attachment #765817 - Flags: feedback?(tojonmz)
Attachment #765817 - Flags: feedback?(dave.hunt)
Attachment #765817 - Flags: feedback?(andrei.eftimie)
Attachment #765817 - Flags: feedback?(andreea.matei)
I think I'm seeing the desired jsbridge attempt message in both mutt testpy,testjs runs. Windows XP32 SP3, one-day old Nightly, latest WIP 1.2 patch.

eg.
Attempting to send message after session closed: {"result":true,"data":"bridge.registry[\"{cad709fe-a414-42ee-ac93-b367042f9102}\"]","uuid":"b496604f-da7c-11e2-aeb2-647002213455"}

If you'd like a log for either testjs or testpy let me know, and I will attach.
Yeah. A full log would be great. Please send it via email. There is no need to attach a dozen of logs on that bug. :)
Will do.
Blocks: 886360
Ok, so if you think I cannot go worse, it will happen. As of now I figured out that the most important underlying problem we are facing here with the discard events is happening because a call to restartApplication or stopApplication does not block the test. Those methods are sending a message through the broken and return immediately. That means the test continues until the application gets shutdown in the frame.

The only way I can see here is to throw an exception once the restart request has been set. We can catch that special exception in Runner.prototype.wrapper() and set a flag on the Runner object, which indicates that the application is going down. Once a test module has this state set, we do no longer execute other tests or teardown methods, but wait for shutdown/restart.

That way no further events will be triggered, which will most likely become discarded.

Clint, what do you think?
Flags: needinfo?(ctalbert)
Ok, so I have fixed a couple of issues which I noticed while checking the logs you all have sent me. Thanks for your help!

The new patch contains a couple of updates and I really would like to know how that works for you all. Please get this tested across platforms.
Attachment #765850 - Attachment is obsolete: true
Attachment #765856 - Attachment is obsolete: true
I ran with mozmill -m mutt/mutt/tests/js/tests.ini -b nightly_path --debug, on Linux 64bit. First try failed due to that dialog of multiple ff copies, the second passed:
Didn't get that line with "Event discarded". I can add the log if needed.
Do you need to be ran in another way?
Ran the restart tests as well and all works fine on this Linux 64bit for me. All tests passed.
(In reply to Henrik Skupin (:whimboo) [away 06/28 - 07/07] from comment #38)
> Ok, so if you think I cannot go worse, it will happen. As of now I figured
> out that the most important underlying problem we are facing here with the
> discard events is happening because a call to restartApplication or
> stopApplication does not block the test. Those methods are sending a message
> through the broken and return immediately. That means the test continues
> until the application gets shutdown in the frame.
> 
> The only way I can see here is to throw an exception once the restart
> request has been set. We can catch that special exception in
> Runner.prototype.wrapper() and set a flag on the Runner object, which
> indicates that the application is going down. Once a test module has this
> state set, we do no longer execute other tests or teardown methods, but wait
> for shutdown/restart.
> 
> That way no further events will be triggered, which will most likely become
> discarded.
> 
> Clint, what do you think?
I think that's the sanest shutdown/restart idea Mozmill's ever had.  +1. I'll try to give it a whirl on windows at some point today.
Flags: needinfo?(ctalbert)
Depends on: 887202
Mailed new set of output logs for the v3 wip patch to Henrik. Ran both testpy and testjs, with nightly 25.0a1 20130625031238.
And that would be Windows XP32 SP3 again, just to be specific.
Attached patch Patch v1 (obsolete) — Splinter Review
Alright. Here an updated version and the first real patch! I found even some more problems in the shutdown logic and cleaned those up. Restart tests seem to work pretty fine. We have tested that on all platforms now. User shutdown tests might work, but it's still not guaranteed. It will be fixed in a follow-up bug for Mozmill 2.1.

Clint, I believe that I have removed any tracing code from that patch. But please keep an eye on any left-over debug statements. I decided to upload the patch already now, to let you all play with it while I'm away the next couple hours.
Attachment #767139 - Attachment is obsolete: true
Attachment #767772 - Flags: review?(ctalbert)
On Windows XP32 SP3, mozmill-env 2.0rc4, latest Nightly, and Patch v1 in comment 45 applied to my mozmill repo, I'm seeing this failure output, one for each mutt test in the manifest.
 
Javascript Failures:
Exception: "[JavaScript Error: "missing ; before statement" {file: "resource://mozmill/modules/assertions.js" line: 169 column: 48 source: "                                    aResult.name);
"}]" (None)

I see this both when running `mutt -m`, or `mutt testjs`. Anyone else on Windows seeing the same?
And to clarify, when I switch my repo branch from Patch v1 to master, and run the mutt tests same way, the tests run fine.
Attached patch Patch v1.1 (obsolete) — Splinter Review
Good catch, Jonathan! Looks like a quick last minute change caused this problem. Here the updated patch.
Attachment #767772 - Attachment is obsolete: true
Attachment #767772 - Flags: review?(ctalbert)
Attachment #767918 - Flags: review?(ctalbert)
Thanks, trying v1.1 now.
Mailed new set of output logs for the V1.1 patch to Henrik. 1 run of clean Master, 4 runs of V1.1 patch to check variances.
Attached patch Patch v1.2Splinter Review
While checking the results from Jonathan I have seen that the usershutdown tests are still disabled. Good news is that we can even re-enable them. They pass all the time for me.
Attachment #767918 - Attachment is obsolete: true
Attachment #767918 - Flags: review?(ctalbert)
Attachment #767982 - Flags: review?(ctalbert)
Mailed a pair of comparative output mutt testjs run logs for the V1.1 vs. V1.2 patch to Henrik. Everything seems in order but I defer to Henrik after he looks at the logs.

Config:
Windows XP32 SP3
mozmill-env w/mozmill 2.0rc4
latest Nightly 25.0a1 20130625031238
Patch v1.2
All looks fine. Thanks Jonathan!
(In reply to Andreea Matei [:AndreeaMatei] from comment #54)
> A functional testrun with the latest patch applied, on OS X:
> http://mozmill-crowd.blargon7.com/#/functional/report/
> a1b02004612785c13cf7c6bf1e086b1e

A report does not help here. What I need is the debug output, but keep in mind that with the latest patch those lines are commented out again.
Comment on attachment 767982 [details] [diff] [review]
Patch v1.2

Review of attachment 767982 [details] [diff] [review]:
-----------------------------------------------------------------

Wow, this is a thorough overhaul, and I think it's been needed for some time. Sorry it took so long to get to this. I've been over it and over it, and I don't see any issues.  Well done. Thanks to everyone for testing this.
Attachment #767982 - Flags: review?(ctalbert) → review+
Thanks Clint. I've landed this in:
https://github.com/mozilla/mozmill/commit/09cfa9f0d70bd127fe21db4a5a90552ac3e02cb3
Status: ASSIGNED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
There seems to be some lingering issues after this land.
Thanks Jonathan for pointing this out in bug 764640 comment 37 (and for the logs via email).
I'm seeing the same thing:

Before this patch landed:
> mutt testjs -b /Applications/FirefoxNightly.app/
> ...
> RESULTS | Passed: 56
> RESULTS | Failed: 2
> RESULTS | Skipped: 0
> ...
> Total passed: 56
> Total failed: 2
> Total skipped: 0

With this patch I'm seeing some discrepancy between JS and final results logged:
> mutt testjs -b /Applications/FirefoxNightly.app/
> ...
> RESULTS | Passed: 58
> RESULTS | Failed: 2
> RESULTS | Skipped: 2
> ...
> Total passed: 60
> Total failed: 2
> Total skipped: 2

I have not noticed this before.
The discrepancy seems to come from the frame/restart tests:

> mutt -m mutt/mutt/tests/js/frame/restart/tests.ini -b /Applications/FirefoxNightly.app/
> ...
> RESULTS | Passed: 7
> RESULTS | Failed: 0
> RESULTS | Skipped: 2
> ...
> Total passed: 9
> Total failed: 0
> Total skipped: 2
Seems to be a problem with mutt reporting the total results erroneously.
It logs a PASS on a skipped test in setuptest_skip.js
(In reply to Andrei Eftimie from comment #58)
> There seems to be some lingering issues after this land.
> Thanks Jonathan for pointing this out in bug 764640 comment 37 (and for the
> logs via email).
> I'm seeing the same thing:

Just for clarity here, my observations in bug 764640 about test-start,test-end still being sporadic with expect_assert.js and screenshot.js, were just a confirmation that bug is still outstanding.

However, I also see Andrei's overall frame/restart/ test count mentioned above in Comment 59.
Whiteboard: [mozmill-2.0+][ateamtrack: p=mozmill q=2013q2 m=2] → [mozmill-2.0+][ateamtrack: p=mozmill q=2013q2 m=4]
Whiteboard: [mozmill-2.0+][ateamtrack: p=mozmill q=2013q2 m=4] → [mozmill-2.0+][ateamtrack: p=mozmill q=2013q3 m=1]
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.