Closed Bug 1102923 Opened 5 years ago Closed 4 years ago

Intermittent test_spdy.js | xpcshell return code: 0 or Test timed out

Categories

(Core :: Networking: HTTP, defect)

x86
Linux
defect
Not set

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: cbook, Assigned: nhughes)

References

()

Details

(Keywords: intermittent-failure)

Attachments

(4 files)

Ubuntu VM 12.04 mozilla-inbound opt test xpcshell

https://treeherder.mozilla.org/ui/logviewer.html#?job_id=4108922&repo=mozilla-inbound

06:36:01 WARNING - TEST-UNEXPECTED-FAIL | /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit/test_spdy.js | xpcshell return code: 0
06:36:01 INFO - TEST-INFO took 564ms
06:36:01 INFO - >>>>>>>
06:36:01 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
06:36:01 INFO - (xpcshell/head.js) | test pending (2)
06:36:01 INFO - (xpcshell/head.js) | test MAIN run_test finished (2)
06:36:01 INFO - running event loop
06:36:01 INFO - Channel should have a success code! (2152398861)
06:36:01 INFO - /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit/test_spdy.js:testOnStartRequest:51
06:36:01 INFO - /builds/slave/test/build/tests/xpcshell/head.js:_do_main:184
06:36:01 INFO - /builds/slave/test/build/tests/xpcshell/head.js:_execute_test:372
06:36:01 INFO - -e:null:1
06:36:01 INFO - null:null:0
06:36:01 INFO - exiting test
Component: General → Networking: HTTP
this is always 32 bit linux opt - the complaint is that the first test's onstartrequest has status 08804b000d (NS_ERROR_CONNECTION_REFUSED)

Two possibilities come to mind:

1] server hasn't fully started yet when the test_spdy runs. runxpcshell basically just forks it and moves on. I think this is unlikely though - 10 minutes have elpased in the log!

2] the server can't bind to port 4443 which is hardcoded in the tests
Assignee: nobody → mcmanus
Status: NEW → ASSIGNED
Attachment #8549999 - Flags: review?(hurley)
Comment on attachment 8549999 [details] [diff] [review]
move spdy/h2 tests off fixed port

Review of attachment 8549999 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comments addressed

::: netwerk/test/unit/test_http2.js
@@ +46,5 @@
>    shouldBeHttp2 : true,
>  
>    onStartRequest: function testOnStartRequest(request, ctx) {
>      this.onStartRequestFired = true;
> +    do_check_true(Components.isSuccessCode(request.status));

So I've actually found the whole logging of request.status to be helpful in debugging some errors in the past, I think I'd rather prefer to keep that.

@@ +671,5 @@
>      var cert = sslStatus.QueryInterface(Ci.nsISSLStatus).serverCert;
>      var cos = Cc["@mozilla.org/security/certoverride;1"].
>                getService(Ci.nsICertOverrideService);
>      cos.rememberValidityOverride(this.host, this.port, cert, this.bits, false);
> +    dump("Certificate Override in place\n");

Feel free to keep this in if you find it useful, but I generally don't like random dumps in my tests :)

@@ +719,5 @@
>  function run_test() {
> +  var env = Cc["@mozilla.org/process/environment;1"].getService(Ci.nsIEnvironment);
> +  serverPort = env.get("MOZHTTP2-PORT");
> +  do_check_neq(serverPort, null);
> +  dump ("using port " + serverPort + "\n");

(this dump is fine, but...)
nit: no space between "dump" and "("

::: netwerk/test/unit/test_spdy.js
@@ +49,5 @@
>  
>    onStartRequest: function testOnStartRequest(request, ctx) {
>      this.onStartRequestFired = true;
>  
> +    do_check_true(Components.isSuccessCode(request.status));

I would say the same thing for this change of not showing the request status, but since SPDY is slowly on it's way out, this change is probably fine.

@@ +396,5 @@
>      var cert = sslStatus.QueryInterface(Ci.nsISSLStatus).serverCert;
>      var cos = Cc["@mozilla.org/security/certoverride;1"].
>                getService(Ci.nsICertOverrideService);
>      cos.rememberValidityOverride(this.host, this.port, cert, this.bits, false);
> +    dump("Certificate Override in place\n");

Same maybe-remove-this-dump-if-you-don't-need-it

@@ +436,5 @@
>  function run_test() {
> +  var env = Cc["@mozilla.org/process/environment;1"].getService(Ci.nsIEnvironment);
> +  serverPort = env.get("MOZSPDY-PORT");
> +  do_check_neq(serverPort, null);
> +  dump ("using port " + serverPort + "\n");

nit: remove space before "("

::: testing/xpcshell/moz-http2/moz-http2.js
@@ +195,5 @@
>    }
>  
>    else if (u.pathname === "/pushapi1") {
>      push1 = res.push(
> +	{ hostname: 'localhost:' + server._server.address().port, port: server._server.address().port, path : '/pushapi1/1', method : 'GET',

I would cache server._server.address().port when you first get it below, and use the cached version (for readability).

@@ +352,5 @@
>      // the first test case if done.
>    });
>  });
> +
> +function listenok() {

(this is where the port number would be cached)
Attachment #8549999 - Flags: review?(hurley) → review+
> So I've actually found the whole logging of request.status to be helpful in
> debugging some errors in the past, I think I'd rather prefer to keep that.
> 

I can revert that in both h2 and spdy.

> Feel free to keep this in if you find it useful, but I generally don't like
> random dumps in my tests :)

I'm gonna keep it - without that in the log there is no indication that the cert override went ok (and I was ruling it out as a cause of failure)
comment 39 is the first 64 bit instance of this.. fwiw
I'm sure it's a completely coincidence that both of the above were after this push:
https://treeherder.mozilla.org/ui/#/jobs?repo=mozilla-inbound&revision=4aefc5bf7ec4
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #42)
> I'm sure it's a completely coincidence that both of the above were after
> this push:
> https://treeherder.mozilla.org/ui/#/jobs?repo=mozilla-
> inbound&revision=4aefc5bf7ec4

I wonder that too, as I saw it on try a couple times when doing that patch.. that's why I prioritized the patch under r? here..

my conclusion is that it wasn't linked - the issue appears to be the test harness not starting up on the expected port - the first attempt always fails to connect when the test fails. The patch here moves the server to a dynamic port like the h1 tests already use.

the try run in comment 36 has both the patch r?'d here and the csets you referenced that are now in m-i.. and it seems to have eliminated the issue so I'm hopeful.
ted - would you have a chance to look at this review asap? It puts the port number for the node h2 and spdy servers in the xpcshell environment so the xpcshell tests can use a dyanmic port instead of a hardcoded one (which is seeing startup failures)

The only part I need your help with is

diff --git a/testing/xpcshell/runxpcshelltests.py b/testing/xpcshell/runxpcshelltests.py
--- a/testing/xpcshell/runxpcshelltests.py
+++ b/testing/xpcshell/runxpcshelltests.py
@@ -952,16 +952,22 @@ class XPCShellTests(object):
                                 stderr=STDOUT, env=self.env, cwd=os.getcwd())
                         self.nodeProc[name] = process
 
                         # Check to make sure the server starts properly by waiting for it to
                         # tell us it's started
                         msg = process.stdout.readline()
                         if 'server listening' in msg:
                             nodeMozInfo['hasNode'] = True
+                            searchObj = re.search( r'SPDY server listening on port (.*)', msg, 0)
+                            if searchObj:
+                              self.env["MOZSPDY-PORT"] = searchObj.group(1)
+                            searchObj = re.search( r'HTTP2 server listening on port (.*)', msg, 0)
+                            if searchObj:
+                              self.env["MOZHTTP2-PORT"] = searchObj.group(1)
                     except OSError, e:
                         # This occurs if the subprocess couldn't be started
                         self.log.error('Could not run %s server: %s' % (name, str(e)))
 
             myDir = os.path.split(os.path.abspath(__file__))[0]
             startServer('moz-spdy', os.path.join(myDir, 'moz-spdy', 'moz-spdy.js'))
             startServer('moz-http2', os.path.join(myDir, 'moz-http2', 'moz-http2.js'))
Flags: needinfo?(ted)
so if the sheriff finds this intolerable, try backing out 1072478 but not backing out 1121706 (the latter doesn't touch spdy code at all). 

I would still rather wait for the pending review on this patch.
Comment on attachment 8549999 [details] [diff] [review]
move spdy/h2 tests off fixed port

Review of attachment 8549999 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay, this is great! Anything that takes hardcoded port numbers out of our testing environment is fine by me! I suspect these are a source of instability everywhere.

::: netwerk/test/unit/test_spdy.js
@@ +137,5 @@
>  SpdyPushListener.prototype.onDataAvailable = function(request, ctx, stream, off, cnt) {
>    this.onDataAvailableFired = true;
>    this.isSpdyConnection = checkIsSpdy(request);
> +  if (ctx.originalURI.spec == "https://localhost:" + serverPort + "/push.js" ||
> +      ctx.originalURI.spec == "https://localhost:" + serverPort + "/push2.js") {

I would write a helper function like function geturl(path) to replace all of these.

::: testing/xpcshell/moz-http2/moz-http2.js
@@ +228,5 @@
>        });
>      push2.end('2');
>  
>      push3 = res.push(
> +	{ hostname: 'localhost:' + server._server.address().port, port: server._server.address().port, path : '/pushapi1/3', method : 'GET',

I think maybe you want a helper function to generate these given all the repetition.

::: testing/xpcshell/moz-spdy/moz-spdy.js
@@ +188,5 @@
>    windowSize: 16000000,
>  };
>  
> +function listenok() {
> +  console.log('SPDY server listening on port ' + webServer.address().port);

Feels like it'd be nice if we had some better way to communicate this info back other than stdout, but if this works then that's good enough for me.

::: testing/xpcshell/runxpcshelltests.py
@@ +961,5 @@
> +                            if searchObj:
> +                              self.env["MOZSPDY-PORT"] = searchObj.group(1)
> +                            searchObj = re.search( r'HTTP2 server listening on port (.*)', msg, 0)
> +                            if searchObj:
> +                              self.env["MOZHTTP2-PORT"] = searchObj.group(1)

Alternately you could just pass this into JS via the cmdline:
http://mxr.mozilla.org/mozilla-central/source/testing/xpcshell/runxpcshelltests.py#373
Attachment #8549999 - Flags: review?(ted) → review+
Flags: needinfo?(ted)
none of the instances of this in comments 242 to 254 contain the cset from comment 241. looking hopeful.
Duplicate of this bug: 1112283
crap 258 and 259 are the same symptom - connection refused on the first test.
Keywords: leave-open
the new logs indicate a high numbered port in use - the server reports stating ok and the test figured out what number to use ok.. but still we get connection refused.

I wonder if the server actually dies before the test starts. I can add some more logging.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a95506dcd7eb yields a little more information

moz-spdy.js has already exited by the time runxpcshell.py goes to kill the server process. It returned "8" to the OS

the current node source says this
* `8` - Unused.  In previous versions of Node, exit code 8 sometimes
  indicated an uncaught exception.

I don't know yet what version of node is running on the test machines.. hopefully the next iteration of logging will show the exception anyhow.

it occurs to me now that the logs for this bug can be misleading.. the xpcshell tests are run in parallel and it does not report those logs.. failures are rerun sequentially and the log from the sequential run is reported.. but if the first iteration resulted in a mid-stream abort then the second one would always report a failure during the bootstrap (which is what we're seeing)
(In reply to Patrick McManus [:mcmanus] from comment #268)
> it occurs to me now that the logs for this bug can be misleading.. the
> xpcshell tests are run in parallel and it does not report those logs..
> failures are rerun sequentially and the log from the sequential run is
> reported.. but if the first iteration resulted in a mid-stream abort then
> the second one would always report a failure during the bootstrap (which is
> what we're seeing)

If there's something we can fix at the harness level definitely file a bug.
the push in comment 298 marked this test as run-sequentially as I suspected the logging would be better.

It also made it much less prone to failure (unexpectedly) - but comment 300 is a reproduction.

And the logs are a bit better - it shows test spdy_push3 failing to connect to the server after push and push2 succeed. So that means the server crashed.

I have a try run that should(?) capture any server output. I've modified that run to remove most xpcshell tests (an iteration runs in 2 minutes) but so far have not been able to reproduce there after a _lot_ of iterations.

If I can't get any traction on that I can try and make the server restart itself. If that doesn't help we can try disabling the push portion of the spdy tests. most of that code is shared with http2 anyhow for regression purposes and the h2 server seems fine (they are different servers).
https://hg.mozilla.org/integration/mozilla-inbound/rev/c0ff304dee00

tries to restart the server on unhandled exceptions.
according to 306, restarting the server didn't work (fast enough).. let's change the node shutdown routine to log stdout/stderr
Comment on attachment 8553810 [details] [diff] [review]
log node moz-spdy and moz-http2 output on shutdown

Review of attachment 8553810 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/xpcshell/moz-spdy/moz-spdy.js
@@ -99,5 @@
> -    webServer = spdy.createServer(options, handleRequest).listen(p, "0.0.0.0", 200, listenok);
> -  }
> -}
> -
> -function realHandleRequest(req, res) {

Is this change intentional?

::: testing/xpcshell/runxpcshelltests.py
@@ +948,5 @@
>                      try:
>                          # We pipe stdin to node because the spdy server will exit when its
>                          # stdin reaches EOF
>                          process = Popen([nodeBin, serverJs], stdin=PIPE, stdout=PIPE,
> +                                stderr=PIPE, env=self.env, cwd=os.getcwd())

You could just leave stderr=STDOUT and you'd get them both intermingled by dumping proc.stdout.
Attachment #8553810 - Flags: review?(ted) → review+
Thanks for the review!

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #309)
> 
> Is this change intentional?
> 

yes, thanks. It reverts a change that I hoped would make the server more stable (it didn't) and I think will coverup the desired output from this change.
3:08:51     INFO -  Process stderr
13:08:51     INFO -  events.js:72
13:08:51     INFO -          throw er; // Unhandled 'error' event
13:08:51     INFO -                ^
13:08:51     INFO -  Error: Received rst: 5
13:08:51     INFO -      at Connection.handleFrame [as _handleFrame] (/builds/slave/test/build/tests/xpcshell/node-spdy/lib/spdy/connection.js:229:17)
13:08:51     INFO -      at Parser.EventEmitter.emit (events.js:95:17)
13:08:51     INFO -      at onFrame (/builds/slave/test/build/tests/xpcshell/node-spdy/lib/spdy/protocol/parser.js:232:12)
13:08:51     INFO -      at Parser.parseRst (/builds/slave/test/build/tests/xpcshell/node-spdy/lib/spdy/protocol/parser.js:442:3)
13:08:51     INFO -      at Parser.parseBody (/builds/slave/test/build/tests/xpcshell/node-spdy/lib/spdy/protocol/parser.js:280:10)
13:08:51     INFO -      at Parser.execute (/builds/slave/test/build/tests/xpcshell/node-spdy/lib/spdy/protocol/parser.js:226:12)
13:08:51     INFO -      at Parser.write [as _write] (/builds/slave/test/build/tests/xpcshell/node-spdy/lib/spdy/protocol/parser.js:132:8)
13:08:51     INFO -      at Object._onImmediate (/builds/slave/test/build/tests/xpcshell/node-spdy/lib/spdy/protocol/parser.js:149:14)
13:08:51     INFO -      at processImmediate [as _immediateCallback] (timers.js:330:15)
13:08:51     INFO -  INFO | Result summary:
13:08:51     INFO -  INFO | Passed: 2336
13:08:51  WARNING -  INFO | Failed: 1
nick, what do you make of comment 321.. is it just a race condition in node-spdy that should be caught and ignored?
Flags: needinfo?(hurley)