Closed Bug 777354 Opened 12 years ago Closed 11 years ago

Regression in shutdown time of httpd.js (~6s) when multiple local pages have been opened

Categories

(Testing :: General, defect)

15 Branch
defect
Not set
major

Tracking

(firefox14 unaffected, firefox15- wontfix, firefox16- wontfix, firefox17- wontfix, firefox18 wontfix, firefox19 wontfix, firefox20 wontfix, firefox25 wontfix, firefox26 fixed, firefox27 fixed, firefox-esr10 unaffected, b2g-v1.2 fixed)

RESOLVED FIXED
mozilla27
Tracking Status
firefox14 --- unaffected
firefox15 - wontfix
firefox16 - wontfix
firefox17 - wontfix
firefox18 --- wontfix
firefox19 --- wontfix
firefox20 --- wontfix
firefox25 --- wontfix
firefox26 --- fixed
firefox27 --- fixed
firefox-esr10 --- unaffected
b2g-v1.2 --- fixed

People

(Reporter: whimboo, Assigned: mcmanus)

References

Details

(Keywords: regression)

Attachments

(3 files, 1 obsolete file)

Over on bug 767332 I have noticed that at some point in the past we have introduced a big regression range in shutting down httpd.js. I did some time measurements with the Mozmill test attached to the before mentioned bug and a Nightly and Release build, whereby the latter behaves correctly.

Nightly:

TEST-START | testHTTPd1.js | setupModule
TEST-START | testHTTPd1.js | testOpenLocalPages
TEST-PASS | testHTTPd1.js | testOpenLocalPages
INFO | ** Shutdown duration: 5617
TEST-START | testHTTPd2.js | setupModule
TEST-START | testHTTPd2.js | testHttpFailure
TEST-PASS | testHTTPd2.js | testHttpFailure
TEST-START | testHTTPd2.js | teardownModule
INFO | ** Shutdown duration: 96

Release (14.0)
TEST-START | testHTTPd1.js | setupModule
TEST-START | testHTTPd1.js | testOpenLocalPages
TEST-PASS | testHTTPd1.js | testOpenLocalPages
INFO | ** Shutdown duration: 100
TEST-START | testHTTPd2.js | setupModule
TEST-START | testHTTPd2.js | testHttpFailure
TEST-PASS | testHTTPd2.js | testHttpFailure
TEST-START | testHTTPd2.js | teardownModule
INFO | ** Shutdown duration: 97

I will have to check for the regression range and hope that we can fix that before the current beta gets merged to release.
This is a huge regression somewhere in the core or httpd.js code which causes our Mozmill tests to execute absolutely slow. After nearly each single test we can expect a delay of 5-6s.

If QA should still be up in signing off from any kind of release or build, we have to get this fixed. Asking for tracking the affected branches.
Regression range matches the one from bug 767332, which regressed on mozilla-central between the builds 2012-04-25-03-06-47 and 2012-04-26-03-05-04.

PASS: http://hg.mozilla.org/mozilla-central/rev/75c7378c87b6
FAIL: http://hg.mozilla.org/mozilla-central/rev/cc5254f9825f

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=75c7378c87b6&tochange=cc5254f9825f

Looks like I should do a hg bisect to find the causing changeset.
The first bad revision is:
changeset:   92418:c773ee0f0245
user:        Patrick McManus <mcmanus@ducksong.com>
date:        Wed Apr 25 08:54:42 2012 -0400
summary:     bug 729133 - internal necko speculative connections r=honzab
Component: httpd.js → Networking: HTTP
Product: Testing → Core
Summary: Regression in shutdown time for ServerSocket when multiple local pages have been opened → Regression in shutdown time of httpd.js (~6s) when multiple local pages have been opened
So steps to reproduce:

1. Create a virtual env and install mozmill: 'pip install mozmill'
2. Clone https://github.com/mozilla/mozmill
3. Run: mozmill -b %path_to_firefox% -m mutt/mutt/tests/js/frame/httpd/tests.ini

Notice the long delay after 'TEST-PASS | testHTTPd1.js | testOpenLocalPages'.
Attached file HTTP log
Adding qawanted and regression-window wanted - yes, let's get on this while we're still in Beta and identify the problem here.
henrik already indicated the changeset that triggers the bug is the speculative connection code in necko.

but the bug is in the server not shutting down based on the new idle connection, not the new code. The idle connection is an expected artifact of the feature.
(In reply to Lukas Blakk [:lsblakk] from comment #6)
> Adding qawanted and regression-window wanted - yes, let's get on this while
> we're still in Beta and identify the problem here.

Not sure QA or I can do anything more on it.

(In reply to Patrick McManus [:mcmanus] from comment #7)
> but the bug is in the server not shutting down based on the new idle
> connection, not the new code. The idle connection is an expected artifact of
> the feature.

So how should this be solved? Is it part of the httpd.js code or is a core fix necessary? If the first, should the connection be hard-killed by getting rid of the existent idle connection? Or what would be the right behavior of the server? Shall we move it back to testing/httpd.js?
Ed, are you aware of any slowdowns when running our in-tree tests with mochitest and others? I just would like to know how wide-spread this issue is and how much does it affect us in testing.
Unfortunately much of the slowdowns I notice are in total end-to-end time, which are dependant on pending time + build time + test run time + setup/cleanup inbetween. These vary significantly by time of day and by week-to-week infra issues, so it's hard to notice regressions like this.

I would say to look a the GoFaster dashboard (http://brasstacks.mozilla.com/gofaster/#/executiontime/test?range=all&include_outliers=0), but it only goes back as far as June.

Believe catlee has helped espindola answer end-to-end time questions recently, so might be able to advise further; CCing.
(In reply to Ed Morley [:edmorley] from comment #10)
> Unfortunately much of the slowdowns I notice are in total end-to-end time,
> which are dependant on pending time + build time + test run time +
> setup/cleanup inbetween. These vary significantly by time of day and by
> week-to-week infra issues, so it's hard to notice regressions like this.

Wait. Probably other frameworks are not affected because they don't shutdown httpd.js after each test. So please scratch that idea.
This is a problem in httpd.js, which won't shut itself down when it has an open client connection. I don't know all that much about how httpd.js is more widely used, but presumably some users call stop() to stop accepting new connections and want to kindly wind down the ones underway.

onStopListening: function(socket, status)
  {
    dumpn(">>> shutting down server on port " + socket.port);
    this._socketClosed = true;
    if (!this._hasOpenConnections())
    {
      dumpn("*** no open connections, notifying async from onStopListening");


I guess the easiest thing to do if you don't want that behavior is to add a reap() function to nsIHttpServer that closes open connections.
Thanks Patrick. I will try if I can get it working with Mozmill. If it does it's hopefully easy to port forward to the original implementation under network/test/.

Lets move it back to the httpd.js component then.
Component: Networking: HTTP → httpd.js
Product: Core → Testing
Version: unspecified → 15 Branch
another approach would be something like this (completely untested and undebugged!) - to automagically close any socket that hadn't seen at least a request line at the time of stop().. this has the advantage that the calling code doesn't need to call reap() to make the right thing happen.

hth

diff --git a/netwerk/test/httpserver/httpd.js b/netwerk/test/httpserver/httpd.js
--- a/netwerk/test/httpserver/httpd.js
+++ b/netwerk/test/httpserver/httpd.js
@@ -467,16 +467,24 @@ nsHttpServer.prototype =
    *   the reason the socket stopped listening (NS_BINDING_ABORTED if the server
    *   was stopped using nsIHttpServer.stop)
    * @see nsIServerSocketListener.onStopListening
    */
   onStopListening: function(socket, status)
   {
     dumpn(">>> shutting down server on port " + socket.port);
     this._socketClosed = true;
+    for (var n in this._connections) {
+      if (!n._requestStarted)
+          n.close();
+    }
+
     if (!this._hasOpenConnections())
     {
       dumpn("*** no open connections, notifying async from onStopListening");
 
       // Notify asynchronously so that any pending teardown in stop() has a
       // chance to run first.
       var self = this;
       var stopEvent =
@@ -1127,16 +1135,19 @@ function Connection(input, output, serve
   /**
    * The request for which a response is being generated, null if the
    * incoming request has not been fully received or if it had errors.
    */
   this.request = null;
 
   /** State variables for debugging. */
   this._closed = this._processed = false;
+
+  /** whether or not 1st line of request has been received */
+  this._requestStarted = false; 
 }
 Connection.prototype =
 {
   /** Closes this connection's input/output streams. */
   close: function()
   {
     dumpn("*** closing connection " + this.number +
           " on port " + this._outgoingPort);
@@ -1190,16 +1201,21 @@ Connection.prototype =
   },
 
   /** Converts this to a string for debugging purposes. */
   toString: function()
   {
     return "<Connection(" + this.number +
            (this.request ? ", " + this.request.path : "") +"): " +
            (this._closed ? "closed" : "open") + ">";
+  },
+
+  requestStarted: function()
+  {
+      this._requestStarted = true;
   }
 };
 
 
 
 /** Returns an array of count bytes from the given input stream. */
 function readBytes(inputStream, count)
 {
@@ -1376,16 +1392,17 @@ RequestReader.prototype =
     if (!readSuccess)
       return false;
 
     // we have the first non-blank line
     try
     {
       this._parseRequestLine(line.value);
       this._state = READER_IN_HEADERS;
+      this._connection.requestStarted();
       return true;
     }
     catch (e)
     {
       this._handleError(e);
       return false;
     }
Attached patch WIP (obsolete) — Splinter Review
Something along those lines? Works at least for me but not sure if it is the right way to close all the open connections. Not sure whom I should ask for feedback given that Jeff is out right now. Patrick, would you mind to have a look at? The patch doesn't include any tests yet.
Attachment #646141 - Flags: feedback?(mcmanus)
Hm, haven't seen the latest comment. So this WIP implements the formerly proposed solution.
Comment on attachment 646141 [details] [diff] [review]
WIP

the problem here is that it shuts down already established connections that are maybe actively doing something. Jeff wrote a bunch of code to make that happen, so presumably something is using it.

so I think the approach in comment 14 works better.. if you want to go with reap it probly needs to be called from mozmill, not internal to httpd.js

but I'm not an httpd.js expert :)
Attachment #646141 - Flags: feedback?(mcmanus)
(In reply to Henrik Skupin (:whimboo) [away 07/27 - 08/05] from comment #11)
> (In reply to Ed Morley [:edmorley] from comment #10)
> > Unfortunately much of the slowdowns I notice are in total end-to-end time,
> > which are dependant on pending time + build time + test run time +
> > setup/cleanup inbetween. These vary significantly by time of day and by
> > week-to-week infra issues, so it's hard to notice regressions like this.
> 
> Wait. Probably other frameworks are not affected because they don't shutdown
> httpd.js after each test. So please scratch that idea.

Quick drive by comment - fwiw, both reftest and xpcshell start up httpd.js on demand like mozmill does.
As I understand, this is caused by the 250ms syn-retry, right?  We could then turn the syn-retry pref off for xpcshell tests.  Any test depending on it, like the test for that feature it self, should explicitly change the pref and then clean it up.
it could also be some form of speculative connect. In generally we'll be doing more of this stuff by default, and we should test in the default config if at all possible.. so imo fixing the server is preferrable (and I think pretty easy).
It's probably worth to wait until Jeff Walden is back. I will hold of with other patches for now and await his feedback.
Jeff, would you mind to give us some feedback on how we should proceed here? Thanks!
Re-ping again. Jeff Walden, it would be really great to get feedback from you. This bug is causing a massive slowdown of our Mozmill tests execution, which I want to see fixed.
Severity: normal → major
Sigh, been juggling too many browser profiles, this tab got dropped in one I haven't opened recently.  Looking now...
Flags: needinfo?(jwalden+bmo)
I don't think it's acceptable for stopping the server to kill in-progress connections by default.  The approach in comment 14 is about as far as I think we might, maybe, reasonably go in the server itself in a way that would affect all httpd.js users unavoidably.  A separate reap() or whatever that clients could call if they wanted wouldn't be a bad idea, either.  It's just something people should have to make an affirmative effort to use -- connections once made successfully, with data transmitted, shouldn't be closed of the server's own volition.
Flags: needinfo?(jwalden+bmo)
Comment on attachment 646141 [details] [diff] [review]
WIP

Sounds like a good approach. Jeff, would you have time to work on this? I'm totally loaded with WebRTC and B2G tests at the moment.
Attachment #646141 - Attachment is obsolete: true
(In reply to Henrik Skupin (:whimboo) from comment #27)
> Sounds like a good approach. Jeff, would you have time to work on this? I'm
> totally loaded with WebRTC and B2G tests at the moment.

Re-ping a month later. Would you be able to help us?
Flags: needinfo?(jwalden+bmo)
Jeff, can we please get feedback from you? I requested the needinfo flag a half month ago. Thanks!
Whiteboard: [qa-automation-blocked]
...given I haven't had time to get to this in that time, I think it's pretty clear the answer's going to have to be no.  :-\

I can answer questions from people who have them, but I have some pretty pressing JS work that needs doing well before I could possibly get to this on my own time.
Flags: needinfo?(jwalden+bmo)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #30)
> I can answer questions from people who have them, but I have some pretty
> pressing JS work that needs doing well before I could possibly get to this
> on my own time.

Thanks for the update Jeff. I hope you have nothing against it when I put this bug into mentored bucket. We can hopefully find someone to work on it.

Given that we will have a larger cluster for Mozmill tests soon this is no longer a real blocker for us, it's just an annoying issue.
Whiteboard: [qa-automation-blocked] → [mentor=waldo][lang=js]
Reading the bug, there isn't a clear solution. Not sure that this is a great first bug.
Whiteboard: [mentor=waldo][lang=js] → [lang=js]
(In reply to Andrew Halberstadt [:ahal] from comment #32)
> Reading the bug, there isn't a clear solution. Not sure that this is a great
> first bug.

It indeed doesn't contain all the information necessary to fix the bug. Waldo, if you are interested to be the mentor can you please point out what would be necessary to do and give links to the source code? The following link will help in terms of necessary information: http://www.joshmatthews.net/deck.js/mentor/#onemore
Flags: needinfo?(jwalden+bmo)
Whiteboard: [lang=js] → [mentor=waldo][lang=js]
Whiteboard: [mentor=waldo][lang=js]
Source code is <http://mxr.mozilla.org/mozilla-central/source/netwerk/test/httpserver/httpd.js>.  As for what to do, you'd have to skim this bug to figure it out.  :-\  My recollection is that the last dozen or so comments discuss the issue, but they probably don't explain it all in TPS-note-formatted details.

This is still first-bug territory, for someone motivated and not looking for just a change-the-code-simply patch.  But I probably don't have time right now to handhold anyone through a solution to it.  :-(
Flags: needinfo?(jwalden+bmo)
With this patch, I am able to get test_update.js to finish, but there are various assertions:

ASSERTION: closing a connection [object Object] that we never added to the set of open connections?
ASSERTION: double-notifying?

It would be great if somebody who understands httpd.js could resolve this soon, because this is now blocking bug 754356, which is a high-priority bug for the networking team.
Jeff, it looks like we need your help here to get this blocker fixed and to finish the work on bug 754356. Could you have a look at Brian's last comment please?
Flags: needinfo?(jwalden+bmo)
Assignee: nobody → mcmanus
Status: NEW → ASSIGNED
Brian. I was able to easily repro your asserts and I made a couple changes to cope with them

1] a flag to figure out the double close because closing the socket has the side effect of making the state code that is reading from the input stream see an EOF (which it assumes is coming from the other end) and close it again.

2] reorder the existing "server is closed" flag to be after we reap the idle sockets instead of before it to take care of the double-notify problem.

I'm sorry it took to long - I'm still very interested in 754356 :)
Attachment #801752 - Flags: review?(brian) → review?(hurley)
Comment on attachment 801752 [details] [diff] [review]
make httpd.js shutdown quickly with idle never used sockets

Actually I think jason said he knew something about this code
Attachment #801752 - Flags: review?(hurley) → review?(jduell.mcbugs)
Comment on attachment 801752 [details] [diff] [review]
make httpd.js shutdown quickly with idle never used sockets

Review of attachment 801752 [details] [diff] [review]:
-----------------------------------------------------------------

nice stuff.  Thanks for fixing this, Patrick.  I assume you tested and it makes the shutdown regression go away?  Someone should verify that that's the case (maybe Henrik can do that if mcmanus didn't get around to it).

::: netwerk/test/httpserver/httpd.js
@@ +483,1 @@
>      if (!this._hasOpenConnections())

testing "if (foo)" then "if (!foo)": would be prettier as "if... else".  Not a big deal.
Attachment #801752 - Flags: review?(jduell.mcbugs) → review+
Flags: needinfo?(jwalden+bmo)
(In reply to Jason Duell (:jduell) from comment #41)

> nice stuff.  Thanks for fixing this, Patrick.  I assume you tested and it
> makes the shutdown regression go away?  Someone should verify that that's
> the case (maybe Henrik can do that if mcmanus didn't get around to it).

I verified that it will shutdown with an open connection that has not yet been used, and will not do so without this patch. I did not specifically verify with mozmill - I'm not well versed in that harness.

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/3b2327d30b10
(In reply to Jason Duell (:jduell) from comment #41)
> makes the shutdown regression go away?  Someone should verify that that's
> the case (maybe Henrik can do that if mcmanus didn't get around to it).

Yes, I can check the fix with Mozmill. Will do it soon.
Wow, that makes a huge difference! With it a lot of our tests are running twice that fast now! Functional tests which take 15 minutes now are run in nearly 7 minutes with the patch.
https://hg.mozilla.org/mozilla-central/rev/3b2327d30b10
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Blocks: 925632
Flags: needinfo?(ryanvm)
Component: httpd.js → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: