Closed Bug 767332 Opened 12 years ago Closed 12 years ago

If tests use a page from the local httpd.js server more than once successive tests will fail because of an inappropriate shutdown of the server

Categories

(Testing Graveyard :: Mozmill, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: whimboo)

References

Details

(Keywords: regression, Whiteboard: [mozmill-2.0+][mozmill-1.5.17+])

Attachments

(3 files)

Running Mozmill against the elementslib manifest throws the following error:

ERROR | Test Failure | {
  "exception": {
    "stack": [
      "createInstance@resource://mozmill/driver/mozelement.js:45", 
      "@resource://mozmill/driver/mozelement.js:57", 
      "@resource://mozmill/modules/frame.js -> file:///Volumes/data/code/tools/mozmill/mutt/mutt/tests/js/elementslib/radio_buttons.js:17", 
      "@resource://mozmill/modules/frame.js:577", 
      "@resource://mozmill/modules/frame.js:648", 
      "@resource://mozmill/modules/frame.js:545", 
      "@resource://mozmill/modules/frame.js:694", 
      "@resource://jsbridge/modules/Bridge.jsm:140", 
      "@resource://jsbridge/modules/Bridge.jsm:147", 
      "@resource://jsbridge/modules/Server.jsm:33", 
      ""
    ], 
    "message": "could not find element ID: radio1", 
    "fileName": "resource://mozmill/driver/mozelement.js", 
    "name": "Error", 
    "lineNumber": 45
  }
}

I will check why that happens. Could be related to a broken httpd service.
This is somehow related to the httpd process. When we load the local test page I can see a 'Bad Request' text in the content window but not the actual page content.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
So this seems to be an issue with HTTPd. When I enable the DEBUG flag of the server I see:

INFO | HTTPD-INFO | *** unrecognized hostname (localhost:43336) in Host header, 400 time
INFO | HTTPD-INFO | *** error in request: 400

This happens not all the time and could be related to some caching.
So this looks kinda strange so far and I have no idea why that happens. But I get closer and what I can see now is that this bug should definitely block Mozmill 2.0.

All this is related how HTTPd handles maps of hosts and ports. In case of a passing test it will look like:

{"xlocalhost":{"43336":"http"},"x127.0.0.1":{"43336":"http"}}

But for a failing test I can see:

{"xlocalhost":{},"x127.0.0.1":{}}

So there is no mapping information (port and protocol) available beside the host. Looks like the initialization is failing to update the map correctly.
Summary: Test 'tests/js/elementslib/radio_buttons.js' fails with 'could not find element ID: radio1' → Some JS tests fail because HTTPd wrongly handles scheme maps
Whiteboard: [mozmill-2.0?]
Here a trace of the broken behavior:

INFO | ****** add scheme (before): {"xlocalhost":{}} localhost 43336
INFO | ****** add scheme (after): {"xlocalhost":{"43336":"http"}}
INFO | 
INFO | ****** add scheme (before): {"xlocalhost":{"43336":"http"}} localhost 43336
INFO | ****** add scheme (after): {"xlocalhost":{"43336":"http"}}
INFO | 
INFO | ****** add scheme (before): {"xlocalhost":{"43336":"http"}} 127.0.0.1 43336
INFO | ****** add scheme (after): {"xlocalhost":{"43336":"http"},"x127.0.0.1":{"43336":"http"}}
TEST-START | radio_buttons.js | setupModule
INFO | 
TEST-START | radio_buttons.js | test
INFO | ****** get scheme: {"xlocalhost":{},"x127.0.0.1":{}}

As you can see the maps are getting added successfully but are reset before the test gets started. Now HTTPd can't work anymore.
Summary: Some JS tests fail because HTTPd wrongly handles scheme maps → Some JS tests fail because a reset of the HTTPd scheme map before starting the test
Actually this works with Firefox Beta which is 14.0 right now, but fails with latest Aurora. So something in Firefox has probably been changed we are affected from.
Regressed on mozilla-central between the builds 2012-04-25-03-06-47 and 2012-04-26-03-05-04.

PASS: http://hg.mozilla.org/mozilla-central/rev/75c7378c87b6
FAIL: http://hg.mozilla.org/mozilla-central/rev/cc5254f9825f

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=75c7378c87b6&tochange=cc5254f9825f

I'm fairly sure that this is because of bug 695480 and could also affect Mozmill 1.5. I will do a custom build around this changeset to verify my suspicion.
So this is definitely a regression from bug 695480. Without that feature we don't run into this issue as reported on this bug.

We have two possible solutions here:

1. Investigate the problem with HTTPd and fix it (tests shouldn't be affected)
2. Replace HTTPd with mozhttpd (tests will probably have to be updated)

Clint, what would you say?
I think perhaps it is time to look into mozhttpd.
Whiteboard: [mozmill-2.0?] → [mozmill-2.0+]
(In reply to Clint Talbert ( :ctalbert ) from comment #8)
> I think perhaps it is time to look into mozhttpd.

+1
Depends on: 754847
So as by today we have a couple of new failures in our Mozmill CI testruns which are related to this issue. All that happens with Mozmill 1.5.16 now.

http://mozmill-ci.blargon7.com/#/functional/report/27072cb61461b83e1447b1979b06ef8d

I don't think that we can disable some of those tests because it would open a ton of worms and other tests will start to fail. Keep in mind that a requirement was to always have local test data. So we have to figure out what's wrong here and probably can move out the work on mozhttpd to Mozmill 2.1. We can't do the replacement on the 1.5 branch.
Whiteboard: [mozmill-2.0+] → [mozmill-2.0+][mozmill-1.5.17]
Whiteboard: [mozmill-2.0+][mozmill-1.5.17] → [mozmill-2.0+][mozmill-1.5.17?]
Summary: Some JS tests fail because a reset of the HTTPd scheme map before starting the test → Some JS tests fail in opening locally served pages because a reset of the HTTPd scheme map before starting the test
CC'ing Kyle and Jeff in the hope at least one of them can help us here.
So actually I got closer to this problem...

So in Mozmill's current state we start and stop the httpd server for each single test. That one is a bad behavior we are aware of but isn't related to this issue. When we run a test we make a call to loadSubScript() for each (sub)module the collector finds.

https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/modules/frame.js#L119

Each of those calls load specified module and runs the code which is located in the global scope. For our own tests for Firefox that means the collector.addHttpResource() method, which sets the root of the local server, is called already at this point. It updates the mapping schema of httpd.js. 

http://hg.mozilla.org/qa/mozmill-tests/file/31f2e3267ae6/tests/endurance/testTabbedBrowsing_PinAndUnpinAppTab/test1.js#l9

It looks like that this call which is chrome -> content -> chrome correctly sets the location schema of httpd.js but once the function returned, the data gets wiped out due to the change on bug 695480.

When I make changes to the tests and call addHttpResource() inside of setupModule() which gets executed in privileged chrome code, the mentioned problem doesn't occur because there is no content chrome boundary we pass here. The mapping schema persists and will be used correctly inside the test.

So what does it mean for us? It looks like that we shouldn't put any code into the global scope of a module but really make use of setupModule() instead to define constants and do any other initialization code. This function was targeted for this purpose, but our tests are plain wrong at the moment by not fully using it for everything.

So in fact this is not a failure in the framework but in our tests. I would propose that we do not change Mozmill at this point by making the httpd object global, and initialized during startup. This will cause some other changes I don't think we should do on this old branch. We should keep it as it.

Clint and Jeff, what do you think?
Blocks: 776917
Looks like that not all tests can be fixed with my proposed solution. Could be that there is another bug in Mozmill or httpd.js. I will have to investigate further tomorrow.
Actually the following testcase we open in our tests is causing httpd.js to fail.

http://hg.mozilla.org/qa/mozmill-tests/file/31f2e3267ae6/data/layout/mozilla_mission.html

More information will be available tomorrow.
Whiteboard: [mozmill-2.0+][mozmill-1.5.17?] → [mozmill-2.0+]
So actually this only happens if a single test opens a locally served page more than once. In such a case successive tests will fail even when they are opening a totally different local page. 

I put together a Mozmill test which let me reproduce this problem:

https://github.com/whimboo/mozmill/commit/1cf97f8bc0beced860cdb75d4b8e2b2a87461997

Given that there is nothing we can do in our tests to fix that, it has to be fixed in the Mozmill harness or even in HTTPd.js if it's a problem in that module. So we should release a 1.5.17 with this fix included.
Summary: Some JS tests fail in opening locally served pages because a reset of the HTTPd scheme map before starting the test → If tests use a locally served page more than once successive tests will fail because of a reset of the HTTPd schema map
Whiteboard: [mozmill-2.0+] → [mozmill-2.0+][mozmill-1.5.17?]
Attached file HTTPd.js log
This is the full debug log of HTTPd.js with the failure included as the last request.
It seems to be related how we stop the server. The following line will cause a failure in our test:

this.httpd.stop(function () { });

But when I replace it with the following code, everything works fine:

this.httpd._requestQuit();

Looks like this is really an issue with the httpd.js module. I will dig into deeper.
So when we call stop() on the HTTPServer object we do not successfully close all the sockets and as result we do not call the stop callback.

if (!this._hasOpenConnections() && this._socketClosed)
  this._notifyStopped();


http://mxr.mozilla.org/mozilla-central/source/netwerk/test/httpserver/httpd.js#788

Here this._socketClosed is false after the first test module has been finished.

CC'ing Boris who hopefully can help us as long as Jeff is away.
Hm. It seems like that when we call stop() it takes an extremely long time for the server to shutdown. In my case it's about 5s when the same local page gets opened more than once in test 1. When I remove the second load of the page everything seems fine and the shutdown is done under a second.
Run my tests with the local fix for latest Firefox release and it works great. No delays in shutting down the server. But with Beta it's happening now.

Boris, looks like I should file a separate bug so we can get the socket shutdown issue solved? I will do it once I'm back later today and hopefully got some helpful feedback. Thanks.
It's probably not dependent on the hueyfix but another socket related change during the same day. If that's the case I will file a new bug in a bit.

For now lets get up the patch for Mozmill so that we can start the release process.
Summary: If tests use a locally served page more than once successive tests will fail because of a reset of the HTTPd schema map → If tests use a page from the local httpd.js server more than once successive tests will fail because of an inappropriate shutdown of the server
Attached file Patch (Mozmill 2.0)
Pointer to Github pull-request
Comment on attachment 645726 [details]
Patch (Mozmill 2.0)

This fixes the broken code in Mozmill for shutting down HTTPd.js. We now really wait until the server and sockets are gone before starting the next test. If we don't do this a socket from the former HTTPd instance is lingering around and it can't fulfill our request.

This patch doesn't fix the massive delay I can see now in closing the socket. And I wouldn't propose that we make the HTTPd.js instance global for now. We should better move to mozhttpd for Mozmill 2.0. For Mozmill 1.5 we shouldn't change any API, but have to ensure we get the socket regression fixed for Firefox 15.
Attachment #645726 - Attachment description: Pointer to Github pull request: https://github.com/mozilla/mozmill/pull/75 → Patch (Mozmill 2.0)
Attachment #645726 - Flags: review?(jhammel)
This really has to be a blocker for Mozmill 1.5.17. It causes a lot of our tests to fail now.
Whiteboard: [mozmill-2.0+][mozmill-1.5.17?] → [mozmill-2.0+][mozmill-1.5.17+]
Attached file Patch (Mozmill 1.5)
Pointer to Github pull-request
Comment on attachment 645736 [details]
Patch (Mozmill 1.5)

Appropriate patch for the hotfix-1.5 branch.
Attachment #645736 - Attachment description: Pointer to Github pull request: https://github.com/mozilla/mozmill/pull/76 → Patch (Mozmill 1.5)
Attachment #645736 - Flags: review?(jhammel)
Blocks: 777335
Depends on: 777354
No longer depends on: 754847
Attachment #645736 - Flags: review?(jhammel) → review+
Attachment #645726 - Flags: review?(jhammel) → review+
master:

https://github.com/mozilla/mozmill/commit/d081365f8a880adafc980764005bb85bb97f6c0e

hotfix-1.5:
https://github.com/mozilla/mozmill/commit/9a8aa8462ae205d48cf611c458999a617ccd79aa
No longer blocks: hueyfix
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Sorry, I don't know much of anything about the test HTTP server...
Blocks: 799557
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: