Closed
Bug 574043
Opened 14 years ago
Closed 14 years ago
mozmill fails in tests on CentOS ; timeouts should be introduced to correct
Categories
(Testing Graveyard :: Mozmill, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: k0scist, Assigned: k0scist)
References
Details
Attachments
(1 file)
2.61 KB,
patch
|
Details | Diff | Splinter Review |
In trying to get mozmill to work on buildbot (bug 516984) there are several difficulties encountered. I will recount them all here, even though several of them are unrelated, in order to isolate the work necessary to debug mozmill and firefox on the CentOS vm that I'm using.
Firstly, the Firefox window for minefield is not painted/rendered. This does not seem to affect the tests, but it makes debugging very difficult. I do not know if minefield works on this VM for others. If it does, then I should have access to a setup where I can actually see firefox painting its windows. If it does not, then this bug should be filed separately and corrected.
Initially, using the mozmill packaged tests as part of m-c (http://mxr.mozilla.org/mozilla-central/source/testing/mozmill/), I consistently encountered the issue where mozmill would hang/wait indefinitely and the invoked version of Firefox would become unresponsive. This was observed on every run of `mozmill -b firefox/firefox -t mozmill/tests/firefox` using the buildslave files.
Changing some apparently unrelated code (see attachment) fixes the firefox becoming unresponsive problem. As :ctalbert pointed out, this is likely a race condition. But a race condition of what?
TODO: slowly back out these changes and see what actually breaks/fixes things.
After this part was fixed, mozmill still hung indefinitely. It dies on the testDownloadStates.js test, and very specifically, on this line: http://hg.mozilla.org/qa/mozmill-tests/file/14acc3e342ba/shared-modules/testDownloadsAPI.js#l365 Mozmill waits indefinitely for the click event, which never occurs, assumedly because the element isn't correctly selected. Once again, it would be much easier to diagnose if I could actually see what Firefox was doing as I can on my laptop.
The exact cause of why this hangs I have not yet determined.
A two prong approach should be taken to ensure mozmill does not hang indefinitely:
- there should be a deadman timeout (say 5 min) on the python side that is reset each time the jsbridge python gets a message from the JS counterpart.
- there should be a per-test timeout in the mozmill javascript so that hanging tests can be bypassed. The deadman timeout is a good safety precaution, but it will kill all subsequent tests instead of trying to progress through them, yielding widely inconsistent results if the timeout condition is seen.
The unreponsive firefox issue should be assessed and diagnosed as well.
In general, tests should never hang the harness. We need to do our best to ensure this.
This bug is some overlap with issues in bug 571630. It has been spun off as a separate issue in belief that the root causes may be different. I do not know this to be true or untrue.
Assignee | ||
Updated•14 years ago
|
Assignee: nobody → jhammel
Assignee | ||
Updated•14 years ago
|
Whiteboard: [mozmill-1.4.2?]
Assignee | ||
Comment 1•14 years ago
|
||
Drilling down even further, the failure is actually at a sleep(0) line:
http://mxr.mozilla.org/mozilla-central/source/testing/mozmill/mozmill-1.4.1/mozmill/extension/resource/modules/controller.js#273
Of course, there's something weird, too. Putting a dump() statement in the `wait()` function ( http://mxr.mozilla.org/mozilla-central/source/testing/mozmill/mozmill-1.4.1/mozmill/extension/resource/modules/utils.js#399 ) and in the while loop ( http://mxr.mozilla.org/mozilla-central/source/testing/mozmill/mozmill-1.4.1/mozmill/extension/resource/modules/utils.js#412 ) , the dump in wait gets hit (many many times) and the dump in the while loop gets hit only once.
A e10s issue?
Assignee | ||
Comment 2•14 years ago
|
||
I've added a global timeout to jsbridge:
http://github.com/k0s/jsbridge/commit/39f6b9ea983cb9e469751f7ee012218f0d085ab8
http://github.com/k0s/jsbridge/commit/bd49c30218388c80c83e0ba8e46f99b7e41fdfbd
http://github.com/k0s/jsbridge/commit/0388256d3de02c5ae4ae785ae28e8e78362efeb0
This seems to work correctly; on my ubuntu machine, the tests run through passing, as they did before. On the CentOS VM, instead of the tests hanging, the exception is thrown and the run ends with "Application quit unexpectedly."
There are a few more things that need to be done:
- a CLI switch in mozmill should be added that sets this on the bridge object(s)
- there should be a per-test timeout in mozmill/extension's JS; currently, the whole run stops (as jsbridge doesn't know anything about "tests") whereas, for reproducibility, individual tests should timeout. The default value (whatever it might be) should be overridable on a per-test basis.
- mozmill/mozrunner should ensure that Firefox is killed when Something Bad Happens; currently, the jsbridge stuff kills the run successfully, but Firefox will remain open, which would be bad for buildbot
Assignee | ||
Comment 3•14 years ago
|
||
(In reply to comment #2)
> I've added a global timeout to jsbridge:
<snip/>
> There are a few more things that need to be done:
> - a CLI switch in mozmill should be added that sets this on the bridge
> object(s)
Done:
http://github.com/k0s/mozmill/commit/945cc4b8654330a9c72a28f5071eb4b9d41fb161
I've tested manually and it works. I would doubt there's side effects.
Assignee | ||
Updated•14 years ago
|
Summary: mozmill fails in test runs on CentOS → mozmill fails in tests on CentOS ; timeouts should be introduced to correct
Assignee | ||
Comment 4•14 years ago
|
||
(In reply to comment #1)
<snip/>
> A e10s issue?
No! Foolish me. e10s isn't even on trunk yet.
Assignee | ||
Comment 5•14 years ago
|
||
(In reply to comment #2)
<snip/>
> There are a few more things that need to be done:
<snip/>
> - there should be a per-test timeout in mozmill/extension's JS; currently, the
> whole run stops (as jsbridge doesn't know anything about "tests") whereas, for
> reproducibility, individual tests should timeout.
I have tried to solve this two ways, neither of which seems to work.
Firstly, I tried to use a hidden window and using a setTimeout on that.
http://github.com/k0s/mozmill/commit/716e7389a7168ce35c3d02f724ebfea4bc317c56
This seems to have no effect whatsoever
Secondly, I tried to use nsITimer with a callback function.
http://github.com/k0s/mozmill/commit/6fb0ec3a55fd42e0da670f1c0466e712e339e875
Again, this doesn't seem to work either.
Both of these rely on throw. Are they throwing in the wrong thread? In either case, they do not halt execution (while throwing directly in the code of course does).
I'm kinda at a loss here. The only other thing I know to try is to introduce a worker/master thread model into mozmill on the JS side (and make the master "unsinkable"). This would require a pretty vast overhaul.
The "works today, breaks tomorrow" alternative is to just not do per-test timeouts right now. This will lead to inconsistent results. Still, cleanup would have to be done to better ensure browser-window closing, as this will kill buildbot.
> The default value (whatever
> it might be) should be overridable on a per-test basis.
It was decided (via IRC and IRL) that this wasn't needed up front.
Assignee | ||
Comment 6•14 years ago
|
||
(In reply to comment #5)
<snip/>
> I'm kinda at a loss here. The only other thing I know to try is to introduce a
> worker/master thread model into mozmill on the JS side (and make the master
> "unsinkable"). This would require a pretty vast overhaul.
Maybe not that vast. Maybe each test could be done in a worker thread and that is enough. I'm not sure if this will work. I am somewhat confused by the threading model still.
Assignee | ||
Comment 7•14 years ago
|
||
Except the rendering issue, this bug has been broken into components:
https://bugzilla.mozilla.org/show_bug.cgi?id=504440
https://bugzilla.mozilla.org/show_bug.cgi?id=574871
Closing
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
Assignee | ||
Updated•14 years ago
|
Updated•8 years ago
|
Product: Testing → Testing Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•