Closed Bug 895376 Opened 6 years ago Closed 6 years ago

Distinguish real timeouts from long running tests in mochitest-browser

Categories

(Testing :: Mochitest, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
mozilla25

People

(Reporter: mak, Assigned: mak)

References

Details

Attachments

(1 file, 3 obsolete files)

After various cases, I figured our current way of handling timeouts in mochitest-browser is not very good.
The problem is that after a given amount of time (multiple of 30 seconds slots) we just kill the test and notify timeout.
This has 2 negative effects:
1. timing out a running test we may leave the next tests in a bad shape.
2. we timeout regardless the test being really hanged, or just slow. This makes very often hard to distinguish between real test hangs from tests taking longer than expected. I saw different bugs disabling timing out tests, or waiting weeks to be investigated, when would have been enough to add a requestLongerTimeout call to fix the orange in the short term.

Thus, I'd like to change the behavior so that if a test is taking longer than expected but it's not hanged, we print out an error but let the test continue.
On the other side, if the test is really hanging, we proceed with current timeout.
To detect if the test is hanging or not I'd just use output, if the test printed a success/failure/info less than 30s before the timeout, it's pretty clear we are killing an alive test.  We can still maintain a protection against a looping test that could print info in a loop, like a 5 minutes limit (some tests are using a requestLongerTimeout(10)).
Attached patch 895376.diff (obsolete) — Splinter Review
I still want to get a Try run, in the meanwhile I'd like to get an opinion.
Attachment #777752 - Flags: feedback?(gavin.sharp)
The only doubt I still have, is whether this may be confusing in case after some delays the test will really timeout.
something like:
ERROR: test is taking too much time, please requestLongerTimeout and investigate
ERROR: test is taking too much time, please requestLongerTimeout and investigate
ERROR: test is taking too much time, please requestLongerTimeout and investigate
ERROR: timed out

I was wondering if I should silently mark the fact, and when the test finishes, if it didn't timeout, throw the error. That would leave less space for misunderstanding.
Attached patch patch v1.1 (obsolete) — Splinter Review
Yeah, this is better.
Attachment #777752 - Attachment is obsolete: true
Attachment #777752 - Flags: feedback?(gavin.sharp)
Attachment #777764 - Flags: feedback?(gavin.sharp)
Attached patch patch v1.2 (obsolete) — Splinter Review
cleanup, sorry for spam.
now going to try.
Attachment #777764 - Attachment is obsolete: true
Attachment #777764 - Flags: feedback?(gavin.sharp)
Attachment #777770 - Flags: review?(gavin.sharp)
Comment on attachment 777770 [details] [diff] [review]
patch v1.2

>diff --git a/testing/mochitest/browser-harness.xul b/testing/mochitest/browser-harness.xul

>       addResult: function addResult(result) {
>+        this.lastOutputTime = Date.now();

Hmm, I think there are cases where we call addResult that aren't good representations of test activity (e.g. console messages), but I suppose this is probably close enough.

>diff --git a/testing/mochitest/browser-test.js b/testing/mochitest/browser-test.js

>+      // Notify a long running test problem if it didn't end up in a timeout.
>+      if (this.currentTest.unexpectedTimeouts && !this.currentTest.timedOut) {
>+        let msg = "The test took more time than expected, please " +
>+                  "requestLongerTimeout for it and file a bug to investigate " +
>+                  "why it's so slow.";

I would rephrase:
"Test execution time exceeded the timeout threshold. This test should be rewritten or split up to avoid taking so long to run. If that's not possible, use requestLongerTimeout(), but only as a last resort."

>+        if (Date.now() - self.currentTest.lastOutputTime < (gTimeoutSeconds / 2) * 1000 &&
>+            ++self.currentTest.unexpectedTimeouts <= MAX_UNEXPECTED_TIMEOUTS) {
>+            self.currentTest.scope.__waitTimer =
>+              setTimeout(arguments.callee, gTimeoutSeconds * 1000);
>+          return;

Not your bug, but I would take this as an opportunity to get rid of arguments.callee (use a named function).
Attachment #777770 - Flags: review?(gavin.sharp) → review+
Sheriff heads-up: we're going to change browser-chrome test timeout behavior to be a bit more forgiving (which will sometimes result in timeouts failing with a different error message).
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #7)
> Hmm, I think there are cases where we call addResult that aren't good
> representations of test activity (e.g. console messages), but I suppose this
> is probably close enough.

Yes, I thought about that, though that kind of activity would not happen after 15 seconds of inactivity, would rather happen after something initiated by the test, so it's still good activity... Unless there's a loop and then we still have the protection.
This sounds like a great change. 

One question for Ed: If we change the timeout message, then do we need to change the orange suggestion regexes in tbpl? If so, can we get a bug on file for that please (if you haven't already).
Flags: needinfo?(emorley)
Attached patch patch v1.3Splinter Review
Fix the comment, fix arguments.callee, and a strict warning about function declaration not at top level
Attachment #777770 - Attachment is obsolete: true
Notice that we are not changing the timeout message, the timeout message is still the same.
We are distinguishing real time outs from tests taking too long. The former stays the same, the latter will now get a dedicated error.
To be clear, we're not changing the timeout message for tests that actually time out. We are introducing another class of "timeouts", though, that use a different message to indicate that the test took too much time despite continuing to produce output and eventually passing (i.e. to identify tests that just need a requestLongerTimeout or to be split up, most likely).
Summary: Change timeout behavior of browser-chrome tests → Distinguish real timeouts from long running tests in mochitest-browser
Looks great to me - thank you for the heads up! :-)
Flags: needinfo?(emorley)
https://hg.mozilla.org/mozilla-central/rev/6d5466692056
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Example new intermittent: bug 896967.

Are the numerous failures (of form "This test exceeded the timeout threshold. It should be rewritten or split up. If that's not possible, use requestLongerTimeout(N), but only as a last resort.") in later tests expected?
Could this have caused:
"TEST-INFO | checking window state
TEST-UNEXPECTED-FAIL | (browser-test.js) | Found an unexpected"

I get this error while running the webapprt-test-chrome tests (and I didn't have problems in early July)
Component: BrowserTest → Mochitest
You need to log in before you can comment on or make changes to this bug.