Closed Bug 649012 (FlakyTimeout) Opened 13 years ago Closed 9 years ago

Fail mochitests which use flaky timeouts (setTimeout(x) for x > 0)

Categories

(Testing :: Mochitest, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla37

People

(Reporter: ehsan.akhgari, Assigned: ehsan.akhgari)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: dev-doc-needed, sheriffing-P2)

Attachments

(1 file, 2 obsolete files)

I have a patch which fails all mochitests which use flaky setTimeout values (calls with a timeout greater than 0).  If for some reason the only way to test something is to use such a timeout, the test author will have a way to opt out of this automated protection, while being publicly shamed for doing so.  :-)

I'm trying to make sure that all of the existing tests run fine with this change.  I'll be submitting patches as I move along (or will file new bugs if rewriting a test not to use these types of timeouts is a significant project).
If a test really does need such timeouts, they need to call SimpleTest.requestFlakyTimeouts with a reason which will be displayed in the logs as a known failure.  Otherwise, setTimeout(func, timeout) where timeout > 0 will always result in a failure.

This will help us have fewer new flaky tests which will intermittently turn orange checked into the tree.
Attachment #525077 - Flags: review?(ted.mielczarek)
Keywords: dev-doc-needed
OS: Mac OS X → All
Hardware: x86 → All
Version: unspecified → Trunk
Comment on attachment 525077 [details] [diff] [review]
Change the suite to fail tests with flaky timeouts

>diff --git a/testing/mochitest/tests/SimpleTest/SimpleTest.js b/testing/mochitest/tests/SimpleTest/SimpleTest.js
>+SimpleTest._flakyTimeoutIsOK = false;
>+SimpleTest._originalSetTimeout = window.setTimeout;
>+window.setTimeout = function SimpleTest_setTimeoutShim() {
>+    if (arguments.length > 1 && arguments[1] > 0) {
>+        if (SimpleTest._flakyTimeoutIsOK) {
>+            SimpleTest.todo(false, "The author of the test has indicated that flaky timeouts are expected.  Reason: " + SimpleTest._flakyTimeoutReason);

I think we should fail if SimpleTest._flakyTimeoutReason isn't set or is the empty string.
(In reply to comment #2)
> Comment on attachment 525077 [details] [diff] [review]
>   --> https://bugzilla.mozilla.org/attachment.cgi?id=525077
> Change the suite to fail tests with flaky timeouts
> 
> >diff --git a/testing/mochitest/tests/SimpleTest/SimpleTest.js b/testing/mochitest/tests/SimpleTest/SimpleTest.js
> >+SimpleTest._flakyTimeoutIsOK = false;
> >+SimpleTest._originalSetTimeout = window.setTimeout;
> >+window.setTimeout = function SimpleTest_setTimeoutShim() {
> >+    if (arguments.length > 1 && arguments[1] > 0) {
> >+        if (SimpleTest._flakyTimeoutIsOK) {
> >+            SimpleTest.todo(false, "The author of the test has indicated that flaky timeouts are expected.  Reason: " + SimpleTest._flakyTimeoutReason);
> 
> I think we should fail if SimpleTest._flakyTimeoutReason isn't set or is the
> empty string.

We already do, when calling SimpleTest.requestFlakyTimeouts.
Comment on attachment 525077 [details] [diff] [review]
Change the suite to fail tests with flaky timeouts

Looks good. Can you add a self-test for this feature in http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/ ?

Also, please make sure you update the Mochitest docs on MDN.
Attachment #525077 - Flags: review?(ted.mielczarek) → review+
Nice, As I said on IRC, I'll work on Places tests to remove some work from Ehsan's shoulders.
(In reply to comment #4)
> Comment on attachment 525077 [details] [diff] [review]
> Change the suite to fail tests with flaky timeouts
> 
> Looks good. Can you add a self-test for this feature in
> http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/ ?

Is there any good way to do that?  I was looking for a way to reverse the meaning of pass and fail, so that I can add a setTimeout(f, 100) for example, and test that it causes a failure, but I couldn't figure out if that's possible.

> Also, please make sure you update the Mochitest docs on MDN.

I will do that, once this lands.  I'll also do the dev.planning post.
Also, ted: can I ask you for a rubberstamp review on the trivial changes that I make to tests?  For more involved changes containing test rewrites, I will probably file separate bugs.  But I do have a patch in my queue which fixes a bunch of these right now.
Testing that a test fails would be tricky, I agree, so how about you just test that requireFlakyTimeouts works?

How about I just give you a blanket-rs=me to trivially fix tests? You can file bugs on more complicated test fixes.
cleaned up test_342484.xul on Places, will merge to central soon.
http://hg.mozilla.org/projects/places/rev/024f5e4df8d5

B-C tests will follow, there are more of those.
Now with a test.
Attachment #525077 - Attachment is obsolete: true
Attachment #526404 - Flags: review?(ted.mielczarek)
Depends on: 650410
(In reply to comment #8)
> How about I just give you a blanket-rs=me to trivially fix tests? You can file
> bugs on more complicated test fixes.

Thanks a lot, this will make my job much easier!  I will occasionally commit the test fixes and will put up links to the revisions on this bug.
Comment on attachment 526404 [details] [diff] [review]
Change the suite to fail tests with flaky timeouts

Thanks!
Attachment #526404 - Flags: review?(ted.mielczarek) → review+
I need to return the timeout identifier from the setTimeout shim.  This version of the patch does that.
Attachment #526404 - Attachment is obsolete: true
Depends on: 650534
Depends on: 650543
Depends on: 650545
Depends on: 650547
Depends on: 492181
Depends on: 650553
Depends on: 526164
Depends on: 650579
Second batch of blanket timeout fixes: http://hg.mozilla.org/mozilla-central/rev/d407ee4dabcc
Depends on: 650580
Depends on: 650581
Depends on: 650583
Depends on: 650557, 650585, 650586
Alias: FlakyTimeout
Depends on: 517579
With the patches that I have so far, mochitest-3 and mochitest-a11y are now green on try.
Depends on: 544356
Depends on: 592366
As said on IRC, I can work on mochitest-1 part.
test_342484.xul changes have now been merged to central (http://hg.mozilla.org/mozilla-central/rev/024f5e4df8d5).
Third batch of wholesale timeout fixes: <http://hg.mozilla.org/mozilla-central/rev/d8623ab1d8fd>
(In reply to comment #21)
> Third batch of wholesale timeout fixes:
> <http://hg.mozilla.org/mozilla-central/rev/d8623ab1d8fd>

test_bug583948.xul failed on Linux64: <http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303169230.1303170886.963.gz>, so I reverted it: <http://hg.mozilla.org/mozilla-central/rev/394772171075>.
Depends on: 651001
Summary: Fail mochitests which use flaky timeouts intentionally → Fail tests which use flaky timeouts (setTimeout(x) for x > 0)
Summary: Fail tests which use flaky timeouts (setTimeout(x) for x > 0) → Fail mochitests which use flaky timeouts (setTimeout(x) for x > 0)
(In reply to comment #19)
> As said on IRC, I can work on mochitest-1 part.

Great, please share your progress here.  This is the latest try server push that I have, if you want to know what needs fixing: <http://tbpl.mozilla.org/?tree=MozillaTry&rev=1e74cb3504fd>.

Thanks!
Depends on: 651075
Depends on: 651077
Depends on: 651078
Depends on: 651089
(In reply to comment #21)
> Third batch of wholesale timeout fixes:
> <http://hg.mozilla.org/mozilla-central/rev/d8623ab1d8fd>

           // Do not use SimpleTest.executeSoon() here: give a little more time.
-          setTimeout(nextStep, 25);
+          setTimeout(nextStep, 0);

Does that comment still make sense?
hm, probably most setTimeout(0) calls should be converted to executeSoon.
And actually writing a comment like that without explaining the reasoning is really useless :)
Depends on: 651136
(In reply to comment #24)
> (In reply to comment #21)
> > Third batch of wholesale timeout fixes:
> > <http://hg.mozilla.org/mozilla-central/rev/d8623ab1d8fd>
> 
>            // Do not use SimpleTest.executeSoon() here: give a little more
> time.
> -          setTimeout(nextStep, 25);
> +          setTimeout(nextStep, 0);
> 
> Does that comment still make sense?

It never made sense!  If this test needs to wait for something, it should be rewritten as such.

(In reply to comment #25)
> hm, probably most setTimeout(0) calls should be converted to executeSoon.
> And actually writing a comment like that without explaining the reasoning is
> really useless :)

I don't generally advocate to replace setTimeout(0) with executeSoon, since the latter is not a web API.  It might make sense more for mochitest-browser-chrome though.
Depends on: 651902
A few of my patches have been landed a few days ago and some are waiting for reviews. With them, mochitest-1 failures should be reduced by ~50% I think.
Depends on: 652982
No longer depends on: 544356
Depends on: 508369
Depends on: 659559
SimpleTest.js shares a window with browser.js, which uses setTimeout with a delay > 0 on various occasions...
Updating to reality, I'm not working on this.
Assignee: ehsan → nobody
CC'ing Ed and Joel who are always interested to kill orangeness.
Keywords: sheriffing-P2
Other footgun functions that mochitests might want to disable are setInverval, Math.random, and Date objects.
Blocks: 996504
So, I'm going to try to see if I can get the dependencies of this bug which have been sitting around waiting to be landed for years now.  In order to do that, I'm going to need to add a dummy empty version of SimpleTest.requestFlakyTimeout() to the SimpleTest API so that I can land the patches which use this to annotate known necessary instances of this.  Does that sound fine to you, Ted?
Flags: needinfo?(ted)
Go for it!
Flags: needinfo?(ted)
Depends on: 1006648
I turned this on for mochitest-plain on inbound, tagging all of the existing tests that used setTimeout like this as |SimpleTest.requestFlakyTimeout("untriaged");|.

https://hg.mozilla.org/integration/mozilla-inbound/rev/fa8bd3194aca
Assignee: nobody → ehsan.akhgari
https://hg.mozilla.org/mozilla-central/rev/fa8bd3194aca
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Can someone please provide the rational for this "feature"?

Out of the blue this new feature turns test into oranges like here: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=67de08ef9bda
 
Is the general idea that using setTimeout() is a bad thing?
Personally I don't think so. Mochitest itself has a super generic timeout of 300s and we are using lots of shorter timeouts in the WebRTC test just for reason that the generic mochitest timeout is pretty useless when it comes to debugging why and where things timed out. Currently the WebRTC mochitest in dom/media/tests/mochitest use setTimeout() in 19 places.
Is the expectation here that I should remove all these timeouts now? That is pretty unrealistic for WebRTC tests as our tests rely on network connections being made and we need to be prepared to wait for things to happen for some time in several places.
Or do I have to mark all of the WebRTC tests now as flaky?

What puzzles me is why this feature here suddenly starts to complain about this one test (test_peerConnection_basicAudioVideoNoBundle.html) which does not sets any timeout values by itself. Why doesn't this error message come up for all of our WebRTC tests?
I'm worried that this suggests that this feature itself is not working reliable.

Some more constructive feedback:
- I think the wording of the error message is pretty bad. What means "flaky timeout value"? This suggests to people that there are values which are OK to use. I would prefer if the error message would be something like "Using setTimeout() with a timeout value other then 0 directly from your test is strongly discouraged".
- Just providing the timeout value makes it virtually impossible for me to figure out about which setTimeout() call this test is actually complaining about. The test failure should at least try to log via info() something like the function/code which the setTimeout gets passed in. Or a backtrace from where this was called would have been really useful too.
Flags: needinfo?(ehsan.akhgari)
What about setInterval?
in fact these tests do use 60000 settimeout:
http://dxr.mozilla.org/mozilla-central/source/dom/media/tests/mochitest/pc.js#

is it possible that can be fixed?  maybe wait for an event?
There are also setTimeout calls made that are only used for debugging purposes in case some event is not fired, for example. Surely, that usage of timers is not wrong, is it?
(In reply to Nils Ohlmeier [:drno] from comment #37)
> Is the general idea that using setTimeout() is a bad thing?
> Personally I don't think so.

(This isn't the place for this discussion with long-form back-and-forth -- m.d.platform is probably a better place for that. There's a thread called "PSA: Flaky timeouts in mochitest-plain now fail newly added tests" -- maybe have this discussion there?

> Is the expectation here that I should remove all these timeouts now?
[...]
> Or do I have to mark all of the WebRTC tests now as flaky?

(I think one of those things, yeah. I think ehsan already labeled all affected tests in the tree, per comment 35.)
 
> What puzzles me is why this feature here suddenly starts to complain about
> this one test (test_peerConnection_basicAudioVideoNoBundle.html) which does
> not sets any timeout values by itself.

That test doesn't seem to be in the tree, so I can't actually look at it to answer your question.  But if it's anything like "test_peerConnection_basicAudioVideo.html" (which is a similarly-named test that *is* in the tree) then:
 - It does use timeout values, via "pc.js" (which defines PeerConnectionTest, which is what that test uses to drive its testing).

> Why doesn't this error message come
> up for all of our WebRTC tests?

The ones that are in the tree were already given SimpleTest.requestFlakyTimeout("untriaged");, per comment 35. (including e.g. test_peerConnection_basicAudioVideo.html)

> I'm worried that this suggests that this feature itself is not working
> reliable.

Hopefully the above eases your worries. :)

> Some more constructive feedback:
> - I think the wording of the error message is pretty bad. What means "flaky
> timeout value"?

Have you seen the documentation for the function?
https://mxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/SimpleTest.js#642
Clearing the needinfo as we have been discussing this on dev-platform.

If you have specific requests for follow-ups, please file new bugs and needinfo me on them.  Thanks!
Flags: needinfo?(ehsan.akhgari)
You need to log in before you can comment on or make changes to this bug.