<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Reporter

Comment 1

•

12 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Jonathan, is this something to consider? This bascially changes the per-test timeout into timing out when the test isn't spewing out any test-passes anymore for a while.

Attachment #738232 - Flags: review?(jgriffin)

Comment 2

•

12 years ago

Comment on attachment 738232 [details] [diff] [review] patch Review of attachment 738232 [details] [diff] [review]: ----------------------------------------------------------------- This timeout is actually managed on the Python side, in this block of code: http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#747. We wouldn't want to introduce a separate timeout mechanism in the JS. We don't want to change the timeout for non-B2G tests, but you could change options.timeout in runtestsb2g.py.

Attachment #738232 - Flags: review?(jgriffin) → review-

Reporter

Comment 3

•

12 years ago

options.timeout from runtests.py/runtestsb2g.py is passed along to JS here, afaik: http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/setup.js#70 Then it's handled by TestRunner.timeout. I'm not sure what you mean with introducing a separate timeout mechanism in the JS. I'm still using the timeout value from runtests.py, but instead of assuming that the test file is finished within that timeout period, it looks if the test file is still carrying out tests within that timeout period. If it is, it can go on, if not, the timeout code is executed. That seems a lot more robust to me than assuming that a test file is finished with its tests within 5 minutes (which is the current value), which will not be the case on slower machines with test files that has thousands of tests in it.

Comment 4

•

12 years ago

It depends on the kind of timeout you're trying to catch. For example, in one of your try jobs at https://tbpl.mozilla.org/?tree=Try&rev=077b9e7f7d51, this timeout actually occurred because the B2G process crashed. I think just about all of the timeout failures I've seen include "application timed out after 330 seconds with no output", which is generated by the testrunner, and wouldn't be affected by your patch. Have you seen cases where a test is producing output, but times out? If so, your patch might help.

Reporter

Comment 5

•

12 years ago

Attached patch patch pushed to try (obsolete) — Details — Splinter Review

This is the patch that I pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=4d2b426d5db4 A lot of the tests that were giving me problems in order to bug 860748, I removed in the b2g.json list. So with this patch, they shouldn't time out anymore.

Reporter

Comment 6

•

12 years ago

Tryserver gives this fail in chunk 1: 11:23:31 WARNING - TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_xhr_progressevents.html | application timed out after 330 seconds with no output That seems like a valid time out, according to the full log: 11:18:01 INFO - received 'close' command 11:18:01 INFO - received 'open' command 11:18:02 INFO - received 'send' command 11:23:31 INFO - ked-text[2] 11:23:31 INFO - 38149 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | event.loaded increased while running data for without length, as xml for moz-chunked-text[2] 11:23:31 INFO - 38150 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | event.loaded didn't increase too much while running data for without length, as xml for moz-chunked-text[2] 11:23:31 INFO - 38151 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | response strictly grew while running data for without length, as xml for moz-chunked-text[2] 11:23:31 INFO - 38152 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | sanity check for progress while running data for without length, as xml for moz-chunked-text[2] 11:23:31 INFO - 38153 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | new data matches expected while running data for without length, as xml for moz-chunked-text[2] 11:23:31 INFO - 38154 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | lengthComputable while running data for without length, as xml for moz-chunked-text[2] 11:23:31 INFO - 38155 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | should have consumed the expected result 11:23:31 INFO - 38156 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | event type while running data for without length, as xml for moz-chunked-text[3] 11:23:31 INFO - 38157 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | response should be a string while running data for without length, as xml for moz-chunked-text[3] 11:23:31 INFO - 38158 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | reflexivity should hold while running data for without length, as xml for moz-chunked-text[3] 11:23:31 INFO - 38159 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | event.loaded grew by response size while running data for without length, as xml for moz-chunked-text[3] 11:23:31 INFO - 38160 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | event.loaded increased while running data for without length, as xml for moz-chunked-text[3] 11:23:31 INFO - 38161 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | event.loaded didn't increase too much while running data for without length, as xml for moz-chunked-text[3] 11:23:31 INFO - 38162 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | response strictly grew while running data for without length, as xml for moz-chunked-text[3] 11:23:31 INFO - 38163 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | sanity check for progress while running data for without length, as xml for moz-chunked-text[3] 11:23:31 INFO - 38164 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | new data matches expected while running data for without length, as xml for moz-chunked-text[3] 11:23:31 INFO - 38165 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | lengthComputable while running data for without length, as xml for moz-chunked-text[3] 11:23:31 INFO - 38166 INFO TEST-PASS | /tests/content/base/test/test_xhr_progressevents.html | should have consumed the expected result 11:23:31 WARNING - TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_xhr_progressevents.html | application timed out after 330 seconds with no output Anyway, this test file is already in the tree and the patch shouldn't be of any influence here. Btw, some of the text in the log seems to get cut off with this test file running. More worrisome is the failure in chunk 3 of the various machines (Ubuntu, Windows and Mac) at: 11:07:04 INFO - 2912 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/indexedDB/ipc/test_ipc.html | /tests/dom/indexedDB/test/test_index_object_cursors.html | Test timed out. ..for instance. But those seem to be known time outs, afaict, it seems to be bug 860714. I don't know how my patch would affect this negatively. It's weird that these timeouts seem to occur, there are test-passes going on just before the time out appears, but it is exactly 5 minutes when the first Test timed out when test_ipc.html has first appeared. test_Range-set.html seems to cause a real crash on the b2g board, so that test file needs definitely stay into the b2g.json file. test_Range-cloneContents.html, test_Range-deleteContents.html and test_Range-extractContents.html have regular failures (no time outs). I've done a retrigger of these problematic chunks, to see if they happen again. Retrigger of B2G Arm (VM) Mochitest 3 opt requested. Retrigger of B2G Arm (VM) Mochitest 1 opt requested.

Reporter

Comment 7

•

12 years ago

This is setting the timeout to a very high value, because this test doesn't ever want to time out, it says. http://mxr.mozilla.org/mozilla-central/source/dom/indexedDB/ipc/test_ipc.html?force=1#15 SimpleTest.requestLongerTimeout(100) And apparently, it's overriding the TestRunner.log method. http://mxr.mozilla.org/mozilla-central/source/dom/indexedDB/ipc/test_ipc.html?force=1#56 TestRunner.log = sendTestMessage; This stuff could very well be influenced by my patch. But I don't know how that file works exactly, I still would think that this ok() call would cal the correct Testrunner.log() function: http://mxr.mozilla.org/mozilla-central/source/dom/indexedDB/ipc/test_ipc.html?force=1#79 I guess when the TestRunner.log function is overridden, the old one needs to be called in the new one, to get it working.

Reporter

Comment 8

•

12 years ago

Retriggered chunk 1 has passed now. Retriggered chunk 3 gave a tested timed out in test_stringencoding.html, so that one should remain in the exclude list for now, I think. So if the test_ipc.html test can be fixed, then these test files could be put out of the exclude list: - "content/html/content/test/forms/test_input_sanitization.html":"", finished in 795202ms - "dom/tests/mochitest/ajax/jquery/test_jQuery.html":"", finished in 331496ms test_Range-surroundContents.html, test_Range-mutations.html, test_addRange.html", test_Range-insertNode.html were in the problematic chunk 3. test_addRange.html finished once in 691724ms, the other time it seemed to crash in that run. test_Range-mutations.html finished once in 434945ms. test_Range-insertNode.html finished once in 552674ms. The other time in chunk 3, there was a crash in test_Range-set.html. So probably better to keep those range tests in the exclude list for now.

Reporter

Comment 9

•

12 years ago

There are some more files that are redefining TestRunner.log, btw: http://mxr.mozilla.org/mozilla-central/search?string=TestRunner.log+%3D&find=&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central

Reporter

Updated

•

12 years ago

Blocks: 862485

Reporter

Comment 10

•

12 years ago

Attached patch patch — Details — Splinter Review

Ok, this is a new patch that seems to fix the test_ipc.html timeouts (at least, it does on my Macbook). https://tbpl.mozilla.org/?tree=Try&rev=1e77e906de92 In TestRunner.runNextTest, I had to add another TestRunner._lastLogTime = TestRunner._currentTestStartTime to get it working. This makes sense, but since I don't know how test_ipc.html is working exactly, I can't really tell how it is fixing this time out failure. I'm still confused on how test_ipc.html is just sitting silently for minutes before it starts sending out results. I guess the ipc message system is blocked by the constant running of mochitests in the child iframe or something? In any case, I still think test_ipc.html could use some fixing of the TestRunner.log thing. But that is probably better done in a new bug. The only 2 tests that can be removed from the exclude list for now are these 2: - "content/html/content/test/forms/test_input_sanitization.html":"", - "dom/tests/mochitest/ajax/jquery/test_jQuery.html":"",

Attachment #738232 - Attachment is obsolete: true

Attachment #738584 - Attachment is obsolete: true

Reporter

Comment 11

•

12 years ago

Comment on attachment 739172 [details] [diff] [review] patch Tryserver all green for the relevant parts, so this patch seems to work very nicely.

Attachment #739172 - Flags: review?(jgriffin)

Comment 12

•

12 years ago

Comment on attachment 739172 [details] [diff] [review] patch The code looks fine. I'm not sure we want to globally turn off our slow test detection, which is essentially what this does. I assume we wanted it for a reason. Another option would be just to dramatically increase the global timeout for mochitests on B2G. Ted, what do you think?

Attachment #739172 - Flags: feedback?(ted)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 13

•

12 years ago

I wouldn't say the patch is turning off slow test detection, I think it's improving it. Currently, a test file with thousands of tests in it, could be timed out, even though it would be running fine.

Comment 14

•

12 years ago

Comment on attachment 739172 [details] [diff] [review] patch Review of attachment 739172 [details] [diff] [review]: ----------------------------------------------------------------- Overall the changes to the TestRunner.js look good, the other alternative is to use the --timeout cli flag to lengthen the timeout, this might affect that. Could we ensure that our timeout is a valid value after making this change? It seems that we might be able to go with a lower timeout!

Reporter

Comment 15

•

12 years ago

I'm not exactly sure what comment 14 is asking for. I tested b2g mochitest dom/tests/mochitest/ajax/jquery/test_jQuery.html with timeout cli flag of 6 and it timed out without the patch. With the patch, it didn't time out anymore.

Reporter

Updated

•

12 years ago

Blocks: 789265

Reporter

Updated

•

12 years ago

Attachment #739172 - Flags: review?(ted)

Reporter

Updated

•

12 years ago

Blocks: 782720

Comment 16

•

12 years ago

Comment on attachment 739172 [details] [diff] [review] patch Review of attachment 739172 [details] [diff] [review]: ----------------------------------------------------------------- I'm not sure how I feel about this patch. It's really changing how we determine test timeouts, which has the potential to make some tests run a lot longer. I think I'd be a lot happier if we could just bump the global timeout for B2G instead. I really don't think we should be simply changing the meaning of timeout from "total time a test has run without finishing" to "time since the test last produced output".

Attachment #739172 - Flags: feedback?(ted)

Reporter

Comment 17

•

12 years ago

Ok, what would be a good value for the test file timeout on B2G? Or should perhaps the tests that are currently running too long be split into smaller test files? And disable the test file timeout for files that can't be split into smaller test files (like test_ipc.html)?

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Comment 18

•

12 years ago

I don't know, we should figure out what a reasonable value is. There's also a requestLongerTimeout function we can use if specific tests are taking too long.

Comment 19

•

12 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #16) > Comment on attachment 739172 [details] [diff] [review] > patch > > Review of attachment 739172 [details] [diff] [review]: > ----------------------------------------------------------------- > > I'm not sure how I feel about this patch. It's really changing how we > determine test timeouts, which has the potential to make some tests run a > lot longer. I think I'd be a lot happier if we could just bump the global > timeout for B2G instead. I really don't think we should be simply changing > the meaning of timeout from "total time a test has run without finishing" to > "time since the test last produced output". I think if it were attached to a different command-line option with a different code path, rather than flatly redefining what runtime (and thus timeout) means, it might well make sense for some runs. A run against a very slow target--particularly with possibly broken tests--would probably be better served by this style rather than just upping the global timeout. Best case might even be a combo, where timeout since last output is used to watchdog hung tests, and global timeout is used to watchdog looping or unreasonably slow tests. Just a thought. Changing something globally in a way that affects other existing test runs may not be ideal, but I really like the idea of this behavior as a tool that can be optionally enabled.

Comment 20

•

12 years ago

It's true, and I thought about that, but we do already have a per-test timeout as well as a "time without output" timeout in the JS and Python harnesses, respectively, so adding a new one seemed like it would make things even more complicated.

Updated

•

12 years ago

Attachment #739172 - Flags: review?(ted)

Comment 21

•

12 years ago

Comment on attachment 739172 [details] [diff] [review] patch Can we just try adding SimpleTest.requestLongerTimeout(3) (or more) to the affected tests?

Attachment #739172 - Flags: review?(jgriffin)