Closed Bug 970239 Opened 8 years ago Closed 7 years ago

Intermittent test_sendkey_cancel.html,test_delete_focused_element.html | application timed out after 330.0 seconds with no output

Categories

(Core :: DOM: Core & HTML, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: cbook, Assigned: xyuan)

References

(Blocks 1 open bug, )

Details

(Keywords: intermittent-failure)

b2g_emulator_vm mozilla-inbound opt test mochitest-7 on 2014-02-09 23:52:17 PST for push afaec025122d

slave: tst-linux64-spot-482

https://tbpl.mozilla.org/php/getParsedLog.php?id=34403407&tree=Mozilla-Inbound

/tests/dom/inputmethod/mochitest/test_sendkey_cancel.html | application timed out after 330.0 seconds with no output
This is very recent and very failure prone. Can we please get someone to look at this ASAP?
Flags: needinfo?(xyuan)
Flags: needinfo?(janjongboom)
https://tbpl.mozilla.org/php/getParsedLog.php?id=34440713&tree=Mozilla-Central

I'm pretty sure this started on mozilla-inbound. Trying to retrigger my way to a regression range right now.
Summary: Intermittent | /tests/dom/inputmethod/mochitest/test_sendkey_cancel.html | application timed out after 330.0 seconds with no output → Intermittent test_sendkey_cancel.html,test_delete_focused_element.html | application timed out after 330.0 seconds with no output
This didn't start until late yesterday/early today, but now I can reproduce it on retriggers going back to Saturday. I suspect this may be related to the AWS slave switch made on Sunday. Every single occurrence is on a spot instance. Exciting.
Ryan, thanks for your help. I'm trying to reproduce it on my local device.
Assignee: nobody → xyuan
Status: NEW → ASSIGNED
Flags: needinfo?(xyuan)
Depends on: 969590
Flags: needinfo?(janjongboom)
Blocks: 966070
No longer blocks: 966070
Blocks: 945981
(In reply to Yuan Xulei [:yxl] from comment #15)
> Ryan, thanks for your help. I'm trying to reproduce it on my local device.

Any luck? :)
Flags: needinfo?(xyuan)
I can Hardly reproduce it. It seems another Timeout problem.
Flags: needinfo?(xyuan)
Ryan, I checked almost all the log files. Sometime it took about 5 minutes(300s) to launch the test and didn't give enough time for the test running. The tests didn't fail, but reached 330s timeout and was killed.

It seems some machines are too slow to run these tests.
Flags: needinfo?(ryanvm)
I don't think this is a test slave issue because it seemingly affects this test only. If it were the slaves, I would expect it to be more random in nature.

Not sure what info you're looking for from me, so I will point you to this best practices page for you to take a look at:
https://developer.mozilla.org/en-US/docs/Mozilla/QA/Avoiding_intermittent_oranges

Another common failure pattern we've seen is tests assuming that chrome has loaded by a certain point and getting stuck if it hasn't. For example, smaug fixed a bunch of private browsing tests recently by adding checks to make sure the PB chrome had actually loaded before continuing on, which made a big difference.

Hopefully that helps, because that's probably about the best I can do besides telling you that you can always request a loaner slave if you want to test it on the same machines that it's failing on.
Flags: needinfo?(ryanvm)
Most of what I said in bug 957213 comment 185 also applies here.

Another bad pattern that I see here is calling inputmethod_cleanup() (which calls SimpleTest.finish()) multiple times.  It is an error to call finish() in a test more than once.  The mochitest framework knows how to deal with tests which do not call finish() in a reasonable amount of time, by reporting a timeout.  You don't need to detect those types of timeouts manually yourself.

I expect that using event handlers for load and focus events, and also waitForFocus() should address most of the issues in these two bugs.
It doesn't look to me like it does call finish() more than once: it's a chain of functions that set up a Promise, and in the then if it's fulfilled they do a test or two and call the next function in the chain, and only if it's rejected do they call finish() and return, breaking the chain. 

Another common failure pattern we've seen is running your test on a platform that randomly hits the network while your test is running.

I didn't quite realize that the 330 seconds without output claim had moved into only being in logcat after it stopped being in the middle of a mangled line, and that this bug wasn't actually about what is in the summary, so I filed bug 983015 for the "claim that a test which has finished has timed out four times over" thing. I looked at the first dozen logs there, and a random assortment here, and in every case the test is finished (it has output the last test condition, so the only thing that remains for it to do is to call SimpleTest.finish(), if it hasn't already), but perhaps more tellingly, in every single case that I looked at, just before the claim that the test which had nothing more to do has timed out, or has timed out with no output for 330 seconds, we have b2g's webapps-update-timer and/or user-agent-updates-timer firing.

From the earliest log that hasn't yet been deleted:

15:02:09     INFO -  2 INFO TEST-START | /tests/dom/inputmethod/mochitest/test_sendkey_cancel.html
15:02:09     INFO -  ###################################### forms.js loaded
15:02:09     INFO -  *** UTM:SVC TimerManager:notify - notified @mozilla.org/b2g/webapps-update-timer;1

hello, timer that hits the network!

15:02:09     INFO -  ############################### browserElementPanning.js loaded
15:02:09     INFO -  ############################### browserElementPanning.js loaded
15:02:09     INFO -  ######################## BrowserElementChildPreload.js loaded
15:02:09     INFO -  LoadPlugin: failed to initialize shared library libXt.so [libXt.so: cannot open shared object file: No such file or directory]
15:02:09     INFO -  LoadPlugin: failed to initialize shared library libXext.so [libXext.so: cannot open shared object file: No such file or directory]
15:02:09     INFO -  LoadPlugin: failed to initialize shared library /usr/lib/mozilla/plugins/libtotem-narrowspace-plugin.so [/usr/lib/mozilla/plugins/libtotem-narrowspace-plugin.so: wrong ELF class: ELFCLASS64]
15:02:09     INFO -  LoadPlugin: failed to initialize shared library /usr/lib/mozilla/plugins/librhythmbox-itms-detection-plugin.so [/usr/lib/mozilla/plugins/librhythmbox-itms-detection-plugin.so: wrong ELF class: ELFCLASS64]
15:02:09     INFO -  LoadPlugin: failed to initialize shared library /usr/lib/mozilla/plugins/libtotem-mully-plugin.so [/usr/lib/mozilla/plugins/libtotem-mully-plugin.so: wrong ELF class: ELFCLASS64]
15:02:09     INFO -  LoadPlugin: failed to initialize shared library /usr/lib/mozilla/plugins/libtotem-cone-plugin.so [/usr/lib/mozilla/plugins/libtotem-cone-plugin.so: wrong ELF class: ELFCLASS64]
15:02:09     INFO -  LoadPlugin: failed to initialize shared library /usr/lib/mozilla/plugins/libtotem-gmp-plugin.so [/usr/lib/mozilla/plugins/libtotem-gmp-plugin.so: wrong ELF class: ELFCLASS64]

all of that crap (which looks decidedly unlike this test's crap) in successful runs of mochitest-7 happens in between, say, "INFO TEST-END | /tests/dom/inputmethod/mochitest/test_sendkey_cancel.html | finished in 3211ms" and "INFO TEST-START | /tests/dom/ipc/tests/test_NuwaProcessCreation.html" when it quite clearly is not a part of either test

15:02:09     INFO -  3 INFO TEST-INFO | MEMORY STAT vsize after test: 81281024
15:02:09     INFO -  4 INFO TEST-INFO | MEMORY STAT vsizeMaxContiguous not supported in this build configuration.
15:02:09     INFO -  5 INFO TEST-INFO | MEMORY STAT residentFast after test: 33255424

test's over now, right? that's coming from TestRunner.testFinished, as SimpleTest wraps things up

15:13:54     INFO -  6 INB2GRunner TEST-UNEXPECTED-FAIL | /tests/dom/inputmethod/mochitest/test_sendkey_cancel.html | application timed out after 330.0 seconds with no output

and then almost 12 minutes later, Rose Mary Woods takes her foot off the hang button, and we claim to have hung for 5.5 minutes rather than nearly 12, mangling the output like we used to do, sticking it in the middle of the line that was going to say the test had already finished.

15:15:04     INFO -  7 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/inputmethod/mochitest/test_sendkey_cancel.html | Test timed out.
15:15:04     INFO -  8 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/inputmethod/mochitest/test_sendkey_cancel.html | [SimpleTest.finish()] this test already called finish!
15:15:04     INFO -  9 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/inputmethod/mochitest/test_sendkey_cancel.html | called finish() multiple times
15:15:04     INFO -  10 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/inputmethod/mochitest/test_sendkey_cancel.html | Test timed out.
15:15:04     INFO -  11 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/inputmethod/mochitest/test_sendkey_cancel.html | [SimpleTest.finish()] this test already called finish!
15:15:04     INFO -  12 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/inputmethod/mochitest/test_sendkey_cancel.html | called finish() multiple times
15:15:04     INFO -  13 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/inputmethod/mochitest/test_sendkey_cancel.html | Test timed out.
15:15:04     INFO -  14 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/inputmethod/mochitest/test_sendkey_cancel.html | [SimpleTest.finish()] this test already called finish!
15:15:04     INFO -  15 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/inputmethod/mochitest/test_sendkey_cancel.html | called finish() multiple times
15:15:04     INFO -  16 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/inputmethod/mochitest/test_sendkey_cancel.html | Test timed out.
15:15:04     INFO -  17 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | 4 test timeouts, giving up.
15:15:04     INFO -  18 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | Skipping 282 remaining tests.
15:15:04     INFO -  19 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | [SimpleTest.finish()] this test already called finish!
15:15:04     INFO -  20 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | called finish() multiple times

Since nobody has ever really looked at bug 942543, no idea whether this is just an instance of that, or something else, or An Clue to that. I'd probably bet a little bit on "if we hang during SimpleTest.finish(), maybe from the memory stuff, maybe in this case from hitting the network for supposedly background updates, then when we unhang we fire the timed out timer four times and then fall on our sword, in a way that would be comical if we didn't then have to retrigger the run."
Debug, where none of these tests has ever failed, they run around 20 minutes into mochitest-9, unlike opt where they run 2 minutes into mochitest-7. Certainly an interesting question why they die, and the things that run 2 minutes into the other 23 mochitest chunks do not, though.
Phil, thanks for digging into it. That gives me a lot help:-) I was kind of frustrated with this intermittent test failure.
Good news, bad news from https://tbpl.mozilla.org/?tree=Try&rev=94f2424bf18b - disabling the webapps update does indeed seem to stop this failure, which is nice. However, the mochitest-3 and mochitest-5 permaorange apparently must mean that we actually can't pass tests with the webapps that we build installed, and we depend on downloading "updates" before those tests are run to make them work, which is bizarre and horrible.
Depends on: 985128
Inactive; closing (see bug 1180138).
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.