Closed
Bug 920952
Opened 11 years ago
Closed 11 years ago
Flush metro test harness output to avoid timeouts due to no output
Categories
(Testing :: General, defect)
Tracking
(firefox27 fixed, firefox28 fixed, firefox29 fixed, firefox-esr24 fixed, b2g-v1.2 fixed)
RESOLVED
FIXED
mozilla29
People
(Reporter: cbook, Assigned: jimm)
References
()
Details
(Keywords: intermittent-failure, regression)
Attachments
(2 files, 1 obsolete file)
747 bytes,
patch
|
bbondy
:
review+
|
Details | Diff | Splinter Review |
41.50 KB,
patch
|
jimm
:
review+
|
Details | Diff | Splinter Review |
WINNT 6.2 mozilla-inbound pgo test mochitest-metro-chrome on 2013-09-26 00:26:51 PDT for push c3a044164b5f slave: t-w864-ix-125 timed out after 1000 seconds of no output https://tbpl.mozilla.org/php/getParsedLog.php?id=28393821&tree=Mozilla-Inbound
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 23•11 years ago
|
||
So, what happened here was that for regular mochitest 330 seconds without output timeouts like comment 15, we hit the normal timeout, then we hit a mozprocess 1000 second timeout, then eventually we hit a 1200 second without output timeout, which is horrible and stupid and wasteful, and much better than For metro-chrome, we just completely silently hit the 330 second timeout, logging nothing at all so that bug 915620 thought that it had disappeared, then hit the 1000 second and the 1200 second timeouts. If I had the slightest idea who did what in what repo somewhere between the 24th and 26th of September, I'd be happily backing them out, because omg, we added *37 minutes* to every timeout on this platform, while completely removing any useful output from one entire test suite. That's *awful*.
Severity: normal → critical
Comment 24•11 years ago
|
||
I'm not really sure where the line between harness and automation is anymore to guess who might have done this, but if you're wondering where your win8 capacity has gone, it's gone down the tubes of taking 37 minutes to do a 330 seconds without output timeout.
Comment 25•11 years ago
|
||
09/23 was bug 746243, which is sorta plausible.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•11 years ago
|
Blocks: 746243
Keywords: regression
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 28•11 years ago
|
||
Sounds likely that it is bug 746243. Summit is distracting developers, but will have jhammel look at it as soon as he can.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 31•11 years ago
|
||
Have been fighting failing wifi and batteries amidst sessions. subprocess being undefined, like for comment 15, should be fixed by http://hg.mozilla.org/mozilla-central/rev/7d9905b9ba5d . I will look over the brunt of the issue now.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 34•11 years ago
|
||
Interesting to note that all occurrences are on opt/pgo builds, none on debug builds.
Comment 35•11 years ago
|
||
All of the failures occur during the last test run, at different points. Could this be a stdout buffering problem?
Comment 36•11 years ago
|
||
Opt-only isn't surprising, since we've only had a few non-metro-chrome instances, and we don't run metro-chrome on debug because it's full of fail. Then, "330 seconds without output" from a debug build? They're lucky to go 3.3 seconds without screaming about something totally unrelated, much less 330. Being the last test to run may well explain the underlying and preexisting metro-chrome failure, but comment 15's mochitest-1 failure was not in the last test in that suite.
Comment 37•11 years ago
|
||
comment 15 was the "NameError: global name 'subprocess' is not defined" error, which I believe is fixed now; I believe the only remaining problem is with the metro tests.
Comment 38•11 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #35) > All of the failures occur during the last test run, at different points. > Could this be a stdout buffering problem? I wouldn't be surprised at all, given the number of problems recorded with mozprocess output buffering (see bug 924253 for the most recent filed). Unfortunately, these are difficult to reproduce, difficult to debug, and have hitherto not been either a (big) problem or priority.
Comment 39•11 years ago
|
||
I've gone through the various apparatus in attempt to diagnose what is going on here. Take for example this example log: https://tbpl.mozilla.org/php/getParsedLog.php?id=28788855&full=1&branch=mozilla-inbound#error0 The last output before the timeouts is at 9:18:23: """ 09:18:23 INFO - TEST-INFO | chrome://mochitests/content/metro/browser/metro/base/tests/mochitest/browser_urlbar_trimURLs.js | Console message: [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://example.com/" line: 0}] """ There are 3 different timeouts: * 330s : From runtests.py: http://hg.mozilla.org/mozilla-central/file/64b497e6f593/testing/mochitest/runtests.py#l987 For whatever reason, we do not print to log when we hit this timeout. * 1000s [@ 9:35:03]: From mozharness: http://hg.mozilla.org/build/mozharness/file/76d08a2d15f6/mozharness/base/script.py#l651 http://hg.mozilla.org/build/mozharness/file/76d08a2d15f6/scripts/desktop_unittest.py#l398 * 1200s [@ 9:55:50]: From buildbot: """ ========= Started run_script exception (results: 4, elapsed: 43 mins, 15 secs) (at 2013-10-06 09:12:34.393232) ========= 'c:/mozilla-build/python27/python' '-u' 'scripts/scripts/desktop_unittest.py' '--cfg' 'unittests/win_unittest.py' '--mochitest-suite' 'mochitest-metro-chrome' '--download-symbols' 'ondemand' in dir C:\slave\test\. (timeout 1200 secs) (maxTime 7200 secs) """ ---- Q: Why isn't 330s timeout recorded? The previous incarnation: http://hg.mozilla.org/mozilla-central/file/64b497e6f593/build/automation.py.in#l767 The current incarnation: http://hg.mozilla.org/mozilla-central/file/64b497e6f593/testing/mochitest/runtests.py#l897 http://hg.mozilla.org/mozilla-central/file/64b497e6f593/testing/mochitest/runtests.py#l1028 Mozharness invokes the command """ 09:15:32 INFO - Running command: ['C:\\slave\\test\\build\\venv\\Scripts\\python', '-u', 'C:\\slave\\test\\build\\tests\\mochitest/runtests.py', '--appname=C:\\slave\\test\\build\\application\\firefox\\firefox.exe', '--utility-path=tests/bin', '--extra-profile-file=tests/bin/plugins', '--symbols-path=http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32-pgo/1381064449/firefox-27.0a1.en-US.win32.crashreporter-symbols.zip', '--certificate-path=tests/certs', '--autorun', '--close-when-done', '--console-level=INFO', '--browser-chrome', '--metro-immersive'] in C:\slave\test\build """ Why the log.info line doesn't get printed....I do not know. ---- Other interesting tidbits: """ 09:15:32 INFO - Calling ['C:\\slave\\test\\build\\venv\\Scripts\\python', '-u', 'C:\\slave\\test\\build\\tests\\mochitest/runtests.py', '--appname=C:\\slave\\test\\build\\application\\firefox\\firefox.exe', '--utility-path=tests/bin', '--extra-profile-file=tests/bin/plugins', '--symbols-path=http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32-pgo/1381064449/firefox-27.0a1.en-US.win32.crashreporter-symbols.zip', '--certificate-path=tests/certs', '--autorun', '--close-when-done', '--console-level=INFO', '--browser-chrome', '--metro-immersive'] with output_timeout 1000 ProcessManager NOT managing child processes ProcessManager UNABLE to use job objects to manage child processes 09:15:34 INFO - ProcessManager UNABLE to use job objects to manage child processes 09:15:34 INFO - ProcessManager NOT managing child processes """ This is known, but bad. Due to a whole slew of complex issues, including how twisted native works as well as our patch for it, mozprocess cannot manage job subprocesses via job objects on windows. It is also worth noting that AFAIK mozprocess has not been rigorously tested on metro.
Comment 40•11 years ago
|
||
> ProcessManager NOT managing child processes
> ProcessManager UNABLE to use job objects to manage child processes
> 09:15:34 INFO - ProcessManager UNABLE to use job objects to
> manage child processes
> 09:15:34 INFO - ProcessManager NOT managing child processes
Looking at the logs, this only seems to occur with Win8. Also, I can't reproduce it locally with a Win8 VM, but that may be due to differences in execution...I'm running things from a console, and that may not be true in buildbot.
Comment 41•11 years ago
|
||
> Why the log.info line doesn't get printed....I do not know. We have a similar issue with B2G tests, bug 904839.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 66•11 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #40) > > ProcessManager NOT managing child processes > > ProcessManager UNABLE to use job objects to manage child processes > > 09:15:34 INFO - ProcessManager UNABLE to use job objects to > > manage child processes > > 09:15:34 INFO - ProcessManager NOT managing child processes > > Looking at the logs, this only seems to occur with Win8. Also, I can't > reproduce it locally with a Win8 VM, but that may be due to differences in > execution...I'm running things from a console, and that may not be true in > buildbot. Buildbot (actually twistd) will run scripts so that it can manage the breakaway processes and close them. We modified our scripts for this some time ago, and I don't believe we have this issue on other windows platforms when running under buildbot. So, unless win8 changed the way things work under the covers of the createProcess api then we should still be good. But if this is happening due to the interaction with the way twistd starts processes, you won't be able to repro this in any way except on a slave where the process has been started by buildbot. What is our current status here? Anyone have ideas on how to debug this? My only thought would be to have access to the sendchange script and send a change to the buildbot slave over and over until the thing hangs and then investigate why it hung. The easiest way to do that is probably by sending a super-intensive logging patch to try server and retriggering the sh*t out of it.
Comment 67•11 years ago
|
||
I am hoping that the work I've done for bug 904839 will be applicable here as well. And yes, I think the best way to test is to add some instrumentation to a patch and retrigger it on a try job many times.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 77•11 years ago
|
||
I suspect the root cause of this bug is the screen saver mode. We (jlund and myself) determined in Talos that the win8 boxes have a screen saver that starts with 15 minutes of inactivity. If the browser is automated (metro mode only?) and the run takes >15 minutes the browser process will be suspended and the python script will time out. We would run into all types of timeout problems and after changing the shut off screen timer from 15 minutes to 120 minutes (just picked a single number) our tests have continued to run reliably. This might not be the problem here, but we really should make this change and see if this problem goes away.
Comment 78•11 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #77) > I suspect the root cause of this bug is the screen saver mode. Thank you - filed bug 929473 so we can rule it out either way :-)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•11 years ago
|
Whiteboard: [release28]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 150•11 years ago
|
||
It looks like the screen saver mode wasn't the culprit here. Also bug 904839 has been marked fixed but this is still happening. Do you know what the next steps here are?
Flags: needinfo?(jgriffin)
Comment 151•11 years ago
|
||
I suspect there's some buffering issue at work here; since this only happens in the metro test suite, I've wondered if adding a flush() here would fix the problem: http://mxr.mozilla.org/mozilla-central/source/browser/metro/shell/testing/metrotestharness.cpp#202
Flags: needinfo?(jgriffin)
Comment 152•11 years ago
|
||
Well, we have multiple bugs (bug 914344 and bug 892990 leap to mind) where we have the same pattern of "330 seconds without output" "1000 seconds of no output" "1200 seconds without output" before finally managing to "kill" the job by disconnecting the slave and turning the job purple rather than the orange it should have been 2200 seconds earlier. I'd agree that eating the 330 second timeout's message is metro-chrome-only, but that failure to log is probably the smallest part of the problem here, compared to the way we waste 2200 seconds by failing to kill after 330 seconds and failing to kill yet again after another 1000 seconds, the way we fail to provide a stack for the hang because we didn't ever manage to kill the browser, the way we set the job's result to look like infra, and the way we (I think) fail to reboot the slave, every time we have a test hang.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 192•11 years ago
|
||
Reproduced on try: https://tbpl.mozilla.org/php/getParsedLog.php?id=31104304&tree=Try&full=1#error0 ; unfortunately, none of the diagnostics I added are particularly insightful. We are using the FirefoxRunner from https://github.com/mozilla/mozbase/blob/master/mozrunner/mozrunner/local.py#L208 ; 20:30:47 INFO - Using mozrunner FirefoxRunner 20:30:47 INFO - ProcessManager UNABLE to use job objects to manage child processes 20:30:47 INFO - ProcessManager NOT managing child processes 20:30:47 INFO - INFO | runtests.py | Application pid: 3816 20:30:47 INFO - INFO | metrotestharness.exe | Launching browser... 20:30:47 INFO - INFO | metrotestharness.exe | App model id='E4CFE2E6B75AA3A3' 20:30:47 INFO - INFO | metrotestharness.exe | Harness process id: 3816 20:30:47 INFO - INFO | metrotestharness.exe | Using bin path: 'C:\slave\test\build\application\firefox\firefox.exe' 20:30:47 INFO - INFO | metrotestharness.exe | Writing out tests.ini to: 'C:\slave\test\build\application\firefox\tests.ini' 20:30:47 INFO - INFO | metrotestharness.exe | Browser command line args: 'C:\slave\test\build\application\firefox\firefox.exe about:blank -foreground -profile c:\users\cltbld~1.t-w\appdata\local\temp\tmp3awdu_' 20:30:49 INFO - INFO | metrotestharness.exe | Activation succeeded. 20:30:49 INFO - INFO | runtests.py | metro browser sub process id detected: 2476 20:30:49 INFO - INFO | metrotestharness.exe | METRO_BROWSER_PROCESS=2476 20:30:49 INFO - INFO | metrotestharness.exe | Waiting on child process... 20:30:53 INFO - XRE_MetroCoreApplicationRun: IsMainThread:0 ThreadId:668 This uses in in-tree metrotestharness.exe: 09:49:44 INFO - ['C:\\slave\\test\\build\\tests\\bin\\metrotestharness.exe', '-firefoxpath', 'C:\\slave\\test\\build\\application\\firefox\\firefox.exe', 'about:blank', '-foreground', '-profile', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmpssyifs'] I'll try to add a flush to the metrotestharness.cpp ; even if this works, however, it won't really resolve any of the mystery
Comment 193•11 years ago
|
||
(In reply to Jeff Hammel [:jhammel] from comment #192) > Reproduced on try: > https://tbpl.mozilla.org/php/getParsedLog. > php?id=31104304&tree=Try&full=1#error0 ; unfortunately, none of the > diagnostics I added are particularly insightful. > > We are using the FirefoxRunner from > https://github.com/mozilla/mozbase/blob/master/mozrunner/mozrunner/local. > py#L208 ; > > 20:30:47 INFO - Using mozrunner FirefoxRunner > 20:30:47 INFO - ProcessManager UNABLE to use job objects to > manage child processes > 20:30:47 INFO - ProcessManager NOT managing child processes > 20:30:47 INFO - INFO | runtests.py | Application pid: 3816 > 20:30:47 INFO - INFO | metrotestharness.exe | Launching > browser... > 20:30:47 INFO - INFO | metrotestharness.exe | App model > id='E4CFE2E6B75AA3A3' > 20:30:47 INFO - INFO | metrotestharness.exe | Harness process id: > 3816 > 20:30:47 INFO - INFO | metrotestharness.exe | Using bin path: > 'C:\slave\test\build\application\firefox\firefox.exe' > 20:30:47 INFO - INFO | metrotestharness.exe | Writing out > tests.ini to: 'C:\slave\test\build\application\firefox\tests.ini' > 20:30:47 INFO - INFO | metrotestharness.exe | Browser command > line args: 'C:\slave\test\build\application\firefox\firefox.exe > about:blank -foreground -profile > c:\users\cltbld~1.t-w\appdata\local\temp\tmp3awdu_' > 20:30:49 INFO - INFO | metrotestharness.exe | Activation > succeeded. > 20:30:49 INFO - INFO | runtests.py | metro browser sub process id > detected: 2476 > 20:30:49 INFO - INFO | metrotestharness.exe | > METRO_BROWSER_PROCESS=2476 > 20:30:49 INFO - INFO | metrotestharness.exe | Waiting on child > process... > 20:30:53 INFO - XRE_MetroCoreApplicationRun: IsMainThread:0 ThreadId:668 > > This uses in in-tree metrotestharness.exe: > > 09:49:44 INFO - > ['C:\\slave\\test\\build\\tests\\bin\\metrotestharness.exe', '-firefoxpath', > 'C:\\slave\\test\\build\\application\\firefox\\firefox.exe', 'about:blank', > '-foreground', '-profile', > 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmpssyifs'] > > I'll try to add a flush to the metrotestharness.cpp ; even if this works, > however, it won't really resolve any of the mystery flush should be done here:http://hg.mozilla.org/mozilla-central/file/34f431863037/browser/metro/shell/testing/metrotestharness.cpp#l201
Comment 194•11 years ago
|
||
pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=12b9eb302672 will retrigger a bunch of test jobs
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Updated•11 years ago
|
Whiteboard: [release28]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 229•11 years ago
|
||
(In reply to Jeff Hammel [:jhammel] from comment #194) > Created attachment 8338742 [details] [diff] [review] > fflush(stdout); > > pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=12b9eb302672 > > will retrigger a bunch of test jobs Did we give up on this? I see a bunch of retriggers, but none with this failure. Should we repush and try again?
Assignee | ||
Comment 230•11 years ago
|
||
I wonder if there's some way we can trigger a stack dump when this hang is detected? We added support for getting the right browser stacks in bug 905628.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 236•11 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #229) > (In reply to Jeff Hammel [:jhammel] from comment #194) > > Created attachment 8338742 [details] [diff] [review] > > fflush(stdout); > > > > pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=12b9eb302672 > > > > will retrigger a bunch of test jobs > > Did we give up on this? I see a bunch of retriggers, but none with this > failure. Should we repush and try again? Perhaps this actually fixes the problem. I've requested 10 more retriggers; if this problem still doesn't show up I'd like to land this patch and see if it goes away. :)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 243•11 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #236) > (In reply to Jim Mathies [:jimm] from comment #229) > > (In reply to Jeff Hammel [:jhammel] from comment #194) > > > Created attachment 8338742 [details] [diff] [review] > > > fflush(stdout); > > > > > > pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=12b9eb302672 > > > > > > will retrigger a bunch of test jobs > > > > Did we give up on this? I see a bunch of retriggers, but none with this > > failure. Should we repush and try again? > > Perhaps this actually fixes the problem. I've requested 10 more retriggers; > if this problem still doesn't show up I'd like to land this patch and see if > it goes away. :) Builds were gone. :/
Assignee | ||
Comment 244•11 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=6ab6fc755327
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 246•11 years ago
|
||
Note, if this does fix the problem, we should generate a build and update the test harness bin in talos as well - http://mxr.mozilla.org/build/source/talos/metro/
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 248•11 years ago
|
||
Yours is better in that you actually submitted your bug comment, but mine - https://tbpl.mozilla.org/?tree=Try&rev=e5957a807ffb - is better in that I already retriggered it like a monkey who just figured out exactly why those food pellets are landing in the bowl.
Assignee | ||
Comment 249•11 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #248) > Yours is better in that you actually submitted your bug comment, but mine - > https://tbpl.mozilla.org/?tree=Try&rev=e5957a807ffb - is better in that I > already retriggered it like a monkey who just figured out exactly why those > food pellets are landing in the bowl. heh, I just did the same on mine before I got this bug post. We should have a great data set to look at once all these jobs finish.
Assignee | ||
Comment 250•11 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #248) > Yours is better in that you actually submitted your bug comment, but mine - > https://tbpl.mozilla.org/?tree=Try&rev=e5957a807ffb - is better in that I > already retriggered it like a monkey who just figured out exactly why those > food pellets are landing in the bowl. So this is interesting, we're getting stacks now which is great. Looks like we're hanging in thread manager shutdown. My guess would be omtc related. Can we go ahead and get the patch here landed? We can file a follow up on the hang.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → jmathies
Assignee | ||
Comment 267•11 years ago
|
||
Not sure what happened to jeff, his email has gone bad. Taking to get this in.
Attachment #8338742 -
Attachment is obsolete: true
Attachment #8348007 -
Flags: review?(netzen)
Updated•11 years ago
|
Attachment #8348007 -
Flags: review?(netzen) → review+
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 272•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/40ca2354d194
Comment 273•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/40ca2354d194
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 275•11 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/a5d8e2a7f245 https://hg.mozilla.org/releases/mozilla-beta/rev/703a5bc03856
status-b2g-v1.2:
--- → affected
status-firefox27:
--- → fixed
status-firefox28:
--- → fixed
status-firefox29:
--- → fixed
status-firefox-esr24:
--- → affected
Comment 276•11 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/38f29914881e https://hg.mozilla.org/releases/mozilla-esr24/rev/3eb37b059b7c
Assignee | ||
Comment 277•11 years ago
|
||
Jordan, we should get this change into the cached talos metrotestharness.exe.
Flags: needinfo?(jlund)
Comment 278•11 years ago
|
||
This updates the exe in our talos repo to reflect recent metrotestharness.exe changes. It also updates the txt file we use to track changes.
Attachment #8355100 -
Flags: review?(jmathies)
Flags: needinfo?(jlund)
Assignee | ||
Updated•11 years ago
|
Attachment #8355100 -
Flags: review?(jmathies) → review+
Comment 279•10 years ago
|
||
pushed to default: https://hg.mozilla.org/build/talos/rev/1a75267be582
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 281•10 years ago
|
||
(Morhping summary to avoid false positives with TBPL bug suggestions)
Summary: Intermittent Win8 - timed out after 1000 seconds of no output → Flush metro test harness output to avoid timeouts due to no output
You need to log in
before you can comment on or make changes to this bug.
Description
•