Closed
Bug 560056
Opened 15 years ago
Closed 14 years ago
Frequent "uncaught exception: 2147746065" and "Timed out while waiting for server startup" running mochitests on Rev3 darwin machines
Categories
(Release Engineering :: General, defect, P5)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: philor, Unassigned)
References
Details
(Keywords: intermittent-failure, Whiteboard: [talos])
This may very well be a code problem, but nobody's likely to know _what_ problem without some releng help first. From looking back at a day or so of the mochitests I just made visible on tinderbox (3, 4, 5, opt-only, since that's what's kind of greenish):
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271418870.1271419149.1776.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitests-3/5 on 2010/04/16 04:54:30
s: talos-r3-snow-011
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271434132.1271434313.13855.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitests-2/5 on 2010/04/16 09:08:52
s: talos-r3-snow-010
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271434771.1271435643.19212.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitest-other on 2010/04/16 09:19:31
s: talos-r3-snow-004
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271443692.1271443839.8003.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitests-2/5 on 2010/04/16 11:48:12
s: talos-r3-snow-008
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271444741.1271444889.10708.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitests-4/5 on 2010/04/16 12:05:41
s: talos-r3-snow-014
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271453722.1271454252.6113.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitest-other on 2010/04/16 14:35:22
s: talos-r3-snow-010
In every case, the log is the same except what fraction of 92 seconds it takes to get to the timeout:
python mochitest/runtests.py --appname=./Minefield.app/Contents/MacOS/firefox-bin --utility-path=bin --extra-profile-file=bin/plugins --certificate-path=certs --autorun --close-when-done --console-level=INFO --symbols-path=symbols --chrome
...
INFO | runtests.py | Server pid: 521
uncaught exception: 2147746065
Timed out while waiting for server startup.
program finished with exit code 1
elapsedTime=92.593459
Comment 1•15 years ago
|
||
Hi Phil,
The only similarity that I have seen is that the web server gets a pid higher than 500. Every time this happens it seems to time out.
Any pid less than 500 seems to have worked.
Comment 3•15 years ago
|
||
For context,
I have seen these exceptions only to happen in Mo 2, 4 & other and only in this last commit it actually happened in all these 3 suites.
Comment 4•15 years ago
|
||
Logs from my just-filed duplicate bug 560292 (the logs Armen refers to in comment 3):
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271690054.1271690507.15701.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitests-4/5 on
2010/04/19 08:14:14
s: talos-r3-snow-015
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271690031.1271690507.15703.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitests-2/5 on
2010/04/19 08:13:51
s: talos-r3-snow-017
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271690039.1271690511.15709.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitests-3/5 on
2010/04/19 08:13:59
s: talos-r3-snow-007
(thanks for catching the dupe, philor -- I neglected to search bugzilla before filing. Bad dholbert.)
Reporter | ||
Comment 5•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271720150.1271722506.6828.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitests-2/5 on 2010/04/19 16:35:50
s: talos-r3-snow-012
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271720391.1271722435.6549.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitests-3/5 on 2010/04/19 16:39:51
s: talos-r3-snow-003
Updated•15 years ago
|
Whiteboard: [orange] → [orange][talos]
Reporter | ||
Comment 6•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271795056.1271795935.26208.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitests-3/5 on 2010/04/20 13:24:16
s: talos-r3-snow-017
Comment 7•15 years ago
|
||
The failing pid in these logs are again higher than 500 (this doesn't mean it is the reason):
INFO | runtests.py | Server pid: 4971
INFO | runtests.py | Server pid: 559
INFO | runtests.py | Server pid: 4461
INFO | runtests.py | Server pid: 849
INFO | runtests.py | Server pid: 882
INFO | runtests.py | Server pid: 576
Tests suites that have failed so far are 2, 3, 4 & other.
Comment 8•15 years ago
|
||
I've just seen what looks like the same failure on a couple of 10.5.8 mochitest runs:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271840801.1271842216.20839.gz
Rev3 MacOSX Leopard 10.5.8 mozilla-central opt test mochitests-2/5 on 2010/04/21 02:06:41
s: talos-r3-leopard-004
INFO | runtests.py | Server pid: 484
uncaught exception: 2147746065
Timed out while waiting for server startup.
program finished with exit code 1
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271840825.1271842216.20840.gz
Rev3 MacOSX Leopard 10.5.8 mozilla-central opt test mochitests-3/5 on 2010/04/21 02:07:05
s: talos-r3-leopard-030
INFO | runtests.py | Server pid: 498
uncaught exception: 2147746065
Timed out while waiting for server startup.
program finished with exit code 1
Comment 9•15 years ago
|
||
So it seems that this happens once in a while for both 10.5 and 10.6.
My theory of the pid is now 480 and higher. I think this should not be too difficult to try if most of us have 10.6 machines. Does anyone have time to give it a try?
For reference the code being run is:
http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py.in#329
and times out waiting for an alive file to exist.
os.path.exists(aliveFile):
Maybe increasing the timeout will solve this.
Summary: Frequent "uncaught exception: 2147746065" and "Timed out while waiting for server startup" running OS X 10.6 mochitests → Frequent "uncaught exception: 2147746065" and "Timed out while waiting for server startup" running mochitests on Rev3 darwin machines
Updated•15 years ago
|
Blocks: darwin_unittests
Comment 10•15 years ago
|
||
I'm really curious to hear a theory about why the PID could be a factor.
Comment 11•15 years ago
|
||
Assuming the OS simply increments the PIDs it hands out for new processes I can only take it as an indication that we've run more processes. So either the OS itself has run some periodic job, or the reboot after the previous job failed.
Reporter | ||
Comment 12•15 years ago
|
||
http://tinderbox.mozilla.org/Firefox/1272212417.1272212744.21340.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitests-2/5 on 2010/04/25 09:20:17
s: talos-r3-snow-009
INFO | runtests.py | Server pid: 417
uncaught exception: 2147746065
Timed out while waiting for server startup.
Comment 13•15 years ago
|
||
(In reply to comment #12)
> uncaught exception: 2147746065
NS_ERROR_NOT_AVAILABLE, FWIW.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1272631219.1272631618.23647.gz
Rev3 MacOSX Leopard 10.5.8 mozilla-central opt test mochitests-3/5 on 2010/04/30 05:40:19
uncaught exception: 2147746065
Timed out while waiting for server startup.
program finished with exit code 1
TinderboxPrint: mochitest-plain-3<br/><em class="testfail">T-FAIL</em>
And for the previous one:
INFO | runtests.py | Server pid: 437
I'd expect the PIDs to be reasonably consistent most of the time if the machine is rebooting before each run.
Reporter | ||
Comment 16•15 years ago
|
||
http://tinderbox.mozilla.org/Firefox/1272834053.1272834532.11327.gz
Rev3 MacOSX Leopard 10.5.8 mozilla-central opt test mochitests-5/5 on 2010/05/02 14:00:53
s: talos-r3-leopard-029
INFO | runtests.py | Server pid: 478
uncaught exception: 2147746065
Timed out while waiting for server startup.
Reporter | ||
Comment 17•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1272834753.1272835520.13523.gz
Rev3 MacOSX Leopard 10.5.8 mozilla-central debug test mochitests-5/5 on 2010/05/02 14:12:33
s: talos-r3-leopard-009
INFO | runtests.py | Server pid: 600
...
WARNING: failed to bind socket: file /builds/moz2_slave/mozilla-central-macosx-debug/build/netwerk/base/src/nsServerSocket.cpp, line 317
uncaught exception: 2147746065
...
Timed out while waiting for server startup.
Comment 18•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273064040.1273064719.12628.gz
Rev3 MacOSX Leopard 10.5.8 mozilla-central opt test mochitest-other on 2010/05/05 05:54:00
s: talos-r3-leopard-027
INFO | runtests.py | Server pid: 420
uncaught exception: 2147746065
Timed out while waiting for server startup.
Reporter | ||
Comment 19•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273097792.1273098637.4556.gz
Rev3 MacOSX Leopard 10.5.8 mozilla-central debug test mochitests-2/5 on 2010/05/05 15:16:32
s: talos-r3-leopard-039
INFO | runtests.py | Server pid: 396
WARNING: failed to bind socket: file /builds/slave/mozilla-central-macosx-debug/build/netwerk/base/src/nsServerSocket.cpp, line 317
uncaught exception: 2147746065
Timed out while waiting for server startup.
Reporter | ||
Comment 20•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273097798.1273099494.15904.gz
Rev3 MacOSX Leopard 10.5.8 mozilla-central debug test mochitest-other on 2010/05/05 15:16:38
s: talos-r3-leopard-004
INFO | runtests.py | Server pid: 399
WARNING: failed to bind socket: file /builds/slave/mozilla-central-macosx-debug/build/netwerk/base/src/nsServerSocket.cpp, line 317
uncaught exception: 2147746065
Timed out while waiting for server startup.
Reporter | ||
Comment 21•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273184293.1273184612.14635.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitests-3/5 on 2010/05/06 15:18:13
s: talos-r3-snow-019
Reporter | ||
Comment 22•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273184666.1273185114.21540.gz
Rev3 MacOSX Leopard 10.5.8 mozilla-central opt test mochitests-4/5 on 2010/05/06 15:24:26
s: talos-r3-leopard-023
Reporter | ||
Comment 23•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273233295.1273233490.30569.gz
Rev3 MacOSX Leopard 10.5.8 mozilla-central opt test mochitests-4/5 on 2010/05/07 04:54:55
s: talos-r3-leopard-008
Reporter | ||
Comment 24•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273862361.1273862806.28970.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central debug test mochitests-3/5 on 2010/05/14 11:39:21
s: talos-r3-snow-020
Comment 25•15 years ago
|
||
Based on the frequency of this I'm lowering the priority.
Priority: -- → P5
Comment 26•15 years ago
|
||
Comment 27•15 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1274369353.1274369894.17400.gz
This has happened three times on a single check-in so far.
Comment 28•14 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1286505654.1286505947.29917.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central debug test mochitests-3/5 on
2010/10/07 19:40:54
s: talos-r3-snow-017
INFO | runtests.py | Server pid: 394
WARNING: Error reading channel-prefs.js: file /builds/slave/mozilla-central-macosx64-debug/build/modules/libpref/src/nsPrefService.cpp, line 922
nsNativeModuleLoader::LoadModule("/Users/cltbld/talos-slave/mozilla-central_snowleopard-debug_test-mochitests-3/build/MinefieldDebug.app/Contents/MacOS/components/libxpcomsample.dylib") - load FAILED, rv: 80004005, error:
<unknown; can't get error from NSPR>
WARNING: failed to bind socket: file /builds/slave/mozilla-central-macosx64-debug/build/netwerk/base/src/nsServerSocket.cpp, line 317
uncaught exception: 2147746065
Timed out while waiting for server startup.
Comment 30•14 years ago
|
||
Can we fix xpcshell to print a stack trace when it handles an uncaught exception? That might make this a bit easier to diagnose.
Comment 31•14 years ago
|
||
At least in this most recent failure, it's xpcshell failing to bind the socket.
Comment 32•14 years ago
|
||
Reporter | ||
Comment 33•14 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1288896208.1288896391.17021.gz
Rev3 MacOSX Leopard 10.5.8 tracemonkey opt test mochitests-3/5 on 2010/11/04 11:43:28
s: talos-r3-leopard-028
INFO | runtests.py | Server pid: 506
uncaught exception: 2147746065
Timed out while waiting for server startup.
Reporter | ||
Comment 34•14 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1289276044.1289276328.20053.gz
Rev3 MacOSX Leopard 10.5.8 mozilla-central opt test mochitests-2/5 on 2010/11/08 20:14:04
s: talos-r3-leopard-008
INFO | runtests.py | Server pid: 391
uncaught exception: 2147746065
Timed out while waiting for server startup.
Reporter | ||
Comment 35•14 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1289276044.1289276986.23005.gz
Rev3 MacOSX Leopard 10.5.8 mozilla-central opt test mochitest-other on 2010/11/08 20:14:04
s: talos-r3-leopard-028
INFO | runtests.py | Server pid: 390
uncaught exception: 2147746065
Timed out while waiting for server startup.
Particularly bad as a part of mochitest-other, since this doesn't put anything into the brief log, so with mochitest-chrome hitting this and mochitest-browser-chrome hitting a common intermittent orange, it got double-starred as just the browser-chrome failure.
Comment 36•14 years ago
|
||
I usually see this when xpcshell is already running. Could it be that we are running two test suites (mochitest-chrome, browser-chrome, etc...) on the same machine and for some reason we are not cleaning up properly before we run the second test suite?
Comment 37•14 years ago
|
||
Comment 38•14 years ago
|
||
Comment 39•14 years ago
|
||
Comment 40•14 years ago
|
||
Reporter | ||
Comment 41•14 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1296176355.1296178136.12534.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central debug test mochitests-3/5 on 2011/01/27 16:59:15
s: talos-r3-snow-018
Reporter | ||
Comment 42•14 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1297190320.1297192779.7408.gz
Rev3 MacOSX Leopard 10.5.8 mozilla-central debug test mochitest-other on 2011/02/08 10:38:40
s: talos-r3-leopard-036
Reporter | ||
Comment 43•14 years ago
|
||
Dunno about any of the other instances, but comment 42 was the result of killing a previous test run, which skips the reboot step, so httpd.js is still running from that and can't start up again.
Comment 44•14 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1297194573.1297194873.16591.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test mochitests-1/5 on 2011/02/08 11:49:33
s: talos-r3-snow-024
Reporter | ||
Comment 45•14 years ago
|
||
So, we had one thing through last November, when something got fixed (I wanted to pin it on the Bluetooth dialog, but that was the right day but in December, not November), then we had the period when we were killing test jobs and that wasn't running the reboot step, and now we don't have either one.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
Assignee | ||
Updated•12 years ago
|
Keywords: intermittent-failure
Assignee | ||
Updated•12 years ago
|
Whiteboard: [orange][talos] → [talos]
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Comment 46•9 years ago
|
||
Still happening with Thunderbird 38.1.0
Timestamp: Tue, August, 04, 08, 2015 3:19:03 AM
Error: uncaught exception: 2147746065
Comment 47•9 years ago
|
||
(In reply to Erdin Kacan from comment #46)
> Still happening with Thunderbird 38.1.0
>
> Timestamp: Tue, August, 04, 08, 2015 3:19:03 AM
> Error: uncaught exception: 2147746065
Erdin, is the next line of your error text the same as the reporter's?
Timed out while waiting for server startup.
Flags: needinfo?(philringnalda)
QA Contact: mshal
Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(philringnalda)
You need to log in
before you can comment on or make changes to this bug.
Description
•