Closed Bug 575918 Opened 14 years ago Closed 9 years ago

When run from the console, xpcshell tests can't (reliably) launch content processes on OS X

Categories

(Core :: IPC, defect)

All
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla37

People

(Reporter: romaxa, Assigned: Gavin)

References

Details

Attachments

(2 files)

TEST-PASS | /builds/slave/tryserver-macosx-debug-unittest-xpcshell/build/xpcshell/tests/embed/tests/unit/test_wwpromptfactory.js | test passed

command timed out: 1200 seconds without output, killing pid 575
process killed by signal 9
program finished with exit code -1
elapsedTime=1249.203932
TinderboxPrint: xpcshell<br/><em class="testfail">T-FAIL</em>
buildbot.slave.commands.TimeoutError: command timed out: 1200 seconds without output, killing pid 575
TinderboxPrint: xpcshell<br/><em class="testfail">timeout</em>
=== Output ended ===
======== BuildStep ended ========
======== BuildStep started ========
Pretty sure this is not a random-orange bug. Is this with Electrolysis, or just with tryserver? Does it hang when you run the test locally?
Whiteboard: [orange]
Not able to run test locally, because I don't have Mac OSX
I got 3 builds with e10s merge patch orange with this message
This has been failing on the e10s tinderbox for at least two weeks.  Looks it's just been ignored.......
Short story is that OOP xpcshell tests are broken beyond hope of easy repair for OS X.  They've never worked there.

Here's what's causing the hang

_RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL.
2010-06-30 15:42:05.858 plugin-container[71497:807] *** -[NSRecursiveLock unlock]: lock (<NSRecursiveLock: 0xa5ae30> '(null)') unlocked when not locked

Backtrace from there:
#0  0x965cc1a9 in _NSLockError ()
#1  0x9652aa6f in -[NSRecursiveLock unlock] ()
#2  0x91de2cee in _NSAppKitUnlock ()
#3  0x91df12c4 in +[NSScreen screens] ()
#4  0x91df20ff in +[NSScreen _zeroScreenHeight] ()
#5  0x91eea64b in -[NSEvent _cgsEventRecord] ()
#6  0x91f37085 in -[NSEvent _eventRefInternal] ()
#7  0x91f37002 in -[NSEvent _postAtStart:] ()
#8  0x04f05007 in nsAppShell::ProcessGeckoEvents (aInfo=0xa536d0) at /Users/cjones/mozilla/electrolysis/widget/src/cocoa/nsAppShell.mm:411
#9  0x9586c3c5 in CFRunLoopRunSpecific ()
#10 0x9586caa8 in CFRunLoopRunInMode ()
#11 0x909a92ac in RunCurrentEventLoopInMode ()
#12 0x909a90c5 in ReceiveNextEventCommon ()
#13 0x909a8f39 in BlockUntilNextEventMatchingListInMode ()
#14 0x91e1e6d5 in _DPSNextEvent ()
#15 0x91e1df88 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#16 0x91e16f9f in -[NSApplication run] ()
#17 0x04f03cbb in nsAppShell::Run (this=0xa536d0) at /Users/cjones/mozilla/electrolysis/widget/src/cocoa/nsAppShell.mm:747
#18 0x03cca23b in XRE_RunAppShell () at /Users/cjones/mozilla/electrolysis/toolkit/xre/nsEmbedFunctions.cpp:572
#19 0x04ffe00d in mozilla::ipc::MessagePumpForChildProcess::Run (this=0xa0f7c0, aDelegate=0xbffff470) at /Users/cjones/mozilla/electrolysis/ipc/glue/MessagePump.cpp:218
#20 0x0523ed42 in MessageLoop::RunInternal (this=0xbffff470) at /Users/cjones/mozilla/electrolysis/ipc/chromium/src/base/message_loop.cc:219
#21 0x0523ed59 in MessageLoop::RunHandler (this=0xbffff470) at /Users/cjones/mozilla/electrolysis/ipc/chromium/src/base/message_loop.cc:202
#22 0x0523edbd in MessageLoop::Run (this=0xbffff470) at /Users/cjones/mozilla/electrolysis/ipc/chromium/src/base/message_loop.cc:176
#23 0x03ccb15f in XRE_InitChildProcess (aArgc=1, aArgv=0xbffff668, aProcess=GeckoProcessType_Content) at /Users/cjones/mozilla/electrolysis/toolkit/xre/nsEmbedFunctions.cpp:453
#24 0x00001eed in main (argc=3, argv=0xbffff668) at /Users/cjones/mozilla/electrolysis/ipc/app/MozillaRuntimeMain.cpp:87


The content process is trying to run the app shell event loop, which transitively relies on a WindowServer connection through OS X APIs.  We don't have that when we hang.  But according to Apple[1], this only sometimes works from the console, is fragile regardless, and deprecated to boot.

So the options to fix are

 (0) Attempt to align the stars wrt [1] on the test machines to make the WindowServer work from the console, at least until Apple removes that functionality
 (1) Have the test machines not run xpcshell tests from the console, somehow
 (2) Wean content processes off nsAppShell, if only for OS X
 (3) Make an xpcshell subprocess type that doesn't use nsAppShell

All of these options suck, (1) perhaps least.  At any rate, this shouldn't block the m-c merge, so I'll throw together a patch to disable these tests on OS X. 

[1] http://developer.apple.com/mac/library/technotes/tn2005/tn2083.html#SECWINDOWSERVER
The ambiguous pronoun reference in the commit message is intentional.

This patch leaves a "FIXME/bug 575918" trail of breadcrumbs for when we can get the harness sorted out.
Assignee: nobody → jones.chris.g
Attachment #455291 - Flags: review?(benjamin)
Attachment #455291 - Flags: review?(benjamin) → review+
Summary: buildbot.slave.commands.TimeoutError: command timed out: 1200 seconds without output, killing pid 575 → When run from the console, xpcshell tests can't (reliably) launch content processes on OS X
Is the issue here just that the xpcshell tests aren't being run from the actual desktop? I thought all of our tests got run there, since it's the same buildbot instance running all of our test suites on the same machines, and all of our machines have real displays (real-looking to the OS, anyway). If so, we should just fix that, and declare that xpcshell tests need to be run with a display.
There's another one of these turds in dom/ipc/Makefile.in.
Must fix before considering anything multiprocess on desktop.
I suspect this is no longer an issue, let's see what try says:

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=4f30b0e64ab8
Attached patch undo disablingSplinter Review
A bit of a scattershot f?. Try suggests this was fixed somehow:

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=4f30b0e64ab8
Assignee: cjones.bugs → gavin.sharp
Status: NEW → ASSIGNED
Attachment #8542599 - Flags: feedback?(jduell.mcbugs)
Attachment #8542599 - Flags: feedback?(bzbarsky)
Attachment #8542599 - Flags: feedback?(benjamin)
Comment on attachment 8542599 [details] [diff] [review]
undo disabling

If it works on try, ship it. This was mostly ancient history when win/linux worked but mac was broken because of bundles or something like that.
Attachment #8542599 - Flags: review+
Attachment #8542599 - Flags: feedback?(jduell.mcbugs)
Attachment #8542599 - Flags: feedback?(bzbarsky)
Attachment #8542599 - Flags: feedback?(benjamin)
https://hg.mozilla.org/integration/fx-team/rev/8f91a31d4cec
Hardware: x86 → All
Target Milestone: --- → mozilla37
https://hg.mozilla.org/mozilla-central/rev/8f91a31d4cec
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: