Running SDK tests on debug builds hangs after the SDK's test runner calls nsIAppStartup::quit to quit the browser after it finishes running tests. The problem is intermittent but frequent. It happens almost every time tests are run on mozilla-central and the aurora branch (look for the red "JP"): http://tbpl.mozilla.org/?tree=Firefox&noignore=1 http://tbpl.mozilla.org/?tree=Mozilla-Aurora&noignore=1 It doesn't happen on optimized builds. I don't have a reduced testcase. But I can reproduce locally on both Mac and Linux. The problem appears to be in Firefox rather than the SDK. Here's a stack trace: #0 0x00007fff8103bf8a in __semwait_signal () #1 0x00007fff81062d46 in pthread_join () #2 0x00000001054d5c12 in PR_JoinThread (thred=0x1182c7080) at /Users/myk/Mozilla/source/nsprpub/pr/src/pthreads/ptthread.c:560 #3 0x0000000101187c3b in nsPSMBackgroundThread::requestExit (this=0x1182c6e70) at /Users/myk/Mozilla/source/security/manager/ssl/src/nsPSMBackgroundThread.cpp:88 #4 0x000000010119650d in nsNSSComponent::DoProfileChangeNetTeardown (this=0x107fc95c0) at /Users/myk/Mozilla/source/security/manager/ssl/src/nsNSSComponent.cpp:2475 #5 0x000000010119cc5d in nsNSSComponent::Observe (this=0x107fc95c0, aSubject=0x121db44d0, aTopic=0x101dac730 "profile-change-net-teardown", someData=0x101f9d040) at /Users/myk/Mozilla/source/security/manager/ssl/src/nsNSSComponent.cpp:2294 #6 0x0000000101645826 in nsObserverList::NotifyObservers (this=0x106232b40, aSubject=0x121db44d0, aTopic=0x101dac730 "profile-change-net-teardown", someData=0x101f9d040) at /Users/myk/Mozilla/source/xpcom/ds/nsObserverList.cpp:130 #7 0x00000001016467f4 in nsObserverService::NotifyObservers (this=0x105e08250, aSubject=0x121db44d0, aTopic=0x101dac730 "profile-change-net-teardown", someData=0x101f9d040) at /Users/myk/Mozilla/source/xpcom/ds/nsObserverService.cpp:182 #8 0x000000010003e1d8 in nsXREDirProvider::DoShutdown (this=0x7fff5fbfecd0) at /Users/myk/Mozilla/source/toolkit/xre/nsXREDirProvider.cpp:795 #9 0x000000010002f371 in ScopedXPCOMStartup::~ScopedXPCOMStartup (this=0x7fff5fbff000) at /Users/myk/Mozilla/source/toolkit/xre/nsAppRunner.cpp:1103 #10 0x000000010003646e in XRE_main (argc=5, argv=0x7fff5fbff400, aAppData=0x105a16150) at /Users/myk/Mozilla/source/toolkit/xre/nsAppRunner.cpp:3784 #11 0x000000010000128f in main (argc=5, argv=0x7fff5fbff400) at /Users/myk/Mozilla/source/browser/app/nsBrowserApp.cpp:158 The problem seems connected to the duration of the test run, since I don't see it when I run tests for the packages in the SDK, which contain thousands of tests that take many seconds to run, but I do see it when I run tests for the example addons, which contain only a couple tests each and take very little time to run. The problem also seems associated with the state of the profile. The SDK's test runner constructs a minimal profile with just an extensions/ dir containing a test-running extension and a user.js file with some preferences. The problem is frequent the first time I run tests against the profile and rare if I reuse the profile (which is no longer minimal) to rerun tests. All of which suggests to me that the hang occurs when the test runner finishes so quickly that quitting Firefox races some essential startup/profile creation activity.
Sounds like you're hitting bug 430260.
(In reply to comment #1) > Sounds like you're hitting bug 430260. Hmm, indeed, although it isn't clear whether this bug should depend on that one or just directly on bug 468736, whose most recent patch purports to fix the problem.
Oh, yeah, looks like one could be duped to the other, that just happened to be the one in my urlbar history.
It looks like this can happen on optimized builds too on occasion: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303405832.1303407144.14219.gz&fulltext=1
Ehsan's fix for bug 468736 seems to have fixed this bug: http://tbpl.mozilla.org/?rev=05fe748d2031&noignore=1 (Windows 7 Debug and Windows XP Debug are still failing, but that looks like a different issue, which we're tracking in bug 647126.)
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
The fix for bug 468736 was backed out because it might have caused a Tshutdown regression, so reopening this bug. Let's think about a workaround for this problem if that bug takes longer to fix.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
We hit something similar with the PGO profiling run (bug 420187), and we worked around it by simply waiting a bit longer before quitting the browser. You might try throwing in a 5-10 second delay to see if things shake out well enough to not cause the shutdown hang.
Created attachment 528641 [details] [diff] [review] establish minimum run time Here's a potential workaround that establishes a minimum run time in an attempt to work around the problem.
Assignee: nobody → myk
Attachment #528641 - Flags: review?(warner-bugzilla)
Comment on attachment 528641 [details] [diff] [review] establish minimum run time Looks ok to me. I despise hard-coded delays, and I'm sure that this isn't the last we've heard of this failure (sooner or later the tests will run on a system that's busy, or slow, and 12.321s won't be enough). And it's disappointing that this patch increases the overall 'cfx testall' time on my box from 90s to 113s (and makes a single-addon test-edit-repeat loop kinda painful). But if that's what we need to get this passing right now, I'm ok with that. r+
Attachment #528641 - Flags: review?(warner-bugzilla) → review+
I despise hard-coded delays too, and I'm not even sure 12.321s is enough in this case, but I don't have a better idea right now, and the underlying platform bug is proving harder than expected to fix (not only the actual fixing of it but also finding someone to do the work, as the last person working on it just recused himself). Checked in this workaround, but I'll wait to see results before actually marking this bug fixed: https://github.com/mozilla/addon-sdk/commit/dd60bc5727e9554027c3fd10aff8615da01ee03c
Windows debug builds are still failing pretty consistently, so it looks like the workaround was not effective (or the tests are failing for some other reason).
What code invokes the jetpack tests? Maybe they're just really slow on windows debug because of the trace-malloc stuff, and we should be running them with NS_TRACE_MALLOC_DISABLE_STACKS=1 in the environment (like we do in automation.py and runxpcshelltests.py)?
The initial cause of this was certainly bug 468736, but its resolution may have exposed additional issues. I'm game for seeing if disabling trace-malloc will help; it'd just require us to set that flag in runner.py after line 285: https://github.com/mozilla/addon-sdk/blob/master/python-lib/cuddlefish/runner.py#L285 I'll cook up a patch.
Created attachment 544650 [details] [diff] [review] disable trace-malloc, revert min run time Here's a patch that disables trace-malloc, as dbaron suggested, and reverts the earlier patch to establish a minimum run time, which didn't seem to help, has unwelcome side-effects, and is no longer necessary now that bug 468736 has been fixed.
Comment on attachment 544650 [details] [diff] [review] disable trace-malloc, revert min run time Looks good to me!
Attachment #544650 - Flags: review?(warner-bugzilla) → review+
https://github.com/mozilla/addon-sdk/commit/9642e4b85f24449baeecc7e17e9b88774cadc42c Leaving this open pending verification that debug builds are no longer hanging on shutdown.
Verified: the debug builds are no longer hanging on shutdown. Since the fix was to the SDK, I am moving this to the SDK product.
Status: REOPENED → RESOLVED
Last Resolved: 7 years ago → 7 years ago
Component: Startup and Profile System → General
Product: Toolkit → Add-on SDK
QA Contact: startup → general
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.