Closed Bug 1701596 Opened 4 years ago Closed 3 years ago

TEST-UNEXPECTED-TIMEOUT pup | Launcher specs Puppeteer Puppeteer.launch userDataDir option (launcher.spec.ts) | expected PASS

Categories

(Remote Protocol :: CDP, defect)

defect

Tracking

(firefox-esr78 unaffected, firefox-esr91 wontfix, firefox88 wontfix, firefox89 wontfix, firefox90 wontfix, firefox91 wontfix, firefox94 wontfix, firefox95 wontfix, firefox96 fixed)

RESOLVED FIXED
96 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- wontfix
firefox88 --- wontfix
firefox89 --- wontfix
firefox90 --- wontfix
firefox91 --- wontfix
firefox94 --- wontfix
firefox95 --- wontfix
firefox96 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Regression)

Details

(Keywords: crash, regression)

Running the given Puppeteer test locally on MacOS always fails with a timeout because the shutdown hangs and the browser gets killed eventually by the terminator thread.

Steps to reproduce:

  1. Ensure you are on MacOS (haven't seen it on Linux yet)
  2. Modify it( to it.only( in launcher.spec.js
  3. Run the tests via mach puppeteer-test --subset -vv --setpref="toolkit.asyncshutdown.log=true"

The test will fail with the following output (I enabled async shutdown logs locally) while waiting for the shutdown of Firefox:

  0:32.51 pid:48141 console.warn: services.settings: main/cfr-fxa sync interrupted by shutdown
 0:32.62 pid:48141 console.warn: services.settings: main/cfr sync interrupted by shutdown
 0:32.80 pid:48141 JavaScript error: resource://gre/modules/DeferredTask.jsm, line 215: Error: Unable to arm timer, the object has been finalized.
 0:32.83 pid:48141 console.error: "Could not load engine ddg@search.mozilla.org: [Exception... \"AddonManager is not initialized\"  nsresult: \"0xc1f30001 (NS_ERROR_NOT_INITIALIZED)\"  location: \"JS frame :: resource://gre/modules/AddonManager.jsm :: callInstallListeners :: line 1530\"  data: no]"
 0:32.83 pid:48141 console.warn: SearchService: "_init: abandoning init due to shutting down"
 0:32.84 pid:48141 JavaScript error: resource://gre/modules/ExtensionParent.jsm, line 1738: AbortError: IOUtils: Shutting down and refusing additional I/O tasks
 0:32.84 pid:48141 JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 575: uncaught exception: 2147500036
 0:32.84 pid:48141 JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 575: uncaught exception: 2147500036
 0:32.84 pid:48141 JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 575: uncaught exception: 2147500036
 0:32.84 pid:48141 JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 575: uncaught exception: 2147500036
 0:32.84 pid:48141 JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 575: uncaught exception: 2147500036
 0:32.84 pid:48141 JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 575: uncaught exception: 2147500036
 0:32.84 pid:48141 JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 575: uncaught exception: 2147500036
 0:32.84 pid:48141 JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 575: uncaught exception: 2147500036
 0:32.84 pid:48141 JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 575: uncaught exception: 2147500036
 0:32.84 pid:48141 JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 575: uncaught exception: 2147500036
 0:32.84 pid:48141 JavaScript error: resource://gre/modules/AsyncShutdown.jsm, line 575: uncaught exception: 2147500036
 0:32.84 pid:48141 JavaScript error: resource://gre/modules/ExtensionParent.jsm, line 1738: AbortError: IOUtils: Shutting down and refusing additional I/O tasks
 0:42.81 pid:48141 JavaScript error: resource:///modules/EveryWindow.jsm, line 37: NS_ERROR_UNEXPECTED: Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsIWindowMediator.getEnumerator]

After a while the hang monitor kills Firefox: bp-6d93a447-e64e-4887-a596-16f130210329

Here the excerpt from the crashing thread:

MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)
Crash Reason 	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS

0 	libsystem_platform.dylib 	_platform_strlen 		context
1 	XUL 	mozilla::CycleCollectedJSContext::AfterProcessTask(unsigned int) 	xpcom/base/CycleCollectedJSContext.cpp:462 	frame_pointer
2 	XUL 	XPCJSContext::AfterProcessTask(unsigned int) 	js/xpconnect/src/XPCJSContext.cpp:1462 	frame_pointer
3 	XUL 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1192 	frame_pointer
4 	XUL 	NS_ProcessPendingEvents(nsIThread*, unsigned int) 	xpcom/threads/nsThreadUtils.cpp:496 	frame_pointer
5 	XUL 	mozilla::ShutdownXPCOM(nsIServiceManager*) 	xpcom/build/XPCOMInit.cpp:625 	frame_pointer
6 	XUL 	ScopedXPCOMStartup::~ScopedXPCOMStartup() 	toolkit/xre/nsAppRunner.cpp:1669 	frame_pointer
7 	XUL 	XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:5554 	frame_pointer
8 	XUL 	XRE_main(int, char**, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:5600 	frame_pointer
9 	firefox 	main 	browser/app/nsBrowserApp.cpp:347 	frame_pointer
10 	libdyld.dylib 	start 		frame_pointer

Nika, can you see what might go wrong here? Given that the test starts Firefox and immediately triggers a shutdown via Services.startup.quit(Ci.nsIAppStartup.eAttemptQuit) I assume that maybe some startup logic gets into a weird state?

Flags: needinfo?(nika)

Any chance you could attach a debugger and see where exactly the main thread is hanging? The crash report isn't giving very specific information.

redirecting to :whimboo to answer :mccr8's question :-)

Flags: needinfo?(nika) → needinfo?(hskupin)

Tried to poke at this today, a few additional details:

  • this was recently regressed by Bug 1697010, where we flipped back the toolkit.shutdown.fastShutdownStage pref to 1 on Nightly. This doesn't tell us much, the shutdown happens later because of that, so obviously whatever race condition we run into is impacted by this.
  • logged the list of pending tasks in AsyncShutdown when the hang occurs:
"ClientManagerService: start destroying IPC actors early",
"IOUtils Blocker (profile-before-change)",
"IOUtils Blocker (xpcom-will-shutdown)",
"ServiceWorkerRegistrar: Flushing data",
"MediaShutdownManager: shutdown",
"ServiceWorkerShutdownBlocker: shutting down Service Workers",
"Places Clients shutdown",
"Places Connection shutdown",
"ContentParent: id=128e12000",
[... omitted 10+ other ContentParent: id=someId items]

It seems really linked to some file IO, because if we modify the test to stop using a profile stored in a temporary folder, the hang no longer happens. Will try to attach a debugger tomorrow.

See Also: → 1697010

I was also curious to see why this didn't happen on the puppeteer CI, which runs the tests against Firefox on macos.
Turns out this doesn't happen with real Nightly builds.

Even locally, if you mach package on macos, you'll get 2 binaries:

  • dist/Nightly.app (around 400MB)
  • dist/firefox/Nightly.app (around 450MB)

The test works with dist/firefox/Nightly.app, but hangs with dist/Nightly.app.
That's why we don't see this on puppeteer CI for now.

(warning I've never debugged C++ so I might need help to get you the info you want)

:mccr8 I managed to attach a debugger (I used lldb).
I try to only attach after the hang occurs, basically I start the test, wait for firefox's window to disappear (and then some) and then attach.
But I end up in various frames, I tried ~10 times, I always seem to get different results.

I am adding some outputs for thread backtrace I got, let me know if I should do something different here:
Example 1:

* thread #1, name = 'MainThread', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x0000000115015781 XUL`js::CheckContextLocal::check(this=0x0000000108f35100) const at ProtectedData.cpp:48 [opt]
   45  	  MOZ_ASSERT(id == ThreadId::ThisThreadId());
   46  	}
   47  	
-> 48  	void CheckContextLocal::check() const {
   49  	  if (!cx_->isInitialized()) {
   50  	    return;
   51  	  }

backtrace:

* thread #1, name = 'MainThread', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x0000000115015781 XUL`js::CheckContextLocal::check(this=0x0000000108f35100) const at ProtectedData.cpp:48 [opt]
    frame #1: 0x000000011575c4fb XUL`JSObject* js::gc::GCRuntime::tryNewTenuredThing<JSObject, (js::AllowGC)0>(JSContext*, js::gc::AllocKind, unsigned long) [inlined] js::ProtectedData<js::CheckContextLocal, js::gc::FreeLists*>::ref(this=<unavailable>) const at ProtectedData.h:149:13 [opt]
    frame #2: 0x000000011575c4e0 XUL`JSObject* js::gc::GCRuntime::tryNewTenuredThing<JSObject, (js::AllowGC)0>(JSContext*, js::gc::AllocKind, unsigned long) [inlined] js::ProtectedData<js::CheckContextLocal, js::gc::FreeLists*>::operator js::gc::FreeLists* const&(this=<unavailable>) const at ProtectedData.h:93 [opt]
    frame #3: 0x000000011575c4e0 XUL`JSObject* js::gc::GCRuntime::tryNewTenuredThing<JSObject, (js::AllowGC)0>(JSContext*, js::gc::AllocKind, unsigned long) [inlined] JSContext::freeLists(this=0x0000000108f35000) at JSContext.h:239 [opt]
    frame #4: 0x000000011575c4e0 XUL`JSObject* js::gc::GCRuntime::tryNewTenuredThing<JSObject, (js::AllowGC)0>(cx=0x0000000108f35000, kind=OBJECT2, thingSize=<unavailable>) at Allocator.cpp:352 [opt]
    frame #5: 0x00000001156f4abb XUL`JSObject* js::gc::GCRuntime::tryNewTenuredObject<(js::AllowGC)0>(cx=0x0000000108f35000, kind=OBJECT2, thingSize=40, nDynamicSlots=0) at Allocator.cpp:145:19 [opt]
    frame #6: 0x00000001151c4e22 XUL`js::NewObjectCache::newObjectFromHit(this=0x0000000108f2abf0, cx=0x0000000108f35000, entryIndex=<unavailable>, heap=128) at Caches-inl.h:64:50 [opt]
    frame #7: 0x00000001151c49bb XUL`js::NewObjectWithGivenTaggedProto(cx=<unavailable>, clasp=0x0000000119b2bc80, proto=<unavailable>, allocKind=<unavailable>, newKind=GenericObject, objectFlags=(flags_ = 0)) at JSObject.cpp:834:17 [opt]
    frame #8: 0x0000000115488a5f XUL`JS_NewObjectWithGivenProto(JSContext*, JSClass const*, JS::Handle<JSObject*>) [inlined] JSObject* js::NewObjectWithGivenTaggedProto<(js::NewObjectKind)0>(cx=0x0000000108f35000, clasp=<unavailable>, proto=Handle<js::TaggedProto> @ r14) at JSObject-inl.h:369:10 [opt]
    frame #9: 0x0000000115488a13 XUL`JS_NewObjectWithGivenProto(JSContext*, JSClass const*, JS::Handle<JSObject*>) [inlined] js::NewObjectWithGivenProto(cx=0x0000000108f35000, clasp=<unavailable>, proto=JS::HandleObject @ r14) at JSObject-inl.h:399 [opt]
    frame #10: 0x0000000115488a13 XUL`JS_NewObjectWithGivenProto(cx=0x0000000108f35000, clasp=0x0000000119b2bc80, proto=<unavailable>) at jsapi.cpp:1929 [opt]
    frame #11: 0x0000000110d198c2 XUL`mozilla::dom::BindingJSObjectCreator<mozilla::dom::IdleDeadline>::CreateObject(this=0x00007ffeebcddec8, aCx=<unavailable>, aClass=<unavailable>, aProto=<unavailable>, aNative=0x00000001415b75b0, aReflector=<unavailable>) at BindingUtils.h:2637:20 [opt]
    frame #12: 0x0000000110d19102 XUL`mozilla::dom::IdleDeadline_Binding::Wrap(aCx=0x0000000108f35000, aObject=0x00000001415b75b0, aCache=0x00000001415b75b0, aGivenProto=<unavailable>, aReflector=<unavailable>) at IdleDeadlineBinding.cpp:354:11 [opt]
    frame #13: 0x00000001109760e7 XUL`mozilla::dom::IdleDeadline::WrapObject(JSContext*, JS::Handle<JSObject*>) [inlined] JSObject* mozilla::dom::IdleDeadline_Binding::Wrap<mozilla::dom::IdleDeadline>(aCx=<unavailable>, aObject=<unavailable>, aGivenProto=<unavailable>) at IdleDeadlineBinding.h:43:12 [opt]
    frame #14: 0x00000001109760ad XUL`mozilla::dom::IdleDeadline::WrapObject(this=<unavailable>, aCx=<unavailable>, aGivenProto=<unavailable>) at IdleDeadline.cpp:47 [opt]
    frame #15: 0x0000000111862c36 XUL`bool mozilla::dom::binding_detail::DoGetOrCreateDOMReflector<mozilla::dom::IdleDeadline, (mozilla::dom::binding_detail::GetOrCreateReflectorWrapBehavior)0>(cx=0x0000000108f35000, value=0x00000001415b75b0, givenProto=<unavailable>, rval=<unavailable>) at BindingUtils.h:1046:18 [opt]
    frame #16: 0x000000011167d638 XUL`mozilla::dom::IdleRequestCallback::Call(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, mozilla::dom::IdleDeadline&, mozilla::ErrorResult&) [inlined] bool mozilla::dom::GetOrCreateDOMReflector<mozilla::dom::IdleDeadline>(cx=<unavailable>, value=0x00000001415b75b0, rval=<unavailable>, givenProto=<unavailable>) at BindingUtils.h:1112:10 [opt]
    frame #17: 0x000000011167d629 XUL`mozilla::dom::IdleRequestCallback::Call(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, mozilla::dom::IdleDeadline&, mozilla::ErrorResult&) [inlined] mozilla::dom::GetOrCreateDOMReflectorHelper<mozilla::dom::IdleDeadline, false>::GetOrCreate(cx=<unavailable>, value=0x00000001415b75b0, givenProto=<unavailable>, rval=<unavailable>) at BindingUtils.h:1691 [opt]
    frame #18: 0x000000011167d629 XUL`mozilla::dom::IdleRequestCallback::Call(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, mozilla::dom::IdleDeadline&, mozilla::ErrorResult&) [inlined] bool mozilla::dom::GetOrCreateDOMReflector<mozilla::dom::IdleDeadline>(cx=<unavailable>, value=0x00000001415b75b0, rval=<unavailable>, givenProto=<unavailable>) at BindingUtils.h:1699 [opt]
    frame #19: 0x000000011167d629 XUL`mozilla::dom::IdleRequestCallback::Call(this=0x00000001415b68c0, cx=0x00007ffeebcde338, aThisVal=Handle<JS::Value> @ 0x00007ffeebcde008, deadline=0x00000001415b75b0, aRv=0x00007ffeebcde3c0) at WindowBinding.cpp:865 [opt]
    frame #20: 0x000000011087c7df XUL`mozilla::dom::IdleRequestCallback::Call(this=0x00000001415b68c0, deadline=0x00000001415b75b0, aRv=0x00007ffeebcde3c0, aExecutionReason="ChromeUtils::IdleDispatch handler", aExceptionHandling=eReportExceptions, aRealm=0x0000000000000000) at WindowBinding.h:782:12 [opt]
    frame #21: 0x000000011087bff4 XUL`mozilla::dom::(anonymous namespace)::IdleDispatchRunnable::Run() [inlined] mozilla::dom::IdleRequestCallback::Call(this=0x00000001415b68c0, deadline=<unavailable>, aExecutionReason=<unavailable>) at WindowBinding.h:795:12 [opt]
    frame #22: 0x000000011087bfb7 XUL`mozilla::dom::(anonymous namespace)::IdleDispatchRunnable::Run(this=<unavailable>) at ChromeUtils.cpp:417 [opt]
    frame #23: 0x000000010f0cb276 XUL`mozilla::RunnableTask::Run(this=0x000000010e8b5e20) at TaskController.cpp:470:16 [opt]
    frame #24: 0x000000010f0c927d XUL`mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(this=0x00000001084cb000, aProofOfLock=<unavailable>) at TaskController.cpp:754:26 [opt]
    frame #25: 0x000000010f0c815a XUL`mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(this=0x00000001084cb000, aProofOfLock=0x00007ffeebcdeae8) at TaskController.cpp:632:15 [opt]
    frame #26: 0x000000010f0c8209 XUL`mozilla::TaskController::ProcessPendingMTTask(this=0x00000001084cb000, aMayWait=false) at TaskController.cpp:393:36 [opt]
    frame #27: 0x000000010f0cf0d7 XUL`mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_4>::Run() [inlined] mozilla::TaskController::InitializeInternal(this=<unavailable>)::$_4::operator()() const at TaskController.cpp:133:37 [opt]
    frame #28: 0x000000010f0cf0c4 XUL`mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_4>::Run(this=<unavailable>) at nsThreadUtils.h:534 [opt]
    frame #29: 0x000000010f0e103e XUL`nsThread::ProcessNextEvent(this=0x00000001042352e0, aMayWait=<unavailable>, aResult=0x00007ffeebcdec6f) at nsThread.cpp:1155:16 [opt]
    frame #30: 0x000000010f0de1f6 XUL`NS_ProcessPendingEvents(aThread=0x00000001042352e0, aTimeout=4294967295) at nsThreadUtils.cpp:496:19 [opt]
    frame #31: 0x000000010f12bcd2 XUL`mozilla::ShutdownXPCOM(aServMgr=0x000000010424e580) at XPCOMInit.cpp:625:5 [opt]
    frame #32: 0x0000000114db661c XUL`ScopedXPCOMStartup::~ScopedXPCOMStartup(this=0x000000010420e3c0) at nsAppRunner.cpp:1669:5 [opt]
    frame #33: 0x0000000114dc754f XUL`XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [inlined] ScopedXPCOMStartup::~ScopedXPCOMStartup(this=0x000000010420e3c0) at nsAppRunner.cpp:1651:43 [opt]
    frame #34: 0x0000000114dc7547 XUL`XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [inlined] mozilla::DefaultDelete<ScopedXPCOMStartup>::operator(aPtr=0x000000010420e3c0)(ScopedXPCOMStartup*) const at UniquePtr.h:463 [opt]
    frame #35: 0x0000000114dc7547 XUL`XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [inlined] mozilla::UniquePtr<ScopedXPCOMStartup, mozilla::DefaultDelete<ScopedXPCOMStartup> >::reset(this=<unavailable>, aPtr=0x0000000000000000) at UniquePtr.h:305 [opt]
    frame #36: 0x0000000114dc7534 XUL`XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [inlined] mozilla::UniquePtr<ScopedXPCOMStartup, mozilla::DefaultDelete<ScopedXPCOMStartup> >::operator=(this=<unavailable>, (null)=0x0000000000000000) at UniquePtr.h:275 [opt]
    frame #37: 0x0000000114dc7534 XUL`XREMain::XRE_main(this=0x00007ffeebcdee70, argc=<unavailable>, argv=<unavailable>, aConfig=<unavailable>) at nsAppRunner.cpp:5554 [opt]
    frame #38: 0x0000000114dc7ec2 XUL`XRE_main(argc=<unavailable>, argv=<unavailable>, aConfig=<unavailable>) at nsAppRunner.cpp:5600:21 [opt]
    frame #39: 0x0000000103f1df94 firefox`main [inlined] do_main(argc=<unavailable>, argv=0x00007ffeebcdf440, envp=0x00007ffeebcdf480) at nsBrowserApp.cpp:220:22 [opt]
    frame #40: 0x0000000103f1df04 firefox`main(argc=7, argv=<unavailable>, envp=0x00007ffeebcdf480) at nsBrowserApp.cpp:347 [opt]
    frame #41: 0x00007fff6aba93d5 libdyld.dylib`start + 1

Example 2:

* thread #1, name = 'MainThread', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x0000000110794ba0 XUL`nsContentUtils::IsSafeToRunScript() at nsContentUtils.cpp:5633 [opt]
   5630	  AddScriptRunner(runnable.forget());
   5631	}
   5632	
-> 5633	/* static */ bool nsContentUtils::IsSafeToRunScript() {
   5634	  MOZ_ASSERT(NS_IsMainThread(),
   5635	             "This static variable only makes sense on the main thread!");
   5636	  return sScriptBlockerCount == 0;

backtrace:

* thread #1, name = 'MainThread', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x0000000110794ba0 XUL`nsContentUtils::IsSafeToRunScript() at nsContentUtils.cpp:5633 [opt]
    frame #1: 0x00000001131a63f4 XUL`mozilla::dom::AutoEntryScript::AutoEntryScript(nsIGlobalObject*, char const*, bool) [inlined] mozilla::dom::(anonymous namespace)::AssertIfNotSafeToRunScript() at AutoEntryScript.cpp:36:7 [opt]
    frame #2: 0x00000001131a63ef XUL`mozilla::dom::AutoEntryScript::AutoEntryScript(this=0x00007ffeebcdd0c0, aGlobalObject=0x000000010424dc10, aReason="XPCWrappedJS QueryInterface", aIsMainThread=true) at AutoEntryScript.cpp:84 [opt]
    frame #3: 0x000000010ff187bc XUL`nsXPCWrappedJS::DelegatedQueryInterface(this=0x00000001410eacf0, aIID=0x00007ffeebcdd380, aInstancePtr=0x00007ffeebcdd470) at XPCWrappedJSClass.cpp:339:19 [opt]
    frame #4: 0x000000010fedf13b XUL`XPCConvert::JSObject2NativeInterface(cx=0x0000000108f35000, dest=0x00007ffeebcdd470, src=<unavailable>, iid=<unavailable>, aOuter=0x0000000000000000, pErr=0x00007ffeebcdd35c) at XPCConvert.cpp:1073:26 [opt]
    frame #5: 0x000000010fedd8ef XUL`XPCConvert::JSData2Native(cx=0x0000000108f35000, d=0x00007ffeebcdd470, s=<unavailable>, type=<unavailable>, iid=<unavailable>, arrlen=<unavailable>, pErr=0x00007ffeebcdd35c) at XPCConvert.cpp:764:14 [opt]
    frame #6: 0x000000010ff3ceb5 XUL`CallMethodHelper::ConvertIndependentParam(this=0x00007ffeebcdd428, i=<unavailable>) at XPCWrappedNative.cpp:1545:8 [opt]
    frame #7: 0x000000010ff26c07 XUL`CallMethodHelper::Call() at XPCWrappedNative.cpp:1471:17 [opt]
    frame #8: 0x000000010ff26bb9 XUL`CallMethodHelper::Call(this=0x00007ffeebcdd428) at XPCWrappedNative.cpp:1168 [opt]
    frame #9: 0x000000010ff26a59 XUL`XPCWrappedNative::CallMethod(ccx=0x00007ffeebcdd5e0, mode=<unavailable>) at XPCWrappedNative.cpp:1142:23 [opt]
    frame #10: 0x000000010ff287e1 XUL`XPC_WN_CallMethod(cx=0x0000000108f35000, argc=<unavailable>, vp=<unavailable>) at XPCWrappedNativeJSOps.cpp:925:10 [opt]
    frame #11: 0x00001fdf64c8429b
    frame #12: 0x0000000118c6d348 XUL`emptyObjectSlotsHeaders + 8

Again I've not seen any consistent result so far, it's entirely possible I'm not doing this correctly.

Flags: needinfo?(continuation)

That's strange. Those stacks just look like we're running JS and don't tell me anything more specific. I'm not sure where to go from here.

Flags: needinfo?(continuation)

(In reply to Julian Descottes [:jdescottes] from comment #5)

Even locally, if you mach package on macos, you'll get 2 binaries:

  • dist/Nightly.app (around 400MB)
  • dist/firefox/Nightly.app (around 450MB)

The test works with dist/firefox/Nightly.app, but hangs with dist/Nightly.app.

And dist/firefox/Nightly.app is used for distribution / download? Do both applications have the same value for toolkit.shutdown.fastShutdownStage set? Also I wonder why we generate two app bundles at all?

Andrew, do you know?

Note that when debugging Firefox and using an actual debug build the value of the pref will be 0 and not 1:
https://searchfox.org/mozilla-central/rev/d7e344e956d9da2808ea33e1fe0f963ed10c142d/modules/libpref/init/StaticPrefList.yaml#10307-10316

Flags: needinfo?(hskupin) → needinfo?(ahal)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #8)

Do both applications have the same value for toolkit.shutdown.fastShutdownStage set?

Forgot to mention it, but yes both apps have the same pref (using a clean dedicated profile each time)

When running the test in isolation (it.only + --subset), adding a delay before closing the browser allows the test to pass. But I still get a failure when running the whole test file.

And again, if I replace the .app file by the one created by mach package, the test works fine.

From a conversation with :whimboo on Elements, in that test, in this test, the browser is closed by :

  close() {
    Services.startup.quit(Ci.nsIAppStartup.eAttemptQuit);
  }

https://searchfox.org/mozilla-central/rev/d7e344e956d9da2808ea33e1fe0f963ed10c142d/remote/cdp/domains/parent/Browser.jsm#37

which is called from

  close(): Promise<void> {
    if (this._closed) return Promise.resolve();
    if (this._tempDirectory) {
      this.kill();
    } else if (this.connection) {
      // Attempt to close the browser gracefully
      this.connection.send('Browser.close').catch((error) => {
        debugError(error);
        this.kill();
      });
    }
    // Cleanup this listener last, as that makes sure the full callback runs. If we
    // perform this earlier, then the previous function calls would not happen.
    helper.removeEventListeners(this._listeners);
    return this._processClosing;
  }

https://searchfox.org/mozilla-central/rev/d7e344e956d9da2808ea33e1fe0f963ed10c142d/remote/test/puppeteer/src/node/BrowserRunner.ts#135-138

Worth noting, we don't use the if (this._tempDirectory) branch here, because the userDataDir option set in this test results in passing the temporary folder as a --profile argument. So even though it's a temporary directory, it's not the "generic" _tempDirectory expected by the harness.

tldr; we involve the Browser domain's close implementation here.

I tried switching the implementation from Ci.nsIAppStartup.eAttemptQuit to Ci.nsIAppStartup.eForceQuit, but we still get the same hang.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #8)

Also I wonder why we generate two app bundles at all?

Andrew, do you know?

Nope, news to me!

Flags: needinfo?(ahal)

Mike could you please have a look at comment 5? Any idea why the test fails only in one case? Is there some kind of issue with packaging?

Flags: needinfo?(mh+mozilla)

Is it related to things like the addon manager not being able to load non-packed addons (which is the case in the non-packaged firefox)?

Flags: needinfo?(mh+mozilla)

For Puppeteer tests we don't have to use any specific add-ons. The functionality is all baked into the Remote Agent component which is part of the Remote Protocol that partially implements the Chrome DevTools Protocol.

Maybe there is a problem with distributed add-ons? But why doesn't that appear for other jobs on try and local. The difference here might be that these specific tests are quickly closing Firefox after startup.

Actually what I forgot... in comment 4 Julian noticed that it was regressed by bug 1697010 where toolkit.shutdown.fastShutdownStage was set to 1 in Nightly. Looks like that Doug might be a better person to ask here.

Flags: needinfo?(dothayer)
Keywords: regression
Regressed by: 1697010

So bug 1697010 actually reverts a change to match beta/release, which is important to catch late shutdown crashes. Seems like we exactly have such a situation here.

See Also: → 1713030

I noticed that these specific Puppeteer unit tests do not use a Puppeteer prepared profile, which might be the reason here. I'm going to work on a fix over on bug 1713030.

Nevertheless the shutdown hang is something else that might be worth investigating.

Depends on: 1713030
Flags: needinfo?(dothayer)
See Also: 1713030

With the fix on bug 1713030 this should no longer be a problem. It's still interesting that launching Firefox via Puppeteer without having any preferences set, is causing this strange behavior. But given that we never got a reply in the last months I'm going to just close this bug and call it as fixed.

Assignee: nobody → hskupin
Status: NEW → RESOLVED
Closed: 3 years ago
Component: XPCOM → CDP
Product: Core → Remote Protocol
Resolution: --- → FIXED
Target Milestone: --- → 96 Branch
You need to log in before you can comment on or make changes to this bug.