Closed Bug 1754104 Opened 2 years ago Closed 2 months ago

Intermittent Assertion failure: false (Late preference writes should be avoided.), at /builds/worker/checkouts/gecko/modules/libpref/Preferences.cpp:1597

Categories

(Core :: Preferences: Backend, defect, P5)

defect

Tracking

()

RESOLVED INCOMPLETE
Performance Impact low

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Depends on 2 open bugs)

Details

(Keywords: assertion, intermittent-failure)

Attachments

(1 obsolete file)

Filed by: nfay [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=367007673&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Jiq4l40pRPGsw8folGyEBQ/runs/0/artifacts/public/logs/live_backing.log


[task 2022-02-07T22:27:25.415Z] 22:27:25     INFO - TEST-START | browser/base/content/test/plugins/browser_private_browsing_eme_persistent_state.js
<...>
[task 2022-02-07T22:27:33.262Z] 22:27:33     INFO - GECKO(8376) | [Parent 5248, IPDL Background] WARNING: sdb: '0.031000s: completed', file /builds/worker/checkouts/gecko/dom/quota/ActorsParent.cpp:3789
[task 2022-02-07T22:27:33.262Z] 22:27:33     INFO - GECKO(8376) | [Parent 5248, IPDL Background] WARNING: ls: '0.031000s: completed', file /builds/worker/checkouts/gecko/dom/quota/ActorsParent.cpp:3789
[task 2022-02-07T22:27:33.547Z] 22:27:33     INFO - GECKO(8376) | ###!!! [Parent][PGPUParent] Error: RunMessage(msgname=PGPU::Msg_ShutdownVR) Channel closing: too late to send/recv, messages will be lost
[task 2022-02-07T22:27:33.551Z] 22:27:33     INFO - GECKO(8376) | [GPU 6720, Main Thread] WARNING: IPC message 'PGPU::Msg_FOGData' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:499
[task 2022-02-07T22:27:33.552Z] 22:27:33     INFO - GECKO(8376) | [RDD 8776, Main Thread] WARNING: IPC message 'PRDD::Msg_FOGData' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:499
[task 2022-02-07T22:27:33.558Z] 22:27:33     INFO - GECKO(8376) | [RDD 8776, Main Thread] WARNING: NS_ENSURE_TRUE(Preferences::InitStaticMembers()) failed: file /builds/worker/checkouts/gecko/modules/libpref/Preferences.cpp:4457
[task 2022-02-07T22:27:33.567Z] 22:27:33     INFO - GECKO(8376) | Destroying context 0000017DD8387000 surface 0000017DD666B430 on display 0000017DD831E600
[task 2022-02-07T22:27:33.587Z] 22:27:33     INFO - GECKO(8376) | [GPU 6720, Main Thread] WARNING: NS_ENSURE_TRUE(Preferences::InitStaticMembers()) failed: file /builds/worker/checkouts/gecko/modules/libpref/Preferences.cpp:4457
[task 2022-02-07T22:27:33.621Z] 22:27:33     INFO - GECKO(8376) | pref_SetPref: Attempt to write pref media.gmp-manager.buildID after XPCOMShutdownThreads started.
[task 2022-02-07T22:27:33.622Z] 22:27:33     INFO - GECKO(8376) | 0 setString(aKey = ""media.gmp-manager.buildID"", aVal = ""20220207215603"") ["resource://gre/modules/GMPUtils.jsm":219:19]
[task 2022-02-07T22:27:33.622Z] 22:27:33     INFO - GECKO(8376) |     this = [object Object]
[task 2022-02-07T22:27:33.623Z] 22:27:33     INFO - GECKO(8376) | 1 _versionchangeOccurred() ["resource://gre/modules/GMPInstallManager.jsm":352:13]
[task 2022-02-07T22:27:33.623Z] 22:27:33     INFO - GECKO(8376) |     this = [object Object]
[task 2022-02-07T22:27:33.624Z] 22:27:33     INFO - GECKO(8376) | 2 simpleCheckAndInstall() ["resource://gre/modules/GMPInstallManager.jsm":365:13]
[task 2022-02-07T22:27:33.625Z] 22:27:33     INFO - GECKO(8376) |     this = [object Object]
[task 2022-02-07T22:27:33.625Z] 22:27:33     INFO - GECKO(8376) | 3 checkForUpdates/<() ["resource://gre/modules/addons/GMPProvider.jsm":541:26]
[task 2022-02-07T22:27:33.626Z] 22:27:33     INFO - GECKO(8376) | 4 notify("[xpconnect wrapped nsITimer @ 0x26e54ff8ca0 (native @ 0x26e54f887a0)]") ["resource://gre/modules/Timer.jsm":62:16]
[task 2022-02-07T22:27:33.626Z] 22:27:33     INFO - GECKO(8376) |     this = [object Object]
[task 2022-02-07T22:27:33.627Z] 22:27:33     INFO - GECKO(8376) | Assertion failure: false (Late preference writes should be avoided.), at /builds/worker/checkouts/gecko/modules/libpref/Preferences.cpp:1597
[task 2022-02-07T22:27:33.797Z] 22:27:33     INFO -  Initializing stack-fixing for the first stack frame, this may take a while...
[task 2022-02-07T22:27:43.246Z] 22:27:43     INFO - GECKO(8376) | #01: pref_SetPref(nsTString<char> const&, mozilla::PrefType, mozilla::PrefValueKind, PrefValue, bool, bool, bool) [modules/libpref/Preferences.cpp:1597]
[task 2022-02-07T22:27:43.259Z] 22:27:43     INFO - GECKO(8376) | #02: static mozilla::Preferences::SetCString(char const*, nsTSubstring<char> const&, mozilla::PrefValueKind) [modules/libpref/Preferences.cpp:4877]
[task 2022-02-07T22:27:43.260Z] 22:27:43     INFO - GECKO(8376) | #03: nsPrefBranch::SetCharPrefNoLengthCheck(char const*, nsTSubstring<char> const&) [modules/libpref/Preferences.cpp:2162]
[task 2022-02-07T22:27:43.260Z] 22:27:43     INFO - GECKO(8376) | #04: XRE_GetBootstrap [Z:\task_164426636746112\build\application\firefox\xul.dll + 0x9015f72]
[task 2022-02-07T22:27:43.261Z] 22:27:43     INFO - GECKO(8376) | #05: CallMethodHelper::Call() [js/xpconnect/src/XPCWrappedNative.cpp:1184]
[task 2022-02-07T22:27:43.261Z] 22:27:43     INFO - GECKO(8376) | #06: static XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) [js/xpconnect/src/XPCWrappedNative.cpp:1130]
[task 2022-02-07T22:27:43.262Z] 22:27:43     INFO - GECKO(8376) | #07: XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) [js/xpconnect/src/XPCWrappedNativeJSOps.cpp:923]
[task 2022-02-07T22:27:43.263Z] 22:27:43     INFO - GECKO(8376) | #08: CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) [js/src/vm/Interpreter.cpp:425]
[task 2022-02-07T22:27:43.263Z] 22:27:43     INFO - GECKO(8376) | #09: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) [js/src/vm/Interpreter.cpp:512]
[task 2022-02-07T22:27:43.264Z] 22:27:43     INFO - GECKO(8376) | #10: InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) [js/src/vm/Interpreter.cpp:572]
[task 2022-02-07T22:27:43.264Z] 22:27:43     INFO - GECKO(8376) | #11: Interpret(JSContext*, js::RunState&) [js/src/vm/Interpreter.cpp:3309]
[task 2022-02-07T22:27:43.264Z] 22:27:43     INFO - GECKO(8376) | #12: js::RunScript(JSContext*, js::RunState&) [js/src/vm/Interpreter.cpp:394]
[task 2022-02-07T22:27:43.265Z] 22:27:43     INFO - GECKO(8376) | #13: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) [js/src/vm/Interpreter.cpp:544]
[task 2022-02-07T22:27:43.265Z] 22:27:43     INFO - GECKO(8376) | #14: InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) [js/src/vm/Interpreter.cpp:572]
[task 2022-02-07T22:27:43.266Z] 22:27:43     INFO - GECKO(8376) | #15: js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) [js/src/vm/Interpreter.cpp:589]
[task 2022-02-07T22:27:43.266Z] 22:27:43     INFO - GECKO(8376) | #16: js::fun_apply(JSContext*, unsigned int, JS::Value*) [js/src/vm/JSFunction.cpp:1022]
[task 2022-02-07T22:27:43.267Z] 22:27:43     INFO - GECKO(8376) | #17: CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) [js/src/vm/Interpreter.cpp:425]
[task 2022-02-07T22:27:43.267Z] 22:27:43     INFO - GECKO(8376) | #18: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) [js/src/vm/Interpreter.cpp:512]
[task 2022-02-07T22:27:43.268Z] 22:27:43     INFO - GECKO(8376) | #19: InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) [js/src/vm/Interpreter.cpp:572]
[task 2022-02-07T22:27:43.268Z] 22:27:43     INFO - GECKO(8376) | #20: Interpret(JSContext*, js::RunState&) [js/src/vm/Interpreter.cpp:3309]
[task 2022-02-07T22:27:43.269Z] 22:27:43     INFO - GECKO(8376) | #21: js::RunScript(JSContext*, js::RunState&) [js/src/vm/Interpreter.cpp:394]
[task 2022-02-07T22:27:43.269Z] 22:27:43     INFO - GECKO(8376) | #22: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) [js/src/vm/Interpreter.cpp:544]
[task 2022-02-07T22:27:43.270Z] 22:27:43     INFO - GECKO(8376) | #23: InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) [js/src/vm/Interpreter.cpp:572]
[task 2022-02-07T22:27:43.270Z] 22:27:43     INFO - GECKO(8376) | #24: js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) [js/src/vm/Interpreter.cpp:589]
[task 2022-02-07T22:27:43.271Z] 22:27:43     INFO - GECKO(8376) | #25: JS_CallFunctionValue(JSContext*, JS::Handle<JSObject *>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) [js/src/vm/CallAndConstruct.cpp:53]
[task 2022-02-07T22:27:43.271Z] 22:27:43     INFO - GECKO(8376) | #26: nsXPCWrappedJS::CallMethod(unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*) [js/xpconnect/src/XPCWrappedJSClass.cpp:995]
[task 2022-02-07T22:27:43.272Z] 22:27:43     INFO - GECKO(8376) | #27: PrepareAndDispatch(nsXPTCStubBase*, unsigned int, unsigned long long*, unsigned long long*, double*) [xpcom/reflect/xptcall/md/win32/xptcstubs_x86_64.cpp:168]
[task 2022-02-07T22:27:43.272Z] 22:27:43     INFO - GECKO(8376) | #28: XRE_GetBootstrap [Z:\task_164426636746112\build\application\firefox\xul.dll + 0x9015fc9]
[task 2022-02-07T22:27:43.272Z] 22:27:43     INFO - GECKO(8376) | #29: nsTimerImpl::Fire(int) [xpcom/threads/nsTimerImpl.cpp:659]
[task 2022-02-07T22:27:43.273Z] 22:27:43     INFO - GECKO(8376) | #30: nsTimerEvent::Run() [xpcom/threads/TimerThread.cpp:265]
[task 2022-02-07T22:27:43.273Z] 22:27:43     INFO - GECKO(8376) | #31: mozilla::RunnableTask::Run() [xpcom/threads/TaskController.cpp:468]
[task 2022-02-07T22:27:43.274Z] 22:27:43     INFO - GECKO(8376) | #32: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex &> const&) [xpcom/threads/TaskController.cpp:770]
[task 2022-02-07T22:27:43.274Z] 22:27:43     INFO - GECKO(8376) | #33: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex &> const&) [xpcom/threads/TaskController.cpp:606]
[task 2022-02-07T22:27:43.275Z] 22:27:43     INFO - GECKO(8376) | #34: mozilla::TaskController::ProcessPendingMTTask(bool) [xpcom/threads/TaskController.cpp:390]
[task 2022-02-07T22:27:43.275Z] 22:27:43     INFO - GECKO(8376) | #35: mozilla::detail::RunnableFunction<`lambda at /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:127:7'>::Run() [xpcom/threads/nsThreadUtils.h:532]
[task 2022-02-07T22:27:43.276Z] 22:27:43     INFO - GECKO(8376) | #36: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1199]
[task 2022-02-07T22:27:43.276Z] 22:27:43     INFO - GECKO(8376) | #37: nsThread::Shutdown() [xpcom/threads/nsThread.cpp:885]
[task 2022-02-07T22:27:43.276Z] 22:27:43     INFO - GECKO(8376) | #38: mozilla::LazyIdleThread::ShutdownThread() [xpcom/threads/LazyIdleThread.cpp:275]
[task 2022-02-07T22:27:43.277Z] 22:27:43     INFO - GECKO(8376) | #39: mozilla::LazyIdleThread::Shutdown() [xpcom/threads/LazyIdleThread.cpp:483]
[task 2022-02-07T22:27:43.277Z] 22:27:43     INFO - GECKO(8376) | #40: mozilla::UntrustedModulesProcessor::Observe(nsISupports*, char const*, char16_t const*) [toolkit/xre/dllservices/UntrustedModulesProcessor.cpp:166]
[task 2022-02-07T22:27:43.278Z] 22:27:43     INFO - GECKO(8376) | #41: nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) [xpcom/ds/nsObserverList.cpp:69]
[task 2022-02-07T22:27:43.278Z] 22:27:43     INFO - GECKO(8376) | #42: nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) [xpcom/ds/nsObserverService.cpp:295]
[task 2022-02-07T22:27:43.279Z] 22:27:43     INFO - GECKO(8376) | #43: mozilla::AdvanceShutdownPhaseInternal(mozilla::ShutdownPhase, bool, char16_t const*, nsCOMPtr<nsISupports> const&) [xpcom/base/AppShutdown.cpp:362]
[task 2022-02-07T22:27:43.279Z] 22:27:43     INFO - GECKO(8376) | #44: mozilla::ShutdownXPCOM(nsIServiceManager*) [xpcom/build/XPCOMInit.cpp:638]
[task 2022-02-07T22:27:43.280Z] 22:27:43     INFO - GECKO(8376) | #45: ScopedXPCOMStartup::~ScopedXPCOMStartup() [toolkit/xre/nsAppRunner.cpp:2037]
[task 2022-02-07T22:27:43.280Z] 22:27:43     INFO - GECKO(8376) | #46: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:5921]
[task 2022-02-07T22:27:43.281Z] 22:27:43     INFO - GECKO(8376) | #47: XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:5963]
[task 2022-02-07T22:27:43.325Z] 22:27:43     INFO - GECKO(8376) | #48: NS_internal_main(int, char**, char**) [browser/app/nsBrowserApp.cpp:395]
[task 2022-02-07T22:27:43.326Z] 22:27:43     INFO - GECKO(8376) | #49: wmain(int, wchar_t**) [toolkit/xre/nsWindowsWMain.cpp:147]
[task 2022-02-07T22:27:43.326Z] 22:27:43     INFO - GECKO(8376) | #50: __scrt_common_main_seh() [/builds/worker/workspace/obj-build/browser/app/d:/agent/_work/1/s/src/vctools/crt/vcstartup/src/startup/exe_common.inl:288]
[task 2022-02-07T22:27:43.329Z] 22:27:43     INFO -  fix-stacks: error: failed to read debug info file `wkernel32.pdb` for `C:\Windows\System32\KERNEL32.DLL`
[task 2022-02-07T22:27:43.329Z] 22:27:43     INFO -  fix-stacks: note: this is expected and harmless for all PDB files on opt automation runs
[task 2022-02-07T22:27:43.330Z] 22:27:43     INFO -  fix-stacks: The system cannot find the file specified. (os error 2)
[task 2022-02-07T22:27:43.330Z] 22:27:43     INFO - GECKO(8376) | #51: BaseThreadInitThunk [C:\Windows\System32\KERNEL32.DLL + 0x17034]
[task 2022-02-07T22:27:43.342Z] 22:27:43     INFO -  fix-stacks: error: failed to read debug info file `wntdll.pdb` for `C:\Windows\SYSTEM32\ntdll.dll`
[task 2022-02-07T22:27:43.342Z] 22:27:43     INFO -  fix-stacks: note: this is expected and harmless for all PDB files on opt automation runs
[task 2022-02-07T22:27:43.342Z] 22:27:43     INFO -  fix-stacks: The system cannot find the file specified. (os error 2)
[task 2022-02-07T22:27:43.343Z] 22:27:43     INFO - GECKO(8376) | #52: RtlUserThreadStart [C:\Windows\SYSTEM32\ntdll.dll + 0x52651]
[task 2022-02-07T22:27:43.343Z] 22:27:43     INFO - TEST-INFO | Main app process: exit 1
[task 2022-02-07T22:27:43.343Z] 22:27:43     INFO - TEST-INFO | Confirming we saw 38 DOCSHELL created and 38 destroyed log strings.
[task 2022-02-07T22:27:43.344Z] 22:27:43     INFO - TEST-INFO | Confirming we saw 104 DOMWINDOW created and 94 destroyed log strings.
[task 2022-02-07T22:27:43.344Z] 22:27:43     INFO - TEST-INFO | browser/base/content/test/plugins/browser_bug797677.js | This test created 1 hidden docshell(s)
[task 2022-02-07T22:27:43.345Z] 22:27:43     INFO - TEST-INFO | browser/base/content/test/plugins/browser_enable_DRM_prompt.js | This test created 1 hidden docshell(s)
[task 2022-02-07T22:27:43.345Z] 22:27:43     INFO - TEST-INFO | browser/base/content/test/plugins/browser_globalplugin_crashinfobar.js | This test created 1 hidden docshell(s)
[task 2022-02-07T22:27:43.345Z] 22:27:43     INFO - TEST-INFO | browser/base/content/test/plugins/browser_private_browsing_eme_persistent_state.js | This test created 1 hidden docshell(s)
[task 2022-02-07T22:27:43.346Z] 22:27:43     INFO - Buffered messages finished
[task 2022-02-07T22:27:43.346Z] 22:27:43    ERROR - TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 1
[task 2022-02-07T22:27:43.346Z] 22:27:43     INFO - runtests.py | Application ran for: 0:00:26.296695
[task 2022-02-07T22:27:43.347Z] 22:27:43     INFO - zombiecheck | Reading PID log: C:\Users\task_164426636746112\AppData\Local\Temp\tmp0lq2ehzcpidlog
[task 2022-02-07T22:27:43.348Z] 22:27:43     INFO - ==> process 5248 launched child process 6720 ("Z:\task_164426636746112\build\application\firefox\firefox.exe" -contentproc --channel="5248.0.25791512\895587168" -parentBuildID 20220207215603 -prefsHandle 2136 -prefMapHandle 2128 -prefsLen 1 -prefMapSize 259591 -appDir "Z:\task_164426636746112\build\application\firefox\browser" - 5248 "\\.\pipe\gecko-crash-server-pipe.5248" 2192 26e3d884880 gpu)

Component: Plug-ins → Audio/Video: GMP
See Also: → 1754168

pref_SetPref: Attempt to write pref media.gmp-manager.buildID after XPCOMShutdownThreads started..

Bryce, do you know what's up with that? Maybe we should just register an xpcom shutdown observer, and do not check for update when xpcom shutdown has started? It feels like the right thing to do, there's good chance the update won't succeed if the process are being closed.

Flags: needinfo?(bvandyk)

In alternative to an observer you could just check AppShutdown::IsInOrBeyond(XPCOMShutdownThreads) or any earlier phase you deem suitable before checking for updates.

See Also: → 1753932

Thanks for NI and the suggestions. I'll take a look when I have a moment. This may just be something we want to silence, but it's probably also worth check that there's no bugs we could introduce via shutdown racing prefs being set (e.g. if we try set a GMP pref, but it doesn't happen due to shutdown, we need to make sure it happens next time the browser is opened, otherwise we could end up with inconsistent prefs).

Looking at the code this is happening here in toolkit. I don't have as much experience with toolkit (though that code is also part of the GMP code I've inherited), but an observer seems workable. AppShutdown doesn't seem to be exposed to toolkit, but let me know if that's not the case.

That said, the window for this particular failure seems quite narrow (shutting down just after we schedule an attempted Widevine udpate check). We can guard against this, but I'm concerned this is a symptom of some other issue. It looks like there's a spike of these along with some other GMP issues, so I'm going to take some time to ponder.

(In reply to Bryce Seager van Dyk (:bryce) from comment #4)

AppShutdown doesn't seem to be exposed to toolkit, but let me know if that's not the case.

FWIW, nsIAppStartup exposes isInOrBeyondShutdownPhase to JS and I assume it can be used from toolkit, too.

Looking at the JS stack we print out now right before the assertion:

[task 2022-02-07T22:27:33.621Z] 22:27:33     INFO - GECKO(8376) | pref_SetPref: Attempt to write pref media.gmp-manager.buildID after XPCOMShutdownThreads started.
[task 2022-02-07T22:27:33.622Z] 22:27:33     INFO - GECKO(8376) | 0 setString(aKey = ""media.gmp-manager.buildID"", aVal = ""20220207215603"") ["resource://gre/modules/GMPUtils.jsm":219:19]
[task 2022-02-07T22:27:33.622Z] 22:27:33     INFO - GECKO(8376) |     this = [object Object]
[task 2022-02-07T22:27:33.623Z] 22:27:33     INFO - GECKO(8376) | 1 _versionchangeOccurred() ["resource://gre/modules/GMPInstallManager.jsm":352:13]
[task 2022-02-07T22:27:33.623Z] 22:27:33     INFO - GECKO(8376) |     this = [object Object]
[task 2022-02-07T22:27:33.624Z] 22:27:33     INFO - GECKO(8376) | 2 simpleCheckAndInstall() ["resource://gre/modules/GMPInstallManager.jsm":365:13]
[task 2022-02-07T22:27:33.625Z] 22:27:33     INFO - GECKO(8376) |     this = [object Object]
[task 2022-02-07T22:27:33.625Z] 22:27:33     INFO - GECKO(8376) | 3 checkForUpdates/<() ["resource://gre/modules/addons/GMPProvider.jsm":541:26]
[task 2022-02-07T22:27:33.626Z] 22:27:33     INFO - GECKO(8376) | 4 notify("[xpconnect wrapped nsITimer @ 0x26e54ff8ca0 (native @ 0x26e54f887a0)]") ["resource://gre/modules/Timer.jsm":62:16]
[task 2022-02-07T22:27:33.626Z] 22:27:33     INFO - GECKO(8376) |     this = [object Object]

IIUC we come from this timer which can be started with GMP_CHECK_DELAY == 10s. I think it is reasonable to assume that the callback needs to check if it is still desirable to do anything once the timer fires. Or we need to cancel that timer in time (by an observer).

(In reply to Jens Stutte [:jstutte] from comment #5)

(In reply to Bryce Seager van Dyk (:bryce) from comment #4)

AppShutdown doesn't seem to be exposed to toolkit, but let me know if that's not the case.

FWIW, nsIAppStartup exposes isInOrBeyondShutdownPhase to JS and I assume it can be used from toolkit, too.

Thanks!

IIUC we come from this timer which can be started with GMP_CHECK_DELAY == 10s. I think it is reasonable to assume that the callback needs to check if it is still desirable to do anything once the timer fires. Or we need to cancel that timer in time (by an observer).

That sounds about right. I believe we have some other entry points into this code. I believe the updates will also kick off some asyc tasks, after which we can also set prefs. So if we're worried about this, it may make sense to block shutdown across the whole update and remove the blocker once everything is in a consistent state.

(In reply to Bryce Seager van Dyk (:bryce) from comment #7)

(In reply to Jens Stutte [:jstutte] from comment #5)
That sounds about right. I believe we have some other entry points into this code. I believe the updates will also kick off some asyc tasks, after which we can also set prefs. So if we're worried about this, it may make sense to block shutdown across the whole update and remove the blocker once everything is in a consistent state.

Not sure about the other entry points, but if they are more immediate than a 10sec timer I think they are less worrying. In any case I would probably not recommend to rely on a deterministic behavior of the browser during XPCOMShutdownThreads (as threads are going down in a more or less random order and dispatches might or might not succeed) and IIUC we do allow shutdown blockers only until phase XPCOMWillShutdown anyway. And still the timer should not hold a blocker during its lifetime, IMHO, it should be totally fine to have a shutdown while that timer is in flight such that when it fires we can already be in any phase of shutdown. So I assume we should have some check if we are beyond the critical point (XPCOMWillShutdown) when the timer fires and can then add a blocker for all the entry points on that same phase when the update really starts.

FWIW, if we're updating this anyway, another useful improvement here would be to schedule the GMP check as a cpu-idle task (ie using ChromeUtils.idleDispatch) rather than hardcoding the 10s timeout. That will normally lead to it being run sooner, without overloading the CPU. Either way we'd need to check for shutdown state before writing prefs and/or doing too much work, though.

FWIW in this instance: https://treeherder.mozilla.org/logviewer?job_id=376754943&repo=autoland&lineNumber=13828 I see that the late import of osfile.jsm wants to add a shutdown blocker when we are already beyond the indicated phase.

It is not clear to me where this all starts, though.

(In reply to Jens Stutte [:jstutte] from comment #23)

FWIW in this instance: https://treeherder.mozilla.org/logviewer?job_id=376754943&repo=autoland&lineNumber=13828 I see that the late import of osfile.jsm wants to add a shutdown blocker when we are already beyond the indicated phase.

It is not clear to me where this all starts, though.

It looks like telemetry has its own shutdown phase, profile-before-change-telemetry, and then does some async stuff inside that, which depends on being able to do IO. But both OS.File and IOUtils finish after profile-before-change, which AFAICT is fired earlier, and telemetry depends on those utilities to do file IO. That seems bad. Barret/Dexter, I... don't understand how this "ever" works, ie why don't we hit this all the time and/or break stuff all the time as a result of this mismatch?

(Also including Doug as I'm pretty sure this is relevant to shutdown perf interests.)

Flags: needinfo?(brennie)
Flags: needinfo?(alessio.placitelli)
Performance Impact: --- → ?

(In reply to :Gijs (he/him) from comment #24)

It looks like telemetry has its own shutdown phase, profile-before-change-telemetry, and then does some async stuff inside that, which depends on being able to do IO. But both OS.File and IOUtils finish after profile-before-change, which AFAICT is fired earlier, and telemetry depends on those utilities to do file IO.

Yes, profile-before-change is fired earlier. It seems clear from the log that we are at least beyond profile-before-change-qm (there is some quota manager shutdown noise), the next phase is profile-before-change-telemetry, IIUC.

That seems bad. Barret/Dexter, I... don't understand how this "ever" works, ie why don't we hit this all the time and/or break stuff all the time as a result of this mismatch?

I assume it becomes only a problem, if we want to actually initialize telemetry in the wrong moment. I could imagine we get into (or beyond) profile-before-change-telemetry (thus telemetry shuts down) but then something happens (a late promise, whatever) that wants to use telemetry and we try to initialize it again. I assume most of the times the shutdown order prevents Telemetry from being caught uninitialized during shutdown. Still this feels like a clear misalignment between OS.File and Telemetry that might also cause other side effects. I was not able to figure out from the log what actually wants to use Telemetry here, though, but my JS stack reading skills are limited.

(In reply to Jens Stutte [:jstutte] from comment #25)

(In reply to :Gijs (he/him) from comment #24)

It looks like telemetry has its own shutdown phase, profile-before-change-telemetry, and then does some async stuff inside that, which depends on being able to do IO. But both OS.File and IOUtils finish after profile-before-change, which AFAICT is fired earlier, and telemetry depends on those utilities to do file IO.

Yes, profile-before-change is fired earlier. It seems clear from the log that we are at least beyond profile-before-change-qm (there is some quota manager shutdown noise), the next phase is profile-before-change-telemetry, IIUC.

That seems bad. Barret/Dexter, I... don't understand how this "ever" works, ie why don't we hit this all the time and/or break stuff all the time as a result of this mismatch?

I assume it becomes only a problem, if we want to actually initialize telemetry in the wrong moment. I could imagine we get into (or beyond) profile-before-change-telemetry (thus telemetry shuts down) but then something happens (a late promise, whatever) that wants to use telemetry and we try to initialize it again.

Ah, this is an interesting hypothesis, which indeed would sort of explain why this isn't always a problem, though...

I assume most of the times the shutdown order prevents Telemetry from being caught uninitialized during shutdown. Still this feels like a clear misalignment between OS.File and Telemetry that might also cause other side effects

Right. And in particular, AIUI both OS.File and IOUtils will block profile-before-change, and so if that phase is complete they are definitely shut down. AFAICT telemetry shutdown adds a shutdown blocker for profile-before-change-telemetry here, which calls _cleanupOnShutdown, which calls TelemetrySession.shutdown which calls saveShutdownPings which calls submitExternalPing a few times, which calls promiseArchivePing, which calls saveArchivedPing which ends up doing (or attempting to do) IO, which is (AFAIK) always going to fail.

I think this would just happen every time we shut down (as in, I don't see why it wouldn't) but I'm probably missing something...

I was not able to figure out from the log what actually wants to use Telemetry here, though, but my JS stack reading skills are limited.

AFAICT it is being invoked from the shutdown observer notification by AsyncShutdown:

[task 2022-05-04T02:13:13.124Z] 02:13:13     INFO - GECKO(6963) | WARNING: addBlocker@resource://gre/modules/AsyncShutdown.jsm:718:15
[task 2022-05-04T02:13:13.125Z] 02:13:13     INFO - GECKO(6963) | addBlocker@resource://gre/modules/AsyncShutdown.jsm:526:26
[task 2022-05-04T02:13:13.126Z] 02:13:13     INFO - GECKO(6963) | addBlocker@resource://gre/modules/AsyncShutdown.jsm:465:15
[task 2022-05-04T02:13:13.126Z] 02:13:13     INFO - GECKO(6963) | setupShutdown@resource://gre/modules/osfile/osfile_async_front.jsm:1564:28
[task 2022-05-04T02:13:13.129Z] 02:13:13     INFO - GECKO(6963) | @resource://gre/modules/osfile/osfile_async_front.jsm:1584:16
[task 2022-05-04T02:13:13.138Z] 02:13:13     INFO - GECKO(6963) | @resource://gre/modules/osfile.jsm:11:30
[task 2022-05-04T02:13:13.139Z] 02:13:13     INFO - GECKO(6963) | @resource://gre/modules/TelemetryStorage.jsm:18:28
[task 2022-05-04T02:13:13.140Z] 02:13:13     INFO - GECKO(6963) | promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:99:5
[task 2022-05-04T02:13:13.141Z] 02:13:13     INFO - GECKO(6963) | promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:59:33
[task 2022-05-04T02:13:13.142Z] 02:13:13     INFO - GECKO(6963) | _submitPingLogic@resource://gre/modules/TelemetryControllerParent.jsm:536:43
[task 2022-05-04T02:13:13.143Z] 02:13:13     INFO - GECKO(6963) | send@resource://gre/modules/TelemetryControllerParent.jsm:624:24
[task 2022-05-04T02:13:13.143Z] 02:13:13     INFO - GECKO(6963) | submitExternalPing@resource://gre/modules/TelemetryControllerParent.jsm:179:17
[task 2022-05-04T02:13:13.144Z] 02:13:13     INFO - GECKO(6963) | saveShutdownPings@resource://gre/modules/TelemetrySession.jsm:978:29
[task 2022-05-04T02:13:13.149Z] 02:13:13     INFO - GECKO(6963) | shutdownChromeProcess/cleanup/<@resource://gre/modules/TelemetrySession.jsm:1228:20
[task 2022-05-04T02:13:13.149Z] 02:13:13     INFO - GECKO(6963) | cleanup@resource://gre/modules/TelemetrySession.jsm:1235:9
[task 2022-05-04T02:13:13.149Z] 02:13:13     INFO - GECKO(6963) | shutdownChromeProcess@resource://gre/modules/TelemetrySession.jsm:1252:14
[task 2022-05-04T02:13:13.149Z] 02:13:13     INFO - GECKO(6963) | shutdown@resource://gre/modules/TelemetrySession.jsm:252:17
[task 2022-05-04T02:13:13.150Z] 02:13:13     INFO - GECKO(6963) | _cleanupOnShutdown@resource://gre/modules/TelemetryControllerParent.jsm:969:30
[task 2022-05-04T02:13:13.150Z] 02:13:13     INFO - GECKO(6963) | observe@resource://gre/modules/AsyncShutdown.jsm:575:16

Redirecting this to Chris, who recently also touched shutdown in telemetry.

Flags: needinfo?(alessio.placitelli) → needinfo?(chutten)

We have buckets of evidence that at-shutdown disk writes in Telemetry work (every ping sent by pingsender must first be read from disk and most "main" pings we receive are sent at shutdown (this is an old query, but it still holds)), so there must be an incorrect assumption in here somewhere: Telemetry can write to disk during its shutdown blockers.

Flags: needinfo?(chutten)

(In reply to :Gijs (he/him) from comment #24)

(In reply to Jens Stutte [:jstutte] from comment #23)

FWIW in this instance: https://treeherder.mozilla.org/logviewer?job_id=376754943&repo=autoland&lineNumber=13828 I see that the late import of osfile.jsm wants to add a shutdown blocker when we are already beyond the indicated phase.

It is not clear to me where this all starts, though.

It looks like telemetry has its own shutdown phase, profile-before-change-telemetry, and then does some async stuff inside that, which depends on being able to do IO. But both OS.File and IOUtils finish after profile-before-change, which AFAICT is fired earlier, and telemetry depends on those utilities to do file IO. That seems bad. Barret/Dexter, I... don't understand how this "ever" works, ie why don't we hit this all the time and/or break stuff all the time as a result of this mismatch?

(Also including Doug as I'm pretty sure this is relevant to shutdown perf interests.)

As it turns out, only IOUtils shuts down its event queue at profile-before-change. OS.File happily accepts IO up until xpcom-will-shutdown. That said, IOUtils will need to support this so that the telemetry code can be migrated.

Flags: needinfo?(brennie)

During performance triage we failed to determine a performance priority for this bug using the calculator. Perhaps there's an opportunity to improve the process. If you feel this is the wrong value for the performance flag, please provide this feedback. See https://wiki.mozilla.org/Performance/Triage for details of the triage process and a link to the calculator.

Performance Impact: ? → -

(In reply to Dave Hunt [:davehunt] [he/him] ⌚GMT from comment #30)

During performance triage we failed to determine a performance priority for this bug using the calculator. Perhaps there's an opportunity to improve the process. If you feel this is the wrong value for the performance flag, please provide this feedback. See https://wiki.mozilla.org/Performance/Triage for details of the triage process and a link to the calculator.

So I'd argue that late pref writes cause shutdown delays, and we should capture shutdown performance / async shutdown crashes etc. in the calculator. Would that make sense? :-)

Flags: needinfo?(dave.hunt)

(In reply to :Gijs (he/him) from comment #32)

So I'd argue that late pref writes cause shutdown delays, and we should capture shutdown performance / async shutdown crashes etc. in the calculator. Would that make sense? :-)

Thanks for the feedback Gijs. I've added this to the agenda for the next perf triage guidelines meeting and invited you in case you'd like to join the discussion. I'll leave the needinfo open for now.

This was discussed in the performance triage guidelines meeting and the priority calculator was updated to include shutdown impact.

The Performance Priority Calculator has determined this bug's performance priority to be P3.
Impact on browser UI: Causes noticeable shutdown delay (+2)
Affects macOS or Windows: Both (×4)

Performance Impact: - → P3
Flags: needinfo?(dave.hunt)

Looking at some of the latest failures here, I see in all 3 logs I looked at:

[task 2022-08-20T10:48:56.278Z] 10:48:56     INFO - GECKO(2784) | pref_SetPref: Attempt to write pref media.gmp-manager.buildID after XPCOMShutdownThreads started.
[task 2022-08-20T10:48:56.279Z] 10:48:56     INFO - GECKO(2784) | 0 setString(aKey = ""media.gmp-manager.buildID"", aVal = ""20220820101049"") ["resource://gre/modules/GMPUtils.jsm":208:19]
[task 2022-08-20T10:48:56.279Z] 10:48:56     INFO - GECKO(2784) |     this = [object Object]
[task 2022-08-20T10:48:56.280Z] 10:48:56     INFO - GECKO(2784) | 1 _versionchangeOccurred() ["resource://gre/modules/GMPInstallManager.jsm":441:13]
[task 2022-08-20T10:48:56.280Z] 10:48:56     INFO - GECKO(2784) |     this = [object Object]
[task 2022-08-20T10:48:56.281Z] 10:48:56     INFO - GECKO(2784) | 2 simpleCheckAndInstall() ["resource://gre/modules/GMPInstallManager.jsm":454:13]
[task 2022-08-20T10:48:56.281Z] 10:48:56     INFO - GECKO(2784) |     this = [object Object]
[task 2022-08-20T10:48:56.282Z] 10:48:56     INFO - GECKO(2784) | 3 checkForUpdates/<() ["resource://gre/modules/addons/GMPProvider.jsm":550:26]
[task 2022-08-20T10:48:56.282Z] 10:48:56     INFO - GECKO(2784) | 4 notify("[xpconnect wrapped nsITimer @ 0x2e052fa0 (native @ 0x2de71600)]") ["resource://gre/modules/Timer.jsm":62:16]
[task 2022-08-20T10:48:56.282Z] 10:48:56     INFO - GECKO(2784) |     this = [object Object]
[task 2022-08-20T10:48:56.283Z] 10:48:56     INFO - GECKO(2784) | Assertion failure: false (Late preference writes should be avoided.), at /builds/worker/checkouts/gecko/modules/libpref/Preferences.cpp:1714

So what is happening here is that this BestEffortUserIdleTask timer fires after XPCOMShutdownThreads. Those tasks are meant to run just once per session and there is no harm in not running them at all. Indeed we do check for shutdown when executing those tasks, but IIUC this check is just mapped to the nsAppStartup::mShuttingDown variable and there might be code paths that do not set it during testing? Probably we want to get rid of that flag in favor of an appropriate AppShutdown::IsInOrBeyond check?

(In reply to Jens Stutte [:jstutte] from comment #49)

...there might be code paths that do not set it during testing?

Maybe bug 336193 could be related here?

(In reply to Jens Stutte [:jstutte] from comment #50)

(In reply to Jens Stutte [:jstutte] from comment #49)

...there might be code paths that do not set it during testing?

Maybe bug 336193 could be related here?

See bug 1791350 where we had a similar problem. I filed bug 1791350.

See Also: → 1791350
Assignee: nobody → jstutte
Status: NEW → ASSIGNED

Comment on attachment 9295246 [details]
Bug 1754104 - Replace C++ uses of Services.startup.shuttingDown with appropriate AppShutdown::IsInOrBeyond. r?florian

Revision D157628 was moved to bug 1791350. Setting attachment 9295246 [details] to obsolete.

Attachment #9295246 - Attachment is obsolete: true

I mixed up the bug numbers, sorry for the noise...

Assignee: jstutte → nobody
Status: ASSIGNED → NEW

(In reply to Intermittent Failures Robot from comment #65)

For more details, see:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1754104&startday=2022-10-31&endday=2022-11-06&tree=all

Looking through this batch of failures I see three different preferences:

  • media.gmp-manager.buildID - it seems, we can end up with late attempts to update due to this timer. Probably simpleCheckAndInstall should do some appropriate shutdown check?
  • fission.experiment.max-origins.last-disqualified - this pref name sounds a bit like it might be outdated, anyways?
  • extensions.pendingOperations - asyncLoadDB probably should check for shutdown?
Depends on: 1799418
Depends on: 1799420
Depends on: 1799421
Depends on: 1805139
Depends on: 1823377

This intermittent bug just collects different cases, thus there is no better component for it. AFAICS.

Component: Audio/Video: GMP → Preferences: Backend

(In reply to Intermittent Failures Robot from comment #89)

6 failures in 3206 pushes (0.002 failures/push) were associated with this bug in the last 7 days.

All remaining instances of this batch are bug 1799418.

Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Depends on: 1851429
Status: REOPENED → RESOLVED
Closed: 11 months ago6 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 6 months ago2 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: