Closed Bug 1371248 Opened 7 years ago Closed 7 years ago

Crash in shutdownhang | NtWaitForAlertByThreadId | RtlSleepConditionVariableSRW | SleepConditionVariableSRW

Categories

(Core :: XPConnect, defect, P1)

55 Branch
x86_64
Windows 10
defect

Tracking

()

VERIFIED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 + verified
firefox56 --- verified

People

(Reporter: vtamas, Assigned: kmag)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-07aa1d6b-dbf8-4a07-a751-4619d0170608.
=============================================================

- STR:
1.Launch Firefox with a clean profile.
2.Install a webextension.
3.Spend some time testing it.
4.Navigate to about:addons and remove the webextension.
5.Restart the browser.

- Ran into this on Windows 8.1 32-bit, using Firefox 55.0a1 (2017-06-07/08) on Windows 10 64-bit
- This issue might not reproduce for each restart, but I encountered the crash in about 90% of the cases.

- Crash reports:
bp-19ba0742-1912-4309-8ddd-3592d0170608
bp-69c4e329-01a8-4ab3-a0fc-b29d00170608
bp-9e447097-2b84-49ad-9385-266950170608
bp-7a41dc78-be4f-4037-a835-599f20170608
bp-9c96d7e3-0427-4dcc-a078-951c30170607
bp-f2a24256-57f0-4219-a95a-103a40170607
This one is connected to web extension. This is not a necko bug.
Component: Networking → Untriaged
Kris, looks like this is some sort of shutdown hang related to ScriptPreloader + WebExtensions.
Component: Untriaged → XPConnect
Flags: needinfo?(kmaglione+bmo)
Yeah, I'm pretty sure another bug was filed about this a few days ago. Apparently when the add-on manager triggers a startup cache flush during shutdown, we fail to short circuit the 10 second delay for writing a new, empty cache file, so shutdown times out.
Flags: needinfo?(kmaglione+bmo)
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #3)
> Yeah, I'm pretty sure another bug was filed about this a few days ago.

Should we just set this as the duplicate of that bug you mentioned above?

> Apparently when the add-on manager triggers a startup cache flush during
> shutdown, we fail to short circuit the 10 second delay for writing a new,
> empty cache file, so shutdown times out.
Flags: needinfo?(kmaglione+bmo)
I believe bug 1368701 is the one I was thinking of, which makes this more of a blocker than a dupe.
Blocks: 1368701
Flags: needinfo?(kmaglione+bmo)
Priority: -- → P1
Tracking for 55, this shows up in early crash data for 55b1.
I mostly run into this issue when I'm restarting Nightly 56 for an update.
After press Restart, Firefox close the windows and didn't show up again so I need to start the application again manually.
My crash report is:
https://crash-stats.mozilla.com/report/index/ac01ced9-72c5-433f-9cef-222940170628
Comment on attachment 8881967 [details]
Bug 1371248: Avoid hangs when preloader cache flush is triggered during shutdown.

https://reviewboard.mozilla.org/r/153036/#review158266

This seems okay, I have some concerns about the sync dispatch but I might be misunderstanding the flow. Also would it make sense to add a timeout to the `Wait` call in `Cleanup` to future proof shutdown hangs? Or would that just lead to other crashes?

::: js/xpconnect/loader/ScriptPreloader.cpp:510
(Diff revision 1)
>      }
>      return Ok();
>  }
>  
>  void
>  ScriptPreloader::PrepareCacheWrite()

Maybe rename this `PrepareCacheWriteInternal` just so it's clear you should only call it if you have the monitor (or maybe add the MonitorAutoLock as a `aProofOfLock` param).

::: js/xpconnect/loader/ScriptPreloader.cpp:518
(Diff revision 1)
>  
> +    mSaveMonitor.AssertCurrentThreadOwns();
> +
>      auto cleanup = MakeScopeExit([&] () {
>          if (mChildCache) {
> -            mChildCache->PrepareCacheWrite();
> +            mChildCache->PrepareCacheWriteTask();

...and then change `PrepareCacheWriteTask` to just `PrepareCacheWrite` indicating it's the main entry point.

::: js/xpconnect/loader/ScriptPreloader.cpp:594
(Diff revision 1)
>          MonitorAutoUnlock mau(mSaveMonitor);
>  
>          NS_DispatchToMainThread(
>            NewRunnableMethod("ScriptPreloader::PrepareCacheWrite",
>                              this,
> -                            &ScriptPreloader::PrepareCacheWrite),
> +                            &ScriptPreloader::PrepareCacheWriteTask),

This seems wrong, we grab the monitor above, then do a sync dispatch (which I assume means we block here), the task runs on the main thread and then tries to grab the non-reentrant monitor again which would be bad.
Attachment #8881967 - Flags: review?(erahm)
Comment on attachment 8881967 [details]
Bug 1371248: Avoid hangs when preloader cache flush is triggered during shutdown.

https://reviewboard.mozilla.org/r/153036/#review158266

I think it would probably be a good idea to assert that the write thread isn't blocked on sync dispatch, but I'm worried about just adding a timeout.

> This seems wrong, we grab the monitor above, then do a sync dispatch (which I assume means we block here), the task runs on the main thread and then tries to grab the non-reentrant monitor again which would be bad.

We release the monitor above, we don't grab it.
Once this lands can you request uplift? It's pretty high volume on beta (#2 topcrash).
Flags: needinfo?(kmaglione+bmo)
Assignee: nobody → kmaglione+bmo
Flags: needinfo?(kmaglione+bmo)
Comment on attachment 8881967 [details]
Bug 1371248: Avoid hangs when preloader cache flush is triggered during shutdown.

https://reviewboard.mozilla.org/r/153036/#review158302

Looks good, thanks for the updated verison.
Attachment #8881967 - Flags: review?(erahm) → review+
Pushed by maglione.k@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/7e2f1f4713fc
Avoid hangs when preloader cache flush is triggered during shutdown. r=erahm
Backed out for asserting IsAcquired() && mOwningThread == PR_GetCurrentThread(), e.g. in browser_ext_browserAction_contextMenu.js:

https://hg.mozilla.org/integration/autoland/rev/4bd7d5752bd8ba40bf8fe742bb4e8a6313d9b779

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=7e2f1f4713fcf4ad54559f7f1fe6cacc30d32512&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=111566408&repo=autoland

[task 2017-07-03T21:31:20.649475Z] 21:31:20     INFO - GECKO(1097) | Assertion failure: IsAcquired() && mOwningThread == PR_GetCurrentThread(), at /home/worker/workspace/build/src/xpcom/threads/BlockingResourceBase.cpp:400
[task 2017-07-03T21:31:58.759204Z] 21:31:58     INFO - GECKO(1097) | #01: mozilla::ScriptPreloader::PrepareCacheWriteInternal [mfbt/GuardObjects.h:101]
[task 2017-07-03T21:31:58.762033Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.764128Z] 21:31:58     INFO - GECKO(1097) | #02: mozilla::ScriptPreloader::InvalidateCache [xpcom/base/nsCOMPtr.h:1195]
[task 2017-07-03T21:31:58.766131Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.767139Z] 21:31:58     INFO - GECKO(1097) | #03: mozilla::ScriptPreloader::Observe [js/xpconnect/loader/ScriptPreloader.cpp:341]
[task 2017-07-03T21:31:58.768923Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.770966Z] 21:31:58     INFO - GECKO(1097) | #04: nsObserverList::NotifyObservers [xpcom/ds/nsObserverList.cpp:112]
[task 2017-07-03T21:31:58.773831Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.782257Z] 21:31:58     INFO - GECKO(1097) | #05: nsObserverService::NotifyObservers [xpcom/ds/nsTHashtable.h:136]
[task 2017-07-03T21:31:58.783935Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.786017Z] 21:31:58     INFO - GECKO(1097) | #06: NS_InvokeByIndex
[task 2017-07-03T21:31:58.787616Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.791659Z] 21:31:58     INFO - GECKO(1097) | #07: CallMethodHelper::Call [js/xpconnect/src/xpcprivate.h:781]
[task 2017-07-03T21:31:58.793317Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.798231Z] 21:31:58     INFO - GECKO(1097) | #08: XPCWrappedNative::CallMethod [js/xpconnect/src/XPCWrappedNative.cpp:1282]
[task 2017-07-03T21:31:58.801834Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.803750Z] 21:31:58     INFO - GECKO(1097) | #09: XPC_WN_CallMethod [js/xpconnect/src/XPCWrappedNativeJSOps.cpp:966]
[task 2017-07-03T21:31:58.805372Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.807206Z] 21:31:58     INFO - GECKO(1097) | #10: js::CallJSNative [js/src/jscntxtinlines.h:293]
[task 2017-07-03T21:31:58.809835Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.811774Z] 21:31:58     INFO - GECKO(1097) | #11: js::InternalCallOrConstruct [js/src/vm/Interpreter.cpp:470]
[task 2017-07-03T21:31:58.821840Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.823741Z] 21:31:58     INFO - GECKO(1097) | #12: Interpret [js/src/vm/Interpreter.cpp:3060]
[task 2017-07-03T21:31:58.825371Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.827233Z] 21:31:58     INFO - GECKO(1097) | #13: js::RunScript [js/src/vm/Interpreter.cpp:381]
[task 2017-07-03T21:31:58.828883Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.830776Z] 21:31:58     INFO - GECKO(1097) | #14: js::InternalCallOrConstruct [js/src/vm/Interpreter.cpp:488]
[task 2017-07-03T21:31:58.832412Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.834310Z] 21:31:58     INFO - GECKO(1097) | #15: Interpret [js/src/vm/Interpreter.cpp:3060]
[task 2017-07-03T21:31:58.835977Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.838076Z] 21:31:58     INFO - GECKO(1097) | #16: js::RunScript [js/src/vm/Interpreter.cpp:381]
[task 2017-07-03T21:31:58.844810Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.849781Z] 21:31:58     INFO - GECKO(1097) | #17: js::InternalCallOrConstruct [js/src/vm/Interpreter.cpp:488]
[task 2017-07-03T21:31:58.851520Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.853379Z] 21:31:58     INFO - GECKO(1097) | #18: js::Call [js/src/vm/Interpreter.cpp:534]
[task 2017-07-03T21:31:58.855086Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.856964Z] 21:31:58     INFO - GECKO(1097) | #19: js::Wrapper::call [js/src/proxy/Wrapper.cpp:169]
[task 2017-07-03T21:31:58.858611Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.860537Z] 21:31:58     INFO - GECKO(1097) | #20: js::CrossCompartmentWrapper::call [js/src/proxy/CrossCompartmentWrapper.cpp:359]
[task 2017-07-03T21:31:58.865815Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.870066Z] 21:31:58     INFO - GECKO(1097) | #21: js::Proxy::call [js/src/proxy/Proxy.cpp:481]
[task 2017-07-03T21:31:58.871745Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.873658Z] 21:31:58     INFO - GECKO(1097) | #22: js::proxy_Call [js/src/proxy/Proxy.cpp:741]
[task 2017-07-03T21:31:58.875299Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.877171Z] 21:31:58     INFO - GECKO(1097) | #23: js::CallJSNative [js/src/jscntxtinlines.h:293]
[task 2017-07-03T21:31:58.881833Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.883770Z] 21:31:58     INFO - GECKO(1097) | #24: js::InternalCallOrConstruct [js/src/vm/Interpreter.cpp:452]
[task 2017-07-03T21:31:58.885514Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.889537Z] 21:31:58     INFO - GECKO(1097) | #25: Interpret [js/src/vm/Interpreter.cpp:3060]
[task 2017-07-03T21:31:58.891205Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.893241Z] 21:31:58     INFO - GECKO(1097) | #26: js::RunScript [js/src/vm/Interpreter.cpp:381]
[task 2017-07-03T21:31:58.895271Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.902120Z] 21:31:58     INFO - GECKO(1097) | #27: js::InternalCallOrConstruct [js/src/vm/Interpreter.cpp:488]
[task 2017-07-03T21:31:58.904149Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.906164Z] 21:31:58     INFO - GECKO(1097) | #28: js::Call [js/src/vm/Interpreter.cpp:534]
[task 2017-07-03T21:31:58.908142Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.910281Z] 21:31:58     INFO - GECKO(1097) | #29: js::Wrapper::call [js/src/proxy/Wrapper.cpp:169]
[task 2017-07-03T21:31:58.912032Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.914083Z] 21:31:58     INFO - GECKO(1097) | #30: js::CrossCompartmentWrapper::call [js/src/proxy/CrossCompartmentWrapper.cpp:359]
[task 2017-07-03T21:31:58.915906Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.917995Z] 21:31:58     INFO - GECKO(1097) | #31: xpc::AddonWrapper<js::CrossCompartmentWrapper>::call [js/xpconnect/wrappers/AddonWrapper.cpp:157]
[task 2017-07-03T21:31:58.919798Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.926190Z] 21:31:58     INFO - GECKO(1097) | #32: js::Proxy::call [js/src/proxy/Proxy.cpp:481]
[task 2017-07-03T21:31:58.928281Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.930402Z] 21:31:58     INFO - GECKO(1097) | #33: js::proxy_Call [js/src/proxy/Proxy.cpp:741]
[task 2017-07-03T21:31:58.932182Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.934267Z] 21:31:58     INFO - GECKO(1097) | #34: js::CallJSNative [js/src/jscntxtinlines.h:293]
[task 2017-07-03T21:31:58.936085Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.938067Z] 21:31:58     INFO - GECKO(1097) | #35: js::InternalCallOrConstruct [js/src/vm/Interpreter.cpp:452]
[task 2017-07-03T21:31:58.939835Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.942084Z] 21:31:58     INFO - GECKO(1097) | #36: js::jit::DoCallFallback [js/src/jit/BaselineIC.cpp:2552]
[task 2017-07-03T21:31:58.947285Z] 21:31:58     INFO - 
[task 2017-07-03T21:31:58.949253Z] 21:31:58     INFO - GECKO(1097) | #37: ??? (???:???)
[task 2017-07-03T21:31:58.951577Z] 21:31:58     INFO - GECKO(1097) | ExceptionHandler::GenerateDump cloned child 1165
[task 2017-07-03T21:31:58.953783Z] 21:31:58     INFO - GECKO(1097) | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2017-07-03T21:31:58.955722Z] 21:31:58     INFO - GECKO(1097) | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2017-07-03T21:31:58.961070Z] 21:31:58     INFO - TEST-INFO | Main app process: exit 11
[task 2017-07-03T21:31:58.962887Z] 21:31:58    ERROR - TEST-UNEXPECTED-FAIL | ShutdownLeaks | process() called before end of test suite
[task 2017-07-03T21:31:58.977548Z] 21:31:58     INFO - Buffered messages logged at 21:31:17
[task 2017-07-03T21:31:58.979038Z] 21:31:58     INFO - Entering test bound browseraction_popup_contextmenu
[task 2017-07-03T21:31:58.979328Z] 21:31:58     INFO - Extension loaded
[task 2017-07-03T21:31:58.981922Z] 21:31:58     INFO - Buffered messages logged at 21:31:18
[task 2017-07-03T21:31:58.983586Z] 21:31:58     INFO - Console message: 1499117478543	addons.webextension.59842a4e5d6621c34c30cc154e9046b78aac74fd@temporary-addon	WARN	Please specify whether you want browser_style or not in your browser_action options.
[task 2017-07-03T21:31:58.984255Z] 21:31:58     INFO - Buffered messages logged at 21:31:19
[task 2017-07-03T21:31:58.986677Z] 21:31:58     INFO - TEST-PASS | browser/components/extensions/test/browser/browser_ext_browserAction_contextMenu.js | Expect widget not to be overflowed - 
[task 2017-07-03T21:31:58.993658Z] 21:31:58     INFO - Buffered messages logged at 21:31:20
[task 2017-07-03T21:31:58.995559Z] 21:31:58     INFO - TEST-PASS | browser/components/extensions/test/browser/browser_ext_browserAction_contextMenu.js | contextMenu item for page was found - 
[task 2017-07-03T21:31:58.997289Z] 21:31:58     INFO - Buffered messages finished
[task 2017-07-03T21:31:58.999229Z] 21:31:58    ERROR - TEST-UNEXPECTED-FAIL | browser/components/extensions/test/browser/browser_ext_browserAction_contextMenu.js | application terminated with exit code 11
[task 2017-07-03T21:31:59.001157Z] 21:31:59     INFO - runtests.py | Application ran for: 0:01:23.226058
[task 2017-07-03T21:31:59.002929Z] 21:31:59     INFO - zombiecheck | Reading PID log: /tmp/tmpL1WUK7pidlog
[task 2017-07-03T21:31:59.004655Z] 21:31:59     INFO - ==> process 1097 launched child process 1115
[task 2017-07-03T21:31:59.006417Z] 21:31:59     INFO - zombiecheck | Checking for orphan process with PID: 1115
[task 2017-07-03T21:31:59.008322Z] 21:31:59     INFO - mozcrash Copy/paste: /usr/local/bin/linux64-minidump_stackwalk /tmp/tmplvQU_4.mozrunner/minidumps/0e0b9d4e-c4ae-92b3-96dc-fa4f1b686c6f.dmp /home/worker/workspace/build/symbols
[task 2017-07-03T21:32:18.477789Z] 21:32:18     INFO - mozcrash Saved minidump as /home/worker/workspace/build/blobber_upload_dir/0e0b9d4e-c4ae-92b3-96dc-fa4f1b686c6f.dmp
[task 2017-07-03T21:32:18.480901Z] 21:32:18     INFO - mozcrash Saved app info as /home/worker/workspace/build/blobber_upload_dir/0e0b9d4e-c4ae-92b3-96dc-fa4f1b686c6f.extra
[task 2017-07-03T21:32:18.958854Z] 21:32:18     INFO - PROCESS-CRASH | browser/components/extensions/test/browser/browser_ext_browserAction_contextMenu.js | application crashed [@ mozilla::OffTheBooksMutex::AssertCurrentThreadOwns]
Flags: needinfo?(kmaglione+bmo)
https://hg.mozilla.org/integration/mozilla-inbound/rev/e187342238542a1867368c83676059b35a01145c
Bug 1371248: Avoid hangs when preloader cache flush is triggered during shutdown. r=erahm
https://hg.mozilla.org/mozilla-central/rev/e18734223854
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment on attachment 8881967 [details]
Bug 1371248: Avoid hangs when preloader cache flush is triggered during shutdown.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1359653
[User impact if declined]: This issue causes a shutdown hang under certain circumstances when the script preloader cache is flushed prior to shutdown. In practice, the effect on users should be minor, but it may significantly slow down some browser restarts.
[Is this code covered by automated tests?]: Not by specific tests, but it is thoroughly exercised by our test suites.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: Probably not. The drop in crash rates should be enough verification.
[List of other uplifts needed for the feature/fix]:
[Is the change risky?]: Low-risk.
[Why is the change risky/not risky?]: This mainly just adds some additional sanity checks during shutdown. The only real risk is the possibility of a different set of shutdown crashes.
[String changes made/needed]: None.
Flags: needinfo?(kmaglione+bmo)
Attachment #8881967 - Flags: approval-mozilla-beta?
Comment on attachment 8881967 [details]
Bug 1371248: Avoid hangs when preloader cache flush is triggered during shutdown.

fix shutdownhang regression in beta55
Attachment #8881967 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
This crash is no longer reproducible on Firefox 56.0a1 (2017-07-11) and Firefox 55.0b8 (20170710085521) under Windows 10 64-bit, Ubuntu 16.04 32-bit and Mac OS X 10.12.1. Based on this testing I am marking this issue as Verified fixed.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: