Closed Bug 1576386 Opened 4 months ago Closed 3 months ago

Crash in [@ shutdownhang | nsThread::Shutdown | nsNotifyAddrListener::Shutdown]

Categories

(Core :: Networking, defect, P2, critical)

69 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla71
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox68 --- unaffected
firefox69 + wontfix
firefox70 + verified
firefox71 + verified

People

(Reporter: philipp, Assigned: valentin)

References

(Regression)

Details

(Keywords: crash, regression, Whiteboard: [necko-triaged])

Crash Data

Attachments

(2 files)

This bug is for crash report bp-44924865-4f83-4691-8164-35ab50190824.

Top 10 frames of crashing thread:

0 ntdll.dll NtWaitForAlertByThreadId 
1 ntdll.dll RtlSleepConditionVariableSRW 
2 kernelbase.dll SleepConditionVariableSRW 
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:50
4 xul.dll mozilla::ThreadEventQueue<mozilla::PrioritizedEventQueue>::GetEvent xpcom/threads/ThreadEventQueue.cpp:153
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1134
6 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
7 xul.dll nsThread::Shutdown xpcom/threads/nsThread.cpp:900
8 xul.dll nsresult nsNotifyAddrListener::Shutdown netwerk/system/win32/nsNotifyAddrListener.cpp:378
9 xul.dll nsresult nsNotifyAddrListener::Observe netwerk/system/win32/nsNotifyAddrListener.cpp:339

shutdownhang crash reports with this signature are showing up in increased frequency in firefox 69.
unfortunately there aren't any particular correlations or useful comments that would shed any more light on the issue.

We are stuck inside CPubINetworkListManager::GetNetworks on the Link Monitor thread (14 in the report.)

That code was added in bug 1561005.

Priority: -- → P3
Regressed by: 1561005
Whiteboard: [necko-triaged]

Hi Nhi, can you please help find someone to look into this while Valentin is on PTO?

Flags: needinfo?(nhnguyen)

Valentin is back in a few days and can look into this then.

Assignee: nobody → valentin.gosu
Flags: needinfo?(nhnguyen)

Hmm, I don't know if it's possible to find out whether CPubINetworkListManager::GetNetworks is actually hanging, or if we just happen to be running that code during shutdown.

My approach would be to

  1. check if mShutdown is true in nsNotifyAddrListener::calculateNetworkId and if so exit early. This could partly or fully resolve the issue.
  2. use a nsIThreadPool instead of a nsIThread for mThread (limited to 1 thread) and call shutdownWithTimeout to avoid the shutdown hang.

I'll submit a patch for the first task, and depending on the results we'll consider the second.

Calling the windows APIs may block shutdown.

Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/6702fbc2c2a9
Don't recompute the networkId during shutdown r=michal

This is a candidate for uplifting to 70.

Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71

Can you request uplift? Thanks!

Flags: needinfo?(valentin.gosu)
Priority: P3 → P2

This is the #4 overall top parent process crash on Release 69 too. Would also be good to include this in 69.0.1 if possible.

It seems there are still crash reports after landing the patch 🙁. No need to uplift just yet.
Time for approach #2.

Flags: needinfo?(valentin.gosu)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla71 → ---

The thread calls into some windows APIs that may sometimes block for a long time. Since we're shutting down anyway, it is OK to sometimes leak this thread, similar to what we do for DNS resolver threads.

The patch converts nsNotifyAddrListener::mThread to a nsIThreadPool (with a max of 1 thread) so that we may call shutdownWithTimeout on it.

Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/d17822726031
Set a 2 second timeout when shutting down the Link Monitor thread r=mayhemer
Status: REOPENED → RESOLVED
Closed: 3 months ago3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71

Early Nightly data is looking promising. Please nominate this for Beta approval when you get a chance.

Flags: needinfo?(valentin.gosu)

Comment on attachment 9092994 [details]
Bug 1576386 - Set a 2 second timeout when shutting down the Link Monitor thread r=mayhemer

Beta/Release Uplift Approval Request

  • User impact if declined: Shutdown hang
  • Is this code covered by automated tests?: Unknown
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This fix is similar to what we did for bug 1500861 / bug 1498782 - which was successful at avoiding the shutdown hang and didn't cause any regressions.
    It's not a scalable long-term solution, as pointed out by bug 1581792 comment 0, but it seems necessary at present time.
  • String changes made/needed:
Flags: needinfo?(valentin.gosu)
Attachment #9092994 - Flags: approval-mozilla-beta?
Attachment #9091364 - Flags: approval-mozilla-beta?

Comment on attachment 9091364 [details]
Bug 1576386 - Don't recompute the networkId during shutdown r=michal

Fixes a topcrash on Windows. Approved for 70.0b8.

Attachment #9091364 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9092994 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

No crashes with this signature in builds with the patch, calling this verified.

Regressions: 1583170

Is this fix something you're comfortable with enough to include a 69 dot release or should we just let this fix ride with 70?

Flags: needinfo?(valentin.gosu)

Per discussion with Valentin, this is too risky to uplift into a dot release. We'll let this fix ride Fx70 to release.

Flags: needinfo?(valentin.gosu)
You need to log in before you can comment on or make changes to this bug.