Closed Bug 1580740 Opened 5 years ago Closed 5 years ago

Firefox 70.0b5 freezes randomly

Categories

(Core :: Networking: DNS, defect, P1)

70 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla73
Tracking Status
firefox70 - wontfix
firefox71 --- wontfix
firefox72 --- fixed
firefox73 --- verified

People

(Reporter: valery, Assigned: valentin)

References

Details

(Keywords: hang, Whiteboard: [necko-triaged][trr])

Attachments

(2 files)

Attached image Freeze.png

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0

Steps to reproduce:

I open the Firefox and work some time. For example translating a page for the Mozilla Support.

Actual results:

Firefox freezes until I remove its task from Windows Task Manager.
I tried to use crashfirefox64.exe utility to make some crash reports. But these reports are not create after the Firefox crashes...
In Windows Task one of the Firefox processes shows as 'not responding' (see screenshot)

Expected results:

Firefox work normally without freezes.

70.0b6 - still exists.

Hi Valery,

I haven't been able to reproduce your issue on Firefox 70.0b5 or Firefox 70.0b6.

Please test if the issue also occurs to you while using a new profile, you can find the steps to do that below:
https://support.mozilla.org/en-US/kb/profile-manager-create-and-remove-firefox-profiles?redirectlocale=en-US&redirectslug=Managing-profiles#w_starting-the-profile-manager

Thank you for reporting!

Flags: needinfo?(valery)

Hi Peter. I create a new profile, then work some with it and it was freeze again.
I can send some additional information if it may help.

Flags: needinfo?(valery)

I also face this problem after my Firefox Dev Edition update.

It's also hang in safe mode.

Hi,

Could you check if Firefox generates a crash report when these freezes happen?
If it does please attach the crash report in a comment below so we can look over it.

Could you also check if this freezing happens on Firefox Nightly as well?
You can download it from here: https://www.mozilla.org/en-US/firefox/nightly/all/

Thanks!

Flags: needinfo?(valery)

Hi,

Could you check if Firefox generates a crash report when these freezes happen?
If it does please attach the crash report in a comment below so we can look over it.

On 70.0b12 the problem still exists.
about:crashes is empty. (and usting crashfirefox64.exe utility too).

Could you also check if this freezing happens on Firefox Nightly as well?
You can download it from here: https://www.mozilla.org/en-US/firefox/nightly/all/

No, on 71.0a1 this problem doesn't happens. And 69.0.2 (release) too.
The problem only on 70.x branch.

Flags: needinfo?(valery)

(In reply to Peter_M from comment #6)
Hi,

Apparently, this is not connected with the 70.x branch, but it is connected with a specific installation, although the creation of a new profile was carried out. After updating the version on the computer where the problem is reproducing to 71.0b3, the problem was reproduced again. Because crash reports are not created, I decided to provide a memory dump of the hanging process. I created it using the Windows Task Manager. Perhaps this will help to figure out this bug. The dump can be downloaded at the following link:
https://drive.google.com/open?id=1CD9u26204Vn8UEN5meSn9tKKi683FD6z

Getting this in 69.0.0.1 x86-64 on Win10 Enterprise, too. Started in the last week or so in one of the 68s and then updated to latest version our enterprise environment offers. I notice Simple Tab Groups has been behaving strangely since a recent update, have disabled it to see if it helps.

Disabling STG did not help.

Severity: normal → critical
Keywords: hang

I'll pass this along to the stability list to see if anyone can have a look at the memory dump.

But, for now, you may want to try with a later version of beta (beta 71) or firefox release 70.0.1 and a fresh profile.

I found the solution that stop freezing. Disable DNS over HTTPS solve my problem.

I disabled the DNS over HTTPS feature too (forget that turned it on). And it helped me too.

I've analyzed the crash dump with Visual Studio and the stack trace of the stuck thread looks like this:

>	nss3.dll!_PR_MD_WAIT_CV(_MDCVar * cv, _MDLock * lock, unsigned int timeout) Line 252	C
 	nss3.dll!_PR_WaitCondVar(PRThread * thread, PRCondVar * cvar, PRLock * lock, unsigned int timeout) Line 178	C
 	nss3.dll!PR_Wait(PRMonitor * mon, unsigned int ticks) Line 308	C
 	xul.dll!nsDNSService::ResolveInternal(const nsTSubstring<char> & aHostname, unsigned int flags, const mozilla::OriginAttributes & aOriginAttributes, nsIDNSRecord * * result) Line 1087	C++
 	xul.dll!nsAuthSSPI::MakeSN(const char * principal, nsTString<char> & result) Line 119	C++
 	xul.dll!nsAuthSSPI::Init(const char * serviceName, unsigned int serviceFlags, const char16_t * domain, const char16_t * username, const char16_t * password) Line 213	C++
 	xul.dll!nsHttpNegotiateAuth::ChallengeReceived(nsIHttpAuthenticableChannel * authChannel, const char * challenge, bool isProxyAuth, nsISupports * * sessionState, nsISupports * * continuationState, bool * identityInvalid) Line 230	C++
 	xul.dll!mozilla::net::nsHttpChannelAuthProvider::GetCredentialsForChallenge(const char * challenge, const char * authType, bool proxyAuth, nsIHttpAuthenticator * auth, nsTString<char> & creds) Line 717	C++
 	xul.dll!mozilla::net::nsHttpChannelAuthProvider::GetCredentials(const char * challenges, bool proxyAuth, nsTString<char> & creds) Line 553	C++
 	xul.dll!mozilla::net::nsHttpChannelAuthProvider::ProcessAuthentication(unsigned int httpStatus, bool SSLConnectFailed) Line 185	C++
 	xul.dll!mozilla::net::nsHttpChannel::ContinueProcessResponse3(nsresult rv) Line 2754	C++
 	xul.dll!mozilla::net::nsHttpChannel::ContinueProcessResponse2(nsresult rv) Line 2635	C++
 	xul.dll!mozilla::net::nsHttpChannel::ContinueProcessResponse1() Line 0	C++
 	xul.dll!mozilla::net::nsHttpChannel::ProcessResponse() Line 0	C++
 	xul.dll!mozilla::net::nsHttpChannel::OnStartRequest(nsIRequest * request) Line 0	C++
 	xul.dll!nsInputStreamPump::OnStateStart() Line 487	C++
 	xul.dll!nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream * stream) Line 396	C++
 	xul.dll!nsOutputStreamReadyEvent::Run() Line 93	C++
 	xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 1226	C++
 	xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 486	C++
 	xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 88	C++
 	[Inline Frame] xul.dll!MessageLoop::RunInternal() Line 315	C++
 	xul.dll!MessageLoop::RunHandler() Line 309	C++
 	xul.dll!MessageLoop::Run() Line 291	C++
 	xul.dll!nsBaseAppShell::Run() Line 139	C++
 	xul.dll!nsAppShell::Run() Line 406	C++
 	xul.dll!nsAppStartup::Run() Line 277	C++
 	xul.dll!XREMain::XRE_mainRun() Line 4599	C++
 	xul.dll!XREMain::XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 4734	C++
 	xul.dll!XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 4815	C++
 	[Inline Frame] firefox.exe!do_main(int argc, char * * argv, char * * envp) Line 218	C++
 	firefox.exe!NS_internal_main(int argc, char * * argv, char * * envp) Line 300	C++
 	firefox.exe!wmain(int argc, wchar_t * * argv) Line 131	C++

I'm confirming this bug and moving it to the appropriate component.

Blocks: 1434852
Status: UNCONFIRMED → NEW
Component: Untriaged → Networking: DNS
Ever confirmed: true
Product: Firefox → Core

Ok, this is an interesting one.
So, we have a method called nsDNSService::DeprecatedSyncResolve which is basically only used in one place nsAuthSSPI.
What's happening here is that the main thread tries to call DeprecatedSyncResolve blocking the main thread until it completes, but TRR uses the main thread to open the TRR channel. But the main thread even will never get processed, as the main thread is blocked waiting for the syncResolve to complete.
The solution here would be to add the nsIDNSService::RESOLVE_DISABLE_TRR flag to all sync requests.
In mode3 that would make them fail, but I guess that's better than freezing Firefox 🙂

Assignee: nobody → valentin.gosu
Priority: -- → P1
Whiteboard: [necko-triaged][trr]

nsAuthSSPI makes a call to DeprecatedSyncResolve that normally issues a DNS
request and blocks until that completes. Apart from being a problem in general
this is an issue when using TRR, because the HTTPS channel to the DoH server
uses the main thread. When DeprecatedSyncResolve gets called on the main
thread it then blocks the thread, and since the TRR request never has the
chance to complete (even the TRR cancellation when the timer expires is
processed on the main thread) the result is a deadlock.

This structural problem should be fixed, but until that happens we should
set the RESOLVE_DISABLE_TRR flag when calling ResolveHost from
nsDNSService::DeprecatedSyncResolve

See Also: → 1604164
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/52e6fc030c2f Don't use TRR when calling nsDNSService::DeprecatedSyncResolve on the main thread r=dragana
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla73

The patch just landed in Firefox. Could you test with Nightly tomorrow with DoH turned on and let us know if you're still having issues?

Flags: needinfo?(valery)

(In reply to Valentin Gosu [:valentin] (he/him) from comment #20)

The patch just landed in Firefox. Could you test with Nightly tomorrow with DoH turned on and let us know if you're still having issues?

Hi Valentin!
I've tested this on Nighly version today and all works ok.
Thank you for the fix.

Flags: needinfo?(valery)

I guess you'll want to request beta uplift?

Comment on attachment 9115962 [details]
Bug 1580740 - Don't use TRR when calling nsDNSService::DeprecatedSyncResolve on the main thread r=dragana

Beta/Release Uplift Approval Request

  • User impact if declined: Potential hang/deadlock when user has TRR on and is also using NTLM/Kerberos
    Although the number of crash/hang reports is low, it seems the hang reporter didn't kick in for this case, so it may be affecting more users than expecting (although this type of auth is more common in enterprise env when TRR tends to be disabled)
  • Is this code covered by automated tests?: Unknown
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Low risk. Adds a flag that excludes a sync DNS request from using TRR.
    Other behaviour should be unaffected.
  • String changes made/needed:
Attachment #9115962 - Flags: approval-mozilla-beta?

Comment on attachment 9115962 [details]
Bug 1580740 - Don't use TRR when calling nsDNSService::DeprecatedSyncResolve on the main thread r=dragana

avoid a deadlock in name resolution, approved for 72.0b9

Attachment #9115962 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: