Firefox Nightly losses all network connectivity while other programs have no issue connecting
Categories
(Core :: Networking: DNS, defect)
Tracking
()
People
(Reporter: alex_mayorga, Assigned: dragana)
References
(Regression)
Details
(Keywords: nightly-community, regression)
Attachments
(2 files)
34.12 KB,
application/x-zip-compressed
|
Details | |
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details | Review |
¡Hola y'all!
Hope this bug reports finds you well.
I came back from my meal to a Nightly that seemingly lost all network connectivity, I took this profile https://share.firefox.dev/3exIFfP
Strangely, the networking on all other programs and browsers had no issue to be seen.
Various NS_ type errors seen in the Network tab of the Developer Tools.
DNS resolution won't work at the time either over at
about:networking#dnslookuptool
Please let me know if there's anything further needed from the profile, device or network.
¡Gracias!
Alex
Reporter | ||
Comment 2•4 years ago
|
||
¡Hola Dragana!
The OS is Windows 10 Pro Insider Preview 21332.1000.
¡Gracias!
Alex
Comment 3•4 years ago
|
||
Thank you for the report. Could you provide some more info about the context in which this is happening?
Does it happen when the computer is waking up from sleep? Is it fixed if you restart Firefox?
Can you reliably reproduce the issue? How about after disabling all extensions?
What's the error you see when using about:networking#dnslookuptool?
Could you capture some HTTP logs when reproducing? https://firefox-source-docs.mozilla.org/networking/http/logging.html
Reporter | ||
Comment 4•4 years ago
|
||
¡Hola Valentin!
I have a 30 minutes lunch break during which I leave the laptop turned on with Firefox Nightly open.
When I reported this I came back from my lunch break and Firefox had lost all network connectivity.
Not Edge nor Chromium, Skype, Zoom or any other programs had any issues connecting but Firefox Nightly insisted there was no network connection at all.
The issue manifested in about:networking#dnslookuptool as no hostnames resolving at all.
Restarting Firefox Nightly fixed the issue.
I have not seen the issue reproduce since so I guess if nothing can be concluded from the profile perhaps this bug can be closed as INCOMPLETE?
If it reoccurs I'll make a note to capture HTTP logs.
¡Gracias!
Alex
Comment 5•4 years ago
|
||
Thanks for the info. I'd rather keep the bug open for now. Hopefully you manage to capture the logs.
A few extra questions: were you using a VPN or proxy when this happened? Could be related to bug 1698028.
Thanks!
Reporter | ||
Comment 7•4 years ago
|
||
¡Hola Valentin, Nihanth!
No VPN nor proxy.
This is plain Firefox Nightly but I do have Proton and HTTP/3 enabled in case that is relevant.
Keeping the ni? for now so I can find this bug easily it it happens again.
¡Gracias!
Alex
Comment 8•4 years ago
|
||
Dragana, OP says that HTTP3 is enabled so I thought you would be interested.
Comment 9•4 years ago
|
||
I have also been experiencing this. Disabling http3 didn't help me. I tried the about:networking#dnslookuptool mentioned in comment 3, while in the broken state, on mozilla.org
and it returned NS_ERROR_UNKNOWN_HOST
. I am using NextDNS with DoH, no VPN or proxy.
Comment 10•4 years ago
|
||
Here are some logs from about:networking while Nightly is in the broken state. Hopefully they shed some light on this.
Assignee | ||
Comment 11•4 years ago
|
||
For some reason we go the number of active connection to be 65535 :(
I will look if I can find out from he log why, I am no sure if the log captures that, i think it does.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 12•4 years ago
|
||
The log does no capture the point where the number of active connection got wrong.
May I ask you to try to make a log that starts earlier?
If your steps to reproduce the problem involved putting computer to sleep, please start the logging before you put the computer to sleep.
Comment 13•4 years ago
|
||
I'll try to capture another log. It happens randomly, it seems, so it will be difficult without just starting logging at the beginning of a browsing session and hoping that it happens during that session. My computer is set to never sleep, so that isn't an issue here. I'll leave the ni? for now.
Assignee | ||
Comment 14•4 years ago
|
||
[Tracking Requested - why for this release]:
This is may cause stalls, because no new connection can be made.
Updated•4 years ago
|
Assignee | ||
Comment 15•4 years ago
|
||
This may happen when a TransportSetup retry to connect using a different IP address, but AsyncResolveNative failed.
The change adds a flag that indicates whether it is needed to remove a connection from mNumActiveConns. This should make sure that we do not do it multiple times.
Assignee | ||
Comment 16•4 years ago
|
||
Assignee | ||
Comment 17•4 years ago
|
||
I think I found the cause of this bug and the patch should fix it.
Trevor, I am still interested in a log if you can create it to confirm that my findings are correct.
Updated•4 years ago
|
Comment 18•4 years ago
•
|
||
Dragana, I was able to reproduce this during a session with logging enabled prior to the loss of connectivity. The log is roughly 2.9 GB in size, unfortunately, but hopefully it contains the information you need. I have emailed you the password to the zip file.
Updated•4 years ago
|
Assignee | ||
Comment 19•4 years ago
|
||
Thanks for the log. Your log confirms what I have found.
Looking further, this affects mostly people with TRR in mode 3. There is not so many people using TRR in this mode, therefore the effect of the bug is limited to a small group.
Comment 20•4 years ago
|
||
Updated•4 years ago
|
Comment 21•4 years ago
|
||
bugherder |
Updated•4 years ago
|
Assignee | ||
Comment 22•4 years ago
|
||
Can you update Nightly and see if you can still reproduce the issue?
Assignee | ||
Comment 23•4 years ago
|
||
Comment on attachment 9213390 [details]
Make sure that we do not underflow mNumActiveConns
Beta/Release Uplift Approval Request
- User impact if declined: May cause Firefox to be not able to open any new connection. It mostly affects users with TRR in mode 3 (this is rare), but i cannot rule out that it affects other users as well.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The patch make sure that we update the count in a better way and also adds guards against underflow the counter.
- String changes made/needed:
Comment 24•4 years ago
|
||
Comment on attachment 9213390 [details]
Make sure that we do not underflow mNumActiveConns
Sounds like a good improvement to take, though it'd definitely be nice if we could get some verification from the people in this bug who were able to reproduce the issue. Approved for 88.0b9.
Comment 25•4 years ago
|
||
bugherder uplift |
Comment 26•4 years ago
|
||
I have not experienced any connection loss since this fix landed. Thanks!
Assignee | ||
Comment 27•4 years ago
|
||
(In reply to Trevor Rowbotham [:rowbot] from comment #26)
I have not experienced any connection loss since this fix landed. Thanks!
Thanks for checking!
Reporter | ||
Comment 28•4 years ago
•
|
||
¡Hola Dragana!
I believe this bug or a very similar one bite me again today while I was trying to load
https://blog.mozilla.org/sumo/2021/04/09/whats-up-with-sumo-q1-2021/
earlier on a fully updated Nightly built from https://hg.mozilla.org/mozilla-central/rev/0ea49daf534fcd9a49708717b3864a4ebc73c20d.
about:networking#dnslookuptool would give NS_ERROR_UNKNOWN_HOST for both
blog.mozilla.org and mozilla.org at the time.
Here's the log https://we.tl/t-BSXnsJ2ZtS
The bug went away on the 7th or 8th reload.
Hope this is useful.
¡Gracias!
Alex
Comment 29•4 years ago
|
||
From the looks of the log it seems that you are using TRR in mode3.
At some point, native DNS fails for reasons that are likely out of our control (maybe the DNS server was down, or the computer went offline)
So there's this line in the log Caching host [mozilla.cloudflare-dns.com] negative record for 60 seconds.
This is an interesting use case that's worth fixing, but it's not a regression. I'll file a different bug. Thanks for sending the logs!
Assignee | ||
Updated•4 years ago
|
Description
•