Closed Bug 1884349 Opened 9 months ago Closed 6 months ago

firefox didn't load any page

Categories

(Core :: Networking: HTTP, defect, P1)

Firefox 125
Desktop
Linux
defect

Tracking

()

RESOLVED FIXED
128 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox126 --- wontfix
firefox127 --- fixed
firefox128 --- fixed

People

(Reporter: mix5003, Assigned: kershaw)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [necko-triaged][necko-priority-queue])

Attachments

(6 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0

Steps to reproduce:

use firefox for a long time, so i don't known what trigger it

Actual results:

it random stall networking.
i can not open any page in firefox browser

but chrome can normally open that page.
and after i restart firefox it work fines.

this is my profiler that i try to capture. from about:logging
networking preset: https://share.firefox.dev/3IqKRU2
http/3 preset https://share.firefox.dev/4amtO1B

Expected results:

firefox should open page normally.

The Bugbug bot thinks this bug should belong to the 'Core::Gecko Profiler' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Gecko Profiler
Product: Firefox → Core
Component: Gecko Profiler → Networking
OS: Unspecified → Linux
Hardware: Unspecified → Desktop

i think this problem only exists in linux. in windows i didn't encounter this issue

Can you try it in Troubleshooting mode? (under Help menu)

The problem may be an extension (addblocker, etc)

Flags: needinfo?(mix5003)

i still trying in Troubleshooting mode, but it happend random, so far i still not encounter this bug,

but i want to let you known that my firefox is not install any extension. so i don't think it from extension

today i can confirmed that issue still exists in Troubleshooting mode.

but if i wait for a long time (around 10 minute) it may solved it self. i am not sure is it same problem or not.
most of time i restart firefox without wait 10 minute, but today i try many thing to capture log but it can not upload (but it too large and it say can not compressed)

Flags: needinfo?(mix5003)
Flags: needinfo?(smayya)

Hey mix5003,
Thanks for reporting the issue.
From the profile, I see that we received the response for most of the requests.
However, I observed that the load is pending for https://spocs.mozilla.net/spocs and https://getpocket.cdn.mozilla.net/.
I also see that you are using a proxy/captive portal to connect.

Could you please let me know if you still observe the issue without using a proxy? Kindly capture profile and HTTP logs in a file and share the same.
How often do you observe this issue?

Flags: needinfo?(smayya) → needinfo?(mix5003)

in that profiler i try to open page https://www.gstatic.com/ and it didn't load succesfully

and for proxy i didn't use any proxy. it should leave config as "use system proxy settings".

this issue only exists in linux PC at my office (that not config any proxy too.) . and i work from home until 1 May.
i will try to config it to "no proxy" and try again when i come back to my office.

Attached file firefox-log.tar.zst

hi today problem still exists after disabled proxy .
This is video when encouter issue: https://youtu.be/HDMGgtUi2ts

i attached log file for you

Flags: needinfo?(mix5003)
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(kershaw)

Thanks for the log. It seems to suggest that you've reached the maximum limit of connections. What's the value of the preference network.http.max-connections?

If the value seems reasonable, we might need to investigate why you reached the limit. Could you try to record a http log from the beginning?
Maybe there is a bug in Firefox that leaks sockets.

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(mix5003)
Flags: needinfo?(kershaw)

Did you restart the browser after disabling the proxy?

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #10)

Did you restart the browser after disabling the proxy?

may be no. is it necessary? i think when i config to use proxy it can use without restart, so i didn't do it

(In reply to Kershaw Chang [:kershaw] from comment #9)

Thanks for the log. It seems to suggest that you've reached the maximum limit of connections. What's the value of the preference network.http.max-connections?

If the value seems reasonable, we might need to investigate why you reached the limit. Could you try to record a http log from the beginning?
Maybe there is a bug in Firefox that leaks sockets.

network.http.max-connections set to 900. i think it default value .

and for http log from the beginning i think it not possible. this time only 3 minute firefox capture logs 400MB
base on my memory this issue never happend in 4 hours after i start firefox (it never happend before my lunch break). so it may a large logs.

and in video at 0:47 i try to disabled wifi. isn't it a should force reset all connections? i not sure.

and i forgot to mention about special config in PC that issue exists. that i think it may lead to make this issue

  1. that pc config script to change wifi network when it can not ping google for check is internet offline. but sometime this script will change wifi even if it still online (bad luck, ping timeout)
  2. that PC has 2 interface (1 for wifi, 1 for ethernet). and that PC install docker and networks. if i ran ip addr | grep UP | wc -l to count interface. the result is 23

I can't reproduce this on my side when I set network.http.max-connections to a small value (e.g., 10). Could you also try reducing network.http.max-connections and see if you can reproduce the issue easily on your end?
If so, please record the log and share it. Thanks.

i think network.http.max-connections didn't make this issue easily to reproduce (i test 1 hour only) .
and it hard to use in daily life.

is another way to debug this with out capture all logging?

Flags: needinfo?(mix5003)
Flags: needinfo?(kershaw)

(In reply to mix5003 from comment #13)

i think network.http.max-connections didn't make this issue easily to reproduce (i test 1 hour only) .
and it hard to use in daily life.

is another way to debug this with out capture all logging?

Sorry, I can' think of a good way to debug this right now.

Valentin, since you've looked at the socket transport code recently, do you maybe have a better idea?
Thanks.

Flags: needinfo?(kershaw) → needinfo?(valentin.gosu)

I can't say for sure this is the problem,

2024-04-29 11:54:54.101685 UTC - [Parent 93363: Socket Thread]: V/nsHttp PendingTransactionInfo::IsAlreadyClaimedInitializingConn [trans=7f1a888bf900, halfOpen=7f1ab46feae0, activeConn=0]
2024-04-29 11:54:54.101686 UTC - [Parent 93363: Socket Thread]: V/nsHttp PendingTransactionInfo::IsAlreadyClaimedInitializingConn [trans=7f1a888bf900, dnsAndSock=7f1a56481300]
2024-04-29 11:54:54.101689 UTC - [Parent 93363: Socket Thread]: V/nsHttp nsHttpConnectionMgr::TryDispatchTransaction without conn [trans=7f1a888bf900 ci=7f1a8d9142e0 ci=.S........[tlsflags0x00000000]www.ishare.in.th:443 {NPN-TOKEN h2}^partitionKey=%28https%2Cishare.in.th%29 caps=1021 onlyreused=1 active=0 idle=0]
2024-04-29 11:54:54.101695 UTC - [Parent 93363: Socket Thread]: V/nsHttp nsHttpConnectionMgr::TryDispatchTransactionOnIdleConn, ent=7f1ac7442a60, trans=7f1a888bf900, urgent=1
2024-04-29 11:54:54.101697 UTC - [Parent 93363: Socket Thread]: V/nsHttp    not dispatched (queued) trans=7f1a888bf900
2024-04-29 11:54:54.101699 UTC - [Parent 93363: Socket Thread]: V/nsHttp   adding transaction to pending queue [trans=7f1a888bf900 urgent-start-count=1]
2024-04-29 11:54:54.123468 UTC - [Parent 93363: Socket Thread]: V/nsHttp PendingTransactionInfo::IsAlreadyClaimedInitializingConn [trans=7f1a888bf900, halfOpen=7f1ab46feae0, activeConn=0]
2024-04-29 11:54:54.123470 UTC - [Parent 93363: Socket Thread]: V/nsHttp PendingTransactionInfo::IsAlreadyClaimedInitializingConn [trans=7f1a888bf900, dnsAndSock=7f1a56481300]
2024-04-29 11:54:54.123472 UTC - [Parent 93363: Socket Thread]: V/nsHttp nsHttpConnectionMgr::TryDispatchTransaction without conn [trans=7f1a888bf900 ci=7f1a8d9142e0 ci=.S........[tlsflags0x00000000]www.ishare.in.th:443 {NPN-TOKEN h2}^partitionKey=%28https%2Cishare.in.th%29 caps=1021 onlyreused=1 active=0 idle=0]
2024-04-29 11:54:54.123479 UTC - [Parent 93363: Socket Thread]: V/nsHttp nsHttpConnectionMgr::TryDispatchTransactionOnIdleConn, ent=7f1ac7442a60, trans=7f1a888bf900, urgent=1
2024-04-29 11:54:54.123480 UTC - [Parent 93363: Socket Thread]: V/nsHttp    not dispatched (queued) trans=7f1a888bf900
2024-04-29 11:54:54.123482 UTC - [Parent 93363: Socket Thread]: V/nsHttp   adding transaction to pending queue [trans=7f1a888bf900 urgent-start-count=1]

Also a bit later:

2024-04-29 11:54:56.484406 UTC - [Parent 93363: Socket Thread]: V/nsHttp nsHttpConnectionMgr::OnMsgCancelTransaction [trans=7f1a888bf900] removed from pending queue
2024-04-29 11:54:56.484409 UTC - [Parent 93363: Socket Thread]: D/nsHttp nsHttpTransaction::Close [this=7f1a888bf900 reason=804b0002]
2024-04-29 11:54:56.484435 UTC - [Parent 93363: Socket Thread]: D/nsHttp nsHttpTransaction::ShouldRestartOn0RttError [this=7f1a888bf900, mEarlyDataWasAvailable=0 error=804b0002]
2024-04-29 11:54:56.484453 UTC - [Parent 93363: Socket Thread]: D/nsHttp nsHttpTransaction::RemoveDispatchedAsBlocking this=7f1a888bf900 not blocking
2024-04-29 11:54:56.484459 UTC - [Parent 93363: Socket Thread]: D/nsHttp nsHttpTransaction 7f1a888bf900 request context set to null in ReleaseBlockingTransaction() - was 7f1a6003d270
...
2024-04-29 11:54:56.486067 UTC - [Parent 93363: Main Thread]: D/nsHttp nsHttpTransaction::DeleteSelfOnConsumerThread [this=7f1a888bf900]
2024-04-29 11:54:56.486073 UTC - [Parent 93363: Main Thread]: D/nsHttp Destroying nsHttpTransaction @7f1a888bf900
2024-04-29 11:54:56.486124 UTC - [Parent 93363: Main Thread]: D/nsHttp nsHttpChannel::ContinueOnStopRequestAfterAuthRetry [this=7f1ab5c6f600, aStatus=804b0002 aAuthRetry=0, aIsFromNet=1, aTransWithStickyConn=0]
2024-04-29 11:54:56.486128 UTC - [Parent 93363: Main Thread]: D/nsHttp nsHttpChannel::ContinueOnStopRequest [this=7f1ab5c6f600 aStatus=804b0002, aIsFromNet=1]

It could be that the network switching you mention is causing some connections to get broken.
Could you try disabling that temporarily and test again?

Flags: needinfo?(valentin.gosu) → needinfo?(mix5003)
Attached image networks-tab.png

my network switching script not run that too much (it should run only when internet connection lost).
but i just found that 1 of my docker failed and it always restart. and it trigger network changed in log too (so network change will fire every 1 minute). i stop that docker and will test again.

and i want to ask is this page suppose to show like this when i disabled wifi? i try to refresh it. but it look like it keep some connection.
some connection still show in about:networking#sockets no matter what i did.
eg. disconnect wifi, or change wifi to connection that support only ipv4 but some ipv6 still show in that page

(In reply to mix5003 from comment #16)

and i want to ask is this page suppose to show like this when i disabled wifi? i try to refresh it. but it look like it keep some connection.
some connection still show in about:networking#sockets no matter what i did.
eg. disconnect wifi, or change wifi to connection that support only ipv4 but some ipv6 still show in that page

It's definitely possible that we're failing to teardown some connections.
But are you sure you are disconnecting all network interfaces?

Attached image interface-down.png

(In reply to Valentin Gosu [:valentin] (he/him) from comment #17)

(In reply to mix5003 from comment #16)

and i want to ask is this page suppose to show like this when i disabled wifi? i try to refresh it. but it look like it keep some connection.
some connection still show in about:networking#sockets no matter what i did.
eg. disconnect wifi, or change wifi to connection that support only ipv4 but some ipv6 still show in that page

It's definitely possible that we're failing to teardown some connections.
But are you sure you are disconnecting all network interfaces?

no. only main internet wifi down. but docket network still ran.

in this attachment i think i stop docker and down all interface. but some ip still left.

Attached file log2.tar.zst

i attached log file from my step

  1. enable logging
  2. goto cloudflare.com
  3. disable wifi
    4.refresh cloudflare.com

may has other step but i forgot
for me just this can reproduce ip left in connection tab after disable wifi

i hope this will help you fine why connection not closed after disconnect wifi

Flags: needinfo?(mix5003)

i think this issue is from networking leaks on linux/mac only

i think i has reprorduce step

  1. use linux mint (i just fresh install on virtualbox)
  2. use firefox open cloudflare.com
  3. disconnect internet
  4. go to about:networking and see socket not disconnect after disconnect internet

base on mozregression it said bug come from Bug1706377
i can confirm that it can not reproduce if i set network.http.http2.move_to_pending_list_after_network_change to false.
(in about:networking still has a connection. but if you wait 60 seconds, no connection left anymore)

Severity: -- → S3
Flags: needinfo?(kershaw)
Priority: -- → P2
Regressions: 1706377
Whiteboard: [necko-triaged][necko-priority-new]
Flags: needinfo?(kershaw)
Whiteboard: [necko-triaged][necko-priority-new] → [necko-triaged][necko-priority-next]
Assignee: nobody → kershaw
Priority: P2 → P1
Whiteboard: [necko-triaged][necko-priority-next] → [necko-triaged][necko-priority-queue]

To make sure the pending connections to be closed soon, we just need to call DontReuse().
After it, the connection will be closed after the last transaction is done.

Pushed by kjang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ce7b0a776ec2 Make sure the pending connections will be closed, r=necko-reviewers,valentin
Status: UNCONFIRMED → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → 128 Branch

To make sure the pending connections to be closed soon, we just need to call DontReuse().
After it, the connection will be closed after the last transaction is done.

Original Revision: https://phabricator.services.mozilla.com/D211747

Attachment #9404511 - Flags: approval-mozilla-beta?

beta Uplift Approval Request

  • User impact if declined: Possible socket leaks.
  • Code covered by automated testing: yes
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing: N/A
  • Risk associated with taking this patch: Low
  • Explanation of risk level: The patch is straightforward.
  • String changes made/needed: N/A
  • Is Android affected?: yes
Keywords: regression
Regressed by: 1706377
No longer regressions: 1706377

Set release status flags based on info from the regressing bug 1706377

Attachment #9404511 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Component: Networking → Networking: HTTP
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: