Closed Bug 1891431 Opened 10 months ago Closed 9 months ago

Waiting for socket thread takes 3 s

Categories

(Core :: Networking, defect, P2)

Firefox 126
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: pmenzel+bugzilla.mozilla.org, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:126.0) Gecko/20100101 Firefox/126.0

Steps to reproduce:

Debian sid/unstable with Firefox Nightly

Actual results:

Loading simple pages takes several seconds.

Network profile shows several entries with Waiting for socket thread greater than three seconds.

Expected results:

Loading should be instant, that means as fast as possible.

The Bugbug bot thinks this bug should belong to the 'Core::Networking' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Networking
Product: Firefox → Core

One comment about "Waiting for Socket Thread..." -- that's somewhat misleading, as if we're using Http1.x and are limited on the sharding limit of 6 connections/site, other connections to the server will be waiting for socket thread. There are other cases where things will be "waiting for socket thread" that don't actually involve waiting for the SocketThread.

However, something is very odd about this profile... especially if you look at SocketThread

There are many LongTaskSocketProcessing markers, up to 2s long. Most of the time in these markers is spent in IO_file_write(), called from cycle collection (Collect()), which is bizarrely called from an atomic read in ShouldLog (which shouldn't call anything). Perhaps something is wrong with the symbols?

Is this a Mozilla build, or a Debian build? Anything else unusual in the setup/config/settings?

Can you repro with a fresh profile (via about:profiles)? Can you repro with a mozilla-built Firefox, if you're using a Debian build?

Thanks!

Severity: -- → S3
Flags: needinfo?(pmenzel+bugzilla.mozilla.org)
Priority: -- → P2
Whiteboard: [necko-triaged][necko-priority-new]
Blocks: necko-perf
Flags: needinfo?(smayya)
Whiteboard: [necko-triaged][necko-priority-new] → [necko-triaged][necko-priority-next]

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #3)

One comment about "Waiting for Socket Thread..." -- that's somewhat misleading, as if we're using Http1.x and are limited on the sharding limit of 6 connections/site, other connections to the server will be waiting for socket thread. There are other cases where things will be "waiting for socket thread" that don't actually involve waiting for the SocketThread.

I do not fully understand. Do you see that many parallel requests?

However, something is very odd about this profile... especially if you look at SocketThread

There are many LongTaskSocketProcessing markers, up to 2s long. Most of the time in these markers is spent in IO_file_write(), called from cycle collection (Collect()), which is bizarrely called from an atomic read in ShouldLog (which shouldn't call anything). Perhaps something is wrong with the symbols?

Is this a Mozilla build, or a Debian build? Anything else unusual in the setup/config/settings?

I install the package from https://packages.mozilla.org/apt. Nothing special. Maybe an updated package was already installed in the background? Would that mess up the symbols?

Can you repro with a fresh profile (via about:profiles)?

It only happens sometimes, and I have to try to then create a fresh problem.

Flags: needinfo?(pmenzel+bugzilla.mozilla.org)

At home everything worked fine, coming to work pages in the same Firefox instance take long to load. On the console everything works fine. Profile: https://share.firefox.dev/3xLNBtj

The profile seems to indicate that a HTTP request to www.mozilla.org is delayed because it's waiting for the resolution of HTTPS record.
Could you go to about:config and disable network.dns.force_waiting_https_rr or network.dns.native_https_query and see either of it helps?
Thanks.

Flags: needinfo?(pmenzel+bugzilla.mozilla.org)

These lines from the profile are interesting:

2024-04-22 09:29:02.480 UTC - [Parent Process 4642 DNS Resolver #4] D/nsHostResolver DNS lookup thread - Calling getaddrinfo for host [www.mozilla.org].
2024-04-22 09:29:02.480 UTC - [Parent Process 4642 DNS Resolver #4] D/nsHostResolver DNS lookup thread - starting execution.
2024-04-22 09:29:02.480 UTC - [Parent Process 4642 DNS Resolver #4] D/nsHostResolver byType on DNS thread

2024-04-22 09:29:12.487 UTC - [Parent Process 4642 DNS Resolver #4] D/nsHostResolver FillBuffer response len < 0
2024-04-22 09:29:12.487 UTC - [Parent Process 4642 DNS Resolver #4] D/nsHostResolver nsHostResolver::CompleteLookupByType record 7f34901f0cd0 [www.mozilla.org] status 8000ffff

It shows that we wait 10s for the HTTPS record.

It shows that we wait 10s for the HTTPS record.

On the command line, DNS works instantly.

PS: After restarting Firefox, it currently works. I am going to try the other suggestions next time this happens.

(In reply to Paul Menzel from comment #8)

It shows that we wait 10s for the HTTPS record.

On the command line, DNS works instantly.

Are you resolving IP or HTTPS records?
Does dig HTTPS cloudflare.com work for you?

(In reply to Valentin Gosu [:valentin] (he/him) from comment #9)

(In reply to Paul Menzel from comment #8)

It shows that we wait 10s for the HTTPS record.

On the command line, DNS works instantly.

Are you resolving IP or HTTPS records?

Right, IP records with host www.mozilla.org.

Does dig HTTPS cloudflare.com work for you?

Yes, it does:

$ dig HTTPS cloudflare.com

; <<>> DiG 9.19.21-1+b1-Debian <<>> HTTPS cloudflare.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30838
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cloudflare.com.			IN	HTTPS

;; ANSWER SECTION:
cloudflare.com.		300	IN	HTTPS	1 . alpn="h3,h2" ipv4hint=104.16.132.229,104.16.133.229 ipv6hint=2606:4700::6810:84e5,2606:4700::6810:85e5

;; Query time: 4 msec
;; SERVER: 141.14.16.1#53(141.14.16.1) (UDP)
;; WHEN: Mon Apr 22 14:09:54 CEST 2024
;; MSG SIZE  rcvd: 116
Duplicate of this bug: 1890263

So the basic cause of the profiles showing long delays in socket processing is the logging -- logging is turned on, and worse yet it's set to sync which writes data immediate, so it's spending a lot of time in OS write functions from within the socket polls. Also it's logging more than it needs to.

The earlier log which showed calls from logging into CycleCollection (and eventually back to io writes) was clearly a problem symbolicating the stacks. The new profile correctly showed the logging.

Hi Paul,

Could you try again with the latest nightly?
I think bug 1895226 might have fixed this issue.

Flags: needinfo?(smayya)
Flags: needinfo?(pmenzel+bugzilla.mozilla.org)
See Also: → 1895226
Flags: needinfo?(pmenzel+bugzilla.mozilla.org)
Whiteboard: [necko-triaged][necko-priority-next] → [necko-triaged]

Thank you. I haven’t seen this issue in over a week. I am currently using 128.0a1 (20240523205926). No idea how to properly close this issue. It’d be great if you did it.

That's great. Let us know if you see it again.

Status: UNCONFIRMED → RESOLVED
Closed: 9 months ago
Flags: needinfo?(pmenzel+bugzilla.mozilla.org)
Resolution: --- → WORKSFORME

I noticed slowness again today, but not knowing if it’s related I created bug 1903975.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: