Closed Bug 1864252 Opened 1 year ago Closed 1 year ago

Socket thread gets stuck running at 100% in neqo_http3::connection_client::Http3Client::process_output

Categories

(Core :: Networking, defect, P2)

defect

Tracking

()

RESOLVED FIXED
121 Branch

People

(Reporter: jrmuizel, Unassigned)

References

(Blocks 2 open bugs, Regression)

Details

(Keywords: regression, Whiteboard: [necko-triaged][necko-priority-new])

I have this happen to me twice. I can't upload a profile because the socket thread seems completely stuck.

Profiling with instruments shows something like:

39.64 s   99.9%	260.00 ms	 	                neqo_http3conn_process_output_and_send
39.04 s   98.4%	200.00 ms	 	                 neqo_http3::connection_client::Http3Client::process_output::h6c7ea49422243b4b
29.18 s   73.6%	1.02 s	 	                  neqo_transport::connection::Connection::process_output::h331bf7e42b66621d
18.13 s   45.7%	736.00 ms	 	                   neqo_transport::connection::Connection::output_path::hf0f2096ac11466b3
6.08 s   15.3%	149.00 ms	 	                    alloc::fmt::format::format_inner::h44804980cee8c7aa
5.00 s   12.6%	229.00 ms	 	                    neqo_transport::connection::Connection::build_packet_header::h91335ca659f9cb64
3.71 s    9.3%	58.00 ms	 	                    _$LT$gecko_logger..GeckoLogger$u20$as$u20$log..Log$GT$::log::hf985d1ffd0cbbe34
925.00 ms    2.3%	663.00 ms	 	                    free
529.00 ms    1.3%	10.00 ms	 	                    neqo_common::codec::Encoder::with_capacity::h29e8c01c32af5601
314.00 ms    0.7%	22.00 ms	 	                    neqo_transport::connection::Connection::can_grease_quic_bit::hccb1ac6766ae9858
237.00 ms    0.5%	237.00 ms	 	                    os_unfair_lock_lock_with_options
186.00 ms    0.4%	186.00 ms	 	                    os_unfair_lock_unlock
130.00 ms    0.3%	2.00 ms	 	                    replace_free(void*)
119.00 ms    0.3%	62.00 ms	 	                    neqo_transport::packet::PacketBuilder::pn::h5422127f976a0e64
49.00 ms    0.1%	31.00 ms	 	                    MozJemallocPHC::free(void*)
18.00 ms    0.0%	18.00 ms	 	                    core::ptr::drop_in_place$LT$alloc..vec..Vec$LT$neqo_transport..recovery..RecoveryToken$GT$$GT$::h08151dabe30c7b89
14.00 ms    0.0%	14.00 ms	 	                    neqo_transport::crypto::CryptoStates::select_tx_mut::h635ecf0eddaeb66e
13.00 ms    0.0%	0 s	 	                    <Unknown Address>
11.00 ms    0.0%	11.00 ms	 	                    DYLD-STUB$$os_unfair_lock_unlock
10.00 ms    0.0%	10.00 ms	 	                    _$LT$env_logger..Logger$u20$as$u20$log..Log$GT$::log::h982318939f82e687
8.00 ms    0.0%	8.00 ms	 	                    hashbrown::map::HashMap$LT$K$C$V$C$S$C$A$GT$::get_mut::hf2360f8b527277d6
8.00 ms    0.0%	8.00 ms	 	                    neqo_common::codec::Encoder::is_empty::hd8defb46db9fe080
5.00 ms    0.0%	5.00 ms	 	                    __rust_alloc
5.00 ms    0.0%	0 s	 	                    0x70000c2668c7
4.00 ms    0.0%	4.00 ms	 	                    neqo_common::log::init::hfdc0ac3a4de68870
4.00 ms    0.0%	4.00 ms	 	                    _$LT$neqo_transport..cc..classic_cc..ClassicCongestionControl$LT$T$GT$$u20$as$u20$neqo_transport..cc..CongestionControl$GT$::cwnd_avail::h799a0ff691e0ccf8
4.00 ms    0.0%	4.00 ms	 	                    DYLD-STUB$$os_unfair_lock_lock_with_options
3.00 ms    0.0%	3.00 ms	 	                    DYLD-STUB$$free
1.00 ms    0.0%	1.00 ms	 	                    _$LT$neqo_transport..cc..classic_cc..ClassicCongestionControl$LT$T$GT$$u20$as$u20$neqo_transport..cc..CongestionControl$GT$::recovery_packet::h454715da81c268e7
1.00 ms    0.0%	1.00 ms	 	                    DYLD-STUB$$memset
2.44 s    6.1%	39.00 ms	 	                   _$LT$gecko_logger..GeckoLogger$u20$as$u20$log..Log$GT$::log::hf985d1ffd0cbbe34
1.46 s    3.6%	37.00 ms	 	                   neqo_transport::recovery::LossRecovery::pto_time::h59073070b604a930
1.37 s    3.4%	79.00 ms	 	                   alloc::fmt::format::format_inner::h44804980cee8c7aa

Maybe a duplicate of Bug 1862629, but not sure. Could get fixed with https://github.com/mozilla/neqo/issues/1487. Let's see if this gets fixed when fixing https://github.com/mozilla/neqo/issues/1487 and reevaluate then. Completely stuck sounds slightly different.

I don't see how the delayed timers could be related to a total freeze.
This sounds like a iloop of some sort in neqo. Also, it seems to be allocating and freeing in the loop

Whiteboard: [necko-triaged][necko-priority-new]

According to comment #0, the function that took most of the time is neqo_http3conn_process_output_and_send. That function was added in this patch, but that patch was backed out. So, this bug should be already fixed.

Jeff, it would be great of you can provide us more information. Any STR for this? Did you upload a bug file when having this issue?
What's your hardware?
Thanks.

Status: NEW → RESOLVED
Closed: 1 year ago
Flags: needinfo?(jmuizelaar)
Resolution: --- → FIXED
See Also: → 1852924

I have no STR. It's only happened to me twice.
I'm not sure what you mean by "Did you upload a bug file when having this issue?"
I'm on macOS 10.15

Flags: needinfo?(jmuizelaar)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #4)

I have no STR. It's only happened to me twice.
I'm not sure what you mean by "Did you upload a bug file when having this issue?"
I'm on macOS 10.15

Sorry, I mean uploading a big file.
Thanks for the information.

Keywords: regression
Regressed by: 1852924
See Also: 1852924
Target Milestone: --- → 121 Branch
You need to log in before you can comment on or make changes to this bug.