Open Bug 2016567 Opened 20 hours ago Updated 4 hours ago

Telemetry collection is ~6% of socket thread samples during TRR stress test; cull low value probes

Categories

(Core :: Networking: DNS, task, P3)

task

Tracking

()

People

(Reporter: acreskey, Assigned: acreskey)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged] [fxpe] )

Looking at a profile of the TRR-multi test (loads favicons from 50 hosts, thus performing 50 TRR DNS lookups and 50 image fetches) with Valentin's TRR prioritization patches from bug 2009372, I'm seeing that about 6% of the cpu samples in the busy socket thread are related to telemetry

https://share.firefox.dev/3OmtERS

If any of these probes can be removed or made to be called less frequently, it would be beneficial to DoH performance on Fenix.
Also, we probably don't need the GIFFT Glean Interface For Firefox Telemetry mirrors anymore?

  423  dns.lookup_method
  396  HTTP_TRANSACTION_RESTART_REASON (GIFFT)
  396  http.transaction_restart_reason
  337  networking.transaction_wait_time
  319  TRANSACTION_WAIT_TIME_SPDY (GIFFT)
  319  http.transaction_wait_time_spdy
  313  TRANSACTION_WAIT_TIME_HTTP2_SUP_HTTP3 (GIFFT)
  313  http.transaction_wait_time_http2_sup_http3
  288  DNS_LOOKUP_DISPOSITION3 (GIFFT)
  284  networking.trr_request_count[regular]
  284  networking.trr_dns_start/mozilla.cloudflare-dns.com_3[mozilla.cloudflare-dns.com_3]
  284  networking.trr_dns_end/mozilla.cloudflare-dns.com_3[mozilla.cloudflare-dns.com_3]
  284  networking.trr_open_to_first_sent/mozilla.cloudflare-dns.com_3[mozilla.cloudflare-dns.com_3]
  284  networking.trr_first_sent_to_last_received/mozilla.cloudflare-dns.com_3[mozilla.cloudflare-dns.com_3]
  284  networking.trr_complete_load/mozilla.cloudflare-dns.com_3[mozilla.cloudflare-dns.com_3]
  284  networking.trr_open_to_first_received/mozilla.cloudflare-dns.com_3[mozilla.cloudflare-dns.com_3]
  284  networking.trr_request_size[mozilla.cloudflare-dns.com_3]
  284  networking.trr_response_size[mozilla.cloudflare-dns.com_3]
  202  DNS_TRR_PROCESSING_TIME (GIFFT)
  202  dns.trr_processing_time
  202  DNS_TRR_HTTP_VERSION2 (GIFFT)
  202  dns.trr_http_version[mozilla.cloudflare-dns.com_3, h_2]
  165  PRCONNECT_BLOCKING_TIME_CONNECTIVITY_CHANGE (GIFFT)
  165  networking.prconnect_blocking_time_connectivity_change
  164  IPV4_AND_IPV6_ADDRESS_CONNECTIVITY (GIFFT)
  164  network.ipv4_and_ipv6_address_connectivity
  152  networking.trr_fetch_duration/h2[h2]
  152  networking.trr_fetch_duration/h2_network_only[h2_network_only]
  126  http.traffic_analysisTransaction__other__
   99  HTTP_CONNECTION_ENTRY_CACHE_HIT_1 (GIFFT)
   99  TRR_SKIP_REASON_TRR_FIRST2 (GIFFT)
   99  dns.trr_skip_reason_trr_first[mozilla.cloudflare-dns.com_3]
   99  TRR_RELEVANT_SKIP_REASON_TRR_FIRST (GIFFT)
   99  dns.trr_relevant_skip_reason_trr_first[mozilla.cloudflare-dns.com_3]
   99  DNS_LOOKUP_ALGORITHM (GIFFT)
   99  dns.lookup_algorithm[trrOnly]
   98  PRCONNECTCONTINUE_BLOCKING_TIME_CONNECTIVITY_CHANGE (GIFFT)
   98  networking.prconnectcontinue_blocking_time_connectivity_change
   96  SSL_HANDSHAKE_RESULT_ECH_GREASE (GIFFT)
   96  ssl_handshake.result_ech_grease
   96  SSL_HANDSHAKE_RESULT (GIFFT)
   96  ssl_handshake.result
   96  SSL_NPN_TYPE (GIFFT)
   96  ssl.npn_type
   96  TLS_CIPHER_SUITE (GIFFT)
   96  tls.cipher_suite
   96  SSL_HANDSHAKE_VERSION (GIFFT)
   96  ssl_handshake.version
   96  SSL_TIME_UNTIL_READY_ECH_GREASE (GIFFT)
   96  ssl.time_until_ready_ech_grease
   96  SSL_TIME_UNTIL_READY (GIFFT)
   96  ssl.time_until_ready
   96  SSL_TIME_UNTIL_HANDSHAKE_FINISHED_KEYED_BY_KA (GIFFT)
   96  SSL_RESUMED_SESSION (GIFFT)
   96  SSL_HANDSHAKE_TYPE (GIFFT)
   96  ssl_handshake.completed
   96  ECHCONFIG_SUCCESS_RATE (GIFFT)
   96  http.echconfig_success_rate[NoEchConfigSucceeded]
   96  dns.lookup_disposition[mozilla.cloudflare-dns.com_3, trrAOK]
   96  DNS_TRR_LOOKUP_TIME3 (GIFFT)
   96  dns.trr_lookup_time[mozilla.cloudflare-dns.com_3]
   96  dns.lookup_disposition[mozilla.cloudflare-dns.com_3, trrOK]
   95  SSL_KEY_EXCHANGE_ALGORITHM_FULL (GIFFT)
   95  ssl.key_exchange_algorithm_full
   95  SSL_AUTH_ALGORITHM_FULL (GIFFT)
   95  ssl.auth_algorithm_full
   95  ssl.resumed_session[false]
   95  SSL_BYTES_BEFORE_CERT_CALLBACK (GIFFT)
   95  ssl.bytes_before_cert_callback
   95  SSL_HANDSHAKE_PRIVACY (GIFFT)
   95  ssl_handshake.privacy
   93  PRCLOSE_TCP_BLOCKING_TIME_CONNECTIVITY_CHANGE (GIFFT)
   93  networking.prclose_tcp_blocking_time_connectivity_change
   92  TRR_RELEVANT_SKIP_REASON_TRR_FIRST_TYPE_REC (GIFFT)
   92  dns.trr_relevant_skip_reason_trr_first_type_rec[mozilla.cloudflare-dns.com_3]
   92  http.connection_entry_cache_hit[false]
   91  SSL_HANDSHAKE_RESULT_FIRST_TRY (GIFFT)
   91  ssl_handshake.result_first_try
   91  SSL_TIME_UNTIL_READY_FIRST_TRY (GIFFT)
   91  ssl.time_until_ready_first_try
   87  networking.data_transferred_v3_kb[Y2_N3Oth]
   81  DNS_BY_TYPE_FAILED_LOOKUP_TIME (GIFFT)
   81  dns.by_type_failed_lookup_time
   74  ?.networking.data_transferred_v3_kb
   68  SPDY_SETTINGS_MAX_STREAMS (GIFFT)
   68  spdy.settings_max_streams
   66  networking.connection_address_type[http_2_ipv4]
   66  PRCONNECT_FAIL_BLOCKING_TIME_CONNECTIVITY_CHANGE (GIFFT)
   66  networking.prconnect_fail_blocking_time_connectivity_change
   54  SSL_AUTH_RSA_KEY_SIZE_FULL (GIFFT)
   54  ssl.auth_rsa_key_size_full
   51  dns.lookup_disposition[mozilla.cloudflare-dns.com_3, trrAAAAFail]
   49  ssl.time_until_handshake_finished_keyed_by_ka[mlkem768x25519]
   47  SSL_KEA_ECDHE_CURVE_FULL (GIFFT)
   47  ssl.kea_ecdhe_curve_full
   45  dns.lookup_disposition[mozilla.cloudflare-dns.com_3, trrAAAAOK]
   44  HTTP_CONNECTION_CLOSE_REASON (GIFFT)
   42  ssl.time_until_handshake_finished_keyed_by_ka[x25519]
   41  SSL_AUTH_ECDSA_CURVE_FULL (GIFFT)
   41  ssl.auth_ecdsa_curve_full
   35  netwerk.http3_0rtt_state[not_used]
   26  TRANSACTION_WAIT_TIME_HTTP (GIFFT)
   26  http.transaction_wait_time_http
   24  networking.data_transferred_v3_kb[Y0_N1Sys]
   23  SPDY_PARALLEL_STREAMS (GIFFT)
   23  spdy.parallel_streams
   23  SPDY_REQUEST_PER_CONN_3 (GIFFT)
   23  spdy.request_per_conn
   23  SPDY_SERVER_INITIATED_STREAMS (GIFFT)
   23  spdy.server_initiated_streams
   23  SPDY_GOAWAY_LOCAL (GIFFT)
   23  spdy.goaway_local
   23  SPDY_GOAWAY_PEER (GIFFT)
   23  spdy.goaway_peer
   23  HTTP2_FAIL_BEFORE_SETTINGS (GIFFT)
   23  http.http2_fail_before_settings[false]
   22  HTTP3_TIMER_DELAYED (GIFFT)
   22  http3.timer_delayed
   20  HTTP3_CONNECTION_CLOSE_CODE_3 (GIFFT)
   20  http3.connection_close_code[app_closing]
   18  HTTP3_REQUEST_PER_CONN (GIFFT)
   18  http3.request_per_conn
   18  HTTP3_BLOCKED_BY_STREAM_LIMIT_PER_CONN (GIFFT)
   18  http3.blocked_by_stream_limit_per_conn
   18  HTTP3_TRANS_BLOCKED_BY_STREAM_LIMIT_PER_CONN (GIFFT)
   18  http3.trans_blocked_by_stream_limit_per_conn
   18  HTTP3_TRANS_SENDING_BLOCKED_BY_FLOW_CONTROL_PER_CONN (GIFFT)
   18  http3.trans_sending_blocked_by_flow_control_per_conn
   18  networking.http_3_quic_frame_count[ack_tx]
   18  networking.http_3_quic_frame_count[crypto_tx]
   18  networking.http_3_quic_frame_count[stream_tx]
   18  networking.http_3_quic_frame_count[reset_stream_tx]
   18  networking.http_3_quic_frame_count[stop_sending_tx]
   18  networking.http_3_quic_frame_count[ping_tx]
   18  networking.http_3_quic_frame_count[padding_tx]
   18  networking.http_3_quic_frame_count[max_streams_tx]
   18  networking.http_3_quic_frame_count[streams_blocked_tx]
   18  networking.http_3_quic_frame_count[max_data_tx]
   18  networking.http_3_quic_frame_count[data_blocked_tx]
   18  networking.http_3_quic_frame_count[max_stream_data_tx]
   18  networking.http_3_quic_frame_count[stream_data_blocked_tx]
   18  networking.http_3_quic_frame_count[new_connection_id_tx]
   18  networking.http_3_quic_frame_count[retire_connection_id_tx]
   18  networking.http_3_quic_frame_count[path_challenge_tx]
   18  networking.http_3_quic_frame_count[path_response_tx]
   18  networking.http_3_quic_frame_count[connection_close_tx]
   18  networking.http_3_quic_frame_count[handshake_done_tx]
   18  networking.http_3_quic_frame_count[new_token_tx]
   18  networking.http_3_quic_frame_count[ack_frequency_tx]
   18  networking.http_3_quic_frame_count[datagram_tx]
   18  networking.http_3_quic_frame_count[ack_rx]
   18  networking.http_3_quic_frame_count[crypto_rx]
   18  networking.http_3_quic_frame_count[stream_rx]
   18  networking.http_3_quic_frame_count[reset_stream_rx]
   18  networking.http_3_quic_frame_count[stop_sending_rx]
   18  networking.http_3_quic_frame_count[ping_rx]
   18  networking.http_3_quic_frame_count[padding_rx]
   18  networking.http_3_quic_frame_count[max_streams_rx]
   18  networking.http_3_quic_frame_count[streams_blocked_rx]
   18  networking.http_3_quic_frame_count[max_data_rx]
   18  networking.http_3_quic_frame_count[data_blocked_rx]
   18  networking.http_3_quic_frame_count[max_stream_data_rx]
   18  networking.http_3_quic_frame_count[stream_data_blocked_rx]
   18  networking.http_3_quic_frame_count[new_connection_id_rx]
   18  networking.http_3_quic_frame_count[retire_connection_id_rx]
   18  networking.http_3_quic_frame_count[path_challenge_rx]
   18  networking.http_3_quic_frame_count[path_response_rx]
   18  networking.http_3_quic_frame_count[connection_close_rx]
   18  networking.http_3_quic_frame_count[handshake_done_rx]
   18  networking.http_3_quic_frame_count[new_token_rx]
   18  networking.http_3_quic_frame_count[ack_frequency_rx]
   18  networking.http_3_quic_frame_count[datagram_rx]
   18  networking.http_3_congestion_event_reason[loss]
   18  networking.http_3_congestion_event_reason[ecn-ce]
   18  http.connection_close_reason[30_1_0_0_2]
   18  PRCLOSE_UDP_BLOCKING_TIME_CONNECTIVITY_CHANGE (GIFFT)
   18  networking.prclose_udp_blocking_time_connectivity_change
   18  http.traffic_analysisConnection__other__
   15  networking.transaction_wait_time_https_rr
   14  http.connection_close_reason[11_1_0_1_4]
   12  networking.connection_address_type[http_1_ipv4]
   11  DNS_BY_TYPE_SUCCEEDED_LOOKUP_TIME (GIFFT)
   11  dns.by_type_succeeded_lookup_time
   10  cert_compression.failures[brotli]
    9  SPDY_KBREAD_PER_CONN2 (GIFFT)
    9  spdy.kbread_per_conn
    7  http.connection_entry_cache_hit[true]
    7  TRANSACTION_WAIT_TIME_HTTP3 (GIFFT)
    7  http.transaction_wait_time_http3
    7  http.connection_close_reason[20_1_0_3_4]
    6  DNS_HTTPSSVC_CONNECTION_FAILED_REASON (GIFFT)
    6  http.dns_httpssvc_connection_failed_reason
    4  ssl.time_until_handshake_finished_keyed_by_ka[P256]
    3  http.connection_close_reason[11_1_0_3_4]
    3  HTTP_REQUEST_PER_CONN (GIFFT)
    3  http.request_per_conn
    2  http.connection_close_reason[20_1_0_7_4]
    1  SSL_KEY_EXCHANGE_ALGORITHM_RESUMED (GIFFT)
    1  ssl.key_exchange_algorithm_resumed
    1  ssl.resumed_session[true]
    1  networking.trr_tcp_connection/mozilla.cloudflare-dns.com_3[mozilla.cloudflare-dns.com_3]
    1  networking.data_transferred_v3_kb[Y1_N1]
    1  HTTP_KBREAD_PER_CONN2 (GIFFT)
    1  http.kbread_per_conn2
    1  ssl.time_until_handshake_finished_keyed_by_ka[P521]
Summary: Telemetry collection is ~6% of socket thread samples during TRR stress test → Telemetry collection is ~6% of socket thread samples during Fenix TRR stress test

Thanks for taking a look at this.
I think we can remove the following probes, since they are no longer needed:

  • dns_httpssvc_connection_failed_reason
  • connection_close_reason
  • HTTP_TRANSACTION_RESTART_REASON

Some TRR-related probes appear to serve a similar purpose and could likely be removed as well. For example:

  • trr_fetch_duration
  • trr_complete_load
Summary: Telemetry collection is ~6% of socket thread samples during Fenix TRR stress test → Telemetry collection is ~6% of socket thread samples during TRR stress test

For H3/QUIC, I think we should check whether all these frame_count metrics are useful at all. Ditto for at least some of the ssl ones. Wonder if we should add a connection parameter omit_noisy_stats to the QUIC stack that we can set for DoH connections.

Summary: Telemetry collection is ~6% of socket thread samples during TRR stress test → Telemetry collection is ~6% of socket thread samples during TRR stress test; cull low value probes
Whiteboard: [necko-triaged] → [necko-triaged] [fxpe]
You need to log in before you can comment on or make changes to this bug.