Closed Bug 1430768 Opened 6 years ago Closed 6 years ago

Refine TFO telemetry

Categories

(Core :: Networking: HTTP, enhancement, P1)

59 Branch
enhancement

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: dragana, Assigned: dragana)

References

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file)

      No description provided.
Assignee: nobody → dd.mozilla
Status: NEW → ASSIGNED
Priority: -- → P1
Whiteboard: [necko-triaged]
This is a summary of TFO telemetry states. It is rather written for someone who knows the necko code well!

TFO states for telemetry:

nsHalfOpenSocket has a TFO state as soon as it is created. This is done in function nsHttpConnectionMgr::CreateTransport:
 1) creating of nsHalfOpenSocket - http connections get status TFO_HTTP
                                                      - https connections get status TFO_UNKNOWN
 2) nsHalfOpenSocket ::SetupPrimaryStream is called right after the constructor and it further changes the TFO state to:
   - http connections keep status TFO_HTTP
   - https connections get status TFO_DISABLED if TFO is disabled; if TFO is not disabled the connections keep status TFO_UNKNOWN.

SetupPrimaryStream will start SocketTransport that will move through different states: NS_NET_STATUS_RESOLVING_HOST,  NS_NET_STATUS_RESOLVED_HOST, NS_NET_STATUS_CONNECTING_TO, NS_NET_STATUS_CONNECTED_TO , etc.

After a successful host resolution and a tcp socket is successfully built, the transport(also nsHttpTransaction) changes to state NS_NET_STATUS_CONNECTING_TO and in nsSocketTransport::InitiateSocket() connections using TFO will call:
 1) FastOpenEnabled -This function checks again whether TFO is disabled and whether a proxy with Connect is used and change TFO state to  TFO_DISABLED or TFO_DISABLED_CONNECT. Otherwise the TFO state stays TFO_UNKNOWN.
 2) PR_Connect (If this succeeds the following 2 function are called, otherwise they are skipt)
 3) StartFastOpen - This function creates a nsHttpConnection. If the initialization of the nsHttpConnection fails, the TFO status of the nsHttpConnection is set to TFO_INIT_FAILED. If it is successful TFO status of the nsHttpConnection is set to TFO status of the nsHalfOpenSocket that is TFO_UNKNOWN (the status cannot be TFO_HTTP, TFO_DISABLED or TFO_DISABLED_CONNECT because StartFastOpen would not have been called).
 4) TCPFastOpenFinish - After this function is called the TFO status of the nsHalfOpenSocket and nsHttpConnection is set to: TFO_NOT_TRIED, TFO_DISABLED, TFO_DATA_SENT or TFO_TRIED.


(Just before these 4 functions are called BackupTimer is started. Even if network.http.connection-retry-timeout is set to 0 the timer will only dispatch an event and a backup connection will not be created before these 4 functions are done.)

From here there are 2 outcomes:
 1) The connection with TFO is connected or it has receive a socket error. In this case SetFastOpenConnected is called. SetFastOpenConnected has 2 different code paths:
  a) It is called with error: NS_ERROR_CONNECTION_REFUSED, NS_ERROR_FAILURE, NS_ERROR_PROXY_CONNECTION_REFUSED or NS_ERROR_NET_TIMEOUT. For these errors the existing nsHttpConnection is closed and its TFO state is set to TFO_FAILED(this is an internal state that is not reported in telemetry). The TFO state of nsHalfOpenSocket is set to TFO_FAILED_CONNECTION_REFUSED, TFO_FAILED_NET_TIMEOUT or TFO_FAILED_UNKNOW_ERROR and it will be reported by the retried connection or the backup connection.
  b) The TFO connection succeeded or it received a different error - the TFO state of the nsHttpConnection must be: TFO_NOT_TRIED, TFO_DATA_SENT or TFO_TRIED and it will be reported via telemetry. The TFO state for nsHalfOpenSocket is change to TF_BACKUP_CONN if backupTransport already exist.

 2) The backup connection is faster - nsHalfOpenSocket::OnOutputStreamReady for the backup connection will be called while the nsHlfOpenSocket is still in mFastOpenInProgerss state  (mFastOpenInProgerss is equal true).  The TFO nsHttpConnection will be closed and its TFO state will be set to TFO_FAILED(this is an internal state that is not reported in telemetry). The TFO state of the nsHalfOpenSocket will be set to:
TFO_FAILED_BACKUP_CONNECTION_TFO_NOT_TRIED, TFO_FAILED_BACKUP_CONNECTION_TFO_TRIED, TFO_FAILED_BACKUP_CONNECTION_TFO_DATA_SENT or TFO_FAILED_BACKUP_CONNECTION_TFO_DATA_COOKIE_NOT_ACCEPTED.
A new nsHttpConnection for the backup transport will be created and its TFO state will be set to: TFO_FAILED_BACKUP_CONNECTION_TFO_NOT_TRIED, TFO_FAILED_BACKUP_CONNECTION_TFO_TRIED, TFO_FAILED_BACKUP_CONNECTION_TFO_DATA_SENT or TFO_FAILED_BACKUP_CONNECTION_TFO_DATA_COOKIE_NOT_ACCEPTED. If the backup connection has an error as well the TFO state will be change into: TFO_FAILED_BACKUP_CONNECTION_NO_TFO_FAILED_TOO, TFO_FAILED_CONNECTION_REFUSED_NO_TFO_FAILED_TOO, TFO_FAILED_NET_TIMEOUT_NO_TFO_FAILED_TOO and TFO_FAILED_UNKNOW_ERROR_NO_TFO_FAILED_TOO.

HTTP connections and connection where TFO is disabled will have state TFO_HTTP and TFO_DISABLED. The states are set in nsHalfOpenSocket constructor (for insecure connections) and SetupPrimaryStream, FastOpenEnabled and TCPFastOpenFinish (for secure connections). Functions FastOpenEnabled, StartFastOpen, TCPFastOpenFinish, SetFastOpenConnected and  nsHalfOpenSocket::OnOutputStreamReady with mFastOpenInProgerss equals true will not be called.
Comment 1 explains the current TFO status telemetry values.

This patch:
- adds some assertions.
- The existing telemetry shows that around 4.5% of all connection are in state TFO_UNKNOWN which is high. A nsHalfOpenSocket is in this state from its creation until approximately PR_Connect is called ( and if for example DNS failed a nsHttpConnection will be made and it will report TFO_UNKOWN) I tried to split this state depending in which state the socketTransport is, i.e. resolving_host, host_respolved... I would like to understand why this number is so high.

- there is some minor changes to ensure that TFO_HTTP, TFO_DISABLED and TFO_DISABLED_CONNECT do not change state once set.


Do you have any other idea that would help understanding the telemetry results?

Links to MacOS telemetry (do not look at data older than 01/12):
https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-01-15&keys=__none__!__none__!__none__&max_channel_version=nightly%252F59&measure=TCP_FAST_OPEN_3&min_channel_version=null&os=Darwin%252C17.3.0!Darwin%252C17.5.0!Darwin%252C17.4.0&processType=*&product=Firefox&sanitize=0&sort_keys=submissions&start_date=2018-01-12&table=1&trim=1&use_submission_date=0

Link to Windows telemetry (do not look at data older than 01/12):
https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-01-15&keys=__none__!__none__!__none__&max_channel_version=nightly%252F59&measure=TCP_FAST_OPEN_3&min_channel_version=null&os=Windows_NT%252C10.0&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-12-20&table=1&trim=1&use_submission_date=0

I find value for TFO_UNKNOWN and TFO_DISABLED_CONNECT strange.

And we have already seen that backup connection wins to often. On MacOS it is around 2.7% (TFO_FAILED_BACKUP_CONNECTION_TFO_DATA_SENT).

The telemetry probe that measure how often a backup connection wins when TFO is not used is 0.84%:
https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-01-14&keys=__none__!__none__!__none__&max_channel_version=nightly%252F59&measure=NETWORK_HTTP_BACKUP_CONN_WON_1&min_channel_version=null&os=Windows_NT!Darwin!Linux&processType=*&product=Firefox&sanitize=0&sort_keys=submissions&start_date=2018-01-09&table=1&trim=1&use_submission_date=0
Attachment #8943265 - Flags: review?(mcmanus)
Attachment #8943265 - Flags: review?(mcmanus) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/652042ebfff8
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Depends on: 1432420
Depends on: 1432254
No longer depends on: 1432420
Depends on: 1434609
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: