Closed Bug 1354407 Opened 3 years ago Closed 3 years ago

Add new telemetry probe for saved time when we rcwn

Categories

(Core :: Networking: Cache, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: junior, Assigned: junior)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [necko-active])

Attachments

(1 file, 5 obsolete files)

To evaluate rcwn algorithm, we need new probe for saved time - this will be also just an estimate based on current cache speed
Assignee: nobody → juhsu
Status: NEW → ASSIGNED
Depends on: 1325336, 1325331
We would calculate (OCEC - OnStartRequest from Transaciton)
Attached patch saveTime WIP - v1 (obsolete) — Splinter Review
haven't tested
Attached patch saveTime - v2 (obsolete) — Splinter Review
I tested by adding a random delay here
http://searchfox.org/mozilla-central/rev/313e5199bf58200f158c6fcbe193e41b88ed58a6/netwerk/protocol/http/nsHttpChannel.cpp#3623

I found that we still race sometimes through [1] instead of MaybeRaceCacheWithNetwork even disabling RCWN. 
[1] http://searchfox.org/mozilla-central/rev/313e5199bf58200f158c6fcbe193e41b88ed58a6/netwerk/protocol/http/nsHttpChannel.cpp#473
Attachment #8860277 - Attachment is obsolete: true
Attachment #8860877 - Flags: review?(michal.novotny)
(In reply to Junior[:junior] from comment #3)
> Created attachment 8860877 [details] [diff] [review]
> saveTime - v2
> 
> I tested by adding a random delay here
> http://searchfox.org/mozilla-central/rev/
> 313e5199bf58200f158c6fcbe193e41b88ed58a6/netwerk/protocol/http/nsHttpChannel.
> cpp#3623
> 
> I found that we still race sometimes through [1] instead of
> MaybeRaceCacheWithNetwork even disabling RCWN. 
> [1]
> http://searchfox.org/mozilla-central/rev/
> 313e5199bf58200f158c6fcbe193e41b88ed58a6/netwerk/protocol/http/nsHttpChannel.
> cpp#473

Oops, it's solved in bug 1354409 comment 7
Comment on attachment 8860877 [details] [diff] [review]
saveTime - v2

Review of attachment 8860877 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +3730,5 @@
> +
> +        // Net-win indicates that mOnStartRequestTimestamp is the one from net.
> +        int64_t difftime = (TimeStamp::Now() - mOnStartRequestTimestamp).ToMilliseconds();
> +        difftime = AbsBucketIndex(difftime);
> +        Telemetry::Accumulate(Telemetry::NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME, difftime);

It can happen that OnCacheEntryCheck is called before mFirstResponseSource is set and network still wins. We should handle it even if it is probably a rare case.

::: toolkit/components/telemetry/Histograms.json
@@ +2150,5 @@
> +  "NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME": {
> +    "expires_in_version": "58",
> +      "alert_emails": ["necko@mozilla.com"],
> +      "bug_numbers": [1354407],
> +      "kind": "enumerated",

There is no reason to use enumerated type for this probe.
Attachment #8860877 - Flags: review?(michal.novotny) → feedback+
> ::: netwerk/protocol/http/nsHttpChannel.cpp
> @@ +3730,5 @@
> > +
> > +        // Net-win indicates that mOnStartRequestTimestamp is the one from net.
> > +        int64_t difftime = (TimeStamp::Now() - mOnStartRequestTimestamp).ToMilliseconds();
> > +        difftime = AbsBucketIndex(difftime);
> > +        Telemetry::Accumulate(Telemetry::NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME, difftime);
> 
> It can happen that OnCacheEntryCheck is called before mFirstResponseSource
> is set and network still wins. We should handle it even if it is probably a
> rare case.
> 

How about reporting telemetry in onStartReqeust?

> ::: toolkit/components/telemetry/Histograms.json
> @@ +2150,5 @@
> > +  "NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME": {
> > +    "expires_in_version": "58",
> > +      "alert_emails": ["necko@mozilla.com"],
> > +      "bug_numbers": [1354407],
> > +      "kind": "enumerated",
> 
> There is no reason to use enumerated type for this probe.

I'd like to use the granularity like this
http://searchfox.org/mozilla-central/rev/abe68d5dad139e376d5521ca1d4b7892e1e7f1ba/toolkit/components/telemetry/Histograms.json#2051-2059
Flags: needinfo?(michal.novotny)
(In reply to Junior[:junior][ooo 5/11-16] from comment #6)
> > ::: netwerk/protocol/http/nsHttpChannel.cpp
> > @@ +3730,5 @@
> > > +
> > > +        // Net-win indicates that mOnStartRequestTimestamp is the one from net.
> > > +        int64_t difftime = (TimeStamp::Now() - mOnStartRequestTimestamp).ToMilliseconds();
> > > +        difftime = AbsBucketIndex(difftime);
> > > +        Telemetry::Accumulate(Telemetry::NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME, difftime);
> > 
> > It can happen that OnCacheEntryCheck is called before mFirstResponseSource
> > is set and network still wins. We should handle it even if it is probably a
> > rare case.
> > 
> 
> How about reporting telemetry in onStartReqeust?

In OnStartRequest you don't know OnCacheEntryCheck time and in most cases won't open the cache entry. So we need to keep this code and add that special case to OnStartRequest.

> > ::: toolkit/components/telemetry/Histograms.json
> > @@ +2150,5 @@
> > > +  "NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME": {
> > > +    "expires_in_version": "58",
> > > +      "alert_emails": ["necko@mozilla.com"],
> > > +      "bug_numbers": [1354407],
> > > +      "kind": "enumerated",
> > 
> > There is no reason to use enumerated type for this probe.
> 
> I'd like to use the granularity like this
> http://searchfox.org/mozilla-central/rev/
> abe68d5dad139e376d5521ca1d4b7892e1e7f1ba/toolkit/components/telemetry/
> Histograms.json#2051-2059

Agree.
Flags: needinfo?(michal.novotny)
(In reply to Michal Novotny (:michal) from comment #7)
> (In reply to Junior[:junior][ooo 5/11-16] from comment #6)
> > > ::: netwerk/protocol/http/nsHttpChannel.cpp
> > > @@ +3730,5 @@
> > > > +
> > > > +        // Net-win indicates that mOnStartRequestTimestamp is the one from net.
> > > > +        int64_t difftime = (TimeStamp::Now() - mOnStartRequestTimestamp).ToMilliseconds();
> > > > +        difftime = AbsBucketIndex(difftime);
> > > > +        Telemetry::Accumulate(Telemetry::NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME, difftime);
> > > 
> > > It can happen that OnCacheEntryCheck is called before mFirstResponseSource
> > > is set and network still wins. We should handle it even if it is probably a
> > > rare case.
> > > 
> > 
> > How about reporting telemetry in onStartReqeust?
> 
> In OnStartRequest you don't know OnCacheEntryCheck time and in most cases
> won't open the cache entry. So we need to keep this code and add that
> special case to OnStartRequest.
> 

Let's case by case:

1. Normal Case for net-win:
onStartRequest from network (set mFirstResponseSource and mOnStartRequestTimestamp) ->
onCacheEntryCheck (report probe)

2. Rare Case for net-win
onCacheEntryCheck without mFirstResponseSource (save OCEC time)->
onStartRequest from network 


In this case, we know net win, but time(OCEC) - mOnStartRequestTimestamp < 0 and it's meaningless.
We might record time(onStartReqeust from net) - time(onStartReqeust from cache)

Is it possible that we don't request cache after onCacheEntryAvailable?
If yes, another special case should be report. 


To conclude, we need three tables for our analysis:
1) time(OCEC) - mOnStartRequestTimestamp for case 1
2) time(onStartReqeust from net) - time(onStartReqeust from cache) for case 2
3) how often we hit case 1, and how often we hit case 2

For table 3, I do consider to merge it to the table in bug 1354409.
But it seems to make that table unreadable.
Therefore, I prefer separated table.

What do you think about this plan, Michal?
Flags: needinfo?(michal.novotny)
(In reply to Junior[:junior][ooo 5/11-16] from comment #8)
> (In reply to Michal Novotny (:michal) from comment #7)
> > (In reply to Junior[:junior][ooo 5/11-16] from comment #6)
> > > > ::: netwerk/protocol/http/nsHttpChannel.cpp
> > > > @@ +3730,5 @@
> > > > > +
> > > > > +        // Net-win indicates that mOnStartRequestTimestamp is the one from net.
> > > > > +        int64_t difftime = (TimeStamp::Now() - mOnStartRequestTimestamp).ToMilliseconds();
> > > > > +        difftime = AbsBucketIndex(difftime);
> > > > > +        Telemetry::Accumulate(Telemetry::NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME, difftime);
> > > > 
> > > > It can happen that OnCacheEntryCheck is called before mFirstResponseSource
> > > > is set and network still wins. We should handle it even if it is probably a
> > > > rare case.
> > > > 
> > > 
> > > How about reporting telemetry in onStartReqeust?
> > 
> > In OnStartRequest you don't know OnCacheEntryCheck time and in most cases
> > won't open the cache entry. So we need to keep this code and add that
> > special case to OnStartRequest.
> > 
> 
> Let's case by case:
> 
> 1. Normal Case for net-win:
> onStartRequest from network (set mFirstResponseSource and
> mOnStartRequestTimestamp) ->
> onCacheEntryCheck (report probe)
> 
> 2. Rare Case for net-win
> onCacheEntryCheck without mFirstResponseSource (save OCEC time)->
> onStartRequest from network 
> 
> 
> In this case, we know net win, but time(OCEC) - mOnStartRequestTimestamp < 0
> and it's meaningless.
> We might record time(onStartReqeust from net) - time(onStartReqeust from
> cache)

Ideally, we would always report difference between onStartRequests, but we don't have it in case 1) so we need to assume that OCEC time and cache's OnStartRequest don't differ too much.

> Is it possible that we don't request cache after onCacheEntryAvailable?
> If yes, another special case should be report.

I don't think so.

> To conclude, we need three tables for our analysis:
> 1) time(OCEC) - mOnStartRequestTimestamp for case 1
> 2) time(onStartReqeust from net) - time(onStartReqeust from cache) for case 2
> 3) how often we hit case 1, and how often we hit case 2
> 
> For table 3, I do consider to merge it to the table in bug 1354409.
> But it seems to make that table unreadable.
> Therefore, I prefer separated table.
> 
> What do you think about this plan, Michal?

If we really have separate probes for 1) and 2) then we don't need 3) because we can read it from sample count of 1) and 2). But I'm not sure we need to separate them especially when it won't provide information to estimate OnStartRequest time for 1).

We might collect 1) and 2) to a single telemetry and for 2) also report time between OCEC and OnStartRequest. This would tell us how often 2) happens and also we could use it to estimate OnStartRequest for 1).
Flags: needinfo?(michal.novotny)
> We might collect 1) and 2) to a single telemetry and for 2) also report time
> between OCEC and OnStartRequest. This would tell us how often 2) happens and
> also we could use it to estimate OnStartRequest for 1).

If so, exponential telemetry might not a good choice since we'll report 
 time(OCEC) - time(OnStartRequest) + 1000 ms.
The finest part is not what we care most.

From our previous telemetry for Q_BIG case,
1) <5% for a >10s net-win
2) <5% for a <100ms net-win

I would use a linear telemetry with high=11000 and n_bucket=110,
expecting we don't lose much detail for this granularity.

Agree?
Flags: needinfo?(michal.novotny)
(In reply to Junior[:junior][ooo 5/11-16] from comment #10)
> > We might collect 1) and 2) to a single telemetry and for 2) also report time
> > between OCEC and OnStartRequest. This would tell us how often 2) happens and
> > also we could use it to estimate OnStartRequest for 1).
> 
> If so, exponential telemetry might not a good choice since we'll report 
>  time(OCEC) - time(OnStartRequest) + 1000 ms.
> The finest part is not what we care most.

I don't understand this. I meant to report time(OnStartRequest) - time(OCEC) with that additional probe.
Flags: needinfo?(michal.novotny)
Oops, got it.
time(OCEC) - time(OnStartRequest) for the first case in one probe
and time(OnStartRequest) - time(OCEC) for the second case in another probe

Let's keep the exponential telemetry.
first probe:
 - case 1) OCEC - OnStart(net)
 - case 2) OnStart(cache) - OnStart(net)

second probe:
 - case 2) OnStart(cache) - OCEC
Attached patch saveTime - v3 (obsolete) — Splinter Review
Attachment #8860877 - Attachment is obsolete: true
Attachment #8864468 - Flags: review?(michal.novotny)
Comment on attachment 8864468 [details] [diff] [review]
saveTime - v3

Review of attachment 8864468 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +6784,5 @@
> +            Telemetry::Accumulate(Telemetry::NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME, savedTime);
> +
> +            int64_t diffTime = (currentTime - mOnCacheEntryCheckTimestamp).ToMilliseconds();
> +            Telemetry::Accumulate(Telemetry::NETWORK_RACE_CACHE_WITH_NETWORK_OCEC_ON_START_DIFF, diffTime);
> +        }

This is wrong, you're reporting (OnStart(net) - OCEC) instead of (OnStart(cache) - OnStart(net)).

::: toolkit/components/telemetry/Histograms.json
@@ +2080,5 @@
> +    "bug_numbers": [1354407],
> +    "kind": "linear",
> +    "high": 1000,
> +    "n_buckets": 100,
> +    "description": "Time in milliseconds between onStartRequest from the cache and onCacheEntryAvailable. Report only when net wins and OCEC is before onStartRequest from net."

onCacheEntryCheck
Attachment #8864468 - Flags: review?(michal.novotny) → review-
> > Is it possible that we don't request cache after onCacheEntryAvailable?
> > If yes, another special case should be report.
> 
> I don't think so.

Valentin, am I right that if we don't skip cache in OnCacheEntryCheck because we already set mFirstResponseSource to RESPONSE_FROM_NETWORK, then OnStartRequest will be called with cache pump always when it would be called if we didn't race?
Flags: needinfo?(valentin.gosu)
> This is wrong, you're reporting (OnStart(net) - OCEC) instead of
> (OnStart(cache) - OnStart(net)).
> 

It's for the second probe, i.e., report OnStart(cache) - OCEC , per comment 13.
You're right. I reported (OnStart(net) - OCEC) which is wrong
Wrong are both times. mOnStartRequestTimestamp is null here. You should compute the time for NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME in OnStartRequest when request == mCachePump.
Attached patch saveTime - v4 (obsolete) — Splinter Review
Attachment #8864468 - Attachment is obsolete: true
Attachment #8864804 - Flags: review?(michal.novotny)
(In reply to Michal Novotny (:michal) from comment #16)
> > > Is it possible that we don't request cache after onCacheEntryAvailable?
> > > If yes, another special case should be report.
> > 
> > I don't think so.
> 
> Valentin, am I right that if we don't skip cache in OnCacheEntryCheck
> because we already set mFirstResponseSource to RESPONSE_FROM_NETWORK, then
> OnStartRequest will be called with cache pump always when it would be called
> if we didn't race?

I'm not super sure I understand the question.
If we don't get an OnStartRequest for the network, by the time OnCacheEntryCheck is called, then OnCacheEntryAvailable will also be called, and we will be opening the cachePump for reading.
Flags: needinfo?(valentin.gosu)
Comment on attachment 8864804 [details] [diff] [review]
saveTime - v4

Review of attachment 8864804 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks.
Attachment #8864804 - Flags: review?(michal.novotny) → review+
Comment on attachment 8864804 [details] [diff] [review]
saveTime - v4

Review of attachment 8864804 [details] [diff] [review]:
-----------------------------------------------------------------

Hello Benjamin,

The first probe is to measure our success criterion, which is the estimated time we saved.
In fact the amount of saved time is OnStartReqeust(Cache) - OnStartRequest(Net)

However, we would cancel cache request if the net is really fast.
Hence we measure OnCacheEntryCheck - OnStartReqeust(Net)
and the second probe is OnStartReqeust(Cache) - OnCacheEntryCheck when we have this data.

The second probe could help us calculate the saved time.

Request a data review since we add some telemetries.
Attachment #8864804 - Flags: feedback?(benjamin)
Comment on attachment 8864804 [details] [diff] [review]
saveTime - v4

please add the real person responsible for this probe data to alert_emails, in addition to the mailing list.

data-r=me with that change
Attachment #8864804 - Flags: feedback?(benjamin) → feedback+
Gary, any chance to help me changing the commit message with comment 24, also bug 1354405?
I'd like to land it asap but I'm off the keyboard now
Flags: needinfo?(xeonchen)
Attached patch Save-time - v5 (obsolete) — Splinter Review
Fix commit message
Attachment #8864804 - Attachment is obsolete: true
Flags: needinfo?(xeonchen)
Attachment #8867036 - Flags: review+
Attachment #8867036 - Flags: feedback+
Keywords: checkin-needed
Keywords: checkin-needed
Forget to update the alert_emails
Attached patch saveTime - v6Splinter Review
Attachment #8867036 - Attachment is obsolete: true
Attachment #8868493 - Flags: review+
Attachment #8868493 - Flags: feedback+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/42e8905a1df2
Add new telemetry probe for saved time when we rcwn. r=michal, data-r=bsmedberg
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/42e8905a1df2
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
No longer depends on: 1325331, 1325336
Blocks: 1420334
You need to log in before you can comment on or make changes to this bug.