Closed
Bug 1354407
Opened 7 years ago
Closed 7 years ago
Add new telemetry probe for saved time when we rcwn
Categories
(Core :: Networking: Cache, enhancement)
Core
Networking: Cache
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox55 | --- | fixed |
People
(Reporter: CuveeHsu, Assigned: CuveeHsu)
References
(Blocks 1 open bug)
Details
(Whiteboard: [necko-active])
Attachments
(1 file, 5 obsolete files)
7.26 KB,
patch
|
CuveeHsu
:
review+
CuveeHsu
:
feedback+
|
Details | Diff | Splinter Review |
To evaluate rcwn algorithm, we need new probe for saved time - this will be also just an estimate based on current cache speed
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → juhsu
Status: NEW → ASSIGNED
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 1•7 years ago
|
||
We would calculate (OCEC - OnStartRequest from Transaciton)
Assignee | ||
Comment 2•7 years ago
|
||
haven't tested
Assignee | ||
Comment 3•7 years ago
|
||
I tested by adding a random delay here http://searchfox.org/mozilla-central/rev/313e5199bf58200f158c6fcbe193e41b88ed58a6/netwerk/protocol/http/nsHttpChannel.cpp#3623 I found that we still race sometimes through [1] instead of MaybeRaceCacheWithNetwork even disabling RCWN. [1] http://searchfox.org/mozilla-central/rev/313e5199bf58200f158c6fcbe193e41b88ed58a6/netwerk/protocol/http/nsHttpChannel.cpp#473
Attachment #8860277 -
Attachment is obsolete: true
Attachment #8860877 -
Flags: review?(michal.novotny)
Assignee | ||
Comment 4•7 years ago
|
||
(In reply to Junior[:junior] from comment #3) > Created attachment 8860877 [details] [diff] [review] > saveTime - v2 > > I tested by adding a random delay here > http://searchfox.org/mozilla-central/rev/ > 313e5199bf58200f158c6fcbe193e41b88ed58a6/netwerk/protocol/http/nsHttpChannel. > cpp#3623 > > I found that we still race sometimes through [1] instead of > MaybeRaceCacheWithNetwork even disabling RCWN. > [1] > http://searchfox.org/mozilla-central/rev/ > 313e5199bf58200f158c6fcbe193e41b88ed58a6/netwerk/protocol/http/nsHttpChannel. > cpp#473 Oops, it's solved in bug 1354409 comment 7
Comment 5•7 years ago
|
||
Comment on attachment 8860877 [details] [diff] [review] saveTime - v2 Review of attachment 8860877 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/protocol/http/nsHttpChannel.cpp @@ +3730,5 @@ > + > + // Net-win indicates that mOnStartRequestTimestamp is the one from net. > + int64_t difftime = (TimeStamp::Now() - mOnStartRequestTimestamp).ToMilliseconds(); > + difftime = AbsBucketIndex(difftime); > + Telemetry::Accumulate(Telemetry::NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME, difftime); It can happen that OnCacheEntryCheck is called before mFirstResponseSource is set and network still wins. We should handle it even if it is probably a rare case. ::: toolkit/components/telemetry/Histograms.json @@ +2150,5 @@ > + "NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME": { > + "expires_in_version": "58", > + "alert_emails": ["necko@mozilla.com"], > + "bug_numbers": [1354407], > + "kind": "enumerated", There is no reason to use enumerated type for this probe.
Attachment #8860877 -
Flags: review?(michal.novotny) → feedback+
Assignee | ||
Comment 6•7 years ago
|
||
> ::: netwerk/protocol/http/nsHttpChannel.cpp > @@ +3730,5 @@ > > + > > + // Net-win indicates that mOnStartRequestTimestamp is the one from net. > > + int64_t difftime = (TimeStamp::Now() - mOnStartRequestTimestamp).ToMilliseconds(); > > + difftime = AbsBucketIndex(difftime); > > + Telemetry::Accumulate(Telemetry::NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME, difftime); > > It can happen that OnCacheEntryCheck is called before mFirstResponseSource > is set and network still wins. We should handle it even if it is probably a > rare case. > How about reporting telemetry in onStartReqeust? > ::: toolkit/components/telemetry/Histograms.json > @@ +2150,5 @@ > > + "NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME": { > > + "expires_in_version": "58", > > + "alert_emails": ["necko@mozilla.com"], > > + "bug_numbers": [1354407], > > + "kind": "enumerated", > > There is no reason to use enumerated type for this probe. I'd like to use the granularity like this http://searchfox.org/mozilla-central/rev/abe68d5dad139e376d5521ca1d4b7892e1e7f1ba/toolkit/components/telemetry/Histograms.json#2051-2059
Flags: needinfo?(michal.novotny)
Comment 7•7 years ago
|
||
(In reply to Junior[:junior][ooo 5/11-16] from comment #6) > > ::: netwerk/protocol/http/nsHttpChannel.cpp > > @@ +3730,5 @@ > > > + > > > + // Net-win indicates that mOnStartRequestTimestamp is the one from net. > > > + int64_t difftime = (TimeStamp::Now() - mOnStartRequestTimestamp).ToMilliseconds(); > > > + difftime = AbsBucketIndex(difftime); > > > + Telemetry::Accumulate(Telemetry::NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME, difftime); > > > > It can happen that OnCacheEntryCheck is called before mFirstResponseSource > > is set and network still wins. We should handle it even if it is probably a > > rare case. > > > > How about reporting telemetry in onStartReqeust? In OnStartRequest you don't know OnCacheEntryCheck time and in most cases won't open the cache entry. So we need to keep this code and add that special case to OnStartRequest. > > ::: toolkit/components/telemetry/Histograms.json > > @@ +2150,5 @@ > > > + "NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME": { > > > + "expires_in_version": "58", > > > + "alert_emails": ["necko@mozilla.com"], > > > + "bug_numbers": [1354407], > > > + "kind": "enumerated", > > > > There is no reason to use enumerated type for this probe. > > I'd like to use the granularity like this > http://searchfox.org/mozilla-central/rev/ > abe68d5dad139e376d5521ca1d4b7892e1e7f1ba/toolkit/components/telemetry/ > Histograms.json#2051-2059 Agree.
Flags: needinfo?(michal.novotny)
Assignee | ||
Comment 8•7 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #7) > (In reply to Junior[:junior][ooo 5/11-16] from comment #6) > > > ::: netwerk/protocol/http/nsHttpChannel.cpp > > > @@ +3730,5 @@ > > > > + > > > > + // Net-win indicates that mOnStartRequestTimestamp is the one from net. > > > > + int64_t difftime = (TimeStamp::Now() - mOnStartRequestTimestamp).ToMilliseconds(); > > > > + difftime = AbsBucketIndex(difftime); > > > > + Telemetry::Accumulate(Telemetry::NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME, difftime); > > > > > > It can happen that OnCacheEntryCheck is called before mFirstResponseSource > > > is set and network still wins. We should handle it even if it is probably a > > > rare case. > > > > > > > How about reporting telemetry in onStartReqeust? > > In OnStartRequest you don't know OnCacheEntryCheck time and in most cases > won't open the cache entry. So we need to keep this code and add that > special case to OnStartRequest. > Let's case by case: 1. Normal Case for net-win: onStartRequest from network (set mFirstResponseSource and mOnStartRequestTimestamp) -> onCacheEntryCheck (report probe) 2. Rare Case for net-win onCacheEntryCheck without mFirstResponseSource (save OCEC time)-> onStartRequest from network In this case, we know net win, but time(OCEC) - mOnStartRequestTimestamp < 0 and it's meaningless. We might record time(onStartReqeust from net) - time(onStartReqeust from cache) Is it possible that we don't request cache after onCacheEntryAvailable? If yes, another special case should be report. To conclude, we need three tables for our analysis: 1) time(OCEC) - mOnStartRequestTimestamp for case 1 2) time(onStartReqeust from net) - time(onStartReqeust from cache) for case 2 3) how often we hit case 1, and how often we hit case 2 For table 3, I do consider to merge it to the table in bug 1354409. But it seems to make that table unreadable. Therefore, I prefer separated table. What do you think about this plan, Michal?
Flags: needinfo?(michal.novotny)
Comment 9•7 years ago
|
||
(In reply to Junior[:junior][ooo 5/11-16] from comment #8) > (In reply to Michal Novotny (:michal) from comment #7) > > (In reply to Junior[:junior][ooo 5/11-16] from comment #6) > > > > ::: netwerk/protocol/http/nsHttpChannel.cpp > > > > @@ +3730,5 @@ > > > > > + > > > > > + // Net-win indicates that mOnStartRequestTimestamp is the one from net. > > > > > + int64_t difftime = (TimeStamp::Now() - mOnStartRequestTimestamp).ToMilliseconds(); > > > > > + difftime = AbsBucketIndex(difftime); > > > > > + Telemetry::Accumulate(Telemetry::NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME, difftime); > > > > > > > > It can happen that OnCacheEntryCheck is called before mFirstResponseSource > > > > is set and network still wins. We should handle it even if it is probably a > > > > rare case. > > > > > > > > > > How about reporting telemetry in onStartReqeust? > > > > In OnStartRequest you don't know OnCacheEntryCheck time and in most cases > > won't open the cache entry. So we need to keep this code and add that > > special case to OnStartRequest. > > > > Let's case by case: > > 1. Normal Case for net-win: > onStartRequest from network (set mFirstResponseSource and > mOnStartRequestTimestamp) -> > onCacheEntryCheck (report probe) > > 2. Rare Case for net-win > onCacheEntryCheck without mFirstResponseSource (save OCEC time)-> > onStartRequest from network > > > In this case, we know net win, but time(OCEC) - mOnStartRequestTimestamp < 0 > and it's meaningless. > We might record time(onStartReqeust from net) - time(onStartReqeust from > cache) Ideally, we would always report difference between onStartRequests, but we don't have it in case 1) so we need to assume that OCEC time and cache's OnStartRequest don't differ too much. > Is it possible that we don't request cache after onCacheEntryAvailable? > If yes, another special case should be report. I don't think so. > To conclude, we need three tables for our analysis: > 1) time(OCEC) - mOnStartRequestTimestamp for case 1 > 2) time(onStartReqeust from net) - time(onStartReqeust from cache) for case 2 > 3) how often we hit case 1, and how often we hit case 2 > > For table 3, I do consider to merge it to the table in bug 1354409. > But it seems to make that table unreadable. > Therefore, I prefer separated table. > > What do you think about this plan, Michal? If we really have separate probes for 1) and 2) then we don't need 3) because we can read it from sample count of 1) and 2). But I'm not sure we need to separate them especially when it won't provide information to estimate OnStartRequest time for 1). We might collect 1) and 2) to a single telemetry and for 2) also report time between OCEC and OnStartRequest. This would tell us how often 2) happens and also we could use it to estimate OnStartRequest for 1).
Flags: needinfo?(michal.novotny)
Assignee | ||
Comment 10•7 years ago
|
||
> We might collect 1) and 2) to a single telemetry and for 2) also report time
> between OCEC and OnStartRequest. This would tell us how often 2) happens and
> also we could use it to estimate OnStartRequest for 1).
If so, exponential telemetry might not a good choice since we'll report
time(OCEC) - time(OnStartRequest) + 1000 ms.
The finest part is not what we care most.
From our previous telemetry for Q_BIG case,
1) <5% for a >10s net-win
2) <5% for a <100ms net-win
I would use a linear telemetry with high=11000 and n_bucket=110,
expecting we don't lose much detail for this granularity.
Agree?
Flags: needinfo?(michal.novotny)
Comment 11•7 years ago
|
||
(In reply to Junior[:junior][ooo 5/11-16] from comment #10) > > We might collect 1) and 2) to a single telemetry and for 2) also report time > > between OCEC and OnStartRequest. This would tell us how often 2) happens and > > also we could use it to estimate OnStartRequest for 1). > > If so, exponential telemetry might not a good choice since we'll report > time(OCEC) - time(OnStartRequest) + 1000 ms. > The finest part is not what we care most. I don't understand this. I meant to report time(OnStartRequest) - time(OCEC) with that additional probe.
Flags: needinfo?(michal.novotny)
Assignee | ||
Comment 12•7 years ago
|
||
Oops, got it. time(OCEC) - time(OnStartRequest) for the first case in one probe and time(OnStartRequest) - time(OCEC) for the second case in another probe Let's keep the exponential telemetry.
Comment 13•7 years ago
|
||
first probe: - case 1) OCEC - OnStart(net) - case 2) OnStart(cache) - OnStart(net) second probe: - case 2) OnStart(cache) - OCEC
Assignee | ||
Comment 14•7 years ago
|
||
Attachment #8860877 -
Attachment is obsolete: true
Attachment #8864468 -
Flags: review?(michal.novotny)
Comment 15•7 years ago
|
||
Comment on attachment 8864468 [details] [diff] [review] saveTime - v3 Review of attachment 8864468 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/protocol/http/nsHttpChannel.cpp @@ +6784,5 @@ > + Telemetry::Accumulate(Telemetry::NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME, savedTime); > + > + int64_t diffTime = (currentTime - mOnCacheEntryCheckTimestamp).ToMilliseconds(); > + Telemetry::Accumulate(Telemetry::NETWORK_RACE_CACHE_WITH_NETWORK_OCEC_ON_START_DIFF, diffTime); > + } This is wrong, you're reporting (OnStart(net) - OCEC) instead of (OnStart(cache) - OnStart(net)). ::: toolkit/components/telemetry/Histograms.json @@ +2080,5 @@ > + "bug_numbers": [1354407], > + "kind": "linear", > + "high": 1000, > + "n_buckets": 100, > + "description": "Time in milliseconds between onStartRequest from the cache and onCacheEntryAvailable. Report only when net wins and OCEC is before onStartRequest from net." onCacheEntryCheck
Attachment #8864468 -
Flags: review?(michal.novotny) → review-
Comment 16•7 years ago
|
||
> > Is it possible that we don't request cache after onCacheEntryAvailable?
> > If yes, another special case should be report.
>
> I don't think so.
Valentin, am I right that if we don't skip cache in OnCacheEntryCheck because we already set mFirstResponseSource to RESPONSE_FROM_NETWORK, then OnStartRequest will be called with cache pump always when it would be called if we didn't race?
Flags: needinfo?(valentin.gosu)
Assignee | ||
Comment 17•7 years ago
|
||
> This is wrong, you're reporting (OnStart(net) - OCEC) instead of > (OnStart(cache) - OnStart(net)). > It's for the second probe, i.e., report OnStart(cache) - OCEC , per comment 13. You're right. I reported (OnStart(net) - OCEC) which is wrong
Comment 18•7 years ago
|
||
Wrong are both times. mOnStartRequestTimestamp is null here. You should compute the time for NETWORK_RACE_CACHE_WITH_NETWORK_SAVED_TIME in OnStartRequest when request == mCachePump.
Assignee | ||
Comment 19•7 years ago
|
||
Attachment #8864468 -
Attachment is obsolete: true
Attachment #8864804 -
Flags: review?(michal.novotny)
Comment 20•7 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #16) > > > Is it possible that we don't request cache after onCacheEntryAvailable? > > > If yes, another special case should be report. > > > > I don't think so. > > Valentin, am I right that if we don't skip cache in OnCacheEntryCheck > because we already set mFirstResponseSource to RESPONSE_FROM_NETWORK, then > OnStartRequest will be called with cache pump always when it would be called > if we didn't race? I'm not super sure I understand the question. If we don't get an OnStartRequest for the network, by the time OnCacheEntryCheck is called, then OnCacheEntryAvailable will also be called, and we will be opening the cachePump for reading.
Flags: needinfo?(valentin.gosu)
Comment 21•7 years ago
|
||
Comment on attachment 8864804 [details] [diff] [review] saveTime - v4 Review of attachment 8864804 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, thanks.
Attachment #8864804 -
Flags: review?(michal.novotny) → review+
Assignee | ||
Comment 22•7 years ago
|
||
Comment on attachment 8864804 [details] [diff] [review] saveTime - v4 Review of attachment 8864804 [details] [diff] [review]: ----------------------------------------------------------------- Hello Benjamin, The first probe is to measure our success criterion, which is the estimated time we saved. In fact the amount of saved time is OnStartReqeust(Cache) - OnStartRequest(Net) However, we would cancel cache request if the net is really fast. Hence we measure OnCacheEntryCheck - OnStartReqeust(Net) and the second probe is OnStartReqeust(Cache) - OnCacheEntryCheck when we have this data. The second probe could help us calculate the saved time. Request a data review since we add some telemetries.
Attachment #8864804 -
Flags: feedback?(benjamin)
Assignee | ||
Comment 23•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1f6ec8d6865fc11376533c25ce80669dc048123d
Comment 24•7 years ago
|
||
Comment on attachment 8864804 [details] [diff] [review] saveTime - v4 please add the real person responsible for this probe data to alert_emails, in addition to the mailing list. data-r=me with that change
Attachment #8864804 -
Flags: feedback?(benjamin) → feedback+
Assignee | ||
Comment 25•7 years ago
|
||
Gary, any chance to help me changing the commit message with comment 24, also bug 1354405? I'd like to land it asap but I'm off the keyboard now
Flags: needinfo?(xeonchen)
Assignee | ||
Comment 26•7 years ago
|
||
Fix commit message
Attachment #8864804 -
Attachment is obsolete: true
Flags: needinfo?(xeonchen)
Attachment #8867036 -
Flags: review+
Attachment #8867036 -
Flags: feedback+
Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Assignee | ||
Comment 27•7 years ago
|
||
Forget to update the alert_emails
Assignee | ||
Comment 28•7 years ago
|
||
Attachment #8867036 -
Attachment is obsolete: true
Attachment #8868493 -
Flags: review+
Attachment #8868493 -
Flags: feedback+
Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Comment 29•7 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/42e8905a1df2 Add new telemetry probe for saved time when we rcwn. r=michal, data-r=bsmedberg
Keywords: checkin-needed
Comment 30•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/42e8905a1df2
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Assignee | ||
Updated•7 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•