Closed Bug 1377223 Opened 3 years ago Closed 2 years ago

RCWN - Should we revalidate when racing and the cache wins

Categories

(Core :: Networking, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: valentin, Assigned: michal)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [necko-active])

Attachments

(1 file, 1 obsolete file)

In the case of immediate racing, where the network request is sent immediately after asyncOpen, we might get to a situation where the cache wins, but needs to revalidate the entry. Since the network has already been triggered, no conditional request is sent, so we have to wait for the initial one to complete.

Another path would be to cancel the initial network request, and issue a second one. But since we only race for small resources, this might not be necessary.

Also, another BUG is that we should not ignore the network response that comes back after the cache wins.
We had a plan in the past to add some info needed for the reval/don't reval decision to the cache index.  I even believe there is a bug filed for this.

If there is an ETag, you only need to wait for the headers.  If the server responses with the same ETag, go from the cache (it will probably be faster than from the cache *))

I think the same applies when only the Last-Modified header is present for revalidation.  If identical, you can go from cache.

And there is always the trivial solution - ignore the cached content completely if it's small enough *), since it is very likely the content comes quickly after the headers from the net.


*) depends... :D
> Also, another BUG is that we should not ignore the network response that
> comes back after the cache wins.

Incomplete description: the case in which we might ignore the network response is for partial cache entries - and the request would get stuck.
(In reply to Valentin Gosu [:valentin] from comment #2)
> > Also, another BUG is that we should not ignore the network response that
> > comes back after the cache wins.
> 
> Incomplete description: the case in which we might ignore the network
> response is for partial cache entries - and the request would get stuck.

Ah!  Yes.. that's a problem.  Maybe at the first stage of this feature evolution always use the network response when you find the cache entry being partial (for the sake of simplicity and according how many partial requests we usually have cached - I believe a very small amount?)

The more correct solution IMO is to add some more info to the index, like L-M, ETag (which is tho an arbitrary length string), is-partial, and do early decision based on it.
Michal, do you think you could take this bug?
Flags: needinfo?(michal.novotny)
Assignee: valentin.gosu → michal.novotny
Flags: needinfo?(michal.novotny)
(In reply to Valentin Gosu [:valentin] from comment #2)
> > Also, another BUG is that we should not ignore the network response that
> > comes back after the cache wins.
> 
> Incomplete description: the case in which we might ignore the network
> response is for partial cache entries - and the request would get stuck.

How this can happen? I'm trying to reproduce the problem but it's IMO not possible.

In case the cache wins and the network was already triggered, we ignore the cache entry.

In case the cache wins and the network was not yet triggered, we set up the range request and trigger the network, but we're not racing (mRaceCacheWithNetwork == false) so the network response isn't ignored.

Did I miss some other possibility?
Flags: needinfo?(valentin.gosu)
(In reply to Honza Bambas (:mayhemer) from comment #3)
> The more correct solution IMO is to add some more info to the index, like
> L-M, ETag (which is tho an arbitrary length string), is-partial, and do
> early decision based on it.

This would be nice but definitely not something that can be done for release 57.

I don't like canceling already sent non-conditional request to be able to send a conditional one. We should probably get some telemetry to know how often this happens. We could also have some minimum delay time in MaybeRaceCacheWithNetwork, which would eliminate this problem in case we think the cache is slow but it isn't.
(In reply to Michal Novotny (:michal) from comment #5)
> (In reply to Valentin Gosu [:valentin] from comment #2)

> In case the cache wins and the network was already triggered, we ignore the
> cache entry.

I think this case was the reason behind this bug.
The scenario was:
We race cache and network.
Cache wins, so we set mFirstResponseSource = cache
Cache entry is partial, so we try issue a request for the full content, but we don't, since we already triggered the network.
Raced network request comes back, but we ignore it since mFirstResponseSource == cache.

Right now I can't seem to figure out how we ignore the cache entry if the network was already triggered.
Flags: needinfo?(valentin.gosu) → needinfo?(michal.novotny)
(In reply to Valentin Gosu [:valentin] from comment #7)
> (In reply to Michal Novotny (:michal) from comment #5)
> > (In reply to Valentin Gosu [:valentin] from comment #2)
> 
> > In case the cache wins and the network was already triggered, we ignore the
> > cache entry.
> 
> I think this case was the reason behind this bug.
> The scenario was:
> We race cache and network.
> Cache wins, so we set mFirstResponseSource = cache
> Cache entry is partial, so we try issue a request for the full content, but
> we don't, since we already triggered the network.
> Raced network request comes back, but we ignore it since
> mFirstResponseSource == cache.
> 
> Right now I can't seem to figure out how we ignore the cache entry if the
> network was already triggered.

When the cache wins mFirstResponseSource = cache is set in ReadFromCache. When the cache entry is partial we first send the range request and we start to read from the cache entry in ProcessPartialContent() after the server responds.

Honza, is is possible that both mCachedContentIsValid and mCachedContentIsPartial are true? I don't think so, OTOH this check would be then useless http://searchfox.org/mozilla-central/rev/a83a4b68974aecaaacdf25145420e0fe97b7aa22/netwerk/protocol/http/nsHttpChannel.cpp#747
Flags: needinfo?(michal.novotny) → needinfo?(honzab.moz)
(In reply to Michal Novotny (:michal) from comment #8)
> Honza, is is possible that both mCachedContentIsValid and
> mCachedContentIsPartial are true? 

Yes, they can.  In the most common case when we receive 206 and start reading from the cache.

I don't follow what's the big deal here.  From the nature of this feature, it's possible we send a non-range request for the content and later discover a cached entry that is partial.  There are, IIUC, only two outcomes:

- network response comes before OnCacheEntryCheck call (which means also before ReadFromCache()) -> we ignore any cached content
- OnCacheEntryCheck comes before OnStartRequest from the transaction and only then we find out the entry is partial -> we ignore cache since we are about to get a complete request from the server (it's already on its way)

I think it's OK for the first impl to it this way.  Doing a second request (best via a redirect to itself, while the cache entry will be delivered synchronously to the target channel) is doable but IMO more only an optimization.
Flags: needinfo?(honzab.moz)
(In reply to Honza Bambas (:mayhemer) from comment #9)
> (In reply to Michal Novotny (:michal) from comment #8)
> > Honza, is is possible that both mCachedContentIsValid and
> > mCachedContentIsPartial are true? 
> 
> Yes, they can.  In the most common case when we receive 206 and start
> reading from the cache.

Sure, I meant before we sent the range request.

> I don't follow what's the big deal here.  From the nature of this feature,
> it's possible we send a non-range request for the content and later discover
> a cached entry that is partial.  There are, IIUC, only two outcomes:
> 
> - network response comes before OnCacheEntryCheck call (which means also
> before ReadFromCache()) -> we ignore any cached content
> - OnCacheEntryCheck comes before OnStartRequest from the transaction and
> only then we find out the entry is partial -> we ignore cache since we are
> about to get a complete request from the server (it's already on its way)
> 
> I think it's OK for the first impl to it this way.  Doing a second request
> (best via a redirect to itself, while the cache entry will be delivered
> synchronously to the target channel) is doable but IMO more only an
> optimization.

This confirms what I wrote in comment #5. I.e. no problem mentioned in comments #2 and #3 exists in the current code.
(In reply to Michal Novotny (:michal) from comment #10)
> (In reply to Honza Bambas (:mayhemer) from comment #9)
> > (In reply to Michal Novotny (:michal) from comment #8)
> > > Honza, is is possible that both mCachedContentIsValid and
> > > mCachedContentIsPartial are true? 
> > 
> > Yes, they can.  In the most common case when we receive 206 and start
> > reading from the cache.
> 
> Sure, I meant before we sent the range request.

Then I think no.  If they were, would it break?

> 
> > I don't follow what's the big deal here.  From the nature of this feature,
> > it's possible we send a non-range request for the content and later discover
> > a cached entry that is partial.  There are, IIUC, only two outcomes:
> > 
> > - network response comes before OnCacheEntryCheck call (which means also
> > before ReadFromCache()) -> we ignore any cached content
> > - OnCacheEntryCheck comes before OnStartRequest from the transaction and
> > only then we find out the entry is partial -> we ignore cache since we are
> > about to get a complete request from the server (it's already on its way)
> > 
> > I think it's OK for the first impl to it this way.  Doing a second request
> > (best via a redirect to itself, while the cache entry will be delivered
> > synchronously to the target channel) is doable but IMO more only an
> > optimization.
> 
> This confirms what I wrote in comment #5. I.e. no problem mentioned in
> comments #2 and #3 exists in the current code.

Good!
Attached patch fix (obsolete) — Splinter Review
Valentin, note that I also changed the condition when calling Unused << ReadFromCache() in this patch, but it is just a cosmetic change. They should be identical but I think this is easier to understand at the first look.
Attachment #8888502 - Flags: review?(valentin.gosu)
Attachment #8888502 - Flags: review?(benjamin)
(In reply to Honza Bambas (:mayhemer) from comment #11)
> (In reply to Michal Novotny (:michal) from comment #10)
> > (In reply to Honza Bambas (:mayhemer) from comment #9)
> > > (In reply to Michal Novotny (:michal) from comment #8)
> > > > Honza, is is possible that both mCachedContentIsValid and
> > > > mCachedContentIsPartial are true? 
> > > 
> > > Yes, they can.  In the most common case when we receive 206 and start
> > > reading from the cache.
> > 
> > Sure, I meant before we sent the range request.
> 
> Then I think no.  If they were, would it break?

Then it could break, but this would mean that ReadFromCache could be called before validating the partial content with the server. It doesn't make sense to me and AFAICS nsHttpChannel cannot handle this.
Blocks: 1381816
Comment on attachment 8888502 [details] [diff] [review]
fix

Review of attachment 8888502 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +9353,5 @@
>      }
>  
> +    if (mRaceDelay < sRCWNMinWaitMs) {
> +        mRaceDelay = sRCWNMinWaitMs;
> +    }

As discussed on IRC, let's wait for telemetry, and we can land the minimum wait in a separate bug.

::: toolkit/components/telemetry/Histograms.json
@@ +2445,5 @@
>      "alert_emails": ["necko@mozilla.com", "bsmedberg@mozilla.com"],
>      "bug_numbers": [1354405]
>    },
> +  "NETWORK_RACE_CACHE_VALIDATION": {
> +    "record_in_processes": ["main", "content"],

Since nsHttpChannel does not run in the content process, and we only accumulate there, is there any point in keeping the content process here?
Attachment #8888502 - Flags: review?(valentin.gosu) → review+
(In reply to Valentin Gosu [:valentin] from comment #14)
> ::: toolkit/components/telemetry/Histograms.json
> @@ +2445,5 @@
> >      "alert_emails": ["necko@mozilla.com", "bsmedberg@mozilla.com"],
> >      "bug_numbers": [1354405]
> >    },
> > +  "NETWORK_RACE_CACHE_VALIDATION": {
> > +    "record_in_processes": ["main", "content"],
> 
> Since nsHttpChannel does not run in the content process, and we only
> accumulate there, is there any point in keeping the content process here?

No, I just copied another existing probe. Recording in content and main to all existing probes was added in bug 1335343. I will remove it.
Attachment #8888502 - Attachment is obsolete: true
Attachment #8888502 - Flags: review?(benjamin)
Attachment #8888556 - Flags: review+
Attachment #8888556 - Flags: review?(benjamin)
Comment on attachment 8888556 [details] [diff] [review]
patch v2 - removed the minimum delay

Could we get the data review today, please? We need to land this before a shield study starts on Monday.
Attachment #8888556 - Flags: feedback?(francois)
Comment on attachment 8888556 [details] [diff] [review]
patch v2 - removed the minimum delay

data-r=me - I apologize for not getting to this Friday
Attachment #8888556 - Flags: review?(benjamin)
Attachment #8888556 - Flags: review+
Attachment #8888556 - Flags: feedback?(francois)
Pushed by mnovotny@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/69e573abaccb
RCWN - Should we revalidate when racing and the cache wins, r=valentin, data-r=bsmedberg
Depends on: 1383949
https://hg.mozilla.org/mozilla-central/rev/69e573abaccb
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
You need to log in before you can comment on or make changes to this bug.