Closed Bug 1325081 Opened 7 years ago Closed 7 years ago

Make nsHttpChannel able to do network and cache requests at the same time

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox53 --- affected
firefox54 --- fixed

People

(Reporter: michal, Assigned: valentin)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-active])

Attachments

(3 files)

For RCWN nsHttpChannel needs to be able to handle following network/cache requests concurrency:

1) start both at the same time
2) start cache request and if we don't get the entry in a specified time start network request
3) do requests sequentially (the current state)

The data should be served all from one source, i.e. once cache/network starts delivering data, we should ignore/cancel the other. This bug is not about deciding when to use which option so we should still do requests sequentially by default.
Blocks: 1325341
Assignee: nobody → valentin.gosu
Whiteboard: [necko-active]
I just pushed a couple of patches to review-board, with a WIP approach.
The last test is currently failing, so I need to make a few changes before it's ready.
Comment on attachment 8827417 [details]
Bug 1325081 - Add interface to delay the cache fetch in order to test network-cache racing of HTTP requests

https://reviewboard.mozilla.org/r/105098/#review105912

sorry to interfere, just checking on changes to nsHttpChannel...

::: netwerk/protocol/http/nsHttpChannel.cpp:3626
(Diff revision 1)
> +                cacheStorage->AsyncOpenURI(openURI, extension, cacheEntryOpenFlags, this);
> +            });
> +
> +            mCacheOpenTimer = do_CreateInstance(NS_TIMER_CONTRACTID);
> +            mCacheOpenTimer->InitWithFuncCallback([](nsITimer *aTimer, void *aClosure) -> void {
> +                nsHttpChannel *self = static_cast<nsHttpChannel*>(aClosure);

don't!  you must pass |RefPtr<nsHttpChannel> self = this| to the lambda (by copy).  never user timer's or any rawptr closure for refcoutned objects!

::: netwerk/protocol/http/nsHttpChannel.cpp:8464
(Diff revision 1)
> +            return rv;
> +        }
> +        mCacheOpenTimer = nullptr;
> +    }
> +    mCacheOpenDelay = 0;
> +    mCacheOpenRunnable->Run();

swap to local nsCOMPtr<> and call it from it.  you never know what AsyncOpenURI may do, there is potential for this method to be reentered, even very small.

::: netwerk/protocol/http/nsIRaceCacheWithNetwork.idl:12
(Diff revision 1)
> +#include "nsISupports.idl"
> +
> +/**
> + * This holds methods used to race the cache with the network for a specific
> + * channel.
> + */

who is a typical implementer?

::: netwerk/protocol/http/nsIRaceCacheWithNetwork.idl:36
(Diff revision 1)
> +   * Normally a HTTP channel would immediately call AsyncOpenURI leading to the
> +   * cache storage to lookup the entry and return it. In order to simmulate real
> +   * life conditions where fetching a cache entry takes a long time, we set a
> +   * timer to delay the operation.
> +   */
> +  void test_setAsyncOpenURITimeout(in long timeout);

nit: maybe the names should more be like "delayCacheEntryOpeningBy()" and "openCacheEntryNow()" or something.  just for those looking at the code not knowing what AsyncOpenURI is..

but up to you if you want to make it cryptic for unauthorized trespassers ;)
Comment on attachment 8827418 [details]
Bug 1325081 - Change nsHttpChannel to be able to race network with cache

https://reviewboard.mozilla.org/r/105100/#review105916

few notes here as well.

::: netwerk/protocol/http/nsHttpChannel.cpp:4162
(Diff revision 1)
>      else if (wantCompleteEntry)
>          *aResult = RECHECK_AFTER_WRITE_FINISHED;
> -    else
> +    else {
>          *aResult = ENTRY_WANTED;
> +        if (mFirstResponseSource == RESPONSE_PENDING) {
> +            mFirstResponseSource = RESPONSE_FROM_CACHE;

note that here you don't have the entry yet.  this callback is usually (may be) called on a background cache i/o thread.  possession of the entry is done in oncacheentryavailable called on the main thread and onstartrequest may be delayed even further.

::: netwerk/protocol/http/nsHttpChannel.cpp:6850
(Diff revision 1)
> -        if (mTransactionReplaced)
> +        if (mTransactionReplaced) {
> +            LOG(("Transaction replaced\n"));
> +
> +            // This was just the network check for a 304 response.
> +            // Continue from cache.
> +            mFirstResponseSource = RESPONSE_FROM_CACHE;

should either ReadFromCache or ODA set this rather?  if there is a need to do it from here, please add a comment why

::: netwerk/protocol/http/nsHttpChannel.cpp:8545
(Diff revision 1)
> -}
> +    }
>  
> +    mNetworkTriggerTimer = do_CreateInstance(NS_TIMER_CONTRACTID);
> +    mNetworkTriggerTimer->InitWithFuncCallback([](nsITimer *aTimer, void *aClosure) -> void {
> +        nsHttpChannel *self = static_cast<nsHttpChannel*>(aClosure);
> +        self->TriggerNetwork(0);

closure again..
Comment on attachment 8827417 [details]
Bug 1325081 - Add interface to delay the cache fetch in order to test network-cache racing of HTTP requests

https://reviewboard.mozilla.org/r/105098/#review108244

::: netwerk/protocol/http/nsHttpChannel.cpp:8472
(Diff revision 1)
> +}
> +
> +NS_IMETHODIMP
> +nsHttpChannel::TriggerNetwork(int32_t aTimeout)
> +{
> +    return TryHSTSPriming();

aTimeout is ignored?

::: netwerk/protocol/http/nsIRaceCacheWithNetwork.idl:18
(Diff revision 1)
> +[scriptable, builtinclass, uuid(4d963475-8b16-4c58-b804-8a23d49436c5)]
> +interface nsIRaceCacheWithNetwork : nsISupports
> +{
> +  /**
> +   * Triggers network activity after given timeout. If timeout is 0, network
> +   * activity is triggered immediately. If the cache.asyncOpenURI callbacks

Would be good to note that the timeout is in milliseconds.

::: netwerk/protocol/http/nsIRaceCacheWithNetwork.idl:23
(Diff revision 1)
> +   * activity is triggered immediately. If the cache.asyncOpenURI callbacks
> +   * have already been called, the network activity may have already been triggered
> +   * or the content may have already been delivered from the cache, so this
> +   * operation will have no effect.
> +   */
> +  void triggerNetwork(in long timeout);

I think this can be also prefixed by test_, because we will IMO set the timeout internally in nsHttpChannel and not via interface.

::: netwerk/protocol/http/nsIRaceCacheWithNetwork.idl:41
(Diff revision 1)
> +  void test_setAsyncOpenURITimeout(in long timeout);
> +
> +  /**
> +   * Immediatelly triggers AsyncOpenURI if the timer hasn't fired.
> +   */
> +  void test_triggerAsyncOpenURI();

What will this be used for?

::: netwerk/test/unit/test_race_cache_network.js:46
(Diff revision 1)
> +    g200Counter++;
> +  }
> +}
> +
> +let gResponseCounter = 0;
> +function initialGet(request, buffer)

Why is this called initialGet when it is used in every test?

::: netwerk/test/unit/test_race_cache_network.js:69
(Diff revision 1)
> +  yield undefined;
> +  equal(gResponseCounter, 1);
> +  equal(g200Counter, 1, "check number of 200 responses");
> +  equal(g304Counter, 0, "check number of 304 responses");
> +
> +  // Checks that response is returned from the cache, after a 302 response.

304 response

::: netwerk/test/unit/test_race_cache_network.js:79
(Diff revision 1)
> +  equal(g200Counter, 1, "check number of 200 responses");
> +  equal(g304Counter, 1, "check number of 304 responses");
> +
> +  // Checks that delaying the response from the cache works.
> +  var channel = make_channel("http://localhost:" + PORT + "/rcwn");
> +  channel.QueryInterface(Components.interfaces.nsIRaceCacheWithNetwork).test_setAsyncOpenURITimeout(1000);

We could also test that the response was delayed at least by the given timeout.
Attachment #8827417 - Flags: review?(michal.novotny) → review-
Comment on attachment 8827418 [details]
Bug 1325081 - Change nsHttpChannel to be able to race network with cache

https://reviewboard.mozilla.org/r/105100/#review108300

::: netwerk/protocol/http/nsHttpChannel.cpp:3717
(Diff revision 1)
>      nsresult rv = NS_OK;
>  
>      LOG(("nsHttpChannel::OnCacheEntryCheck enter [channel=%p entry=%p]",
>          this, entry));
>  
> +    if (mFirstResponseSource == RESPONSE_FROM_NETWORK) {

Is this threadsafe? mFirstResponseSource can be set in OnStartRequest on the main thread while we're checking it here on cache IO thread.

::: netwerk/protocol/http/nsHttpChannel.cpp:6678
(Diff revision 1)
>      LOG(("nsHttpChannel::OnStopRequest [this=%p request=%p status=%x]\n",
>          this, request, status));
>  
> +    LOG(("OnStopRequest %p requestFromCache: %d mFirstResponseSource: %d\n", this, request == mCachePump, mFirstResponseSource));
> +    MOZ_ASSERT((request == mTransactionPump && mFirstResponseSource == RESPONSE_FROM_NETWORK) ||
> +               (request == mCachePump && mFirstResponseSource == RESPONSE_FROM_CACHE), "We should have the correct source");

I don't think we have mFirstResponseSource always correct. E.g. in OnCacheEntryCheck we set mFirstResponseSource = REPONSE_FROM_CACHE, but network is faster, so OnStartRequest is called with request == mTransactionPump. In this case we miss all the conditions at the beginning of OnStartRequest and serve the content from network, that's OK but mFirstResponseSource stays wrong.
Attachment #8827418 - Flags: review?(michal.novotny) → review-
I managed to get this to pass all of the unit tests. I had a bit of a hard time with partial cache entries, as we get multiple calls to OnStart,OnData and OnStopRequest from both the cache pump and the transaction pump.
I changed mFirstResponseSource to only be set in OnStartRequest.

One issue I need some feedback on, is what should we do when we get OnStartRequest from the cachePump?
For now I just suspend the transactionPump, and it seems to do the trick, but I'm not sure if that's the best choice.
Calling gHttpHandler->CancelTransaction doesn't work, as it calls OnStop with an error code immediately. Or maybe I should do that, and just ignore an OnStop with an error status code that comes from the other pump that what we are using?

(In reply to Honza Bambas (:mayhemer) from comment #4)
> Comment on attachment 8827417 [details]
> Bug 1325081 - Add interface to delay the cache fetch in order to test
> network-cache racing of HTTP requests
> 
> > +            mCacheOpenTimer->InitWithFuncCallback([](nsITimer *aTimer, void *aClosure) -> void {
> > +                nsHttpChannel *self = static_cast<nsHttpChannel*>(aClosure);
> 
> don't!  you must pass |RefPtr<nsHttpChannel> self = this| to the lambda (by
> copy).  never user timer's or any rawptr closure for refcoutned objects!

Unfortunately, InitWithFuncCallback doesn't take a closure, but a function pointer, and a lambda with no captured variables is automatically casts to that. I did this to avoid defining a separate method in another place, and this seems cleaner.
This is safe because the timers are members of nsHttpChannel, so the method will not be called after the destructor.
Other instances of InitWithFuncCallback do pretty much the same thing.
I am unsure if the channel's destructor may be called off main thread, but I did add a RefPtr to make sure it stays alive until the lambda goes out of scope.
(In reply to Valentin Gosu [:valentin] from comment #12)
> I managed to get this to pass all of the unit tests. I had a bit of a hard
> time with partial cache entries, as we get multiple calls to OnStart,OnData
> and OnStopRequest from both the cache pump and the transaction pump.
> I changed mFirstResponseSource to only be set in OnStartRequest.
> 
> One issue I need some feedback on, is what should we do when we get
> OnStartRequest from the cachePump?
> For now I just suspend the transactionPump, and it seems to do the trick,
> but I'm not sure if that's the best choice.
> Calling gHttpHandler->CancelTransaction doesn't work, as it calls OnStop
> with an error code immediately. Or maybe I should do that, and just ignore
> an OnStop with an error status code that comes from the other pump that what
> we are using?
> 
> (In reply to Honza Bambas (:mayhemer) from comment #4)
> > Comment on attachment 8827417 [details]
> > Bug 1325081 - Add interface to delay the cache fetch in order to test
> > network-cache racing of HTTP requests
> > 
> > > +            mCacheOpenTimer->InitWithFuncCallback([](nsITimer *aTimer, void *aClosure) -> void {
> > > +                nsHttpChannel *self = static_cast<nsHttpChannel*>(aClosure);
> > 
> > don't!  you must pass |RefPtr<nsHttpChannel> self = this| to the lambda (by
> > copy).  never user timer's or any rawptr closure for refcoutned objects!
> 
> Unfortunately, InitWithFuncCallback doesn't take a closure, but a function
> pointer, and a lambda with no captured variables is automatically casts to
> that. I did this to avoid defining a separate method in another place, and
> this seems cleaner.
> This is safe because the timers are members of nsHttpChannel, so the method
> will not be called after the destructor.
> Other instances of InitWithFuncCallback do pretty much the same thing.
> I am unsure if the channel's destructor may be called off main thread, but I
> did add a RefPtr to make sure it stays alive until the lambda goes out of
> scope.

What I have in mind:

RefPtr<httpchannel> self = this;
mCacheOpenTimer->InitWithFuncCallback([self](...)...
comment 12
Flags: needinfo?(honzab.moz)
(In reply to Honza Bambas (:mayhemer) from comment #13)
> What I have in mind:
> 
> RefPtr<httpchannel> self = this;
> mCacheOpenTimer->InitWithFuncCallback([self](...)...

I can't call InitWithFuncCallback with this kind of lambda. It would also create a cycle channel -> timer -> channel, until the timer fired, and we finally null it out.
(In reply to Valentin Gosu [:valentin] from comment #15)
> (In reply to Honza Bambas (:mayhemer) from comment #13)
> > What I have in mind:
> > 
> > RefPtr<httpchannel> self = this;
> > mCacheOpenTimer->InitWithFuncCallback([self](...)...
> 
> I can't call InitWithFuncCallback with this kind of lambda. It would also
> create a cycle channel -> timer -> channel, until the timer fired, and we
> finally null it out.

and is that such a big problem?  the timer either has to fire (fulfill its purpose) or has to be canceled before the channel is released (to not fire and try to access the un-addrefed now deleted closure.)  So I don't see the problem here.

if it's that complicated to go with a lambda then please let http channel implement the nsITimerCallback interface and init the timer via initWithCallback.
Flags: needinfo?(honzab.moz)
still need to answer comment 12
Flags: needinfo?(honzab.moz)
(In reply to Honza Bambas (:mayhemer) from comment #17)
> still need to answer comment 12

Answered on IRC, to sum:
- don't use partial cached content when racing is on, since you can't use it w/o a server response to a range request (or some validation)
- cache pump can be canceled when it loses
- net pump can be soft-canceled when it loses (h1 keep-alive trans data should be discarded but the trans left to finish)
- onstop from a manually canceled pump must obviously be ignored
Flags: needinfo?(honzab.moz)
Comment on attachment 8827417 [details]
Bug 1325081 - Add interface to delay the cache fetch in order to test network-cache racing of HTTP requests

https://reviewboard.mozilla.org/r/105098/#review110864
Attachment #8827417 - Flags: review?(michal.novotny) → review+
Comment on attachment 8827418 [details]
Bug 1325081 - Change nsHttpChannel to be able to race network with cache

https://reviewboard.mozilla.org/r/105100/#review110902

::: netwerk/protocol/http/nsHttpChannel.cpp:424
(Diff revision 3)
>              return NS_ERROR_DOCUMENT_NOT_CACHED;
>          }
>          // otherwise, let's just proceed without using the cache.
>      }
>  
> -    return TryHSTSPriming();
> +    return Test_triggerNetwork(0);

Call the internal method TriggerNetwork.

::: netwerk/protocol/http/nsHttpChannel.cpp:2006
(Diff revision 3)
>          // relevant.
>          DebugOnly<nsresult> rv = ProcessSecurityHeaders();
>          MOZ_ASSERT(NS_SUCCEEDED(rv), "ProcessSTSHeader failed, continuing load.");
>      }
>  
> -    MOZ_ASSERT(!mCachedContentIsValid);
> +    // MOZ_ASSERT(!mCachedContentIsValid);

Remove it.

::: netwerk/protocol/http/nsHttpChannel.cpp:3720
(Diff revision 3)
>      nsresult rv = NS_OK;
>  
>      LOG(("nsHttpChannel::OnCacheEntryCheck enter [channel=%p entry=%p]",
>          this, entry));
>  
> +    if (mRacingNetAndCache && mFirstResponseSource == RESPONSE_FROM_NETWORK) {

I think that both mRacingNetAndCache and mFirstResponseSource need to be atomic. Did you check what exactly happens if mFirstResponseSource is set to RESPONSE_FROM_NETWORK right after passing this check?

::: netwerk/protocol/http/nsHttpChannel.cpp:6850
(Diff revision 3)
>  
>          // if this transaction has been replaced, then bail.
> -        if (mTransactionReplaced)
> +        if (mTransactionReplaced) {
> +            LOG(("Transaction replaced\n"));
> +            // This was just the network check for a 304 response.
> +            // mNetworkTriggered = false;

Is this needed or not?

::: netwerk/protocol/http/nsHttpChannel.cpp:8511
(Diff revision 3)
> +
> +    if (!aTimeout) {
> +        mNetworkTriggered = true;
> +        if (!mOnCacheAvailableCalled) {
> +            // If the network was triggered before onCacheEntryAvailable was
> +            // called, we are either racing network and cache, the load is

Did you mean "_or_ the load is bypassing the cache"?

::: netwerk/test/unit/test_race_cache_network.js:194
(Diff revision 3)
> +  equal(g200Counter, 6, "check number of 200 responses");
> +  equal(g304Counter, 4, "check number of 304 responses");
> +
> +  // Set an increasingly high timeout to trigger opening the cache entry
> +  // This way we ensure that some of the entries we will get from the network,
> +  // and some we will get from the cache.

Is it really ensured that all entries are not loaded from the network?
Attachment #8827418 - Flags: review?(michal.novotny) → review-
(In reply to Michal Novotny (:michal) from comment #23)
> I think that both mRacingNetAndCache and mFirstResponseSource need to be
> atomic. Did you check what exactly happens if mFirstResponseSource is set to
> RESPONSE_FROM_NETWORK right after passing this check?

Forgot to comment on this. I manually set mFirstResponseSource to RESPONSE_FROM_NETWORK immediately after checking its value in OnCacheEntryCheck, and everything seems to work just fine.

The latest push adds a few changes, in order to fix the Windows assertions.
(In reply to Valentin Gosu [:valentin] from comment #31)
> (In reply to Michal Novotny (:michal) from comment #23)
> > I think that both mRacingNetAndCache and mFirstResponseSource need to be
> > atomic. Did you check what exactly happens if mFirstResponseSource is set to
> > RESPONSE_FROM_NETWORK right after passing this check?
> 
> Forgot to comment on this. I manually set mFirstResponseSource to
> RESPONSE_FROM_NETWORK immediately after checking its value in
> OnCacheEntryCheck, and everything seems to work just fine.

No, you set mFirstResponseSource to RESPONSE_FROM_NETWORK in OnStartRequest, not in OnCacheEntryCheck. Both methods are called on a different thread so it can happen that we won't exit early from OnCacheEntryCheck when the response goes from the network. We might end up failing somewhere, but if not then we might also end up calling ReadFromCache in OnCacheEntryAvailableInternal.
Comment on attachment 8827418 [details]
Bug 1325081 - Change nsHttpChannel to be able to race network with cache

https://reviewboard.mozilla.org/r/105100/#review114164

::: netwerk/test/unit/test_race_cache_network.js:62
(Diff revision 5)
> -function checkContent(request, buffer)
> +let gIsFromCache = 0;
> +function checkContent(request, buffer, context, isFromCache)
>  {
>    do_check_eq(buffer, gResponseBody);
>    gResponseCounter++;
> -  testGenerator.next();
> +  gIsFromCache += isFromCache;

Adding bool value to an integer is a bit weird. Please do if (isFromCache) { gIsFromCache++; }
Attachment #8827418 - Flags: review?(michal.novotny)
(In reply to Michal Novotny (:michal) from comment #32)
As discussed on IRC, right after checking mFirstResponseSource in OnCacheEntryCheck, I tried setting mFirstResponseSource = RESPONSE_FROM_NETWORK to simulate the race. It all worked just fine, with all the responses coming from the network.

At your suggestion I'm attempting to use a mutex to make sure we race. However, all of the calls to OnCacheEntryCheck are made on the main thread, so it didn't really work for me. CHECK_MULTITHREADED is set when calling AsyncOpenURI, but it still only uses the main thread. Is there a way to force it to call OnCacheEntryCheck _off_ the main thread?
Flags: needinfo?(michal.novotny)
Normally, there is probably no way how to force the callback to be executed off the main thread. For the testing you could change CacheEntry::Callback::OnCheckThread() so that it returns true only on non-main thread if mCheckOnAnyThread == true. Then you need to redispatch to e.g. cache IO thread here http://searchfox.org/mozilla-central/rev/cac6cb6a10afb8ebb2ecfbeeedaff7c66f57dd75/netwerk/cache2/CacheEntry.cpp#659. But make sure you don't break the functionality for mCheckOnAnyThread == false.
Flags: needinfo?(michal.novotny)
I'm attaching the patch I used to check if any racing issues may occur. I checked that the OnCacheEntryCheck - OnStartRequest - OnCacheEntryAvailable case is fine.
Comment on attachment 8827418 [details]
Bug 1325081 - Change nsHttpChannel to be able to race network with cache

https://reviewboard.mozilla.org/r/105100/#review114558

Looks good, thanks!
Attachment #8827418 - Flags: review?(michal.novotny) → review+
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/1a07e0ed1dac
Add interface to delay the cache fetch in order to test network-cache racing of HTTP requests r=michal
https://hg.mozilla.org/integration/autoland/rev/23e5d7eb8fbc
Change nsHttpChannel to be able to race network with cache r=michal
https://hg.mozilla.org/mozilla-central/rev/1a07e0ed1dac
https://hg.mozilla.org/mozilla-central/rev/23e5d7eb8fbc
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Depends on: 1343302
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: