Closed Bug 588507 Opened 14 years ago Closed 14 years ago

Skip caching if Content-length > eviction size

Categories

(Core :: Networking: Cache, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: jduell.mcbugs, Assigned: byronm)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 8 obsolete files)

Right now we start caching a file and then evict if it winds up being > eviction size (see bug 81640).   When we have the content-length available, we should simply set a flag to prevent caching.

I'm not sure what the right flag to set here is:  INHIBIT_CACHING? LOAD_BYPASS_LOCAL_CACHE?  My skim of the source is that they both prevent storing the download in the cache, but that INHIBIT allows reading from an existing cache entry if present.  Which makes no difference for this bug.

I assume we don't want either INHIBIT_PERSISTENT_CACHING (we don't want to store in memory cache either), nor LOAD_BYPASS_CACHE (we want to allow proxies to provide files if they've got them cached, rather than force request to origin server).
(In reply to comment #0)
> Right now we start caching a file and then evict if it winds up being >
> eviction size (see bug 81640).   When we have the content-length available, we
> should simply set a flag to prevent caching.

I actually do not think this would work. The reason being that these flags are
set on the request end of the overall transaction. From the perspective of the
cache, the basic process is that AsyncOpen is called on nsHttpChannel. The
channel then tries to find the requested resource in the cache, before hitting
the net. This is where these flags are used; INHIBIT_CACHING, for instance,
would prevent us from searching the cache.

Unfortunately for us, we do not know if we want to cache or not until we have
received the response from the server. So what will end up happening is we will
create a cache entry in OpenCacheEntry (this is the default behavior when the
resource is not already in the cache) and as we get data back, we will start
writing to the entry.

Michal and I have come up with a strategy that will basically doom the created
entry, and cut off the writing to it, if we see in the response that
ContentLength is greater than the max allowed side. One further complication is
the fact that we will need to see what device the entry is going to be sent
to--disk or memory cache. These two devices have different max file limits.

I am in the process of writing the patch to this end and hope to have it done
soon...

Jason, if I am missing something obvious that you saw about where the flag
could be set so that it gets the job done, I'm all ears. I would not be
surprised if I just missed what you meant.
blocking2.0: --- → ?
> These two devices have different max file limits.

Seems like we should be able to ask nsICacheEntryDescriptor what maximal size it will accept; then it can talk to the devices as needed.
Attached patch content_length_v1 (obsolete) — Splinter Review
This is mostly me just backing up my work. The patch actually works (we now bypass the cache if Content-Length > eviction size). But there are two main things I need to do: 

1) Come up with a good way to determine if Content-Length has been set or not  
   (to cover the case where it is not specified in the response header).
2) Make nsInputStreamTee threadsafe, because this patch adds concurrent access 
   to it. I've started doing this in this version but its not complete/correct 
   yet.
> +    //TODO: Check that ContentLength has *actually* been set

If it's not equal to -1 (not UINT_MAX; content length is signed for a reason), then it has not been set.  Otherwise it has.
Also, you probably want to use NS_CHANNEL_PROP_CONTENT_LENGTH instead of GetContentLength; otherwise you'll get all sorts of weird behavior for files over 2GB in size.  We should really just rev the channel API to use 64-bit content lengths...
Low hanging cache fruit with big gain, blocking.
blocking2.0: ? → final+
blocking2.0: final+ → betaN+
(In reply to comment #5)
> We should really just rev the channel API to use 64-bit
> content lengths...

See Bug 536324, which recently landed on M-C.
Attached patch content_length_v2 (obsolete) — Splinter Review
Now ensures that Content-Length was set before checking if Content-Length is greater than the maximum allowed cache entry size.

Adds thread safety to InputStreamTee and is more correct than previous version. Still need to test this though.
Attachment #468427 - Attachment is obsolete: true
Comment on attachment 469420 [details] [diff] [review]
content_length_v2

> +   nsCacheService::DoomEntry(entry);
> +   return nsnull;

Maybe we should at least check the return value and log in case of error (just like in nsDiskCacheDevice::OnDataSizeChange)


> +  return predictedSize > kMaxDataFileSize.
> +         || predictedSize > (static_cast<PRInt64>(mCacheCapacity) * 1024 / 8);

We should have the check at one place only. I.e. use this method in nsDiskCacheDevice::OnDataSizeChange(). And do the same in memory cache.


> +  // R?: Do we want to set a max file size for offline cache?.
> +  return PR_TRUE;

I don't think we want some limit here. Anyway you should return PR_FALSE.


> +nsMemoryCacheDevice::EntryIsTooBig(PRInt64 predictedSize)
> +{
> +  return predictedSize <= mSoftLimit;
> +}

The same as above: predictedSize > mSoftLimit


> +   rv = mCacheEntry->SetPredictedDataSize(predictedDataSize); 

nit: The patch removes spaces at the end of the line at few places but e.g. this new line (and some others) contains it too.


> +   nsIOutputStream *aSink, nsInputStreamTee*
> +   aRootTee)

Please keep type and name on the same line.


> +   // mSink could have been nulled between when this runnable was
> +   // dispatched and when its actually run
> +   if (mSink) {

But it is nulled in the tee, not in the tee event. You should get here the mSink from the tee instead of passing it as an argument when creating the event.


> +   // back pointer to stream that created this runnable
> +   nsCOMPtr<nsInputStreamTee> mRootTee;

The name is a bit confusing for me. I think just mTee is fine.


> nsInputStreamTee::TeeSegment(const char *buf, PRUint32 count)
> {
> -    if (!mSink)
> +    // Only synchronize this function if this is an async tee.
> +    if (mLock) PR_Lock(mLock);
> +....
> +    if (!mSink) {
> +        if (mLock) PR_Unlock(mLock);
>          return NS_OK; // nothing to do
> -
> +    }
>      if (mEventTarget) {
>          nsRefPtr<nsIRunnable> event =
> -            new nsInputStreamTeeWriteEvent(buf, count, mSink);
> +            new nsInputStreamTeeWriteEvent(buf, count, mSink, this);
>          NS_ENSURE_TRUE(event, NS_ERROR_OUT_OF_MEMORY);
>          LOG(("nsInputStreamTee::TeeSegment [%p] dispatching write %u bytes\n",
>                  this, count));
> +        // This should be safe. We know that InputStreamTee::Read cannot
> +        // return until Dispatch returns, and we know that we can only have one
> +        // client at a time (b/c of global locking on entry), so there should be
> +        // no race here.
> +        PR_Unlock(mLock);.
>          return mEventTarget->Dispatch(event, NS_DISPATCH_NORMAL);
>      }
> -
> +    // No locking is required in the synchronous case.

NS_ENSURE_TRUE would keep the mLock locked if !event. I think you should refactor whole method. mLock should be null iff mEventTarget is null. So do something like:

nsInputStreamTee::TeeSegment
{
  if (mlock) {
    nsAutoLock lock(mLock)
    NS_ASSERTION(mEventTarget, "...");
    ... async code
  } else {
    NS_ASSERTION(!mEventTarget, "...");
    ... sync code
  }
}


>     mSource = 0;
> +   nsAutoLock lock(mLock);
>     mSink = 0;

This would assert if mLock is NULL.


> +nsInputStreamTee::SetLock(void)
> +{
> +  mLock = PR_NewLock();
> +  if (!mLock) NS_ERROR("Failed to allocate lock for nsInputStreamTee");
> +  return NS_OK;
> +}

Return error if !mLock.
(In reply to comment #9)
> Comment on attachment 469420 [details] [diff] [review]
> content_length_v2
> 
> > +   nsCacheService::DoomEntry(entry);
> > +   return nsnull;
> 
> Maybe we should at least check the return value and log in case of error (just
> like in nsDiskCacheDevice::OnDataSizeChange)

Fixed it. 
 
> > +  return predictedSize > kMaxDataFileSize.
> > +         || predictedSize > (static_cast<PRInt64>(mCacheCapacity) * 1024 / 8);
> 
> We should have the check at one place only. I.e. use this method in
> nsDiskCacheDevice::OnDataSizeChange(). And do the same in memory cache.

Agreed. Do you know why we round the size up to the nearest 1k for the mCacheCapacity/2 check in nsDiskCacheDevice::OnDataSizeChange? (It could just be an approximation accounting for truncation due to integer division.) I'm not rounding up with my mCacheCapacity/8 check... should I be? 
 
> > +  // R?: Do we want to set a max file size for offline cache?.
> > +  return PR_TRUE;
> 
> I don't think we want some limit here. Anyway you should return PR_FALSE.
> 
> 
> > +nsMemoryCacheDevice::EntryIsTooBig(PRInt64 predictedSize)
> > +{
> > +  return predictedSize <= mSoftLimit;
> > +}
> 
> The same as above: predictedSize > mSoftLimit

Yes, sorry. I inverted the function's signature but forgot to change these two. 
 
> > +   rv = mCacheEntry->SetPredictedDataSize(predictedDataSize); 
> 
> nit: The patch removes spaces at the end of the line at few places but e.g.
> this new line (and some others) contains it too.

I went back through and made it consistent--no extra spaces at the ends of lines. 
 
> > +   nsIOutputStream *aSink, nsInputStreamTee*
> > +   aRootTee)
> 
> Please keep type and name on the same line.

Fixed. 
 
> > +   // mSink could have been nulled between when this runnable was
> > +   // dispatched and when its actually run
> > +   if (mSink) {
> 
> But it is nulled in the tee, not in the tee event.

I'm not sure what you mean. The tee event nulls mSink in its Run() method if the write fails. While it does this by calling a function on the tee, whatever thread we dispatch the event to should actually execute the code, right?

> You should get here the
> mSink from the tee instead of passing it as an argument when creating the
> event.

The sink is already passed into the constructor of the nsInputStreamTeeWriteEvent before my patch. Do you want me to change this?  
 
> > +   // back pointer to stream that created this runnable
> > +   nsCOMPtr<nsInputStreamTee> mRootTee;
> 
> The name is a bit confusing for me. I think just mTee is fine.

Changed the name to be mTee.
 
> > nsInputStreamTee::TeeSegment(const char *buf, PRUint32 count)
> > {
> > -    if (!mSink)
> > +    // Only synchronize this function if this is an async tee.
> > +    if (mLock) PR_Lock(mLock);
> > +....
> > +    if (!mSink) {
> > +        if (mLock) PR_Unlock(mLock);
> >          return NS_OK; // nothing to do
> > -
> > +    }
> >      if (mEventTarget) {
> >          nsRefPtr<nsIRunnable> event =
> > -            new nsInputStreamTeeWriteEvent(buf, count, mSink);
> > +            new nsInputStreamTeeWriteEvent(buf, count, mSink, this);
> >          NS_ENSURE_TRUE(event, NS_ERROR_OUT_OF_MEMORY);
> >          LOG(("nsInputStreamTee::TeeSegment [%p] dispatching write %u bytes\n",
> >                  this, count));
> > +        // This should be safe. We know that InputStreamTee::Read cannot
> > +        // return until Dispatch returns, and we know that we can only have one
> > +        // client at a time (b/c of global locking on entry), so there should be
> > +        // no race here.
> > +        PR_Unlock(mLock);.
> >          return mEventTarget->Dispatch(event, NS_DISPATCH_NORMAL);
> >      }
> > -
> > +    // No locking is required in the synchronous case.
> 
> NS_ENSURE_TRUE would keep the mLock locked if !event. I think you should
> refactor whole method. mLock should be null iff mEventTarget is null. So do
> something like:
> 
> nsInputStreamTee::TeeSegment
> {
>   if (mlock) {
>     nsAutoLock lock(mLock)
>     NS_ASSERTION(mEventTarget, "...");
>     ... async code
>   } else {
>     NS_ASSERTION(!mEventTarget, "...");
>     ... sync code
>   }
> }

I like this suggestion a lot and refactored it.

> >     mSource = 0;
> > +   nsAutoLock lock(mLock);
> >     mSink = 0;
> 
> This would assert if mLock is NULL.

I changed it so that we only acquire the lock if it is not null. Is that ok, or were you saying that you wanted me to add an NS_ASSERT? 
 
> > +nsInputStreamTee::SetLock(void)
> > +{
> > +  mLock = PR_NewLock();
> > +  if (!mLock) NS_ERROR("Failed to allocate lock for nsInputStreamTee");
> > +  return NS_OK;
> > +}
> 
> Return error if !mLock.

Added a return NS_ERROR_OUT_OF_MEMORY.
Attached patch content_length_v3 (obsolete) — Splinter Review
Addresses Michal's comments. See previous post.
Attachment #469420 - Attachment is obsolete: true
Comment on attachment 469755 [details] [diff] [review]
content_length_v3

Carrying Michal's r+ to this version.
Attachment #469755 - Flags: review+
Comment on attachment 469755 [details] [diff] [review]
content_length_v3

Disregard previous comment. Wrong bug.
Attachment #469755 - Flags: review+
(In reply to comment #10)
> > > +  return predictedSize > kMaxDataFileSize.
> > > +         || predictedSize > (static_cast<PRInt64>(mCacheCapacity) * 1024 / 8);
> > 
> > We should have the check at one place only. I.e. use this method in
> > nsDiskCacheDevice::OnDataSizeChange(). And do the same in memory cache.
> 
> Agreed. Do you know why we round the size up to the nearest 1k for the
> mCacheCapacity/2 check in nsDiskCacheDevice::OnDataSizeChange? (It could just
> be an approximation accounting for truncation due to integer division.) I'm not
> rounding up with my mCacheCapacity/8 check... should I be? 

I don't think it is important to round up the size for the check.


> > > +   // mSink could have been nulled between when this runnable was
> > > +   // dispatched and when its actually run
> > > +   if (mSink) {
> > 
> > But it is nulled in the tee, not in the tee event.
> 
> I'm not sure what you mean. The tee event nulls mSink in its Run() method if
> the write fails. While it does this by calling a function on the tee, whatever
> thread we dispatch the event to should actually execute the code, right?

You have a copy of mSink in the tee event. Nulling mSink in tee doesn't null it in the tee event. So your comment in the code is wrong and the if-statement unnecessary.


> > You should get here the
> > mSink from the tee instead of passing it as an argument when creating the
> > event.
> 
> The sink is already passed into the constructor of the
> nsInputStreamTeeWriteEvent before my patch. Do you want me to change this?

Yes. Taking current mSink from the tee has the advantage that in case of failure no other already posted event will try to write to the failing sink.


> > >     mSource = 0;
> > > +   nsAutoLock lock(mLock);
> > >     mSink = 0;
> > 
> > This would assert if mLock is NULL.
> 
> I changed it so that we only acquire the lock if it is not null. Is that ok, or
> were you saying that you wanted me to add an NS_ASSERT? 

Yes, that's OK. I've just meant that nsAutoLock(nsnull) asserts here http://hg.mozilla.org/mozilla-central/annotate/4496d883d113/xpcom/glue/nsAutoLock.h#l218
(In reply to comment #14)
> (In reply to comment #10)
> > > > +  return predictedSize > kMaxDataFileSize.
> > > > +         || predictedSize > (static_cast<PRInt64>(mCacheCapacity) * 1024 / 8);
> > > 
> > > We should have the check at one place only. I.e. use this method in
> > > nsDiskCacheDevice::OnDataSizeChange(). And do the same in memory cache.
> > 
> > Agreed. Do you know why we round the size up to the nearest 1k for the
> > mCacheCapacity/2 check in nsDiskCacheDevice::OnDataSizeChange? (It could just
> > be an approximation accounting for truncation due to integer division.) I'm not
> > rounding up with my mCacheCapacity/8 check... should I be? 
> 
> I don't think it is important to round up the size for the check.

Cool. I now use EntryIsTooBig for the check in both places, for both devices. I do not worry about rounding up.
 
> > > > +   // mSink could have been nulled between when this runnable was
> > > > +   // dispatched and when its actually run
> > > > +   if (mSink) {
> > > 
> > > But it is nulled in the tee, not in the tee event.
> > 
> > I'm not sure what you mean. The tee event nulls mSink in its Run() method if
> > the write fails. While it does this by calling a function on the tee, whatever
> > thread we dispatch the event to should actually execute the code, right?
> 
> You have a copy of mSink in the tee event. Nulling mSink in tee doesn't null it
> in the tee event. So your comment in the code is wrong and the if-statement
> unnecessary.

I see your point now, sorry for the mix-up. Though I do still think the check is necessary (but only because I now acquire a pointer to the sink inside run() of the event). See the comment in the patch for the rationale. 

> > > You should get here the
> > > mSink from the tee instead of passing it as an argument when creating the
> > > event.
> > 
> > The sink is already passed into the constructor of the
> > nsInputStreamTeeWriteEvent before my patch. Do you want me to change this?
> 
> Yes. Taking current mSink from the tee has the advantage that in case of
> failure no other already posted event will try to write to the failing sink.

Fixed this.
Attached patch content_length_v4 (obsolete) — Splinter Review
Incorporates Michal's feedback. See previous comment.
Attachment #469755 - Attachment is obsolete: true
(In reply to comment #15)
> > You have a copy of mSink in the tee event. Nulling mSink in tee doesn't null it
> > in the tee event. So your comment in the code is wrong and the if-statement
> > unnecessary.
> 
> I see your point now, sorry for the mix-up. Though I do still think the check
> is necessary (but only because I now acquire a pointer to the sink inside run()
> of the event). See the comment in the patch for the rationale. 

Agree. Now you really need to check the sink :)


>      // If the new size is larger than max. file size or larger than
> -    // half the cache capacity (which is in KiB's), doom the entry and abort
> -    if ((newSize > kMaxDataFileSize) || (newSizeK > mCacheCapacity/2)) {
> +    // 1/8 the cache capacity (which is in KiB's), doom the entry and abort
> +    if (EntryIsTooBig(newSize)) {

Detailed comment should be in EntryIsTooBig().


> +
> +    attribute PRInt64    predictedDataSize;
> +

Add some description.


> +
> +    void setLock();

You don't need to add a new method to interface. E.g. initialize the lock in SetEventTarget(). Sorry, I overlooked this last time.


>      NS_IMETHOD Run()
>      {
> +        // Ensure nobody else may modify the sink
> +        nsAutoLock lock(mTee->GetLock());

You hold the lock during the entire Run() method. This means that although we are doing IO on a separate thread we may block UI thread just like in case of sync event. Acquire the lock only when really needed. I.e. move the nsAutoLock just before the call to GetSink(), then release the lock using nsAutoLock::unlock() and then lock/unlock again before/after the call to SetSink(nsnull).
This new version incorporates Michal's suggestions in Comment 17, but also introduces if a dedicated boolean to represent if the sink is safe to write to, rather than relying upon if mSink is null or not.

The reason is that mSink is also set to null when the stream is being closed (see http://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsStreamListenerTee.cpp#67 ), and in this case we do not want to stop writing to the sink from any pending tee events.

  (In reply to comment #17)
> >      NS_IMETHOD Run()
> >      {
> > +        // Ensure nobody else may modify the sink
> > +        nsAutoLock lock(mTee->GetLock());
> 
> You hold the lock during the entire Run() method. This means that although we
> are doing IO on a separate thread we may block UI thread just like in case of
> sync event. Acquire the lock only when really needed. I.e. move the nsAutoLock
> just before the call to GetSink(), then release the lock using
> nsAutoLock::unlock() and then lock/unlock again before/after the call to
> SetSink(nsnull).

I see your point about holding the lock for too long here. However, if I only hold the lock during the getters and setters for the boolean indicator (which is replacing the sink being null check), then we still allow race conditions. That is, the sink could become invalid to write to between when an event checks, and when an event actually writes to it. I know you said that this is ok, since its not a fatal problem anyways, but I have tried a solution that increases the granularity with (hopefully) no sacrifice in data consistency. However, this solution does still hold the lock across the write to mSink. If this is unacceptable, then I will just lock in the getters/setters, as you suggest, and live with the possibility that some events may write to the sink, even when they should not.
Attached patch content_length_v5 (obsolete) — Splinter Review
Attachment #470267 - Attachment is obsolete: true
Comment on attachment 470360 [details] [diff] [review]
content_length_v5

>  #include "nsStreamListenerTee.h"
> -
> +#include <prlock.h>
>  NS_IMPL_ISUPPORTS3(nsStreamListenerTee,

You probably forgot to remove it.


> +bool
> +nsOfflineCacheDevice::EntryIsTooBig(PRInt64 predictedSize)
> +{
> +  return PR_FALSE;
> +}

Don't declare EntryIsTooBig() in nsCacheDevice.h and then you don't need this unused method.


> +    PRInt64 predictedDataSize;
> +    this->GetContentLength(&predictedDataSize);
> +    rv = mCacheEntry->SetPredictedDataSize(predictedDataSize);

Remove "this->"


>  #include "nsIInputStream.idl"
> +#include <prlock.h>
>  interface nsIOutputStream;

Remove this include from nsIInputStreamTee.idl.


> +   //  The output stream could have been nulled between when this event.
> +   //  was dispatched and now, so atomically check before writing.

This comment is no longer true.


> That is, the sink could become invalid to write to between when an event
> checks, and when an event actually writes to it. I know you said that this is
> ok, since its not a fatal problem anyways, but I have tried a solution that
> increases the granularity with (hopefully) no sacrifice in data consistency.
> However, this solution does still hold the lock across the write to mSink. If
> this is unacceptable, then I will just lock in the getters/setters, as you
> suggest, and live with the possibility that some events may write to the sink,
> even when they should not.

It could happen only in case that write has failed and this really isn't a problem. BTW we didn't even try to stop writing to the sink till now.


> +  return mSinkIsValid;
> ...
> +  return mLock;
> ...
> +  mSinkIsValid = false;

Invalid indentation.


> +    if (mLock) { // asynchronous case
> +        NS_ASSERTION(mEventTarget, "mEventTarget is null, mLock is not null.");
> +        nsAutoLock lock(mLock);
> +        if (!mSinkIsValid) return NS_OK; // nothing to do
>          nsRefPtr<nsIRunnable> event =
> -            new nsInputStreamTeeWriteEvent(buf, count, mSink);
> +          new nsInputStreamTeeWriteEvent(buf, count, mSink, this);

1) Lock only the mSinkIsValid check. There is no need to hold the lock longer.
2) You still want to check for mSink.
3) The last line has invalid indentation.
Attached patch content_length_v6 (obsolete) — Splinter Review
Incorporates Michal's comments.

The locking discipline has been simplified. We now only synchronize reads and writes to mSinkIsValid, and rely on the fact that all writes to the field are serialized on the cacheIOThread. There should be no performance issues with this scheme, as we do not hold the lock across I/O.
Attachment #470360 - Attachment is obsolete: true
Attached patch content_length_v6.1 (obsolete) — Splinter Review
Attachment #470548 - Attachment is obsolete: true
Attached patch content_length_v6.2 (obsolete) — Splinter Review
Attachment #470557 - Attachment is obsolete: true
Attachment #470570 - Flags: review?(michal.novotny)
Attachment #470570 - Flags: review?(michal.novotny) → review+
Ready for check-in.
Attachment #470570 - Attachment is obsolete: true
Attachment #472707 - Flags: review+
Contains updates to Part 1 necessary for merging with the tip of the tree. Ready for check-in.
Keywords: checkin-needed
Comment on attachment 472708 [details] [diff] [review]
ContentLength_Part2

r=dwitte
Attachment #472708 - Flags: review+
http://hg.mozilla.org/mozilla-central/rev/4905e9c69a52
http://hg.mozilla.org/mozilla-central/rev/1633b8d8a184
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Keywords: checkin-needed
Depends on: 644431
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: