Closed Bug 913812 Opened 11 years ago Closed 10 years ago

HTTP cache v2: properly propagate errors through the cache file code

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla31

People

(Reporter: mayhemer, Assigned: michal)

References

Details

(Whiteboard: [cache2])

Attachments

(1 file, 1 obsolete file)

      No description provided.
Attached patch patch v1 (obsolete) — Splinter Review
This patch was created and tested mainly for situation when we have the CacheEntry/CacheFile but the cache file on disk was deleted.
Attachment #8370087 - Flags: review?(honzab.moz)
Comment on attachment 8370087 [details] [diff] [review]
patch v1

Review of attachment 8370087 [details] [diff] [review]:
-----------------------------------------------------------------

as I understand this patch switches CacheFile to error state (= not returning chunks to either input or outputstream, throwing data away when write failed) on any read or write io failure as well as any dispatch related to read and writes.

I think it's too much.

Michal, let's split the work here to two pieces:
- this bug should deal only with read (open?) errors, keep it simple, just give to the upper levels (the http channel) an error that we can handle there when the file is gone (read chunk - the actual IO - asynchronously failed)
- open a new bug where we should handle write errors, those might need a bit more thinking (stop any IO, retry, keep in memory, throw all from memory, ...)

Thanks.

::: netwerk/cache2/CacheFile.cpp
@@ +336,2 @@
>    if (NS_FAILED(aResult)) {
> +    SetError(aResult);

not sure write errors should be fatal.

@@ +366,5 @@
> +  }
> +  else {
> +    LOG(("CacheFile::OnChunkWritten() - Removing failed chunk [this=%p, "
> +         "chunk=%p]", this, aChunk));
> +  }

hmm.. why are you removing it?  we have it at least in memory for some time before the entry gets purged..  I don't see a point.

@@ +1172,5 @@
> +      chunk->mRemovingChunk = true;
> +      ReleaseOutsideLock(static_cast<CacheFileChunkListener *>(
> +                           chunk->mFile.forget().get()));
> +      mChunks.Remove(chunk->Index());
> +      return mStatus;

so, when there is a read error, we don't write stuff.  is that really smart?

@@ +1447,2 @@
>      // TODO: close streams with error
> +    SetError(rv);

doesn't this happen only when dispatch fails?  that's not actually an io error..

::: netwerk/cache2/CacheFileChunk.cpp
@@ +206,5 @@
>    rv = CacheFileIOManager::Read(aHandle, mIndex * kChunkSize, mRWBuf, aLen,
>                                  this);
>    if (NS_FAILED(rv)) {
> +    mState = ERROR;
> +    mStatus = mIndex ? NS_ERROR_FILE_CORRUPTED : NS_ERROR_FILE_NOT_FOUND;

Please remind me, what is mIndex?

@@ +476,2 @@
>      }
>      else {

when you are here (and this applies to any patch in any of this code) change it to one line } else { please.

::: netwerk/cache2/CacheFileOutputStream.cpp
@@ +98,5 @@
>  
>    while (aCount) {
>      EnsureCorrectChunk(false);
> +    if (NS_FAILED(mStatus))
> +      return mStatus;

also, only when dispatch fails?

@@ +348,2 @@
>    rv = mFile->GetChunkLocked(chunkIdx, true, nullptr, getter_AddRefs(mChunk));
> +  if (NS_FAILED(rv)) {

ditto...
Attachment #8370087 - Flags: review?(honzab.moz) → review-
(In reply to Honza Bambas (:mayhemer) from comment #2)
> Comment on attachment 8370087 [details] [diff] [review]
> patch v1
> 
> Review of attachment 8370087 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> as I understand this patch switches CacheFile to error state (= not
> returning chunks to either input or outputstream, throwing data away when
> write failed) on any read or write io failure as well as any dispatch
> related to read and writes.
> 
> I think it's too much.
> 
> Michal, let's split the work here to two pieces:
> - this bug should deal only with read (open?) errors, keep it simple, just
> give to the upper levels (the http channel) an error that we can handle
> there when the file is gone (read chunk - the actual IO - asynchronously
> failed)
> - open a new bug where we should handle write errors, those might need a bit
> more thinking (stop any IO, retry, keep in memory, throw all from memory,
> ...)
> 
> Thanks.

Ignoring write errors would lead sooner or later to another error which could be fatal. Just one examples:

- chunk #0 fails to write => we don't set hash in metadata for chunk #0
- we continue to write to the entry and chunk #1 is written to disk successfully
- then we'll hit the assertion in CacheFileMetadata::SetHash() which checks whether the index is sane

We could of course change the assertion but we would have to do it also in this bug and it is IMO much better to fix the problem correctly rather than removing meaningful assertion.

This was just one example, there could be probably lot of other problems that we would need to hotfix.


> ::: netwerk/cache2/CacheFile.cpp
> @@ +336,2 @@
> >    if (NS_FAILED(aResult)) {
> > +    SetError(aResult);
> 
> not sure write errors should be fatal.

See above and below.


> @@ +366,5 @@
> > +  }
> > +  else {
> > +    LOG(("CacheFile::OnChunkWritten() - Removing failed chunk [this=%p, "
> > +         "chunk=%p]", this, aChunk));
> > +  }
> 
> hmm.. why are you removing it?  we have it at least in memory for some time
> before the entry gets purged..  I don't see a point.

The point is to keep the code simple and working. We would need to somehow mark the chunk so it won't be freed when it gets unused (which we do now ASAP by default) or when CacheFile::ThrowMemoryCachedData() is called. This would IMO complicate even more this already complex code.


> @@ +1172,5 @@
> > +      chunk->mRemovingChunk = true;
> > +      ReleaseOutsideLock(static_cast<CacheFileChunkListener *>(
> > +                           chunk->mFile.forget().get()));
> > +      mChunks.Remove(chunk->Index());
> > +      return mStatus;
> 
> so, when there is a read error, we don't write stuff.  is that really smart?

Once we encounter the error (read or write), any subsequent read/write from/to input/output stream fails. So what's the point of generating additional IO for such entry?


> @@ +1447,2 @@
> >      // TODO: close streams with error
> > +    SetError(rv);
> 
> doesn't this happen only when dispatch fails?  that's not actually an io
> error..

Ok, but what's the difference? The chunk failed to write and we must handle it somehow.


> ::: netwerk/cache2/CacheFileChunk.cpp
> @@ +206,5 @@
> >    rv = CacheFileIOManager::Read(aHandle, mIndex * kChunkSize, mRWBuf, aLen,
> >                                  this);
> >    if (NS_FAILED(rv)) {
> > +    mState = ERROR;
> > +    mStatus = mIndex ? NS_ERROR_FILE_CORRUPTED : NS_ERROR_FILE_NOT_FOUND;
> 
> Please remind me, what is mIndex?

Index of the chunk in entry file.


> ::: netwerk/cache2/CacheFileOutputStream.cpp
> @@ +98,5 @@
> >  
> >    while (aCount) {
> >      EnsureCorrectChunk(false);
> > +    if (NS_FAILED(mStatus))
> > +      return mStatus;
> 
> also, only when dispatch fails?
> 
> @@ +348,2 @@
> >    rv = mFile->GetChunkLocked(chunkIdx, true, nullptr, getter_AddRefs(mChunk));
> > +  if (NS_FAILED(rv)) {
> 
> ditto...

These two were already answered above.
(In reply to Michal Novotny (:michal) from comment #3)

tl;dr: I'll go through the patch again, as is.

> - chunk #0 fails to write => we don't set hash in metadata for chunk #0
> - we continue to write to the entry and chunk #1 is written to disk
> successfully
> - then we'll hit the assertion in CacheFileMetadata::SetHash() which checks
> whether the index is sane

This doesn't sound to me like a strong argument - an assertion failure.  We will have to change that code once we support random partial content caching anyway.

Now when we fail to open a file we cache everything in memory.  But yes, there is no intermediate inconsistent state then - it will all just go away and never persist, no need for another state flags or so on individual chunks.

Usually write errors will be out of disk space.  So any more IO in that situation would be just bad or failing anyway.  In case of a network disk and a failure to write only temporarily (disk resumed after network is up again) we will just leave an entry that will not pass the checksum tests and (until reload) will report it self as broken and useless very soon.  So yes, completely shut a CacheFile down on any error is probably good.  Still, it's just a cache!  For the user the experience impact will be like zero.


Please, what happens for both when this patch 1) is applied, 2) is not applied and we do:
- write 2 chunks (dispatch as well as the write itself succeeds)
- actual write of chunk 3 fails


> > 
> > hmm.. why are you removing it?  we have it at least in memory for some time
> > before the entry gets purged..  I don't see a point.
> 
> The point is to keep the code simple and working. We would need to somehow
> mark the chunk so it won't be freed when it gets unused (which we do now
> ASAP by default) or when CacheFile::ThrowMemoryCachedData() is called. This
> would IMO complicate even more this already complex code.

Understand.  And introducing any retry operation is an overkill, even as a followup work.

> 
> 
> > @@ +1172,5 @@
> > > +      chunk->mRemovingChunk = true;
> > > +      ReleaseOutsideLock(static_cast<CacheFileChunkListener *>(
> > > +                           chunk->mFile.forget().get()));
> > > +      mChunks.Remove(chunk->Index());
> > > +      return mStatus;
> > 
> > so, when there is a read error, we don't write stuff.  is that really smart?
> 
> Once we encounter the error (read or write), any subsequent read/write
> from/to input/output stream fails. So what's the point of generating
> additional IO for such entry?

OK, I am not sure I fully understand what all happens here but I believe you.

> 
> 
> > @@ +1447,2 @@
> > >      // TODO: close streams with error
> > > +    SetError(rv);
> > 
> > doesn't this happen only when dispatch fails?  that's not actually an io
> > error..
> 
> Ok, but what's the difference? The chunk failed to write and we must handle
> it somehow.

This will probably happen just during shutdown.  Anyway, you are right there is then not much to do, failure to schedule == failure to do.

> 
> 
> > ::: netwerk/cache2/CacheFileChunk.cpp
> > @@ +206,5 @@
> > >    rv = CacheFileIOManager::Read(aHandle, mIndex * kChunkSize, mRWBuf, aLen,
> > >                                  this);
> > >    if (NS_FAILED(rv)) {
> > > +    mState = ERROR;
> > > +    mStatus = mIndex ? NS_ERROR_FILE_CORRUPTED : NS_ERROR_FILE_NOT_FOUND;
> > 
> > Please remind me, what is mIndex?
> 
> Index of the chunk in entry file.

Aha, so when we fail to read the first chunk the error is NOT_FOUND.  Otherwise CORRUPTED.  How will this bubble to the upper layers?  Should consumers handle both errors then?  I would not be in favor of that very much.  Only one of them should be reported up.  What do we report when we encounter a broken chunk?  From the consumer point of view it's the same - the consumer won't get all the data and streaming it ends (with some consistent error).

We may also need to think how would this error be rendered by docshell when the restart to load from the network could not happen (or failed) - actually when patch for bug 913810 is not applied - or there cannot be a range request made.  NOT_FOUND will confuse the user IMO.  But maybe nsHttpChannel should just convert to something like the new NS_ERROR_INCOMPLETE_CONTENT error or so.  It just need to know what all to convert.  If there were just a single error, it would just be easier.
Comment on attachment 8370087 [details] [diff] [review]
patch v1

Review of attachment 8370087 [details] [diff] [review]:
-----------------------------------------------------------------

Please check that CacheFileChunk::EnsureBufSize, CacheFileChunk::UpdateDataSize behave with mState == ERROR

::: netwerk/cache2/CacheFile.cpp
@@ +309,5 @@
>         this, aResult, aChunk, index));
>  
> +  if (NS_FAILED(aResult)) {
> +    SetError(aResult);
> +    // TODO ??? doom entry

I think no, it will probably fail to load next time, it's indexed so it will go away with the eviction-over-limit mechanism despite being broken.

@@ +363,5 @@
> +  if (NS_SUCCEEDED(aResult)) {
> +    LOG(("CacheFile::OnChunkWritten() - Caching unused chunk [this=%p, "
> +         "chunk=%p]", this, aChunk));
> +  }
> +  else {

} else {

@@ +1172,5 @@
> +      chunk->mRemovingChunk = true;
> +      ReleaseOutsideLock(static_cast<CacheFileChunkListener *>(
> +                           chunk->mFile.forget().get()));
> +      mChunks.Remove(chunk->Index());
> +      return mStatus;

this code seems duplicated, is there a possibility for a method?

::: netwerk/cache2/CacheFileChunk.cpp
@@ +206,5 @@
>    rv = CacheFileIOManager::Read(aHandle, mIndex * kChunkSize, mRWBuf, aLen,
>                                  this);
>    if (NS_FAILED(rv)) {
> +    mState = ERROR;
> +    mStatus = mIndex ? NS_ERROR_FILE_CORRUPTED : NS_ERROR_FILE_NOT_FOUND;

maybe have some SetError method here as well?

::: netwerk/cache2/CacheFileInputStream.cpp
@@ +447,5 @@
> +  if (NS_SUCCEEDED(aResult)) {
> +    mChunk = aChunk;
> +  }
> +  else {
> +    if (aResult != NS_ERROR_NOT_AVAILABLE)

on one line

add a comment why NOT_AVAIL is special cased here

@@ +544,5 @@
>    if (NS_FAILED(rv)) {
>      LOG(("CacheFileInputStream::EnsureCorrectChunk() - GetChunkLocked failed. "
>           "[this=%p, idx=%d, rv=0x%08x]", this, chunkIdx, rv));
> +    if (rv != NS_ERROR_NOT_AVAILABLE)
> +      mStatus = rv;

as well here
Attachment #8370087 - Flags: review- → review+
Attached patch patch v2Splinter Review
(In reply to Honza Bambas (:mayhemer) from comment #4)
> > Once we encounter the error (read or write), any subsequent read/write
> > from/to input/output stream fails. So what's the point of generating
> > additional IO for such entry?

This is not 100% correct. We do not propagate the errors across the streams. So e.g. error in output stream doesn't immediately fail all reads in existing input streams. But it doesn't change anything on that it does not make sense to continue to write to a failed entry.


> ::: netwerk/cache2/CacheFile.cpp
> @@ +309,5 @@
> >         this, aResult, aChunk, index));
> >  
> > +  if (NS_FAILED(aResult)) {
> > +    SetError(aResult);
> > +    // TODO ??? doom entry
> 
> I think no, it will probably fail to load next time, it's indexed so it will
> go away with the eviction-over-limit mechanism despite being broken.

I thought about it and I added the dooming of the entry in case of failure. Such entry is unusable for further requests and it is better to doom it ASAP, because it saves time when opening, space in index, etc...
Attachment #8370087 - Attachment is obsolete: true
Attachment #8406491 - Flags: review+
Blocks: 988489
Blocks: 913810
Anything here that blocks landing?
https://hg.mozilla.org/mozilla-central/rev/65219cf254c7
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
You need to log in before you can comment on or make changes to this bug.