Closed Bug 1258747 Opened 8 years ago Closed 8 years ago

Support for alternative data in CacheFile

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox48 --- wontfix
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- fixed

People

(Reporter: mayhemer, Assigned: michal)

References

Details

(Whiteboard: [necko-active])

Attachments

(1 file, 7 obsolete files)

When reading the "real" data from a cache entry that is already having or going to have an "alternative" (aka processed) data stream appended, we need to tell the input stream where the actual end of the data is to correctly return EOF after the last byte of the "real" data has been delivered, regardless of whether the alt data is still being written or not.


There are two basic use cases for this limitation:

- an existing entry, having alt data, and only one consumer wants to read the "real" data: we need to end reading when we reach the end of the real data

- an existing entry in whatever intermediate state: one consumer writes either the real data or an alt data, other wants to read the "real" data only


For the letter case we need to specify what happens when the reading consumer reaches end of the real data, the condition to change seems to be at [1]:

- real data write in progress, no alt data yet (this is what we already have)

  -> no alt data present, real data output stream exists
  => WOULD_BLOCK

- real data write is finished, write of alt data is in progress

  -> the offset for alt data is known, alt data output stream exists
  => OK+EOF

- real data write is finished, write of alt data has ended with success

  -> the offset for alt data is known, no output exists
  => OK+EOF

- real data write is finished, write of alt data has ended with a failure (we truncate)

  -> no alt data present, no output exists
  => OK+EOF

- and one nice corner case: real data write is finished, write of alt data has ended with a failure (we truncate), we write more data to the "real" data bucket

  -> no alt data present, real data output stream exists
  => WOULD_BLOCK



We also need to return WOULD_BLOCK from _alt_ data input streams when there is still an alt data output stream open.

I think it's OK to just let any concurrent alt data input stream fail with whatever error the alt data output stream has been closed with, no magic here.

And have the same behavior for alt data streams when an entry has been doomed.

We won't allow opening an output stream for "real" data when there is an output stream for the "alt" data open (and vise versa).  But that is already ensured by the current code architecture.

I think we can do this well with just a single mOutput member in CacheFile.


[1] http://hg.mozilla.org/mozilla-central/annotate/ea6298e1b4f7/netwerk/cache2/CacheFileInputStream.cpp#l187
Whiteboard: [necko-active]
In bug 1231565 I added 2 new methods to cache entry:
> NS_IMETHODIMP CacheEntry::OpenAlternativeOutputStream(const nsACString & type, nsIOutputStream **_retval)
> NS_IMETHODIMP CacheEntry::OpenAlternativeInputStream(const nsACString & type, nsIInputStream **_retval)

Let me know if that's ok, and how we can make this work.
I am currently working on some unit tests for the feature.
Flags: needinfo?(michal.novotny)
IIRC, the plan was to not put the alt-data logic info CacheFile and CacheFile*Streams because it should be sufficient to simply limit the input stream when opening it. But this won't work in case there is some input stream opened (for original data) while we want to start storing alternative data to the cache entry. At this point we would need to limit all already existing input stream. This could be done since CacheFile keeps track of all opened input stream. I need to think a bit about what exactly to do on CacheEntry level and what to delegate to CacheFile.
Flags: needinfo?(michal.novotny)
Hi Michal, bug 1231565 comment 36 has a try push which includes a basic alt-data implementation in nsHttpChannel, a stub which forwards calls to openAltDataInput/OutputStream to openInput/OutputStream, and a couple of tests.
Let me know if I can assist with anything on this bug.
Attached patch wip patch (obsolete) — Splinter Review
MozReview-Commit-ID: J0oztqLsqRG
Attachment #8766704 - Attachment is obsolete: true
Attached patch patch v1 (obsolete) — Splinter Review
Attachment #8768743 - Attachment is obsolete: true
Attachment #8771471 - Flags: review?(honzab.moz)
Comment on attachment 8771471 [details] [diff] [review]
patch v1

Review of attachment 8771471 [details] [diff] [review]:
-----------------------------------------------------------------

I so far have more questions.  This is just a first round through the patch.  Seems more complicated then I though it would be.

::: netwerk/cache2/CacheFile.cpp
@@ +627,5 @@
>      } else {
> +      const char *altData = mMetadata->GetElement(CacheFileUtils::kAltDataKey);
> +      if (altData &&
> +          NS_SUCCEEDED(CacheFileUtils::ParseAlternativeDataInfo(
> +            altData, &mRealDataEnd, nullptr)) && (mRealDataEnd > mDataSize)) {

should this rather be:

if (altData &&
    (NS_FAILED(Parse..()) || (mReadDataEnd > mDataSize)) {

?

@@ +788,5 @@
> +
> +    return NS_ERROR_NOT_AVAILABLE;
> +  }
> +
> +  if (mRealDataEnd == -1) {

what does this mean?  please add a comment and probably a log too.

@@ +815,5 @@
> +  nsCString availableAltData;
> +  rv = CacheFileUtils::ParseAlternativeDataInfo(altData, &offset,
> +                                                &availableAltData);
> +  if (NS_FAILED(rv)) {
> +    return rv;

maybe log all these returns?

@@ +822,5 @@
> +  if (availableAltData != aAltDataType) {
> +    return NS_ERROR_NOT_AVAILABLE;
> +  }
> +
> +  MOZ_ASSERT(mRealDataEnd == offset);

I think when this fails we should rather do NS_ERROR() - tests are sensitive to it too - and let this gracefully fail with deletion of the altdata metadata element or even dooming this whole entry.  I presume when this fails it means someone has played with the meta externally?

@@ +826,5 @@
> +  MOZ_ASSERT(mRealDataEnd == offset);
> +
> +  // Once we open input stream we no longer allow preloading of chunks without
> +  // input stream, i.e. we will no longer keep first few chunks preloaded when
> +  // the last input stream is closed.

can you also say why?

@@ +1785,5 @@
> +  MOZ_ASSERT(aOffset < mDataSize);
> +  MOZ_ASSERT(mReady);
> +
> +  if (mDataSize >= aOffset) {
> +    return NS_OK;

should we rather fail when mData < aOffset?  because consumers may expect the end of the data is then at aOffset.  but it may not be.

::: netwerk/cache2/CacheFile.h
@@ +154,5 @@
>    nsresult DeactivateChunk(CacheFileChunk *aChunk);
>    void     RemoveChunkInternal(CacheFileChunk *aChunk, bool aCacheChunk);
>  
> +  bool     OutputStreamExists(bool aAlternativeData);
> +  int64_t  BytesFromChunk(uint32_t aIndex, bool aAlternativeData);

I forgot what this method was doing, can you add a decent comment please?

@@ +199,5 @@
>    bool           mPreloadWithoutInputStreams;
>    uint32_t       mPreloadChunkCount;
>    nsresult       mStatus;
>    int64_t        mDataSize;
> +  int64_t        mRealDataEnd;

worth a comment?

@@ +211,5 @@
>  
>    nsRefPtrHashtable<nsUint32HashKey, CacheFileChunk> mChunks;
>    nsClassHashtable<nsUint32HashKey, ChunkListeners> mChunkListeners;
>    nsRefPtrHashtable<nsUint32HashKey, CacheFileChunk> mCachedChunks;
> +  nsTArray<RefPtr<CacheFileChunk> > mDiscardedChunks;

what's the purpose?  please add a comment.

::: netwerk/cache2/CacheFileChunk.h
@@ +210,5 @@
> +
> +  Atomic<bool> mActiveChunk; // Is true iff the chunk is in CacheFile::mChunks.
> +                             // Adding/removing chunk to/from mChunks as well as
> +                             // changing this member happens under the
> +                             // CacheFile's lock.

then why is it Atomic?

::: netwerk/cache2/CacheFileMetadata.h
@@ +184,5 @@
>    NS_IMETHOD OnFileDoomed(CacheFileHandle *aHandle, nsresult aResult) override;
>    NS_IMETHOD OnEOFSet(CacheFileHandle *aHandle, nsresult aResult) override;
>    NS_IMETHOD OnFileRenamed(CacheFileHandle *aHandle, nsresult aResult) override;
>    virtual bool IsKilled() override { return mListener && mListener->IsKilled(); }
> +  void       InitEmptyMetadata();

void InitEmptyMetadata(); ? (no spaces)

::: netwerk/cache2/CacheFileOutputStream.h
@@ +57,5 @@
>    RefPtr<CacheFileChunk> mChunk;
>    RefPtr<CacheOutputCloseListener> mCloseListener;
>    int64_t                  mPos;
> +  bool                     mClosed : 1;
> +  bool                     mAlternativeData : 1;

this could be |bool const| ?

::: netwerk/cache2/CacheFileUtils.cpp
@@ +549,5 @@
> +  }
> +
> +  // The requested alt-data representation is not available
> +  if (altDataOffset < 0) {
> +    return NS_ERROR_NOT_AVAILABLE;

I think when you get here, altDataOffset >= 0.  the tokenizer can't read negative numbers (it doesn't have any notion of the '-' sign before a number).

@@ +554,5 @@
> +  }
> +
> +  *_offset = altDataOffset;
> +  if (_type) {
> +    mozilla::Unused << p.ReadUntil(Tokenizer::Token::Char(';'), *_type);

We should document what the 'type' is allowed to contain - in the IDL?  Somebody could easily try adding some ";q=0.5" stuff after the type, here it would break (we won't read stuff after the ";").  Why exactly not to read just till the end?
Comment 7 ^^^
Flags: needinfo?(michal.novotny)
Comment on attachment 8771471 [details] [diff] [review]
patch v1

Review of attachment 8771471 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache2/CacheFile.h
@@ +199,5 @@
>    bool           mPreloadWithoutInputStreams;
>    uint32_t       mPreloadChunkCount;
>    nsresult       mStatus;
>    int64_t        mDataSize;
> +  int64_t        mRealDataEnd;

also, could we rename this to mAltDataOffset?  More expresses what it means.  And also is less confusing when this is -1 (negative amount of data? :))
Attached patch patch v2 (obsolete) — Splinter Review
(In reply to Honza Bambas (:mayhemer) from comment #7)
> should this rather be:
> 
> if (altData &&
>     (NS_FAILED(Parse..()) || (mReadDataEnd > mDataSize)) {
> 
> ?

Parsing of the alt-metadata should probably never fail, but I agree that this is safer.

> @@ +822,5 @@
> > +  if (availableAltData != aAltDataType) {
> > +    return NS_ERROR_NOT_AVAILABLE;
> > +  }
> > +
> > +  MOZ_ASSERT(mRealDataEnd == offset);
> 
> I think when this fails we should rather do NS_ERROR() - tests are sensitive
> to it too - and let this gracefully fail with deletion of the altdata
> metadata element or even dooming this whole entry.  I presume when this
> fails it means someone has played with the meta externally?

In fact I added more assertions and added a check to CacheFile::SetElement that ensures that this element won't be changed externally. This data is used internally and nobody should ever try to write it.

> @@ +826,5 @@
> > +  MOZ_ASSERT(mRealDataEnd == offset);
> > +
> > +  // Once we open input stream we no longer allow preloading of chunks without
> > +  // input stream, i.e. we will no longer keep first few chunks preloaded when
> > +  // the last input stream is closed.
> 
> can you also say why?

This comment is copied from CacheFile::OpenInputStream. Preloading is described with more details in CacheFile::ShouldCacheChunk.

> @@ +1785,5 @@
> > +  MOZ_ASSERT(aOffset < mDataSize);
> > +  MOZ_ASSERT(mReady);
> > +
> > +  if (mDataSize >= aOffset) {
> > +    return NS_OK;
> 
> should we rather fail when mData < aOffset?  because consumers may expect
> the end of the data is then at aOffset.  but it may not be.

This seems to be completely wrong and I don't know why this check is there. The condition should be always true, so Truncate() does nothing. I removed it and also updated the assertion above, because aOffset can be equal to mDataSize.

> ::: netwerk/cache2/CacheFileChunk.h
> @@ +210,5 @@
> > +
> > +  Atomic<bool> mActiveChunk; // Is true iff the chunk is in CacheFile::mChunks.
> > +                             // Adding/removing chunk to/from mChunks as well as
> > +                             // changing this member happens under the
> > +                             // CacheFile's lock.
> 
> then why is it Atomic?

Because it's not checked under the lock in CacheFileChunk::Release().

> ::: netwerk/cache2/CacheFileUtils.cpp
> @@ +549,5 @@
> > +  }
> > +
> > +  // The requested alt-data representation is not available
> > +  if (altDataOffset < 0) {
> > +    return NS_ERROR_NOT_AVAILABLE;
> 
> I think when you get here, altDataOffset >= 0.  the tokenizer can't read
> negative numbers (it doesn't have any notion of the '-' sign before a
> number).

Can you guarantee that for the future? I would rather keep that check.

> @@ +554,5 @@
> > +  }
> > +
> > +  *_offset = altDataOffset;
> > +  if (_type) {
> > +    mozilla::Unused << p.ReadUntil(Tokenizer::Token::Char(';'), *_type);
> 
> We should document what the 'type' is allowed to contain - in the IDL? 
> Somebody could easily try adding some ";q=0.5" stuff after the type, here it
> would break (we won't read stuff after the ";").  Why exactly not to read
> just till the end?

I don't know, I took this code from Valentin's patch. I changed it so it is now read till the end. The description (if still needed) should be probably in nsICacheInfoChannel.idl together with other changes in bug 1231565.
Attachment #8771471 - Attachment is obsolete: true
Attachment #8771471 - Flags: review?(honzab.moz)
Flags: needinfo?(michal.novotny)
Attachment #8776170 - Flags: review?(honzab.moz)
Attached patch v1-v2 diff (obsolete) — Splinter Review
Summary: Have a programmatic limit for CacheFileInputStream reading → Support for alternative data in CacheFile
(In reply to Michal Novotny (:michal) from comment #10)
> Created attachment 8776170 [details] [diff] [review]
> > I think when you get here, altDataOffset >= 0.  the tokenizer can't read
> > negative numbers (it doesn't have any notion of the '-' sign before a
> > number).
> 
> Can you guarantee that for the future? I would rather keep that check.

Leave it as is, I may introduce reading neg numbers (probably should ASAP to prevent compatibility issues)
Comment on attachment 8776171 [details] [diff] [review]
v1-v2 diff

Review of attachment 8776171 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache2/CacheFile.cpp
@@ +834,4 @@
>      return NS_ERROR_NOT_AVAILABLE;
>    }
>  
> +  MOZ_ASSERT(mAltDataOffset == offset);

maybe add a comment why?

@@ +1108,5 @@
>    MOZ_ASSERT(mMetadata);
>    NS_ENSURE_TRUE(mMetadata, NS_ERROR_UNEXPECTED);
>  
> +  if (!strcmp(aKey, CacheFileUtils::kAltDataKey)) {
> +    return NS_ERROR_FAILURE;

and maybe NS_ERROR with an explanation too :)

::: netwerk/cache2/CacheFileUtils.cpp
@@ +557,5 @@
>    if (_type) {
> +    p.Record();
> +    mozilla::Tokenizer::Token t;
> +    while (p.Next(t) && t.Type() != mozilla::Tokenizer::TOKEN_EOF);
> +    p.Claim(*_type);

I think

mozilla::Unused << p.ReadUntil(Tokenizer::Token::EOF(), *_type);

should work too.
Attachment #8776171 - Flags: review+
Comment on attachment 8776170 [details] [diff] [review]
patch v2

(reviewed the idiff)
Attachment #8776170 - Flags: review?(honzab.moz) → review+
Attachment #8776170 - Attachment is obsolete: true
Attachment #8776171 - Attachment is obsolete: true
Attachment #8781771 - Flags: review+
Attached file compiler error (obsolete) —
(In reply to Honza Bambas (:mayhemer) from comment #13)
> ::: netwerk/cache2/CacheFileUtils.cpp
> @@ +557,5 @@
> >    if (_type) {
> > +    p.Record();
> > +    mozilla::Tokenizer::Token t;
> > +    while (p.Next(t) && t.Type() != mozilla::Tokenizer::TOKEN_EOF);
> > +    p.Claim(*_type);
> 
> I think
> 
> mozilla::Unused << p.ReadUntil(Tokenizer::Token::EOF(), *_type);
> 
> should work too.

It doesn't compile. See attached error log. I want to land the patch as soon as possible but inbound is closed right now. If you know how to fix it, let me know soon so I can change the final patch.
Flags: needinfo?(honzab.moz)
Sorry, instead of EOF it has to be EndOfFile.  It's easy to find out when you check the Tokenizer.h file.
Flags: needinfo?(honzab.moz)
Pushed by mnovotny@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e232e2267d5f
Support for alternative data in CacheFile, r=honzab
Attached patch final patchSplinter Review
Attachment #8781771 - Attachment is obsolete: true
Attachment #8781775 - Attachment is obsolete: true
Attachment #8781959 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/e232e2267d5f
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
I don't think we want to uplift that.
You need to log in before you can comment on or make changes to this bug.