HTTP cache v2: preload several chunks in advance in input stream to speed up reading

RESOLVED FIXED in mozilla32

Status

()

RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: mayhemer, Assigned: michal.novotny)

Tracking

unspecified
mozilla32
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [cache2])

Attachments

(1 attachment, 3 obsolete attachments)

(Reporter)

Description

5 years ago
Part of this bug should be:
- determine optimal chunk size (now 16kB)
- preload chunks in advance to save repeated OnDataAvailable calls with only 16kB of data (pretty suboptimal as tests show)
(Reporter)

Updated

5 years ago
Whiteboard: [cache2]
(Reporter)

Updated

5 years ago
Blocks: 913806
(Reporter)

Comment 1

5 years ago
I think there will not be much time to finish this in the time frame.  One of the first optimizations to come after that!
Blocks: 941841
No longer blocks: 913806
(Reporter)

Comment 2

5 years ago
Shows up being cause of talos regressions.  Needs to be done for the first deployment.  Also another joint problem may be that there is too much posts in the async reading chain.  I was in the past analyzing this with the Event Tracer and had a patch that tried to remove most of them, tho a bit hacked.

I can play with just enlarging chunk to say 512kB, but from my experience this may not have much performance influence.
Blocks: 913806
No longer blocks: 941841
(Assignee)

Comment 3

5 years ago
(In reply to Honza Bambas (:mayhemer) from comment #2)
> I can play with just enlarging chunk to say 512kB, but from my experience
> this may not have much performance influence.

I don't understand this part of your comment. If 16kB chunk size causes the talos regression, then increasing the chunk size must have a positive effect.
(Reporter)

Comment 4

5 years ago
(In reply to Michal Novotny (:michal) from comment #3)
> (In reply to Honza Bambas (:mayhemer) from comment #2)
> > I can play with just enlarging chunk to say 512kB, but from my experience
> > this may not have much performance influence.
> 
> I don't understand this part of your comment. If 16kB chunk size causes the
> talos regression, then increasing the chunk size must have a positive effect.

It will have, but I somehow sense it may not be enough.  I'll definitely try it right now, it's easy.
(Reporter)

Comment 5

5 years ago
tsvgr_opacity

cache2 - 16kb (m-c) : https://tbpl.mozilla.org/?tree=Try&rev=a505e96204f6
cache2 - 512kB (patched) : https://tbpl.mozilla.org/?tree=Try&rev=56e8a11532b3
(Reporter)

Comment 6

5 years ago
So, with chunk at 512kB it seems like we are still slower, however, not that dramatically this time:

I've ignored extreme values over 240ms.  I took some 15 measurements for cache1 and cache2+512kb chunk.

               median,   average
cache1      :  148.5     159.2666666667
cache2+512kb:  175       167.4333333333


Seems like chink size enlargement is not enough.  However, doing so may be in general a good improvement.

Michal, what size of the chunk would you suggest?
Flags: needinfo?(michal.novotny)
(Reporter)

Comment 7

5 years ago
Ignore my previous comment!  Asking on IRC is always good thing to do ;)  This is how the result should be compared:

http://compare-talos.mattn.ca/?oldRevs=c0ec2e234457&newRev=56e8a11532b3&server=graphs.mozilla.org&submit=true

It shows that we actually improve svg opacity with this.  Cool!

I'll change the bug dependencies.
(Reporter)

Updated

5 years ago
Blocks: 941841
No longer blocks: 913806
Severity: major → normal
(Reporter)

Comment 8

5 years ago
I moved the discussion to bug 988318.
Flags: needinfo?(michal.novotny)
(Reporter)

Comment 9

5 years ago
Seems like some of our talos tests are still regressing because of this (https://bugzilla.mozilla.org/show_bug.cgi?id=982598#c15).

Tentatively blocking cache2 enable.

Michal, if you give me some clues how this should be implemented I can work on it.
Blocks: 913806
No longer blocks: 941841
(Reporter)

Comment 10

5 years ago
I can imagine some algorithm like |while (!preloadedChunks.count < 4) preloadChunk(lastPreloadedChunkIndex+1);| invoked immediately we verify metadata and maybe make the "4" preferable.
(Reporter)

Comment 12

5 years ago
Comment on attachment 8414068 [details] [diff] [review]
patch v1

Review of attachment 8414068 [details] [diff] [review]:
-----------------------------------------------------------------

Looks pretty good! f+

This patch is however missing one important factor we were talking about: start the preload immediately when you have successfully loaded metadata (ASAP on the IO thread).  I checked in the debugger that first time CacheFile::PreloadChunks is called from:

>	xul.dll!mozilla::net::CacheFile::PreloadChunks(0) Line 1191	C++
 	xul.dll!mozilla::net::CacheFile::GetChunkLocked(0, false, 0x1826a798, 0x1826a7a8) Line 1056	C++
 	xul.dll!mozilla::net::CacheFileInputStream::EnsureCorrectChunk(false) Line 546	C++
 	xul.dll!mozilla::net::CacheFileInputStream::AsyncWait(0x189cd03c, 0, 0, 0x00452c80) Line 333	C++
 	xul.dll!nsInputStreamPump::EnsureWaiting() Line 142	C++
 	xul.dll!nsInputStreamPump::AsyncRead(0x18cd3318, 0x00000000) Line 382	C++
 	xul.dll!mozilla::net::nsHttpChannel::ReadFromCache(true) Line 3609	C++
 	xul.dll!mozilla::net::nsHttpChannel::ContinueConnect() Line 366	C++
 	xul.dll!mozilla::net::nsHttpChannel::OnCacheEntryAvailableInternal(0x189377e0, false, 0x00000000, NS_OK) Line 3110	C++
 	xul.dll!mozilla::net::nsHttpChannel::OnCacheEntryAvailable(0x189377e0, false, 0x00000000, NS_OK) Line 3056	C++
 	xul.dll!mozilla::net::CacheEntry::InvokeAvailableCallback({...}) Line 719	C++
 	xul.dll!mozilla::net::CacheEntry::AvailableCallbackRunnable::Run() Line 176	C++

...on the main thread - that is way too late.

Best would be to start at CacheFile::OnMetadataRead.  In CacheFile::ShouldKeepChunk then, when there are no input streams, behave as there would be a single input stream active at pos 0 - actually means to always preload the first 4 chunks of payload.

You may counterargument that when no one opens any input stream (which is IMO very very rare) we will keep the chunks almost forever for no use.  To avoid that (which I think is for after-enable) we may resurrect some of the ThrowMemoryCacheData() callers.  It's still there, just not used right now.

::: netwerk/cache2/CacheFile.cpp
@@ +987,5 @@
> +  // Preload chunks from disk when this is disk backed entry and the listener
> +  // is reader. NOTE: preloading code calls this method without callback, so
> +  // do not preload chunks when there is no listener.
> +  bool preload = !mMemoryOnly && !aWriter && aCallback;
> +  bool calledByPreloader = !aWriter && !aCallback;

time to change |bool aWriter| to |enum { READER, WRITER, PRELOADER } aCaller| ?

Guessing caller type on value of some arguments is definitely not a stable way of coding, but if this passes try, let's leave it for this version (followup OK)

@@ +996,5 @@
>    if (mChunks.Get(aIndex, getter_AddRefs(chunk))) {
>      LOG(("CacheFile::GetChunkLocked() - Found chunk %p in mChunks [this=%p]",
>           chunk.get(), this));
>  
> +    MOZ_ASSERT(!calledByPreloader, "Unexpected!");

comment please why we must not get here when called by the preload code.  as well as on other places you do this assert

@@ +1180,5 @@
> +  AssertOwnsLock();
> +
> +  uint32_t lastChunk = aIndex + CacheObserver::PreloadChunkCount();
> +
> +  for (uint32_t i = (aIndex + 1); i <= lastChunk; ++i) {

I think better would be to pass index you actually want to preload from and not an index of a chunk we already have (what I can see from how this is called).

That is also needed to preload from chunk 0 in OnMetadataRead().  And also IMO makes a bit more sense.

@@ +1189,5 @@
> +      return;
> +    }
> +
> +    if (mChunks.GetWeak(i) || mCachedChunks.GetWeak(i)) {
> +      // This chunk is already in memory or is being read right now.

maybe LOG this?

@@ +1216,5 @@
> +#else
> +  // Cache chunk when this is memory only entry or we don't have a handle yet.
> +  if (mMemoryOnly || mOpeningFile) {
> +    return true;
> +  }

simply add:

if (!mInputs.Length() && aIndex < CacheObserver::PreloadChunkCount()) return true; :)
Attachment #8414068 - Flags: review?(honzab.moz) → feedback+
(Assignee)

Comment 13

5 years ago
Created attachment 8414718 [details] [diff] [review]
patch v2

(In reply to Honza Bambas (:mayhemer) from comment #12)
> @@ +1189,5 @@
> > +      return;
> > +    }
> > +
> > +    if (mChunks.GetWeak(i) || mCachedChunks.GetWeak(i)) {
> > +      // This chunk is already in memory or is being read right now.
> 
> maybe LOG this?

This will happen very often so I don't want to bloat the log with this message.


> @@ +1216,5 @@
> > +#else
> > +  // Cache chunk when this is memory only entry or we don't have a handle yet.
> > +  if (mMemoryOnly || mOpeningFile) {
> > +    return true;
> > +  }
> 
> simply add:
> 
> if (!mInputs.Length() && aIndex < CacheObserver::PreloadChunkCount()) return
> true; :)

I've introduced a new flag mPreloadWithoutInput instead of checking mInput.Length(). The idea is that some timer will drop this flag and perform a cleanup of the cached chunks. But I'll do this later in a follow up bug.
Attachment #8414068 - Attachment is obsolete: true
Attachment #8414718 - Flags: review?(honzab.moz)
(Reporter)

Comment 14

5 years ago
Comment on attachment 8414718 [details] [diff] [review]
patch v2

Review of attachment 8414718 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!  r=hoznab

Now only to check this helps with the talos regressions.

::: netwerk/cache2/CacheFile.cpp
@@ +1231,5 @@
> +  if (mMemoryOnly || mOpeningFile) {
> +    return true;
> +  }
> +
> +  uint32_t preloadChunkCount = CacheObserver::PreloadChunkCount();

maybe return false when preloadChunkCount == 0 here. (despite that the math bellow will not pass for 0 anyway)

::: netwerk/cache2/CacheFile.h
@@ +188,5 @@
>    bool           mOpenAsMemoryOnly;
>    bool           mDataAccessed;
>    bool           mDataIsDirty;
>    bool           mWritingMetadata;
> +  bool           mPreloadWithoutInput;

Maybe mPreloadWithoutInputStreams (Input is too general) or mInitialPreload or so..  Up to you.
Attachment #8414718 - Flags: review?(honzab.moz) → review+
(Assignee)

Comment 15

5 years ago
Created attachment 8414823 [details] [diff] [review]
patch v3 - final patch

https://tbpl.mozilla.org/?tree=Try&rev=f2f2351b07fc
Attachment #8414718 - Attachment is obsolete: true
Attachment #8414823 - Flags: review+
(Assignee)

Comment 16

5 years ago
Created attachment 8415012 [details] [diff] [review]
patch v4

Fixed call to CacheFile::PreloadChunks() outside a lock from CacheFile::OnMetadataRead().

https://tbpl.mozilla.org/?tree=Try&rev=c379ea07551a
Attachment #8414823 - Attachment is obsolete: true
Attachment #8415012 - Flags: review+
(Reporter)

Comment 18

5 years ago
Michal, can you please land the patch even though it doesn't fully solve the talos regression?  We want to do a trial on Nightly for few days again and this should be in for the time.  Thanks.
(Reporter)

Comment 19

5 years ago
The discussion to preload being invoked from CacheEntry::InvokeCallback() after we examined with OnCacheEntryCheck() should happen in a followup.

More I'm thinking of it I tend to preload every time (the current patch), despite that revalidation may fail.  The ratio is more or less 50:50 (200/304).

Updated

5 years ago
Blocks: 1004185
https://hg.mozilla.org/mozilla-central/rev/8a6a5bfdedc6
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
You need to log in before you can comment on or make changes to this bug.