Open Bug 1690629 Opened 4 years ago Updated 7 months ago

Consider optionally enabling the cache chunks behaviour

Categories

(Core :: Networking: Cache, enhancement, P3)

enhancement

Tracking

()

Tracking Status
firefox87 --- affected

People

(Reporter: acreskey, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [necko-triaged])

There is disabled cache behaviour, CACHE_CHUNKS, that caches unused cache file chunks even when not using memory cache.

Mayhemer suggested that it could be worth looking into as a way of improving cache performance without going completely to the memory cache.

While I have been able to measure significant improvements using only the memory cache, so far with CACHE_CHUNKS, I'm seeing that it seems to slow down cached request completion.

I measured the time from nsHttpChannel::AsyncOpen() to nsHttpChannel::OnStopRequest() for cached, items.

On a motoG5, over a series of pageloads, this looks to slow down mean request completion time by ~25%.

------------------
baseline
count   1511.000000
mean     490.058240
std     1738.014898
min        5.000000
10%       15.000000
20%       19.000000
40%       38.000000
50%       48.000000
60%       59.000000
80%       84.000000
90%      443.000000
max    11652.000000
------------------


------------------
test (CACHE_CHUNKS)
count  1518.000000
mean    619.027009
std    1598.994609
min       5.000000
10%      15.000000
20%      19.000000
40%      37.000000
50%      46.000000
60%      58.000000
80%      89.600000
90%    2462.900000
max    7708.000000
------------------

Any thoughts on why this is case, or if I'm missing something would be appreciated.

If you just uncomment #define CACHE_CHUNKS then we'll very shortly hit max_chunks_memory_usage limit which will break cache completely. Just for testing, you can disable the limit and see if it makes a positive change. For proper implementation, you would need to introduce some kind of eviction. This was never implemented, see e.g. this commented-out code.

That's very helpful, thank you.
As a first step, I will see how it behaves without the limit.

Severity: -- → N/A
Priority: -- → P2
Whiteboard: [necko-triaged]

I haven't been able to measure a difference even with no limit on the cache chunks memory usage.

But I don't like that I'm only measuring on Android.
So I'm shifting to adding a way to collect these fine-grained timings in our CI performance tests where I can run on all devices (Bug 1690373).

Using the same process that is described in this bug, I've evaluated the impact of enabling CACHE_CHUNKS (with an unlimited cache). Patch here.

The test measures total http channel completion time as the user visits 5 popular sites (cold page load), and the revisits them again (warm pageload).

Unfortunately, I'm not picking up any measurable improvements:
Linux
MacOS
Windows

I disabled RCWN to simplify the runtime, but I also saw the same lack of measurable improvements with RCWN enabled.

I'll do some profiling tomorrow, but I think that maybe the cache is really not a bottleneck in these scenarios.

If anyone has any ideas, let me know!

IMO, the best thing to do here is to check in the log that the chunks are really reused during the warm pageload. The patch just ensures that the chunks are not freed once they are used, but their lifetime is tied to the lifetime of the CacheFile, so if the CacheFile goes away and a new one is created for the reload, then it has no effect.

Also, the chunks could still be freed in CacheFile::ThrowMemoryCachedData(), but AFAICS that's possible only by calling nsICacheStorageService::purgeFromMemory(nsICacheStorageService::PURGE_DISK_DATA_ONLY) which I don't expect to be called during the test.

(In reply to Michal Novotny [:michal] from comment #5)

IMO, the best thing to do here is to check in the log that the chunks are really reused during the warm pageload. The patch just ensures that the chunks are not freed once they are used, but their lifetime is tied to the lifetime of the CacheFile, so if the CacheFile goes away and a new one is created for the reload, then it has no effect.

Also, the chunks could still be freed in CacheFile::ThrowMemoryCachedData(), but AFAICS that's possible only by calling nsICacheStorageService::purgeFromMemory(nsICacheStorageService::PURGE_DISK_DATA_ONLY) which I don't expect to be called during the test.

Ah, yes, I'm not 100% sure yet, but it looks like the CacheFiles might be purged quite early because of a tight limit on browser.cache.disk.metadata_memory_limit of a quarter meg.
Called from here.
Investigating.

As far as I can tell, this log is the best indication that we are using the cached chunks:
https://searchfox.org/mozilla-central/rev/5e70cd673a0ba0ad19b662c1cf656e0823781596/netwerk/cache2/CacheFile.cpp#1437-1439

In a simplified scenario of just two sites:

The baseline builds only logs that once.

But it's logged 130 times with CACHE_CHUNKS set (browser.cache.disk.max_chunks_memory_usageset to infinite, andbrowser.cache.disk.metadata_memory_limit` increased.)

Something I noticed from local debugging is that the CacheFiles are released after their metadata is written (they drop out of function scope)
https://searchfox.org/mozilla-central/rev/5e70cd673a0ba0ad19b662c1cf656e0823781596/netwerk/cache2/CacheFileIOManager.cpp#1493
I'm not sure yet if that's affecting anything.

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #7)

As far as I can tell, this log is the best indication that we are using the cached chunks:
https://searchfox.org/mozilla-central/rev/5e70cd673a0ba0ad19b662c1cf656e0823781596/netwerk/cache2/CacheFile.cpp#1437-1439

But it's logged 130 times with CACHE_CHUNKS set (browser.cache.disk.max_chunks_memory_usageset to infinite, andbrowser.cache.disk.metadata_memory_limit` increased.)

This shows that some chunks are re-used without reading them from the disk again, but it doesn't tell how often we need to go to the disk. You can compare it with the number of chunks that were read from the disk, see (2) below.

Something I noticed from local debugging is that the CacheFiles are released after their metadata is written (they drop out of function scope)
https://searchfox.org/mozilla-central/rev/5e70cd673a0ba0ad19b662c1cf656e0823781596/netwerk/cache2/CacheFileIOManager.cpp#1493
I'm not sure yet if that's affecting anything.

This means that CacheFile was released by CacheEntry, so it's destroyed after finalizing the file on the disk.

If the cache is empty when you start the test, then you can check 2 logs:

  1. https://searchfox.org/mozilla-central/rev/6a41d8d7ad42d57cf1adcc7067fd8f1689c42ab3/netwerk/cache2/CacheEntry.cpp#455
    If you see any line with "new=0" it means that the CacheFile was released and then initialized again to serve the data from the disk.

  2. https://searchfox.org/mozilla-central/rev/6a41d8d7ad42d57cf1adcc7067fd8f1689c42ab3/netwerk/cache2/CacheFile.cpp#1482
    This is logged when the chunk is read from the disk.

Thanks Michal.

Here I've counted of the number of times each log is present.

The cache is empty when the test is started (it's always a new profile).

It looks to me like we are seeing the chunks being cached.

                                                         baseline     CACHE_CHUNKS
CacheFile::GetChunkLocked() - Reusing cached chunk              4              316
CacheEntry::OnFileReady, where new=0                          364                0
CacheFile::GetChunkLocked() - Reading newly created chunk     374               17

So far with these changes I've reproduced some improvements in mean HttpChannelCompletion from cache, here and here.

But none have been picked up as high confidence by perfherder's t-test compare.

I'm running a compare now where I've stripped out the cold page load from the measurement to focus on the warm page load.

This is what I see when only measuring warm pageload:

MacOS: 4.28% improvement in cache timing, low confidence

Windows: 5.23% improvement in cache timing, low confidence

Linux: 1.21% regression in cache timing, low confidence

Since it looks to me that cache chunk is working (comment 9), I'll make sure this reproduces.

I ran the warm pageload test again, this time it did pick up a high confidence improvement on MacOS:

MacOS: 9.1% improvement in cache timing, high confidence, 3.49% improvement in overall network performance, low confidence

Linux: ~2.4% to 2.8% improvement in network timing, low confidence

Windows - no improvement

I'm running it now with RCWN re-enabled.

This is interesting.

With RCWN enabled, I believe we can see that this improves cache performance:

More channels end up completing via cache, which has the effect of increasing the perfstat HttpChannelCompletion_Cache (high confidence) and decreasing the total durations for HttpChannelCompletion_Network (high confidence).
Overall time spent completing network requests, HttpChannelCompletion is improved. (3.39% low confidence, macOS).
A similar effect is seen on all three desktop platforms:

MacOs
Linux
Windows

I won't be working on platform performance for a while so I'll summarize.

I believe that we can see reproducible evidence of a modest performance improvement from the Cache Chunks behaviour (given large cache sizes).

On spinning disk hardware I would expect this improvement to be larger, but we are not well setup test on platter drives. In addition, from recent scans of top selling PC hardware, the platter drive is quickly becoming the minority.

I see additional potential for this feature on Android when the user is on a cellular connection and RCWN is disabled.

Priority: P2 → P3
Blocks: necko-cache
You need to log in before you can comment on or make changes to this bug.