Testcase storing N random empty strings is 4x slower AND keeps the SSD far more busy - 36% Vs 2%
Categories
(Core :: Storage: Cache API, enhancement)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox138 | --- | fixed |
People
(Reporter: mayankleoboy1, Assigned: jstutte)
References
(Blocks 1 open bug)
Details
Attachments
(14 files)
|
1.32 KB,
text/html
|
Details | |
|
100.41 KB,
image/png
|
Details | |
|
44.90 KB,
text/plain
|
Details | |
|
2.20 KB,
text/html
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details |
Open the attached testcase (OR download the testcase and Serve the attached testcase over a simple HTTP server)
Enter 5000 and 10000 and press enter. -> Note the time taken
Now close the tab -> Note the time taken for the CPU to process the closing of tab (I am guessing as soon as the tab is closed, browser will cleanup any leftover strings).
5000
- Chrome: https://share.firefox.dev/41HJFoL (4.8s + 1 second approximate, not captured)
- Nightly: https://share.firefox.dev/4blNu7a (19s + 4s)
10000
- Chrome: https://share.firefox.dev/3F1owOw (10s)
- Nightly: https://share.firefox.dev/4hZPBAh (40s)
- Nightly to clear data: https://share.firefox.dev/3XoYlHH (11s)
Somewhat more concerning is that Firefox keeps the SSD in a constant 33% "active" state, which may mean more writes or more power use or both.
Testcase generated by Chatgpt based on my prompts. Unrealistic testcase, but maybe it highlights some hotspots?
(maybe we are more conservative than Chrome and flush to disk on every write?)
| Reporter | ||
Comment 1•1 year ago
|
||
| Reporter | ||
Comment 2•1 year ago
|
||
| Reporter | ||
Comment 3•1 year ago
|
||
Also, the deleting of strings sometimes takes a few seconds, sometimes doesnt do any CPU work. But then you close the tab and the CPU/SSD use starts.
| Assignee | ||
Comment 4•1 year ago
|
||
For the creation I think we are seeing the async overhead of creating first a temporary file to write to that then will be moved to the final destination. I am a bit surprised that MoveFileExW is causing good part of that overhead, I would have expected it to be fast. There is also a visible piece of updating the SQLite database. The test case makes this overhead very visible by writing empty strings to the cache and thus feels pretty unrealistic. I am not able to see much from the Chrome profile, but I could imagine that they may avoid process hopping and write right away to the final destination without having a temporary file?
(In reply to Mayank Bansal from comment #3)
Also, the deleting of strings sometimes takes a few seconds, sometimes doesnt do any CPU work. But then you close the tab and the CPU/SSD use starts.
It looks as if the delete case when deleting many IDs at once causes us to traverse the folder structure once for each ID. I wonder if we could just do a BodyIdToFile for each ID and then delete that directly?
There is probably no reason to do much here if not for the delete case.
| Reporter | ||
Comment 5•1 year ago
|
||
What would be a more realistic testcase? (something like storing 100 strings of length 2mb each?)
I can try to get chatgpt to modify the testcase based on what you think is more realistic.
| Assignee | ||
Comment 6•1 year ago
|
||
(In reply to Mayank Bansal from comment #5)
What would be a more realistic testcase? (something like storing 100 strings of length 2mb each?)
I can try to get chatgpt to modify the testcase based on what you think is more realistic.
Well, that might be more in line with real world usage. I think this bug's testcase is very nice in isolating the creation overhead, I just do not think it is worth our effort to improve it.
Marking the bug as enhancement as there is nothing really broken for users nor do we see a recent regression of Firefox itself.
| Reporter | ||
Comment 7•1 year ago
•
|
||
100 strings, 4096KB sized each
Nightly: https://share.firefox.dev/3QH2mDJ
Chrome: https://share.firefox.dev/43nGpkD
(Profile looks completely different to my naive eyes. Make what you will of this)
| Reporter | ||
Updated•1 year ago
|
| Assignee | ||
Comment 8•1 year ago
|
||
(In reply to Mayank Bansal from comment #7)
Created attachment 9469363 [details]
Cache API_Non-empty strings.HTML100 strings, 4096KB sized each
Nightly: https://share.firefox.dev/3QH2mDJ
Chrome: https://share.firefox.dev/43nGpkD(Profile looks completely different to my naive eyes. Make what you will of this)
It looks to me as if the overall execution time is now pretty similar between Chrome and Firefox? The Firefox profile shows that the effort moved to the StreamTransport thread pool and good part of it is now SnappyFrameUtils::WriteCompressedData, which is probably not too surprising. I assume using ' ' (blank) as string content causes very high compression rates such that the actual file operations matter less now. A similar test case with random content in the strings could show something different here.
So I'd probably say to this "works like expected", with an opportunity to improve compression. We already identified to replace snappy as a possible enhancement for IndexedDB in bug 1945670, we might want to think about expanding that to more places.
Comment 9•1 year ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #4)
It looks as if the delete case when deleting many IDs at once causes us to traverse the folder structure once for each ID. I wonder if we could just do a
BodyIdToFilefor each ID and then delete that directly?There is probably no reason to do much here if not for the delete case.
It looks like BodyTraverseFiles may perform some additional cleanup, but I agree that, in theory, we should simply delete the corresponding file(s) on disk and handle the cleanup elsewhere."
Comment 10•1 year ago
|
||
And yes, as we discussed earlier, the Cache API could also benefit from switching to LZ4 or Zstd compression.
| Assignee | ||
Comment 11•1 year ago
•
|
||
(In reply to Jan Varga [:janv] from comment #9)
It looks like
BodyTraverseFilesmay perform some additional cleanup, but I agree that, in theory, we should simply delete the corresponding file(s) on disk and handle the cleanup elsewhere."
I gave this a shot, you could try this build to see if it helps with the initial test case from comment 0.
Note that currently we check for orphaned files on every CacheManager startup, so I assume we can live without the additional cleanup for the normal delete.
| Assignee | ||
Comment 12•1 year ago
|
||
BodyTraverseFile does an expensive directory traversal while we
basically already know, which files we want to delete and can construct
their paths directly from the id.
BodyTraverseFile's additional functionality of deleting folders or
orphaned temp files it encounters while traversing is not resembled
here and we expect the caller to know what should be deleted. However,
we do delete orphaned temp files matching any of the passed ids.
Updated•1 year ago
|
| Assignee | ||
Comment 13•1 year ago
|
||
The basic functionality remains the same, including the use of
BodyTraverseFile with all its cleanups.
| Assignee | ||
Comment 14•1 year ago
|
||
| Assignee | ||
Comment 15•1 year ago
|
||
| Assignee | ||
Comment 16•1 year ago
|
||
| Assignee | ||
Comment 17•1 year ago
|
||
| Assignee | ||
Comment 18•1 year ago
|
||
| Assignee | ||
Comment 19•1 year ago
|
||
| Assignee | ||
Comment 20•1 year ago
|
||
| Assignee | ||
Comment 21•1 year ago
•
|
||
(In reply to Jens Stutte [:jstutte] from comment #11)
I gave this a shot, you could try this build to see if it helps with the initial test case from comment 0.
Trying locally shows a significant speed up and the remaining effort seems to be mostly happening inside FlushFileBuffers and some IPC hopping between content process and parent.
| Reporter | ||
Comment 22•1 year ago
•
|
||
Build from this link
Testcase from comment #0
N=5000: https://share.firefox.dev/3F9zTUB (1.8s)
N=10000: https://share.firefox.dev/4kzx6nK (5s)
So we are getting a 50% improvement while deleting from the patches.
Updated•1 year ago
|
| Assignee | ||
Comment 23•1 year ago
|
||
Updated•1 year ago
|
Comment 24•1 year ago
|
||
Comment 25•1 year ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/f2b9650d9e11
https://hg.mozilla.org/mozilla-central/rev/ea6628c71b38
https://hg.mozilla.org/mozilla-central/rev/4cf750c1cab6
https://hg.mozilla.org/mozilla-central/rev/45bfe3225aa6
https://hg.mozilla.org/mozilla-central/rev/46fa88f0e39c
https://hg.mozilla.org/mozilla-central/rev/5d5166ed00c6
https://hg.mozilla.org/mozilla-central/rev/e90ef2f14e05
https://hg.mozilla.org/mozilla-central/rev/2abcf49fab19
https://hg.mozilla.org/mozilla-central/rev/821ae8e20a8d
https://hg.mozilla.org/mozilla-central/rev/4e391b343c34
| Reporter | ||
Comment 26•1 year ago
|
||
With latest Nightly:
N=10000
Profile: https://share.firefox.dev/4kCSWqv (4.7s)
So an improvement of 57% !
| Assignee | ||
Comment 27•1 year ago
•
|
||
(In reply to Mayank Bansal from comment #26)
With latest Nightly:
N=10000
Profile: https://share.firefox.dev/4kCSWqv (4.7s)
So an improvement of 57% !
Thanks for checking! There is one long execution of DeleteCacheId in that profile that puzzles me a bit (which was not supposed to be changed by these patches, fwiw). At least from the function names I do not really get why DeleteCacheId should call DeleteAllCacheEntries at all and there I find this interesting comment which hints that there is some known problem there, although from the profile it seems, that the execution of the statement itself is slow, not reading its results?
(In reply to Mayank Bansal from comment #0)
10000
- Nightly: https://share.firefox.dev/4hZPBAh (40s)
- Nightly to clear data: https://share.firefox.dev/3XoYlHH (11s)
Out of curiosity: The 57% refer to the second measure from comment 0 ?
| Reporter | ||
Comment 28•1 year ago
|
||
Yes, 57% improvement in latest nightly over the second testing in comment #0, which is the time taken to delete N=10000 strings from cache.
| Reporter | ||
Comment 29•1 year ago
|
||
Also reiterating : many times clicking on the "delete strings" button in the testcase does nothing. But then you close the tab and cpu/disc use starts.
| Assignee | ||
Comment 30•1 year ago
|
||
I think also the first testcase did benefit from these patches, at least in my local testing, but I did not note down any numbers.
| Assignee | ||
Comment 31•1 year ago
|
||
(In reply to Mayank Bansal from comment #29)
Also reiterating : many times clicking on the "delete strings" button in the testcase does nothing. But then you close the tab and cpu/disc use starts.
I do not know if this is the expected behavior. Is it?
| Reporter | ||
Comment 32•1 year ago
|
||
Profile where clciking on the "delete cache" works: https://share.firefox.dev/3XFqb2B
Profile where clciking on the "delete cache" DOES NOT work: https://share.firefox.dev/43xXKHx (I clicked maybe 20 times on the button and nothing happened. Then i closed the tab and that resulted in the actual deletion)
Comment 33•1 year ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #31)
(In reply to Mayank Bansal from comment #29)
Also reiterating : many times clicking on the "delete strings" button in the testcase does nothing. But then you close the tab and cpu/disc use starts.
I do not know if this is the expected behavior. Is it?
Yes. Any live Cache instances are still usable after deleting the cache and will only be cleaned up when the last Cache instance is released. The code refers to this situation as "orphaned", although it also covers the case where the Cache was deleted and then we had an unclean shutdown.
If IPC markers are enabled in the profiler it can be easier to see what's going on, but because of the runnables we can see in the "works" case that there is a PCache::Msg___delete__ that runs but in the "DOES NOT work" case, no such runnable runs. Note that this counts across all tabs and devtools which latches the Caches. Assuming the retention of the Cache is only occurring because of the cache in storeInCache() then nulling it out (which would require the const to become a let) at the end of the function could help, although we still may end up needing to wait for a CC/snow-white collection.
| Reporter | ||
Updated•1 year ago
|
Description
•