Closed Bug 1951283 Opened 1 year ago Closed 1 year ago

Testcase storing N random empty strings is 4x slower AND keeps the SSD far more busy - 36% Vs 2%

Categories

(Core :: Storage: Cache API, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
138 Branch
Tracking Status
firefox138 --- fixed

People

(Reporter: mayankleoboy1, Assigned: jstutte)

References

(Blocks 1 open bug)

Details

Attachments

(14 files)

1.32 KB, text/html
Details
100.41 KB, image/png
Details
44.90 KB, text/plain
Details
2.20 KB, text/html
Details
48 bytes, text/x-phabricator-request
Details
48 bytes, text/x-phabricator-request
Details
48 bytes, text/x-phabricator-request
Details
48 bytes, text/x-phabricator-request
Details
48 bytes, text/x-phabricator-request
Details
48 bytes, text/x-phabricator-request
Details
48 bytes, text/x-phabricator-request
Details
48 bytes, text/x-phabricator-request
Details
48 bytes, text/x-phabricator-request
Details
48 bytes, text/x-phabricator-request
Details
Attached file Cache API.HTML

Open the attached testcase (OR download the testcase and Serve the attached testcase over a simple HTTP server)
Enter 5000 and 10000 and press enter. -> Note the time taken
Now close the tab -> Note the time taken for the CPU to process the closing of tab (I am guessing as soon as the tab is closed, browser will cleanup any leftover strings).

5000

10000

Somewhat more concerning is that Firefox keeps the SSD in a constant 33% "active" state, which may mean more writes or more power use or both.
Testcase generated by Chatgpt based on my prompts. Unrealistic testcase, but maybe it highlights some hotspots?
(maybe we are more conservative than Chrome and flush to disk on every write?)

Attached file about:support

Also, the deleting of strings sometimes takes a few seconds, sometimes doesnt do any CPU work. But then you close the tab and the CPU/SSD use starts.

For the creation I think we are seeing the async overhead of creating first a temporary file to write to that then will be moved to the final destination. I am a bit surprised that MoveFileExW is causing good part of that overhead, I would have expected it to be fast. There is also a visible piece of updating the SQLite database. The test case makes this overhead very visible by writing empty strings to the cache and thus feels pretty unrealistic. I am not able to see much from the Chrome profile, but I could imagine that they may avoid process hopping and write right away to the final destination without having a temporary file?

(In reply to Mayank Bansal from comment #3)

Also, the deleting of strings sometimes takes a few seconds, sometimes doesnt do any CPU work. But then you close the tab and the CPU/SSD use starts.

It looks as if the delete case when deleting many IDs at once causes us to traverse the folder structure once for each ID. I wonder if we could just do a BodyIdToFile for each ID and then delete that directly?

There is probably no reason to do much here if not for the delete case.

Flags: needinfo?(jan.varga)

What would be a more realistic testcase? (something like storing 100 strings of length 2mb each?)
I can try to get chatgpt to modify the testcase based on what you think is more realistic.

(In reply to Mayank Bansal from comment #5)

What would be a more realistic testcase? (something like storing 100 strings of length 2mb each?)
I can try to get chatgpt to modify the testcase based on what you think is more realistic.

Well, that might be more in line with real world usage. I think this bug's testcase is very nice in isolating the creation overhead, I just do not think it is worth our effort to improve it.

Marking the bug as enhancement as there is nothing really broken for users nor do we see a recent regression of Firefox itself.

Type: defect → enhancement

100 strings, 4096KB sized each

Nightly: https://share.firefox.dev/3QH2mDJ
Chrome: https://share.firefox.dev/43nGpkD

(Profile looks completely different to my naive eyes. Make what you will of this)

Attachment #9469363 - Attachment description: Cache API_empty strings.HTML → Cache API_Non-empty strings.HTML
Attachment #9469363 - Attachment filename: Cache-API_empty-strings.HTML → Cache API_Non-empty strings.HTML

(In reply to Mayank Bansal from comment #7)

Created attachment 9469363 [details]
Cache API_Non-empty strings.HTML

100 strings, 4096KB sized each

Nightly: https://share.firefox.dev/3QH2mDJ
Chrome: https://share.firefox.dev/43nGpkD

(Profile looks completely different to my naive eyes. Make what you will of this)

It looks to me as if the overall execution time is now pretty similar between Chrome and Firefox? The Firefox profile shows that the effort moved to the StreamTransport thread pool and good part of it is now SnappyFrameUtils::WriteCompressedData, which is probably not too surprising. I assume using ' ' (blank) as string content causes very high compression rates such that the actual file operations matter less now. A similar test case with random content in the strings could show something different here.

So I'd probably say to this "works like expected", with an opportunity to improve compression. We already identified to replace snappy as a possible enhancement for IndexedDB in bug 1945670, we might want to think about expanding that to more places.

(In reply to Jens Stutte [:jstutte] from comment #4)

It looks as if the delete case when deleting many IDs at once causes us to traverse the folder structure once for each ID. I wonder if we could just do a BodyIdToFile for each ID and then delete that directly?

There is probably no reason to do much here if not for the delete case.

It looks like BodyTraverseFiles may perform some additional cleanup, but I agree that, in theory, we should simply delete the corresponding file(s) on disk and handle the cleanup elsewhere."

Flags: needinfo?(jan.varga)

And yes, as we discussed earlier, the Cache API could also benefit from switching to LZ4 or Zstd compression.

(In reply to Jan Varga [:janv] from comment #9)

It looks like BodyTraverseFiles may perform some additional cleanup, but I agree that, in theory, we should simply delete the corresponding file(s) on disk and handle the cleanup elsewhere."

I gave this a shot, you could try this build to see if it helps with the initial test case from comment 0.

Note that currently we check for orphaned files on every CacheManager startup, so I assume we can live without the additional cleanup for the normal delete.

Flags: needinfo?(mayankleoboy1)

BodyTraverseFile does an expensive directory traversal while we
basically already know, which files we want to delete and can construct
their paths directly from the id.

BodyTraverseFile's additional functionality of deleting folders or
orphaned temp files it encounters while traversing is not resembled
here and we expect the caller to know what should be deleted. However,
we do delete orphaned temp files matching any of the passed ids.

Assignee: nobody → jstutte
Status: NEW → ASSIGNED

The basic functionality remains the same, including the use of
BodyTraverseFile with all its cleanups.

(In reply to Jens Stutte [:jstutte] from comment #11)

I gave this a shot, you could try this build to see if it helps with the initial test case from comment 0.

Trying locally shows a significant speed up and the remaining effort seems to be mostly happening inside FlushFileBuffers and some IPC hopping between content process and parent.

Flags: needinfo?(mayankleoboy1)

Build from this link

Testcase from comment #0
N=5000: https://share.firefox.dev/3F9zTUB (1.8s)
N=10000: https://share.firefox.dev/4kzx6nK (5s)

So we are getting a 50% improvement while deleting from the patches.

Attachment #9470489 - Attachment description: Bug 1951283 - Speedup id lookups by sorting in BodyDeleteOrphanedFiles. r?#dom-storage → Bug 1951283 - Speedup id lookups by using a hash set in BodyDeleteOrphanedFiles. r?#dom-storage
Attachment #9470580 - Attachment description: Bug 1951283 - Invert the default of aCreateDirIfNotExist in BodyIdToFile. r?#dom-storage → Bug 1951283 - Invert the default of aCreateDirIfNotExists in BodyIdToFile. r?#dom-storage
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f2b9650d9e11 Simplify BodyDeleteFiles. r=dom-storage-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/ea6628c71b38 Speedup id lookups by using a hash set in BodyDeleteOrphanedFiles. r=dom-storage-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/4cf750c1cab6 Wrap aDirectoryMetadata into Some in BodyDeleteOrphanedFiles. r=dom-storage-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/45bfe3225aa6 Simplify and rename BodyTraverseFiles to BodyTraverseFilesForCleanup. r=dom-storage-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/46fa88f0e39c Avoid duplicate attempt to create the cache directory in BodyStartWriteStream. r=dom-storage-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/5d5166ed00c6 Avoid needless attempt to create the cache directory in BodyFinalizeWrite. r=dom-storage-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/e90ef2f14e05 Avoid needless attempt to create the cache directory in GetBodyDiskSize. r=dom-storage-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/2abcf49fab19 Avoid needless attempt to create the cache directory in BodyOpen. r=dom-storage-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/821ae8e20a8d Avoid needless attempt to create the cache directory in BodyMaybeUpdatePaddingSize. r=dom-storage-reviewers,asuth https://hg.mozilla.org/integration/autoland/rev/4e391b343c34 Invert the default of aCreateDirIfNotExists in BodyIdToFile. r=dom-storage-reviewers,asuth

With latest Nightly:
N=10000
Profile: https://share.firefox.dev/4kCSWqv (4.7s)
So an improvement of 57% !

(In reply to Mayank Bansal from comment #26)

With latest Nightly:
N=10000
Profile: https://share.firefox.dev/4kCSWqv (4.7s)
So an improvement of 57% !

Thanks for checking! There is one long execution of DeleteCacheId in that profile that puzzles me a bit (which was not supposed to be changed by these patches, fwiw). At least from the function names I do not really get why DeleteCacheId should call DeleteAllCacheEntries at all and there I find this interesting comment which hints that there is some known problem there, although from the profile it seems, that the execution of the statement itself is slow, not reading its results?

(In reply to Mayank Bansal from comment #0)

10000

Out of curiosity: The 57% refer to the second measure from comment 0 ?

See Also: → 1952792

Yes, 57% improvement in latest nightly over the second testing in comment #0, which is the time taken to delete N=10000 strings from cache.

Also reiterating : many times clicking on the "delete strings" button in the testcase does nothing. But then you close the tab and cpu/disc use starts.

I think also the first testcase did benefit from these patches, at least in my local testing, but I did not note down any numbers.

(In reply to Mayank Bansal from comment #29)

Also reiterating : many times clicking on the "delete strings" button in the testcase does nothing. But then you close the tab and cpu/disc use starts.

I do not know if this is the expected behavior. Is it?

Flags: needinfo?(bugmail)

Profile where clciking on the "delete cache" works: https://share.firefox.dev/3XFqb2B
Profile where clciking on the "delete cache" DOES NOT work: https://share.firefox.dev/43xXKHx (I clicked maybe 20 times on the button and nothing happened. Then i closed the tab and that resulted in the actual deletion)

(In reply to Jens Stutte [:jstutte] from comment #31)

(In reply to Mayank Bansal from comment #29)

Also reiterating : many times clicking on the "delete strings" button in the testcase does nothing. But then you close the tab and cpu/disc use starts.

I do not know if this is the expected behavior. Is it?

Yes. Any live Cache instances are still usable after deleting the cache and will only be cleaned up when the last Cache instance is released. The code refers to this situation as "orphaned", although it also covers the case where the Cache was deleted and then we had an unclean shutdown.

If IPC markers are enabled in the profiler it can be easier to see what's going on, but because of the runnables we can see in the "works" case that there is a PCache::Msg___delete__ that runs but in the "DOES NOT work" case, no such runnable runs. Note that this counts across all tabs and devtools which latches the Caches. Assuming the retention of the Cache is only occurring because of the cache in storeInCache() then nulling it out (which would require the const to become a let) at the end of the function could help, although we still may end up needing to wait for a CC/snow-white collection.

Flags: needinfo?(bugmail)
See Also: → 1958615, 1958673
Blocks: 1980560
See Also: 1943230
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: