Closed Bug 1708673 Opened 6 months ago Closed 4 months ago

Dooming cache via asyncVisitAllStorages while resetting networkPredictor leads to inconsistent cache state

Categories

(Core :: Networking: Cache, defect, P2)

defect

Tracking

()

VERIFIED FIXED
91 Branch
Tracking Status
firefox91 --- wontfix
firefox92 --- wontfix
firefox93 --- wontfix
firefox94 --- verified

People

(Reporter: pbz, Assigned: valentin)

References

(Depends on 1 open bug)

Details

(Whiteboard: [necko-triaged])

Attachments

(3 files)

Attached file cache-bug-poc.js

For Bug 1705030 I'm adding a deleteByBaseDomain method for the NetworkCacheCleaner. To clear storage by baseDomain I'm iterating over all cache entries via nsICacheStorageService#asyncVisitAllStorages and doom those that match the baseDomain with nsICacheStorage#asyncDoomURI.

When running my code isolated it clears the cache entries properly. However, when I run my code in parallel with nsINetworkPredictor#reset I end up with leftover cache entries in the disk cache. These cache entries may not exist. While they return true for the exists check, they don't have a matching file on disk and have a data size of 0.

I found that all nsICacheStorage#asyncDoomURI calls in my code return NS_OK for the disk cache, still they're not being fully cleared.

I've attached a PoC with which I can reproduce with the following STR:

  1. Visit https://senglehardt.com/test/dfpi/storage_access_api.html.
  2. Clear all cache via Services.cache2.clear().
  3. Reload the page so we get new cache entries only for this tab.
  4. Run PoC in browser toolbox.
  5. Find leftover entries in about:cache disk cache view which seem to have no last access date and no data associated.
Attached image empty-cache-entries.png

Valentin, we recently discussed some of the issues I experienced with the cache service. Do you know if these cache entries are valid and why they don't get cleared?

This is blocking my work on Bug 1705030 so I'd like to find a solution in the near term. If you could give me some leads on where to look I can also try to debug the issue myself.

Thanks!

Flags: needinfo?(valentin.gosu)

Interestingly the steps in comment 0 don't work if I test on my own Nightly profile, but do it I try on a debug build.
As far as I can say it's because the entries are still in the index or in mHandles
I'll give it another look tomorrow - but the cache impl is something we don't have much experience with.

(In reply to Valentin Gosu [:valentin] (he/him) from comment #3)

Interestingly the steps in comment 0 don't work if I test on my own Nightly profile, but do it I try on a debug build.

Maybe the predictor is disabled on your Nightly profile? When I set network.predictor.enabled=false the issue doesn't reproduce anymore. I think some extension (uBlock Origin?) might disable the predictor.

I'm wondering if the issue is caused by how the network predictor clears cache. It iterates over all cache entries similar to my code. However, when it finds a cache entry without aIdEnhance it visits the cache entry to inspect / rewrite its metadata:
AsyncOpenURI: https://searchfox.org/mozilla-central/rev/bf8d5de8528036c09590009720bc172882845b80/netwerk/base/Predictor.cpp#1891
entry->SetMetaDataElement: https://searchfox.org/mozilla-central/rev/bf8d5de8528036c09590009720bc172882845b80/netwerk/base/Predictor.cpp#1788
Possibly opening / writing to the cache entry or holding parts of the entry leads to this inconsistent state.

Thanks for the lead. I still don't know if this is expected behaviour or a bug, but this should help me isolate a test case.

Assignee: nobody → valentin.gosu
Severity: -- → S3
Flags: needinfo?(valentin.gosu)
Priority: -- → P2
Whiteboard: [necko-triaged]

Update:
It's actually the extra call to AsyncOpenURI that's causing the ghost cache entries to show up.

This bug is caused by two cache visitors working in parallel:
Visitor1: calls asyncDoomURI
Visitor2: calls asyncOpenURI

This causes CacheStorageService::AddStorageEntry to create a new entry and add
it to the hashtable:
https://searchfox.org/mozilla-central/rev/308ea44d0d60b391b031ccee695920bd543f7d2f/netwerk/cache2/CacheStorageService.cpp#1566,1574,1576

Visitor3: will then go through the hashtable and see the newly added entry.

Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/c7911efe7aad
Do not recreate CacheEntry when OPEN_READONLY flag is passed r=necko-reviewers,dragana
Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 91 Branch
Regressions: 1718267

Backed out changeset c7911efe7aad (bug 1708673) on beta for causing perf bug 1718267

Backed out from 93 for causing bug 1718267.
https://hg.mozilla.org/releases/mozilla-beta/rev/65591fce10de05bfd1d6d3d829f8cad140113138

(We should probably back it out in mozilla-central if we don't fix the perf issue in the 94 cycle)

I'll flip the pref for this change on nightly in bug 1718267. If that fixes the perf regression I'll file a different bug for reenabling it.

Depends on: 1731054

The patch landed in nightly and beta is affected.
:valentin, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.
If yes, don't forget to request an uplift for the patches in the regression caused by this fix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(valentin.gosu)
Flags: qe-verify+

I've reproduced this bug using the steps from comment 0, on an affected Nightly build (90.0a1 Build ID 20210430214504).

The last access date and data associated for the test page entries are now displayed in about:cache. Tested with Beta 94.0b2, under macOS 11, Ubuntu 18.04 x64 and Win 11 x64.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
See Also: → 1733958
You need to log in before you can comment on or make changes to this bug.