Open Bug 1960704 Opened 8 months ago Updated 2 days ago

High frequency /clear-site-data/clear-cache-partitioning.tentative.https.html | single tracking bug

Categories

(Core :: Privacy: Anti-Tracking, defect, P5)

defect

Tracking

()

Tracking Status
firefox-esr140 --- unaffected
firefox146 --- unaffected
firefox147 --- unaffected
firefox148 --- affected

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: intermittent-failure, intermittent-testcase, regression, Whiteboard: [domsecurity-intermittent])

Filed by: csabou [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=504091125&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/UC9_nhC5R1-j0mkti3pjzQ/runs/0/artifacts/public/logs/live_backing.log


[task 2025-04-15T16:17:59.425Z] 16:17:59     INFO - TEST-PASS | /clear-site-data/clear-cache-partitioning.tentative.https.html | same site data also gets cleared in iframe 
[task 2025-04-15T16:17:59.425Z] 16:17:59     INFO - TEST-PASS | /clear-site-data/clear-cache-partitioning.tentative.https.html | cross origin iframe data doesn't get cleared 
[task 2025-04-15T16:17:59.425Z] 16:17:59     INFO - TEST-UNEXPECTED-FAIL | /clear-site-data/clear-cache-partitioning.tentative.https.html | clear in cross origin iframe doesn't affect embedder - assert_equals: expected "3a277eda-06de-4079-9988-da8ee1af3217" but got "c75ffede-99de-4285-9668-4f15c627a685"
[task 2025-04-15T16:17:59.426Z] 16:17:59     INFO - openTestPageHelper/<@https://web-platform.test:8443/clear-site-data/support/clear-cache-helper.sub.js:87:19
[task 2025-04-15T16:17:59.426Z] 16:17:59     INFO - Test.prototype.step@https://web-platform.test:8443/resources/testharness.js:2642:25
[task 2025-04-15T16:17:59.426Z] 16:17:59     INFO - Test.prototype.step_func/<@https://web-platform.test:8443/resources/testharness.js:2689:35
[task 2025-04-15T16:17:59.433Z] 16:17:59     INFO - .......
[task 2025-04-15T16:17:59.433Z] 16:17:59     INFO - TEST-OK | /clear-site-data/clear-cache-partitioning.tentative.https.html | took 4763ms
Component: DOM: Security → Privacy: Anti-Tracking

recent spike caused by Bug 2002960

Keywords: regression
Regressed by: 2002960
Summary: Intermittent /clear-site-data/clear-cache-partitioning.tentative.https.html | single tracking bug → High frequency /clear-site-data/clear-cache-partitioning.tentative.https.html | single tracking bug

Set release status flags based on info from the regressing bug 2002960

:arai, since you are the author of the regressor, bug 2002960, could you take a look?

For more information, please visit BugBot documentation.

the failure starts happening from the patch part 3 there.
but as far as I can see, the testcase doesn't use any of the script cache for test-specific things.
The script cache is applicable only for external script (not inline script), but things that may affect the assertion exist only in inline script.

the test clear-cache-partitioning.tentative.https.html itself has no non-helper external script.
The only test-specific helper clear-cache-helper.sub.js meets the file-size requirements, but it's treated as no-cache/expired, and also even if it's cached, the script itself doesn't seem to be the target of test for the cached-ness.
Other helpers are used more widely and also very unrelated to the assertion.

The openTestPageHelper function opens clear-site-data-cache.py, which returns HTML etc.
The clear-cache-partitioning.tentative.https.html's case doesn't use any external script from the py file.

Then, the meta file has several expected failure: clear-cache-partitioning.tentative.https.html.ini

Especially the following line covers the same test:
https://searchfox.org/firefox-main/rev/96eccf5af235e2f592e45fda4e79e6194448fc74/testing/web-platform/meta/clear-site-data/clear-cache-partitioning.tentative.https.html.ini#21,29-32

[clear-cache-partitioning.tentative.https.html]
...
  [cross origin iframe data doesn't get cleared]
    expected:
      if (os == "linux") and asan and fission: [PASS, FAIL]
      if (os == "linux") and not asan and not debug: [PASS, FAIL]

The first one might make sense, given asan is something special, but the second one is suspicious, given it's regular linux opt.
Maybe the testcase itself is unstable/noisy?

The testcase seems to be specific to the Clear-Site-Data header, which is from bug 1268889.
:baku, can I have your input here? What does the failure basically mean, and how stable/unstable is the test?

Flags: needinfo?(arai.unmht) → needinfo?(amarchesini)

So far, the issue happens with the following setup (A):

(try: https://treeherder.mozilla.org/jobs?repo=try&revision=4b7a184acca15ad2088b07e5e74a012533fe84c4)

And it stops happening with the following in addition to the above (B):

(try: https://treeherder.mozilla.org/jobs?repo=try&revision=e3ac9d9b8127fd339957118871902800381031ff)

https://searchfox.org/firefox-main/rev/bef781bbd7a225c428c2444d7d02e9f6eb327e94/dom/script/ScriptLoader.cpp#4074-4077,4091-4094

nsresult ScriptLoader::OnStreamComplete(
    nsIIncrementalStreamLoader* aLoader, ScriptLoadRequest* aRequest,
    nsresult aChannelStatus, nsresult aSRIStatus,
    SRICheckDataVerifier* aSRIDataVerifier) {
...
    nsCOMPtr<nsICacheInfoChannel> cacheInfo = do_QueryInterface(channelRequest);
    nsCOMPtr<nsICacheEntryWriteHandle> cacheEntry;
    if (cacheInfo && NS_SUCCEEDED(cacheInfo->GetCacheEntryWriteHandle(
                         getter_AddRefs(cacheEntry)))) {

https://searchfox.org/firefox-main/rev/bef781bbd7a225c428c2444d7d02e9f6eb327e94/netwerk/protocol/http/CacheEntryWriteHandleParent.h#23-24,28,33

class CacheEntryWriteHandleParent final : public nsICacheEntryWriteHandle,
                                          public PCacheEntryWriteHandleParent {
...
  explicit CacheEntryWriteHandleParent(nsICacheEntry* aCacheEntry);
...
  nsCOMPtr<nsICacheEntry> mCacheEntry;

So, this means, the extra reference to the nsICacheEntry (which is mozilla::net::CacheEntryHandle) causes the issue.

Possible reasons I can think of is the CacheEntryHandle destructor is affecting the behavior there in some way, but the lifetime of the instance won't change so much with the patch.

https://searchfox.org/firefox-main/rev/bef781bbd7a225c428c2444d7d02e9f6eb327e94/netwerk/cache2/CacheEntry.cpp#73-75

CacheEntryHandle::~CacheEntryHandle() {
  mEntry->ReleaseHandleRef();
  Dismiss();

I'll continue looking into the details.

I don’t have the right answer yet, but I suspect the cleaning procedure is asynchronous, and we might not be waiting long enough for all the data to be deleted before loading the page. I would use Pernosco to record the issue, if possible.

Flags: needinfo?(amarchesini)

Cache clearing is asynchronous. There is code to try to ensure that data from before the clear isn't returned after the clear, but there is a timing hole due to a mainthread/cache thread handoff. See https://bugzilla.mozilla.org/show_bug.cgi?id=1997495

You could try applying that patch and see if the problem goes away, or it may give you some ideas as to the source of the problem

Flags: needinfo?(arai.unmht)

Thank you both!

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #37)

Cache clearing is asynchronous. There is code to try to ensure that data from before the clear isn't returned after the clear, but there is a timing hole due to a mainthread/cache thread handoff. See https://bugzilla.mozilla.org/show_bug.cgi?id=1997495

You could try applying that patch and see if the problem goes away, or it may give you some ideas as to the source of the problem

With the patch applied onto the regressor patches, it somewhat reduces the frequency (~70% to ~50%), but still it's failing intermittently.
https://treeherder.mozilla.org/jobs?repo=try&revision=936486b97a9eba5e608428d5ab4af785c8eeda03

I'll see if the behavior differs when minimizing the modification from the regressor.

Flags: needinfo?(arai.unmht)
You need to log in before you can comment on or make changes to this bug.