Closed Bug 1814961 Opened 1 year ago Closed 1 year ago

Some pages trying to fetch fonts from eg fonts.gstatic.com never finish loading

Categories

(Core :: Networking, defect, P1)

defect

Tracking

()

RESOLVED FIXED
112 Branch
Tracking Status
firefox-esr102 --- wontfix
firefox110 --- wontfix
firefox111 --- wontfix
firefox112 --- fixed

People

(Reporter: whimboo, Assigned: valentin)

References

(Blocks 2 open bugs, Regression)

Details

(Keywords: power, regression, Whiteboard: [necko-triaged])

Attachments

(5 files)

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/111.0 ID:20230129213553

With my current profile I have a couple of pages which regularly do not finish loading (see bug 1503259). All of them are trying to load fonts from eg. fonts.gstatic.com and as it looks like the response is never received.

First I thought it is related to the NoScript extension but after removing it the problem was still persistent after a while - probably when certain items from the network cache gets removed.

Here a profile: https://share.firefox.dev/3Y0WV4p

So far I don't have steps for a fresh profile but also haven't had the time yet to strip down my daily profile.

Here some pages where I can see this behavior:

Attached file test_profile.zip

I bisected my profile and that is the minimal state I could strip my profile to actually reproduce this problem.

Here some steps:

  1. Unzip the profile
  2. Start Firefox with the profile by eg. using the --profile command line option
  3. If the first tab (Selenium web page) finishes loading quit Firefox and repeat step 2

Usually it takes me up to 3 restarts of Firefox to replicate the problem.

Gijs, given that it's all related to browser preferences I wonder if this might be related to some weird behavior due to browser cache in use? If yes we probably want to move this bug into another component.

Flags: needinfo?(gijskruitbosch+bugs)
Attached file prefs.js from profile

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #1)

Gijs, given that it's all related to browser preferences I wonder if this might be related to some weird behavior due to browser cache in use? If yes we probably want to move this bug into another component.

I'm not sure why you needinfo'd me as I'm not a networking expert. This still seems like a networking issue, if the request never completes?

It looks like this profile disables "smart" caching and sets the disk cache capacity to 0, meaning nothing is allowed to cache to disk. I expect this breaks "something" that leads to the brokenness, but I'm not sure what. Hopefully the networking team can help.

FWIW, I can reproduce with this profile and repeated mach run invocations on a macOS machine.

Flags: needinfo?(gijskruitbosch+bugs)

Not sure how related bug 1814976 is but it's also google and also "infinite loading spinner", so adding for now and we can remove if it becomes obvious there's no relation.

See Also: → 1814976

(In reply to :Gijs (he/him) from comment #2)

It looks like this profile disables "smart" caching and sets the disk cache capacity to 0, meaning nothing is allowed to cache to disk. I expect this breaks "something" that leads to the brokenness, but I'm not sure what. Hopefully the networking team can help.

FYI I've set those values some years ago when I switched to a SSD drive and to reduce the amount of data that gets written. Not sure how much impact that would still have those days and in my case I might set those values back to the original ones.

Valentin, could you maybe have a look at the attached profile in comment 0? Any idea what could cause the blocking of network requests that eg. trying to load fonts?

Flags: needinfo?(valentin.gosu)

I tried to use mozregression to check if that is actually a regression and it actually is. Given that no autoland builds are available anymore for 78.0a1 builds this is the closest changeset that I was able to get to:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=34d71b4a00863e7615463592662dfe362c4a993e&tochange=385f49adaf00d02fc8e04da1e0031e3477182d67

Thank you for the profile with logs, I think I know what's going on:

In the parent process these are the 3 channels for the Bd1O9uYNmnUQomAgE25imKSbHLSMA6Sw_lSg.woff2 request.
They all seemed to complete successfully - they got onDataAvailable and onStop, but they didn't reach the HttpChannelChild.

(nsHttp) nsHttpChannel::Init [this=359332e00]
{ "start": 175871086.186875, "end": null, "name": "LogMessages", "category": 1, "threadId": null, "data": { "type": "Log", "module": "nsHttp", "name": "nsHttpChannel::Init [this=359332e00]\n" } }

nsHttpChannel::Init [this=359333700]
{ "start": 175871090.045792, "end": null, "name": "LogMessages", "category": 1, "threadId": null, "data": { "type": "Log", "module": "nsHttp", "name": "nsHttpChannel::Init [this=359333700]\n" } }

nsHttpChannel::Init [this=43307dc00]
{ "start": 175871090.8335, "end": null, "name": "LogMessages", "category": 1, "threadId": null, "data": { "type": "Log", "module": "nsHttp", "name": "nsHttpChannel::Init [this=43307dc00]\n" } }

Filtering by this=43307dc00 it seems the problem is likely caused by race cache with network.
We look for an entry in the cache, but it takes too long, so we trigger the network after 20ms. The cache callback eventually comes, so we cancel the network transaction - we look at the cache entry, but it seems it's incomplete, so we hit the network from here. The request eventually completes, but we raced and we expected the response to come from the cache, so we ignore it and never notify the child.

Not sure how common this is, but this kind of intermittent failure is really bad.

Assignee: nobody → valentin.gosu
Blocks: RCWN
Severity: -- → S2
Flags: needinfo?(valentin.gosu)
Priority: -- → P1
Whiteboard: [necko-triaged]

Perhaps this bug is related?
Bug 1027122 - Fonts loaded from cache are missing from DevTools network tab

If a new network request is performed because when the cache wins
(either a revalidation, or because the cache entry is not complete) we need to
clear mRaceCacheWithNetwork so the network response is not ignored.

Depends on D169064

Flags: needinfo?(hskupin)

By using this build I'm no longer able to reproduce the reported issue. Switching back to a Nightly build and restarting Firefox if needed multiple times always shows it. As such the patch fixes the problem for me. Thanks a lot!

Flags: needinfo?(hskupin)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #11)

By using this build I'm no longer able to reproduce the reported issue. Switching back to a Nightly build and restarting Firefox if needed multiple times always shows it. As such the patch fixes the problem for me. Thanks a lot!

Going to assume you didn't mean to clear the regression keyword.

Keywords: regression

(In reply to :Gijs (he/him) from comment #12)

Going to assume you didn't mean to clear the regression keyword.

Indeed. Nevertheless it looks like we cannot easily find the regressor here so not sure if we actually have to keep it as a regression given that it's already quite old.

Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/c9c03c715fa0
Test that RCWN network response isn't ignored when triggeered by cache load r=necko-reviewers,kershaw
https://hg.mozilla.org/integration/autoland/rev/60b65e604e1c
Clear mRaceCacheWithNetwork when a new request is performed r=necko-reviewers,kershaw

Based on comment #5, this bug contains a bisection range found by mozregression. However, the Regressed by field is still not filled.

:valentin, if possible, could you fill the Regressed by field and investigate this regression?

For more information, please visit auto_nag documentation.

Flags: needinfo?(valentin.gosu)

Probably regressed by bug 1392841

Flags: needinfo?(valentin.gosu)
Regressed by: 1392841
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/85190e2952da
Test that RCWN network response isn't ignored when triggeered by cache load r=necko-reviewers,kershaw
https://hg.mozilla.org/integration/autoland/rev/dafb2e6890e1
Clear mRaceCacheWithNetwork when a new request is performed r=necko-reviewers,kershaw
Regressions: 1816466

Set release status flags based on info from the regressing bug 1392841

Previously the network could be triggered by calling
nsHttpChannel::Test_triggerNetwork.
Technically this should have been called after AsyncOpen & BeginConenct
were called. However, is rare instances, if the call to BeginConnect
were delayed and the test ended up calling channel.test_triggerNetwork
then we would try to connect without first setting up mConnectionInfo
in BeginConnect.

This patch changes Test_triggerNetwork so that it doesn't immediately
trigger the network - instead it sets a delay variable similar to
Test_delayCacheEntryOpeningBy. However if Test_triggerNetwork(0) is
called and a mNetworkTriggerTimer exists, it will immediately trigger
the network.

The tests were updated to account for the fact that
channel.test_triggerNetwork now needs to be called before asyncOpen
in order for the delay to be properly registered.

Depends on D169065

Turns out the issue causing the backout was the call to Test_triggerNetwork from the test.
I made a diagram of the RCWN related function calls here You can see that if Test_triggerNetwork(0) gets called before the classifier or proxy resolution are completed, we end up calling ContinueConnect before BeginConnect, which leads to the null connectionInfo dereference.

Flags: needinfo?(valentin.gosu)
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/73c741cb7a5d
Test that RCWN network response isn't ignored when triggeered by cache load r=necko-reviewers,kershaw
https://hg.mozilla.org/integration/autoland/rev/fa3b52b91a6f
Clear mRaceCacheWithNetwork when a new request is performed r=necko-reviewers,kershaw
https://hg.mozilla.org/integration/autoland/rev/0c16d02bd213
Update RCWN test functions r=necko-reviewers,kershaw
Regressions: 1818653
Regressions: 1818757

Thanks Valentin! I've tested with the latest nightly and it all works great now! I assume we want to uplift to 111 at least, but not sure about 102esr?

Flags: needinfo?(valentin.gosu)
Blocks: power-usage
Keywords: power

I was looking into whether bug 1818653, and bug 1818757, pose any risk for an uplift.

Flags: needinfo?(valentin.gosu)

Comment on attachment 9316313 [details]
Bug 1814961 - Clear mRaceCacheWithNetwork when a new request is performed r=#necko

Beta/Release Uplift Approval Request

  • User impact if declined: Users with slow hardware and fast networks (that is more likely to trigger race-cache-with-network) might have some of their sub-requests hang unexpectedly leading to page loads hanging or bad styling for pages.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): This patch fixes the issue by clearing the mRaceCacheWithNetwork flag after we've settled on using a network request, so the subsequent network request works.
    While the fix has been verified on Nightly there is a small chance that the change in behaviour might uncover other issues.
  • String changes made/needed:
  • Is Android affected?: Yes
Attachment #9316313 - Flags: approval-mozilla-beta?
Attachment #9316312 - Flags: approval-mozilla-beta?
Attachment #9319433 - Flags: approval-mozilla-beta?

Comment on attachment 9316312 [details]
Bug 1814961 - Test that RCWN network response isn't ignored when triggeered by cache load r=#necko

Rejecting beta uplift request
We are at the end of the 111 beta cycle. This bug was introduced in Fx59, the patch has medium risk, and if we uplift it now it won't have any time to bake in beta.
Please let me know if there are any objections.

Attachment #9316312 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9316313 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9319433 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: