Problems loading images on youtube.com, wikipedia.org, and some images opened from discord
Categories
(Core :: Networking, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox90 | --- | verified |
People
(Reporter: AwesomeSheep48, Assigned: dragana)
Details
(Whiteboard: [not-a-fission-bug][necko-triaged])
Attachments
(12 files, 1 obsolete file)
1.29 MB,
video/quicktime
|
Details | |
854.24 KB,
video/mp4
|
Details | |
2.61 MB,
video/mp4
|
Details | |
375.25 KB,
image/png
|
Details | |
194.30 KB,
image/png
|
Details | |
6.77 MB,
video/quicktime
|
Details | |
1.10 MB,
image/png
|
Details | |
979.16 KB,
image/png
|
Details | |
3.51 MB,
video/quicktime
|
Details | |
8.01 MB,
application/x-7z-compressed
|
Details | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:89.0) Gecko/20100101 Firefox/89.0
Steps to reproduce:
Open https://github.com/pietervanheijningen/clickbait-remover-for-youtube or https://forum.paradoxplaza.com/forum/forums/stellaris.900/
Actual results:
Pages loads for ~3 minutes, then it stops
Expected results:
Pages should load properly like when fission is turned off
Reporter | ||
Comment 1•3 years ago
|
||
This only happens after using Firefox for a while, it is fixed by restarting
Comment 2•3 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::DOM: Navigation' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.
Reporter | ||
Comment 3•3 years ago
|
||
Reporter | ||
Comment 4•3 years ago
|
||
Comment on attachment 9211831 [details]
Screen Recording 2021-03-26 at 11.53.27 AM.mov
It only loads after cmd+shft+r
Reporter | ||
Comment 5•3 years ago
|
||
On further testing it seems that fission might be unrelated, I'll do some more
Reporter | ||
Comment 6•3 years ago
|
||
Comment 7•3 years ago
|
||
I wonder if this is related to http3.
See bug 1695717.
Reporter | ||
Comment 8•3 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #7)
I wonder if this is related to http3.
See bug 1695717.
I think it might be
Reporter | ||
Comment 9•3 years ago
|
||
Strike that, something just happened
Reporter | ||
Comment 10•3 years ago
|
||
Reporter | ||
Comment 11•3 years ago
|
||
Opening a file from the discord app is when I notice it the most
Comment 12•3 years ago
|
||
(In reply to daviswill048 from comment #4)
It only loads after cmd+shft+r
Opening a file from the discord app is when I notice it the most
If you have seen this problem on both YouTube and Discord and the problem seems to go away after Cmd+Shift+R, then this bug might be related to sites trying to update their Service Workers while the Service Worker is in use.
When you next reproduce this bug, can you please record a performance profile using the Firefox Profiler? This will include information about Service Workers and what code Firefox is running at the time. Here are instructions for enabling the Firefox Profiler ahead of time:
I recommend changing these profiler settings:
- Click the profiler toolbar button's down arrow.
- Change Settings to "Custom".
- Click "Edit Settings".
- Then search for and check the "DOM Worker" and "IPC Messages" checkboxes.
Comment 13•3 years ago
|
||
William Davis, please see comment 12 for some additional info that we need. Thanks!
Reporter | ||
Comment 14•3 years ago
|
||
Seems to be fixed now, I'll comment and try to send the performance profile if it happens again
Reporter | ||
Comment 15•3 years ago
|
||
And it just happened https://share.firefox.dev/3tl6T1l
Reporter | ||
Comment 16•3 years ago
|
||
Reporter | ||
Comment 17•3 years ago
|
||
Reporter | ||
Comment 18•3 years ago
|
||
Command Shift R isn't working either https://share.firefox.dev/3v1NoeT
Reporter | ||
Comment 19•3 years ago
|
||
Reporter | ||
Comment 20•3 years ago
|
||
Reporter | ||
Comment 21•3 years ago
|
||
For images to load, I need to open them in a new tab, then command shift r
Reporter | ||
Comment 22•3 years ago
|
||
Comment 23•3 years ago
|
||
Randell, please use the profile recording to figure out the cause of this issue. Thanks!
Comment 24•3 years ago
|
||
The profile shows a bunch of images that have been requested, but apparently not received from the server.
I don't see anything fission-related here; forwarding to the necko team to look into why we're not receiving the images. Do we have any info on http3 use in this instance?
William: The necko team may want to you collect network logs; you can turn them on via about:networking.
Reporter | ||
Comment 25•3 years ago
|
||
I am using http3, but it was still happening without it as seen in comment 10. It isn'y happening at the moment, but I'll get network logs if it starts again.
Assignee | ||
Comment 26•3 years ago
|
||
This sound like bug 1703934.
But, probably it is not. The other bug is showing if HTTP3 is enabled and pref network.dns.use_https_rr_as_altsvc is true and DoH is used and probably network.http.speculative-parallel-limit is 0.
But you see the bug without HTTP3 so his is no the same.
Can you disable HTTP3 to limit the place where thing can go wrong and make a http log for me:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
The log may contain cookies so please log out of the site to invalidate cookies that are in the log.
Comment 27•3 years ago
|
||
[(In reply to Randell Jesup [:jesup] (needinfo me) from comment #24)
I don't see anything fission-related here; forwarding to the necko team to look into why we're not receiving the images. Do we have any info on http3 use in this instance?
In that case, I will remove "Fission" from the bug summary.
@ Andrew, the Fission bug triage team wonders if this might be a Service Worker issue. Are there any additional instructions for debugging Service Worker issues that you'd like to add to what I already suggested in comment 12?
Comment 28•3 years ago
|
||
ServiceWorkers could perhaps be involved in some of the cases? But it seems like some of these profiles are for wiktionary.org and wikipedia.org, neither of which seem to serve me a ServicewWorker?
:julienw is currently implementing/driving/landing a bunch of enhancements to how the profiler logs what ServiceWorkers are doing in bug 1567222 which should ideally make it easier to further figure out what's going on from just the profiler.
:juliew, could you quickly take a look at some of the profiles here and help me better understand if they potentially indicate that ServiceWorkers are involved? One meta-issue is I don't know if the profiler runs are started early enough to catch the relevant events. For example, the profiler run in comment 18 only has a favicon fetch in the network tab.
Comment 29•3 years ago
|
||
I looked especially at the profiler URL in comment 18.
Firs, take care that this URL will select the webextensions thread, but there are more. The parent process is at the top, and the web process at the bottom. There are a lot more network requests in those. The requests in the parent process look odd (lots of unfinished requests, but that may be normal with tracking stuff, OR anti-tracking exposes a bug to our profiler code) but the requests in the isolated web process look about right.
Then I wanted to see if a service worker was involved, so I checked all "DOM Worker" threads from the blue button at the top, and inspected them all. I turned the call tree "implementation" option to "JavaScript" (instead of "All Stacks"), and looked at all of them. And there seems to be one related in the "Isolated Web Process" indeed (see direct link https://share.firefox.dev/3ekLij8), because the name of the file includes "serviceworker".
Another way, maybe easier, is going to the Marker Chart, and looking at the DOMEvents specific to service workers (especially "fetch"). In this case I see no "fetch" DOMEvents, so I'd be cautious. But being a notification-only service worker (from the name of the script) it's probably normal we don't have these events.
Lastly I moved forward again and just curl-ed the script URl https://www.youtube.com/s/desktop/e84fb691/jsbin/serviceworker-notifications.vflset/serviceworker-notifications.js, obviously that's minified but I do see occurrences of self.registration.pushManager.getSubscription()
. So that seems to be it.
To help with analyzing using the profiler, we could:
- have a different name for service worker threads, besides just "DOM Worker". That should be reasonably easy, but I don't know how all this works :-) In the world of Fission the Worker would be in the same process as the content page, so that makes things easier too.
- show in the network requests that they're being handled by a service worker -- that's more on our team.
Hope this helps, please needinfo again if you want me to look at some of the other profiles!
Reporter | ||
Comment 30•3 years ago
|
||
(In reply to Dragana Damjanovic [:dragana] from comment #26)
This sound like bug 1703934.
But, probably it is not. The other bug is showing if HTTP3 is enabled and pref network.dns.use_https_rr_as_altsvc is true and DoH is used and probably network.http.speculative-parallel-limit is 0.
But you see the bug without HTTP3 so his is no the same.Can you disable HTTP3 to limit the place where thing can go wrong and make a http log for me:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
The log may contain cookies so please log out of the site to invalidate cookies that are in the log.
Reporter | ||
Comment 31•3 years ago
|
||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 32•3 years ago
|
||
When TRR is used in mode 3(only in mode 3) AsyncResolveNative can fail immediately (sync) and we need to make sure that we retry it in the same way as in DnsAndConnectSocket::TransportSetup::OnLookupComplete. If RESOLVE_IP_HINT is set we already retry he lookup, but we do not retry if mRetryWithDifferentIPFamily is set.
Assignee | ||
Comment 33•3 years ago
|
||
Assignee | ||
Comment 34•3 years ago
|
||
When TRR is used in mode 3(only in mode 3) AsyncResolveNative can fail immediately (sync) and we need to make sure that we retry it in the same way as in DnsAndConnectSocket::TransportSetup::OnLookupComplete. If RESOLVE_IP_HINT is set we already retry he lookup, but we do not retry if mRetryWithDifferentIPFamily is set.
Updated•3 years ago
|
Assignee | ||
Comment 35•3 years ago
|
||
Comment 36•3 years ago
|
||
Pushed by ddamjanovic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/52530d6a6a2b Make API to start a HTTP test server on a IPv6 address. r=necko-reviewers,kershaw https://hg.mozilla.org/integration/autoland/rev/1d222151411a Make sure to retry if AsyncResolveNative fails r=necko-reviewers,kershaw
Comment 37•3 years ago
|
||
Backed out 2 changesets (Bug 1700968) for causing failures in test_dns_retry.js CLOSED TREE
Failure log: https://treeherder.mozilla.org/logviewer?job_id=340335221&repo=autoland&lineNumber=1573
Backout: https://hg.mozilla.org/integration/autoland/rev/988608d69fb46e91f5108e4c61fd1e6ad31d5d26
Assignee | ||
Comment 38•3 years ago
|
||
It looks like on mac a server listening on a local ipv6 address is not open. This patch introduce a server on IPv6 address, we did not have them before. Locally it works. I will disable the new test on mac.
Comment 39•3 years ago
|
||
Pushed by ddamjanovic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/090bf8bb67b2 Make API to start a HTTP test server on a IPv6 address. r=necko-reviewers,kershaw https://hg.mozilla.org/integration/autoland/rev/d693453c66ad Make sure to retry if AsyncResolveNative fails r=necko-reviewers,kershaw
Comment 40•3 years ago
|
||
Backed out 2 changesets (Bug 1700968) for causing xpcshell failures in test_prefer_address_version_fail_trr3_1
Backout link: https://hg.mozilla.org/integration/autoland/rev/db63fba98dabc8d6af1e09d9d1da486576588460
Push with failures, failure log.
Comment 41•3 years ago
|
||
Pushed by ddamjanovic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fec6e7e6278f Make API to start a HTTP test server on a IPv6 address. r=necko-reviewers,kershaw https://hg.mozilla.org/integration/autoland/rev/06495cee5ce2 Make sure to retry if AsyncResolveNative fails r=necko-reviewers,kershaw
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 42•3 years ago
|
||
Disabling the test on the socket process. This will be fix in a separate bug, since the socket process project is on hold now.
Comment 43•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/fec6e7e6278f
https://hg.mozilla.org/mozilla-central/rev/06495cee5ce2
Updated•3 years ago
|
Comment 44•3 years ago
•
|
||
Hello,
I can't seem to reproduce this on Mac OS 11.4 ARM on Fx Nightly (BuildID: 20210325085523).
@WIlliamDavis can you please confirm that this issue is fixed for you? Here is a link to 90.0b5 build
Comment 46•3 years ago
|
||
Marking the bug as verified per reporter.
Description
•