Closed Bug 1798450 Opened 3 years ago Closed 2 years ago

YouTube page navigation hangs for 10-20 seconds

Categories

(WebExtensions :: Request Handling, defect)

defect

Tracking

(firefox-esr102 wontfix, firefox107 wontfix, firefox108 wontfix, firefox109 wontfix, firefox110 wontfix, firefox111 wontfix)

RESOLVED INVALID
Tracking Status
firefox-esr102 --- wontfix
firefox107 --- wontfix
firefox108 --- wontfix
firefox109 --- wontfix
firefox110 --- wontfix
firefox111 --- wontfix

People

(Reporter: cpeterson, Unassigned)

References

Details

Starting about a week ago, I noticed YouTube page navigations hanging for 10-20 seconds once or twice a day. Page navigation will hang at the ~70% mark for 10-20 seconds. Navigating on YouTube pages in other tabs will experience the same hang and then they will all resume loading at the same time, as if some centralized bottleneck unblocked all the tabs.

Back in September, I reported a similar problem in the #necko:mozilla.org Matrix channel about page loads (on any site, not just YouTube) hanging in multiple tabs simultaneously. Dragana mentioned that neqo had recently been updated to 0.6.1 (bug 1772092). Those hangs seemed to have disappeared, so I didn't get around to filing a bug. But starting about a week ago, I noticed similar hangs again, but only or mostly on YouTube.

The STR are something like:

  1. Load a YouTube page, such as https://www.youtube.com/user/TheOfficialACM/
  2. Click a link (such the "VIDEOS" link) to navigate to another YouTube page.
  3. You will see a red loading progress bar at the top of the page (drawn in content by YouTube).
  4. About once or twice a day, the page navigation will hang at the ~70% mark for 10-20 seconds. Navigating on YouTube pages in other tabs will experience the same hang and then they will all resume loading at the same time, as if some centralized bottleneck unblocked all the tabs.

I recorded an HTTP log, but the log file is too large (100 MB) to attach to this bug, so I shared it on Dropbox:

https://www.dropbox.com/s/pi7t8gwvva38ji1/log.txt-main.53445.moz_log?dl=0

I had about 20 tabs open, but they were all mostly idle except the YouTube tab. Yet, I see over 30,000 log messages about SocketContext::TimeoutIn. Is that normal?

2022-11-01 06:25:17.376544 UTC - [Parent 53445: Socket Thread]: D/nsSocketTransport   not engaged
2022-11-01 06:25:17.376546 UTC - [Parent 53445: Socket Thread]: D/nsSocketTransport SocketContext::TimeoutIn socket=132b72400, timeout=65535s

Back in September, I reported a similar problem in the #necko:mozilla.org Matrix channel about page loads (on any site, not just YouTube) hanging in multiple tabs simultaneously.

I don't know if it's relevant or coincidental, but around that same time I started seeing hangs, I had upgraded my home internet from 100 Mbps to 700 Mbps. Maybe there is a network race condition or stampede that is more likely on a faster network?

I hit this hang again today. As usual, it started when navigating on YouTube. I then tried opening two new tabs with non-YouTube sites: https://www.cnn.com/ and http://example.com (an insecure HTTP/1.1 site to make sure I wasn't using neqo code). CNN and example.com also hung until all three sites unblocked and finished loading. That these three different sites all hung demonstrates that my hangs are not bug is in neqo HTTP/3 code or a YouTube server bug.

Here's a Dropbox link to an HTTP log including YouTube, CNN, and example.com:

https://www.dropbox.com/s/pi7t8gwvva38ji1/log.txt-main.53445.moz_log?dl=0

I found STR that reliably reproduces this YouTube hang, but the STR is quite complicated and requires installing three different (Mozilla!) add-ons. Hopefully that means few users are affected.

The hang is reproducible back to at least Nightly 88 (February 2021), so I don't know why I only first noticed it a couple weeks ago. I can't bisect earlier than Nightly 88 because the STR requires the "Political and COVID-19 News Study" Rally add-on and the add-on doesn't support Firefox versions <= 87.

Steps to reproduce

  1. Install the "Facebook Container" add-on: https://addons.mozilla.org/en-US/firefox/addon/facebook-container/
  2. Install the "Political and COVID-19 News Study" Rally add-on: https://addons.mozilla.org/en-US/firefox/addon/political-covid-19-news-study/
  3. A new tab about Rally will open.
  4. Click the "Install Rally for Firefox" button.
  5. Click the "Complete Rally Enrollment" button.
  6. Click the "Get Started" button.
  7. Click the "Accept & Enroll" button.
  8. On the "Tell Us About Yourself" page > "Skip for Now" button
  9. A new tab about "Current Studies" will open.
  10. Scroll to the bottom of the page and click the "Political and COVID-19 News" study's "Join Study" button.
  11. Click the "Accept & Enroll" button.
  12. A new tab will open with a Princeton survey. You don't need to complete the survey. You can close the tab.
  13. Load https://www.youtube.com/.
  14. Cmd + click YouTube's "Home" to open a new YouTube tab in the background. Keep clicking and open 10+ YouTube tabs in the background
  15. While the background tabs are still loading, click to load a video in the foreground tab.

Result

The page will partially load and then stall. The video doesn't play and the comments don't load. Loading non-YouTube sites in other tabs will hang, too. Eventually (after 30+ seconds?), the video and comments will usually load.

When the page load hangs, chrome also hangs. Toggling an add-on on and off in about:addons takes ~5 seconds to respond. Quitting Firefox will hang for 5-10 seconds.

I can reproduce this hang with network.http.http3.enable = false, so it's not HTTP/3 specific.

I was able to reliably reproduce the YouTube hang on macOS using those STR, but not on my Windows 11 laptop.

Not sure if it's relevant, but I always hit the assertion below when trying to reproduce this.

Assertion failure: !mCallOnResume (How did that happen?), at /Users/changkershaw/work/gecko/netwerk/protocol/http/HttpBaseChannel.h:1042
#01: mozilla::net::HttpAsyncAborter<mozilla::net::nsHttpChannel>::HandleAsyncAbort()[/Users/changkershaw/work/gecko/objdir/toolkit/library/build/XUL +0x7f34d0]
#02: mozilla::detail::RunnableMethodImpl<mozilla::net::nsHttpChannel*, void (mozilla::net::nsHttpChannel::*)(), true, (mozilla::RunnableKind)0>::Run()[/Users/changkershaw/work/gecko/objdir/toolkit/library/build/XUL +0x84b928]
#03: mozilla::RunnableTask::Run()[/Users/changkershaw/work/gecko/objdir/toolkit/library/build/XUL +0x177d70]
#04: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&)[/Users/changkershaw/work/gecko/objdir/toolkit/library/build/XUL +0x17300c]
#05: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&)[/Users/changkershaw/work/gecko/objdir/toolkit/library/build/XUL +0x171e1c]
#06: mozilla::TaskController::ProcessPendingMTTask(bool)[/Users/changkershaw/work/gecko/objdir/toolkit/library/build/XUL +0x172164]
#07: mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_3>::Run()[/Users/changkershaw/work/gecko/objdir/toolkit/library/build/XUL +0x17bd78]

I can also reproduce this locally and the http log shows the stall is caused by a web extension.
Take the http channel 14a9b0c00 as an example.
It's suspended by WebRequest.jsm at 2022-11-15 09:35:36.014371.

2022-11-15 09:35:36.014371 UTC - [Parent 21587: Main Thread]: D/nsHttp nsHttpChannel::SuspendInternal [this=14a9b0c00]
2022-11-15 09:35:36.014382 UTC - [Parent 21587: Main Thread]: D/nsHttp 14a9b0c00 called from script: resource://gre/modules/WebRequest.jsm:1020:18
2022-11-15 09:35:36.014395 UTC - [Parent 21587: Main Thread]: D/nsHttp Waiting until resume [this=14a9b0c00]

The channel is never resumed until it got cancelled at 2022-11-15 09:36:37.988897.

2022-11-15 09:36:04.692068 UTC - [Parent 21587: Main Thread]: D/nsHttp nsHttpChannel::Cancel [this=14a9b0c00 status=804b0002, reason=nsDocLoader::Stop]
2022-11-15 09:36:04.692070 UTC - [Parent 21587: Main Thread]: D/nsHttp nsHttpChannel::CancelInternal [this=14a9b0c00]
2022-11-15 09:36:37.988897 UTC - [Parent 21587: Main Thread]: D/nsHttp nsHttpChannel::ResumeInternal [this=14a9b0c00]
2022-11-15 09:36:37.988906 UTC - [Parent 21587: Main Thread]: D/nsHttp 14a9b0c00 called from script: resource://gre/modules/WebRequest.jsm:1171:14

The full log can be downloaded here.

I'd like to change the component to WebExtension to get the right people to look at this.

Component: Networking: HTTP → Request Handling
Product: Core → WebExtensions

The severity field is not set for this bug.
:mixedpuppy, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(mixedpuppy)

If you can capture a profile, we should be able to see which add-on suspends the channel without resuming it. If that's not enough to find the offending piece of code, we can probably extend the profiler markers to include stacks.

(In reply to Chris Peterson [:cpeterson] from comment #0)

Starting about a week ago, I noticed YouTube page navigations hanging for 10-20 seconds once or twice a day.

I now wonder whether this bug is related to a change in macOS 13.0 (Ventura). I filed this bug on 2022-11-01 and said I noticed these hangs "starting about a week ago", which is suspiciously the week when macOS 13.0 was released (2022-10-24, according to Wikipedia). Also, I never see these hangs on Windows, even though I use Firefox on Windows all day for work and only use Firefox on macOS on my personal laptop.

Also, when the hangs happen, launching other programs also hangs: even running ls in the Terminal app will hang (even though the Terminal app itself is not hanging). When Firefox's tabs stop hanging, the other programs will resume running. Maybe macOS 13.0 changed something about its kernel locking that triggered by os_unfair_lock bug 1670885?

The problem seems worse now than when I filed this bug: even though I uninstalled the three extensions I originally needed in comment 3's STR, I still see these hangs, even on sites other than YouTube.

(In reply to Markus Stange [:mstange] from comment #8)

If you can capture a profile, we should be able to see which add-on suspends the channel without resuming it. If that's not enough to find the offending piece of code, we can probably extend the profiler markers to include stacks.

I'm still trying to catch the hang in the profiler. I forget to restart the profiler when I update Nightly each day.

Hey Chris,
based on comment 9 it seems this bug may have been hit because of some issue on the host system, and so we are not sure this will be actionable from a WebExtensions internals perspective, would it be reasonable to close as incomplete or invalid from your perspective?

Flags: needinfo?(mixedpuppy) → needinfo?(cpeterson)

(In reply to Luca Greco [:rpl] [:luca] [:lgreco] from comment #10)

based on comment 9 it seems this bug may have been hit because of some issue on the host system, and so we are not sure this will be actionable from a WebExtensions internals perspective, would it be reasonable to close as incomplete or invalid from your perspective?

I agree. I'll will close this bug as "INVALID" for now.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(cpeterson)
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.