Closed Bug 1470355 Opened 6 years ago Closed 6 years ago

Firefox is unresponsive when visiting https://www.youtube.com/premium

Categories

(Core :: Networking, defect, P2)

x86_64
All
defect

Tracking

()

VERIFIED FIXED
mozilla63
Tracking Status
firefox62 --- verified
firefox63 --- verified

People

(Reporter: github, Assigned: baku)

References

()

Details

(Keywords: nightly-community, regression, Whiteboard: [necko-triaged])

Attachments

(3 files, 1 obsolete file)

Just tested it with a fresh profile under macOS 10.13.3.

STR:
Open https://www.youtube.com/premium

What happens:
Under macOS the Spinning Rainbow Wheel appears as the page is loaded partly, only a hard kill with the Activity Manager (Task Manager) helps.

As I'm running into some issues with the regression tool it's hard for me to point to a exact regressions. I definitely opened that page some days ago and it worked then (I'm always on the last builds)
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:63.0) Gecko/20100101 Firefox/63.0
Build ID: 20180626220124

I didn't managed to reproduce this issue on the latest nightly build 63.0a1 (2018-06-26). 

Could you please provide additional steps to the STR? Also, could you provide your hardware information? (model and year) 

And please test if the issue is reproducible in safe mode, here is a link that can help you: https://support.mozilla.org/en-US/kb/troubleshoot-firefox-issues-using-safe-mode
Flags: needinfo?(github)
Unfortunately there's no more STR; I just type in the URL, press enter and wait till Firefox becomes unresponsive.
I've just tested it with 63.0a1 (2018-06-27) and 63.0a1 (2018-06-26) in a fresh profile and with safe mode on; still the same issue. One thing that might would have an effect on this is the Country your IP's is from in combination with the language? Youtube might deliver different pages for different countries. I'm currently in the US with en_US
I also tried to safe the source code with Safari and than open it with Nightly but there I don't have any issue with loading the page.

I've also tried it with 60.0b14 and 62.0b3 and both time I don't run into the issue.
Flags: needinfo?(github) → needinfo?(george.craciun)
I can confirm the issue on macOS 10.13 and Firefox Nightly. Firefox freezes after opening the link from comment #0.
After asking for confirmation in a forum I think the issue is not specific to a language (so far crash for German and English site is confirmed) and Linux is probably also affected but Windows (10) definitely not.
User Agent 	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:63.0) Gecko/20100101 Firefox/63.0
Build ID 	20180627222831

Finally I've managed to reproduce this issue on the latest Nightly build. The issue also occurs on Linux.
Status: UNCONFIRMED → NEW
Component: Untriaged → JavaScript Engine
Ever confirmed: true
Product: Firefox → Core
This sounds a little bit similar to bug 1471409.

Is this a perma-hang? If someone here is able to hit it reliably in macOS, can you please then open up the Activity Monitor, choose Firefox in the list of processes, and collect a process sample? (The gear icon near the top of the Activity Monitor should let you do that).

Once a process sample is gathered, can it then be attached to this bug please? That way, hopefully, we'll get some visibility into what the hang is, well, hanging on.
See Also: → 1471409
Here we go, attached the sample of the non-responding process.
    +                                         222 nsBaseAppShell::NativeEventCallback()  (in XUL) + 113  [0x10d277cb1]
    +                                           222 NS_ProcessPendingEvents(nsIThread*, unsigned int)  (in XUL) + 107  [0x10af821bb]
    +                                             222 nsThread::ProcessNextEvent(bool, bool*)  (in XUL) + 1800  [0x10af83d78]
    +                                               222 nsInputStreamReadyEvent::Run()  (in XUL) + 43  [0x10af5de5b]
    +                                                 222 nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*)  (in XUL) + 265  [0x10b01a5c9]
    +                                                   222 nsInputStreamPump::OnStateTransfer()  (in XUL) + 510  [0x10b01aa5e]
    +                                                     222 mozilla::net::nsHttpChannel::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned long long, unsigned int)  (in XUL) + 923  [0x10b32d1cb]
    +                                                       222 mozilla::net::HttpChannelParent::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned long long, unsigned int)  (in XUL) + 352  [0x10b2f9290]
    +                                                         222 NS_ReadInputStreamToString(nsIInputStream*, nsTSubstring<char>&, long long, unsigned long long*)  (in XUL) + 136  [0x10b023c68]
    +                                                           222 NS_ReadInputStreamToBuffer(nsIInputStream*, void**, long long, unsigned long long*)  (in XUL) + 1200  [0x10b023a00]
    +                                                             222 mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&)  (in libmozglue.dylib) + 28  [0x101f1268c]
    +                                                               222 _pthread_cond_wait  (in libsystem_pthread.dylib) + 732  [0x7fff4ffc5662]
    +                                                                 222 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fff4fe88cee]

Interesting - we're waiting on a lock somewhere in Necko?

Hey Dragana, any idea what might be going on here?
Flags: needinfo?(dd.mozilla)
There has been a recent change in nsNetUtil.cpp that has changed some locks in the BufferWriter class. Let's ask the author, I think the date the patch is pushed corresponds to this hang.

Andrea can you take a look if this is caused by bug 1469290.
Flags: needinfo?(dd.mozilla) → needinfo?(amarchesini)
> Interesting - we're waiting on a lock somewhere in Necko?

We use the lock only when the inputStream->ReadSegments() returns NS_BASE_STREAM_WOULD_BLOCK. When this happens, we call ::AsyncWait() and the reading will continue on a separate thread.

The question is: why input->ReadSegments() is not able to read data from the stream immediately? We read what has been asked here:
https://searchfox.org/mozilla-central/rev/d2966246905102b36ef5221b0e3cbccf7ea15a86/netwerk/base/nsNetUtil.cpp#1658,1664

Where mCount is what HttpChannelParent::OnDataAvailable receives as argument here:

https://searchfox.org/mozilla-central/rev/d2966246905102b36ef5221b0e3cbccf7ea15a86/netwerk/protocol/http/HttpChannelParent.cpp#1594,1613,1621-1623,1638

So, in theory, that amount of data should be available in the stream.
I could change NS_ReadInputStreamToBuffer() to avoid blocking, but that seems a bad way to hide a bug.
Dragana, does it sound reasonable?

Is there an STR to reproduce this issue?
Flags: needinfo?(amarchesini) → needinfo?(dd.mozilla)
Component: JavaScript Engine → Networking
HttpChannelParent::OnDataAvailable is called from here:
https://dxr.mozilla.org/mozilla-central/source/netwerk/base/nsInputStreamPump.cpp#597

and count is set to the actual available amount of data. So this data should be available.

This code has always worked like this. please do not change NS_ReadInputStreamToBuffer. 

I have try to reproduce it using https://www.youtube.com/premium but I could not.

Maybe log for nsHttp:5,nsSocketTransport:5,nsStreamPump:5 would help
Flags: needinfo?(dd.mozilla)
Cristian, can you try to make a http log:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
use modules timestamp,nsHttp:5,nsSocketTransport:5,nsStreamPump:5.
Was my first time but here the link for the archive with logs: (both links are for the same file)
https://send.firefox.com/download/206e828cfc/#kFqSHTH-pYJGSVSnlE6vwg
https://nofile.io/f/R0JgKMgCGu2/http-logs.zip
Flags: needinfo?(george.craciun)
The stream is coming from the cache. Michal, has something change in the cache recently? Can you take a look? There is 262144 available on the stream, but we end up waiting for data...

(the log is missing cache log :( )
Flags: needinfo?(michal.novotny)
(In reply to Dragana Damjanovic [:dragana] from comment #14)
> The stream is coming from the cache. Michal, has something change in the
> cache recently? Can you take a look? There is 262144 available on the
> stream, but we end up waiting for data...
> 
> (the log is missing cache log :( )

AFAIK, there was no recent change in the cache streams. Can you point me to an exact lines in the log?
Flags: needinfo?(michal.novotny) → needinfo?(dd.mozilla)
(In reply to Dragana Damjanovic [:dragana] from comment #14)
> (the log is missing cache log :( )

I was using the timestamp,nsHttp:5,nsSocketTransport:5,nsStreamPump:5 modules as you said, is there anything else that I would had to include? Or is there something I missed? Was my first time that I was using it.
Please use timestamp,sync,nsHttp:5,nsSocketTransport:5,nsStreamPump:5,cache2:5
Was finally able to upload it somewhere; https://nofile.io/f/apb1wmnZUPW/http-logs-with-cache.tar.bz2

If this log is somehow different than all the other one, please let me know. The unresponsiveness happened at a different stage of the website loading.
(In reply to Andrea Marchesini [:baku] from comment #10)
> > Interesting - we're waiting on a lock somewhere in Necko?
> 
> We use the lock only when the inputStream->ReadSegments() returns
> NS_BASE_STREAM_WOULD_BLOCK. When this happens, we call ::AsyncWait() and the
> reading will continue on a separate thread.

This approach seems to be wrong. What if the stream needs to run an event on this thread to get more data?

> The question is: why input->ReadSegments() is not able to read data from the
> stream immediately? We read what has been asked here:
> https://searchfox.org/mozilla-central/rev/
> d2966246905102b36ef5221b0e3cbccf7ea15a86/netwerk/base/nsNetUtil.cpp#1658,1664
> 
> Where mCount is what HttpChannelParent::OnDataAvailable receives as argument
> here:
> 
> https://searchfox.org/mozilla-central/rev/
> d2966246905102b36ef5221b0e3cbccf7ea15a86/netwerk/protocol/http/
> HttpChannelParent.cpp#1594,1613,1621-1623,1638
> 
> So, in theory, that amount of data should be available in the stream.
> I could change NS_ReadInputStreamToBuffer() to avoid blocking, but that
> seems a bad way to hide a bug.
> Dragana, does it sound reasonable?

The amount is available in the stream. The problem is that BufferWriter::WriteAsync calls ReadSegments (with length == 0) again even if it reads all available data and that CacheFileInputStream::ReadSegments returns NS_BASE_STREAM_WOULD_BLOCK if aCount is 0 and there is no data.
 
> Is there an STR to reproduce this issue?

It happens when there are 2 HTTP channels, one writes the data to the cache entry and the other reads from the entry. The bug seems to occur on chunk boundary which is 256KB.
Flags: needinfo?(dd.mozilla)
Baku, the patch should fix this particular problem, but BufferWriter really shouldn't call ReadSegments with aCount=0. And blocking the thread when NS_BASE_STREAM_WOULD_BLOCK is returned is IMO a potential deadlock.
Flags: needinfo?(amarchesini)
Assignee: nobody → michal.novotny
Priority: -- → P2
Whiteboard: [necko-triaged]
(In reply to Michal Novotny (:michal) from comment #21)
> Baku, the patch should fix this particular problem, but BufferWriter really
> shouldn't call ReadSegments with aCount=0. And blocking the thread when
> NS_BASE_STREAM_WOULD_BLOCK is returned is IMO a potential deadlock.

I'll file a follow up bug and work on it.
Flags: needinfo?(amarchesini)
> > We use the lock only when the inputStream->ReadSegments() returns
> > NS_BASE_STREAM_WOULD_BLOCK. When this happens, we call ::AsyncWait() and the
> > reading will continue on a separate thread.
> 
> This approach seems to be wrong. What if the stream needs to run an event on
> this thread to get more data?

This is definitely an issue in NS_ReadInputStreamToBuffer() when used to read the full content of the stream in a buffer.

Currently NS_ReadInputStreamToBuffer() is used in 2 different scenarios:

1. to read what is available, without blocking and when we are sure that that amount of data data is available in the stream. This the scenario of OnDataAvailable(). No lock should be used.

2. When we want to read the full content of the stream in memory. This definitely has the issue you are describing. We should try to remove all the uses of NS_ReadInputStreamToBuffer() for this scenario. Fortunately, there are no streams that needs to 
dispatch runnables on the current thread, as far as I know.

Probably, a good first step would be to split the 2 scenarios in 2 different methods. This will allow us to add assertions and do a proper audit of how/when NS_ReadInputStreamToBuffer() is used.
Attached patch stream.patch (obsolete) — Splinter Review
Actually, it's probably better to land my patch with yours.
Attachment #8989120 - Flags: review?(michal.novotny)
Attachment #8989112 - Flags: review?(valentin.gosu) → review+
Attachment #8989120 - Flags: review?(michal.novotny) → review+
Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/198a4644c09a
Firefox is unresponsive when visiting https://www.youtube.com/premium, r=valentin
https://hg.mozilla.org/integration/mozilla-inbound/rev/50b0bfa64b2c
BufferWriter should stop reading data when the wanted amount has been reached, r=michal
Keywords: checkin-needed
Attached patch stream.patchSplinter Review
Assignee: michal.novotny → amarchesini
Attachment #8989120 - Attachment is obsolete: true
Flags: needinfo?(amarchesini)
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d5e2a4d678f0
BufferWriter should stop reading data when the wanted amount has been reached, r=michal
https://hg.mozilla.org/integration/mozilla-inbound/rev/715cc1bda8a2
Firefox is unresponsive when visiting https://www.youtube.com/premium, r=valentin
https://hg.mozilla.org/mozilla-central/rev/d5e2a4d678f0
https://hg.mozilla.org/mozilla-central/rev/715cc1bda8a2
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Comment on attachment 8989112 [details] [diff] [review]
patch v1 - don't return NS_BASE_STREAM_WOULD_BLOCK from CacheFileInputStream::ReadSegments when aCount is 0

Approval Request Comment
[Feature/Bug causing the regression]: bug 1469290
[User impact if declined]: possible deadlock on a main thread
[Is this code covered by automated tests?]: the code is executed by all tests using disk cache
[Has the fix been verified in Nightly?]: no
[Needs manual test from QE? If yes, steps to reproduce]: no, it's not easy to trigger the bug
[List of other uplifts needed for the feature/fix]: the patch should land together with the second patch in this bug
[Is the change risky?]: no
[Why is the change risky/not risky?]: simple change
[String changes made/needed]: none
Flags: needinfo?(michal.novotny)
Attachment #8989112 - Flags: approval-mozilla-beta?
Comment on attachment 8989151 [details] [diff] [review]
stream.patch

Approval Request Comment
[Feature/Bug causing the regression]: bug 1469290
[User impact if declined]: possible deadlock on a main thread
[Is this code covered by automated tests?]: the code is executed by all tests using disk cache
[Has the fix been verified in Nightly?]: no
[Needs manual test from QE? If yes, steps to reproduce]: no, it's not easy to trigger the bug
[List of other uplifts needed for the feature/fix]: the patch should land together with the second patch in this bug
[Is the change risky?]: no
[Why is the change risky/not risky?]: simple change
[String changes made/needed]: none
Attachment #8989151 - Flags: approval-mozilla-beta?
Depends on: 1472945
Blocks: 1469290
Keywords: regression
OS: Mac OS X → All
Patrick, does the problem seem to be fixed for you in the latest Nightly build? I realized you may not be able to reproduce the issue 100% of the time but it sounds like you hit it fairly often, originally.
Flags: needinfo?(github)
Comment on attachment 8989151 [details] [diff] [review]
stream.patch

Fix for a new regression in 62, has some test coverage. Let's try this on beta 7. 

I'd still love to see any verification that the issue is fixed (either in nightly now, or in next week's beta)
Attachment #8989151 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I can confirm that the issue is no longer appearing with 2018-07-06 nightly builds. Website can be loaded and is responsive. (tested with my old profile and a new one)
Flags: needinfo?(github)
Fabulous, thank you!
Comment on attachment 8989112 [details] [diff] [review]
patch v1 - don't return NS_BASE_STREAM_WOULD_BLOCK from CacheFileInputStream::ReadSegments when aCount is 0

Fix for a hang with YouTube, verified in Nightly. Let's uplift for beta 7.
Attachment #8989112 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I have managed to reproduce this issue on Ubuntu using Fx buildID: 20180626220124 and macOS 10.13.6 using Fx buildID: 20180621220127.
The issue is verified as fixed using Fx 63.0b7 buildID: 20180917143811 and Fx 62.0 buildID: 20180830143136 on both Ubuntu 16.04 x64 and macOS 10.13.6 environments. The browser no longer becomes unresponsive when loading the www.youtube.com/premium webpage.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: