Closed Bug 1174899 Opened 9 years ago Closed 9 years ago

Facebook timeline stuck on network request, high CPU usage on socket thread

Categories

(Core :: Networking, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla45
Tracking Status
firefox41 --- wontfix
firefox42 + wontfix
firefox43 + verified
firefox44 + verified
firefox45 --- verified
b2g-v2.5 --- fixed

People

(Reporter: bugzilla, Assigned: mcmanus)

References

Details

Attachments

(4 files)

I can reproduce this consistently on my laptop.

STR:
1) Load Facebook
2) Go to somebody's profile.
3) Scroll down their timeline. In my case this person updates daily and frequently posts HTML5 video.
4) Keep scrolling until eventually the infinite scroll seems to be stuck on a network request.

I see a non-main thread that is spinning. I've taken a few different snapshots of its stack:

MSVCR120!memset+0x26
mozglue!free_impl+0x98
xul!nsOutputStreamReadyEvent::Release+0x3a
xul!nsCOMPtr_base::assign_with_AddRef+0x23
xul!nsPipeOutputStream::AsyncWait+0x52
xul!mozilla::net::nsHttpTransaction::WriteSegments+0x308d06
xul!mozilla::net::SpdyStream31::WriteSegments+0x46
xul!mozilla::net::SpdySession31::WriteSegments+0x7e0
xul!mozilla::net::nsHttpConnection::OnSocketReadable+0xcc
xul!mozilla::net::nsHttpConnection::OnInputStreamReady+0x27
xul!nsSocketInputStream::OnSocketReady+0x82
xul!nsSocketTransport::OnSocketReady+0x67
xul!nsSocketTransportService::DoPollIteration+0x102
xul!nsSocketTransportService::Run+0x16f
xul!nsThread::ProcessNextEvent+0x309
xul!NS_ProcessNextEvent+0x1a
xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0xab
xul!MessageLoop::RunHandler+0x20
xul!MessageLoop::Run+0x19
xul!nsThread::ThreadFunc+0x8c
nss3!_PR_NativeRunThread+0x8c
MSVCR120!_getptd_noexit+0x6a
MSVCR120!_beginthreadex+0xb4
MSVCR120!_endthreadex+0x102
KERNEL32!BaseThreadInitThunk+0x24
ntdll!__RtlUserThreadStart+0x2f
ntdll!_RtlUserThreadStart+0x1b

KERNELBASE!TlsGetValue+0x23
nss3!PR_EnterMonitor+0x28
xul!mozilla::net::nsHttpConnectionMgr::EnsureSocketThreadTarget+0x50
xul!mozilla::net::nsHttpConnectionMgr::GetSocketThreadTarget+0xb
xul!mozilla::net::nsHttpTransaction::WriteSegments+0x308cef
xul!mozilla::net::SpdyStream31::WriteSegments+0x46
xul!mozilla::net::SpdySession31::WriteSegments+0x7e0
xul!mozilla::net::nsHttpConnection::OnSocketReadable+0xcc
xul!mozilla::net::nsHttpConnection::OnInputStreamReady+0x27
xul!nsSocketInputStream::OnSocketReady+0x82
xul!nsSocketTransport::OnSocketReady+0x67
xul!nsSocketTransportService::DoPollIteration+0x102
xul!nsSocketTransportService::Run+0x16f
xul!nsThread::ProcessNextEvent+0x309
xul!NS_ProcessNextEvent+0x1a
xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0xab
xul!MessageLoop::RunHandler+0x20
xul!MessageLoop::Run+0x19
xul!nsThread::ThreadFunc+0x8c
nss3!_PR_NativeRunThread+0x8c
MSVCR120!_getptd_noexit+0x6a
MSVCR120!_beginthreadex+0xb4
MSVCR120!_endthreadex+0x102
KERNEL32!BaseThreadInitThunk+0x24
ntdll!__RtlUserThreadStart+0x2f
ntdll!_RtlUserThreadStart+0x1b

xul!nsTHashtable<nsBaseHashtableET<nsCStringHashKey,nsFactoryEntry *> >::s_HashKey
xul!nsComponentManagerImpl::GetServiceByContractID+0x70
xul!nsCOMPtr_base::assign_from_gs_contractid_with_error+0x25
xul!mozilla::net::nsHttpConnectionMgr::EnsureSocketThreadTarget+0x47
xul!mozilla::net::nsHttpConnectionMgr::GetSocketThreadTarget+0xb
xul!mozilla::net::nsHttpTransaction::WriteSegments+0x308cef
xul!mozilla::net::SpdyStream31::WriteSegments+0x46
xul!mozilla::net::SpdySession31::WriteSegments+0x7e0
xul!mozilla::net::nsHttpConnection::OnSocketReadable+0xcc
xul!mozilla::net::nsHttpConnection::OnInputStreamReady+0x27
xul!nsSocketInputStream::OnSocketReady+0x82
xul!nsSocketTransport::OnSocketReady+0x67
xul!nsSocketTransportService::DoPollIteration+0x102
xul!nsSocketTransportService::Run+0x16f
xul!nsThread::ProcessNextEvent+0x309
xul!NS_ProcessNextEvent+0x1a
xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0xab
xul!MessageLoop::RunHandler+0x20
xul!MessageLoop::Run+0x19
xul!nsThread::ThreadFunc+0x8c
nss3!_PR_NativeRunThread+0x8c
MSVCR120!_getptd_noexit+0x6a
MSVCR120!_beginthreadex+0xb4
MSVCR120!_endthreadex+0x102
KERNEL32!BaseThreadInitThunk+0x24
ntdll!__RtlUserThreadStart+0x2f
ntdll!_RtlUserThreadStart+0x1b
Jason, could somebody take a peak at this, please?
Flags: needinfo?(jduell.mcbugs)
Some additional info:

MacBook Pro running Windows 8.1 (Boot Camp)
Nightly built from revision https://hg.mozilla.org/mozilla-central/rev/3c26bef95d54
Fresh profile
I've been reproducing it on non-e10s but I've seen it with e10s as well
Summary: Facebook timeline stuck on network request, high CPU usage → Facebook timeline stuck on network request, high CPU usage on socket thread
I'm hoping nick can take this on..
Flags: needinfo?(hurley)
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #0)
> I can reproduce this consistently on my laptop.
> 
> STR:
> 1) Load Facebook
> 2) Go to somebody's profile.
> 3) Scroll down their timeline. In my case this person updates daily and
> frequently posts HTML5 video.
> 4) Keep scrolling until eventually the infinite scroll seems to be stuck on
> a network request.

Oh, and I point out those details about this Facebook timeline because I had to scroll the timeline back to early March for this bug to trigger.

Also it appears that once this bug is triggered, I can't load anything else.
(In reply to Patrick McManus [:mcmanus] from comment #3)
> I'm hoping nick can take this on..

I won't have access to a windows machine until after Whistler, and my facebook account is non-existent, but modulo those two things, I should be able to look at this :)
Flags: needinfo?(hurley)
daniel maybe?
Flags: needinfo?(daniel)
What Patrick said :)

Do we have any idea how widespread this is?  Trying to prioritize.
Flags: needinfo?(jduell.mcbugs)
No idea. I found this as part of perf team's quality improvement project for content scrolling. Since it doesn't actually hang the entire browser it won't show up in crashstats.
I assume that "scroll the timeline back to early March" means you had to keep scrolling for a while until it triggered? Can you quantify roughly in number of seconds perhaps? Did you scroll down with the scrollbar or using keys? I want to increase my chances of repeating it!

Did you try this on any other operating system or Windows version?
Flags: needinfo?(daniel)
Hey Im able to reproduce this EVERY single day :(

Its on beta and Nightly.

And the issue occurs anytime you press the END key or scroll to the bottom to view "More Stories"  
I accidentally posted here, thinking that was the right bug - https://bugzilla.mozilla.org/show_bug.cgi?id=1073463#c5

FWIW:
This link is what does the loading on older stories on facebook  
"https://www.facebook.com/ajax/pagelet/generic.php/LitestandMoreStoriesPagelet?ajaxpipe=1&ajaxpipe_token=*"
Somehow there is no verification if the content is loaded or not. Or Else its just not pinging the server/stuck in limbo :(


Also not the only one stuck with this issue:  
http://forums.mozillazine.org/viewtopic.php?f=38&t=2953361 
I have no antivirus or stuff that could interfere with connectivity  

Hoping somebody can fix this soon. Its so bad that i've got to use another browser just for facebook.
Sorry forgot to add the important details.  

Issue present on
FF - 42b3 
FF - 44

On Windows 7 and Fedora 22 as well.  

Noticed this first on 41b9. 
And for a change switched to Opera where it was loading flawlessly without any trouble, even after been on idle for a long time. So no internet connectivity issues.
flag this for daniel so he doesn't loose it in the bug deluge
Flags: needinfo?(daniel)
Attached file HTTP logging —
NSPR_LOG_MODULES=timestamp,nsHttp:5,nsSocketTransport:5,nsStreamPump:5,nsHostResolver:5
The problem goes away if network.http.spdy.enabled.v3-1 is set to false.
(In reply to bull500 from comment #11)
> Noticed this first on 41b9. 

This is not first on 41b9. I can repro on 41b8 as well.

The pushlog from beta8 and beta9 didn't show any change in netwerk.
OS: Unspecified → All
Hardware: Unspecified → All
(In reply to Ekanan Ketunuti from comment #16)
> (In reply to bull500 from comment #11)
> > Noticed this first on 41b9. 
> 
> This is not first on 41b9. I can repro on 41b8 as well.
> 
> The pushlog from beta8 and beta9 didn't show any change in netwerk.

Ah yeah it's not from ff41b9

I noticed the problem with that version. I tested out a few versions till 38(IIRC) and strangely it was happening there as well. 

Thank you for the temporary workaround!
Hmmmm, regression window in comment 17 was bogus. I can reproduce in 40.0.3.

Sorry for the noise here. :(
Keywords: regression
I have a toy Windows Vista machine (running 32bit Firefox) here that I just tried Firefox 41 and Nightly 44 on. I also ran the same test on my Debian Linux machine (linux kernel 4.1.6) with Nightly 44 and IceWeasel 38. This also brings a mix of e10s and non-e10s I believe.

Both 38 and 41 did show SPDY used using the SPDY indicator add-on. The 44 has a add-on bug that prevents the indication to show but I have no reason to doubt SPDY was used (bug 1041139).

I'm on a 100/100 mbit connection here with an average ping time of 39 ms to www.facebook.com. Both my test machines used regular wired Ethernet. IPv4 only.

On all these 4 browser versions I logged on to my Facebook account, I pressed "page down" and then kept it pressed for _at least_ 20 seconds causing it to scroll at a fairly high rate. I then released the key and tried to see any side effects. I watched browser behavior and I checked process CPU usage on the machines.

In none of these experiments could I detect anything strange occurring.

I'm not saying there isn't a problem here, but there seems to be some additional trigger required to really make this show. Or perhaps I've misunderstood and that's also why I've detailed my tests here so that we can work it out.
Flags: needinfo?(daniel)
(In reply to Daniel Stenberg [:bagder] from comment #20)
> I have a toy Windows Vista machine (running 32bit Firefox) here that I just
> tried Firefox 41 and Nightly 44 on. I also ran the same test on my Debian
> Linux machine (linux kernel 4.1.6) with Nightly 44 and IceWeasel 38. This
> also brings a mix of e10s and non-e10s I believe.
> 
> Both 38 and 41 did show SPDY used using the SPDY indicator add-on. The 44
> has a add-on bug that prevents the indication to show but I have no reason
> to doubt SPDY was used (bug 1041139).
> 
> I'm on a 100/100 mbit connection here with an average ping time of 39 ms to
> www.facebook.com. Both my test machines used regular wired Ethernet. IPv4
> only.
> 
> On all these 4 browser versions I logged on to my Facebook account, I
> pressed "page down" and then kept it pressed for _at least_ 20 seconds
> causing it to scroll at a fairly high rate. I then released the key and
> tried to see any side effects. I watched browser behavior and I checked
> process CPU usage on the machines.
> 
> In none of these experiments could I detect anything strange occurring.
> 
> I'm not saying there isn't a problem here, but there seems to be some
> additional trigger required to really make this show. Or perhaps I've
> misunderstood and that's also why I've detailed my tests here so that we can
> work it out.

I think its because you're having a super-fast connection that content loading is happening without troubles.  

Here from my ISP; ping to facebook results in:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 305ms, Maximum = 324ms, Average = 310ms

This is on a windows 64bit FF42b4 x64 bit.
With the option :Ekanan mentioned as a workaround in #c14 facebook "more stories" loading works.  

IIRC FF added v3.1 SPDY in FF27

I can reproduce this on both my OS's and multiple versions of Firefox.  
'End' key or mouse scroll to bottom (either instantly after  loading facebook or letting it idle for some time after the facebook tab has been opened) results in this  

Also just came across this regarding facebook - http://www.engadget.com/2015/10/06/facebook-news-feed-speed-depends-on-connection/
Blocks: 1213469
Sorry to buzz everyone here, but it seems like this needs some serious attention. I dont think anybody knows how many people this has been affecting and for how long.  

This popped up on reddit - https://www.reddit.com/r/firefox/comments/3o95rk/facebook_dont_load_good_in_firefox/
[Tracking Requested - why for this release]:
This can cause necko to get stuck on a network request and consume 100% CPU in the process. Often seen when scrolling the Facebook News Feed.
Tracking as it is an important issue on one of the major website.
However, I haven't been able to reproduce it on a Debian with Fx 42b5.
We need STR here to make proress.

Aaron, do you have any addons running when this happens?  Are you able to repro fairly consistently?  If so could you try w/o any addons and see if it still happens?
Flags: needinfo?(aklotz)
I'll have to get back to you on this; I used to be able to reproduce this consistently on a particular friend's profile feed, but they have since unfriended me and so the content that was being served to my browser during the repro is no longer available.
Flags: needinfo?(aklotz)
(In reply to Jason Duell [:jduell] (needinfo? me) from comment #25)
> We need STR here to make proress.
> 
> Aaron, do you have any addons running when this happens?  Are you able to
> repro fairly consistently?  If so could you try w/o any addons and see if it
> still happens?

I could possibly help out on this bug.  
I can reproduce the problem consistently.  

I'm not sure what STR stands for but if there's any guide out there to help the debugging process on Fedora, i can contribute.
(In reply to bull500 from comment #27)
> (In reply to Jason Duell [:jduell] (needinfo? me) from comment #25)
> > We need STR here to make proress.
> > 
> > Aaron, do you have any addons running when this happens?  Are you able to
> > repro fairly consistently?  If so could you try w/o any addons and see if it
> > still happens?
> 
> I could possibly help out on this bug.  
> I can reproduce the problem consistently.  
> 
> I'm not sure what STR stands for but if there's any guide out there to help
> the debugging process on Fedora, i can contribute.

STR stands for steps to reproduce. What is needed here is a webpage or test case where we can reliably reproduce the bug.
(In reply to Valentin Gosu [:valentin] from comment #28)
> (In reply to bull500 from comment #27)
> > (In reply to Jason Duell [:jduell] (needinfo? me) from comment #25)
> > > We need STR here to make proress.
> > > 
> > > Aaron, do you have any addons running when this happens?  Are you able to
> > > repro fairly consistently?  If so could you try w/o any addons and see if it
> > > still happens?
> > 
> > I could possibly help out on this bug.  
> > I can reproduce the problem consistently.  
> > 
> > I'm not sure what STR stands for but if there's any guide out there to help
> > the debugging process on Fedora, i can contribute.
> 
> STR stands for steps to reproduce. What is needed here is a webpage or test
> case where we can reliably reproduce the bug.

Alright what i can provide from my experience is that:

1. It does happen in a new profile 
2. Happens without any add-ons installed as well. 
3. It happens for me consistently on the 'News Feed' when i try to scroll down either using the mouse scroll wheel or by pressing the "End" key. Hitting the "END" key multiple times at the bottom of the 'News Feed' when the content box(https://bug1073463.bmoattachments.org/attachment.cgi?id=8665628) appears in a loading state sometimes causes the stall.
4. Probably another possible way to reproduce this error is to leave the news feed in the middle of some story and let it idle-away for some time. After doing that scrolling or "END" key should reproduce the issue. 
Opera somehow keeps loading content(more older stories) even if the 'News Feed'(or Facebook Tab) hasn't been used or idled for a long time.
5. Also sometimes doesn't load the chat area properly leaving it blank(much rare than the News feed problem for me though)


I think its really dependent on the user's connection speed to reproduce this error.
PING facebook.com (31.13.77.6) 56(84) bytes of data.
64 bytes from edge-star-shv-01-sjc2.facebook.com (31.13.77.6): icmp_seq=1 ttl=80 time=293 ms
64 bytes from edge-star-shv-01-sjc2.facebook.com (31.13.77.6): icmp_seq=2 ttl=80 time=293 ms
64 bytes from edge-star-shv-01-sjc2.facebook.com (31.13.77.6): icmp_seq=3 ttl=80 time=290 ms
64 bytes from edge-star-shv-01-sjc2.facebook.com (31.13.77.6): icmp_seq=4 ttl=80 time=293 ms
64 bytes from edge-star-shv-01-sjc2.facebook.com (31.13.77.6): icmp_seq=5 ttl=80 time=297 ms
64 bytes from edge-star-shv-01-sjc2.facebook.com (31.13.77.6): icmp_seq=6 ttl=80 time=289 ms
64 bytes from edge-star-shv-01-sjc2.facebook.com (31.13.77.6): icmp_seq=7 ttl=80 time=293 ms
64 bytes from edge-star-shv-01-sjc2.facebook.com (31.13.77.6): icmp_seq=8 ttl=80 time=291 ms
^C
--- facebook.com ping statistics ---
8 packets transmitted, 8 received, 0% packet loss, time 7007ms
rtt min/avg/max/mdev = 289.868/292.922/297.583/2.320 ms


The ping data is a bit faster today; mostly on my connection speed it averages to 320-330 ms

The workaround works well though - https://bugzilla.mozilla.org/show_bug.cgi?id=1174899#c14
Some days ago, when I scroll down and the news feed appears blank I went to Configuration -> Developer (I use Nightly 44) and clicked in the Network Tab and I noticed that files with .js extension had a ping +70000ms! (Sorry if my english isn't the properly). The next time it happens, I'll take a screen caption (I tried to export that info but I didn't find where to). Tell me if you need more info about this
Attached image Fkb20151019.jpg —
I see this (or something similar) pretty much every day if I leave Facebook as an open tab.  Eventually my entire browser won't load pages, but starts working if I close the Facebook tab.  I'm on 41.0.2, Mac, El Capitan
We are going to ship 42 with this issue. It would be great to have a potential fix for 43.
We don't seem to have consistent steps to reproduce this, or anyone working on it. Vladan, do you think I should go ahead and wontfix this for 43? Can you help find someone to take a closer look?   We may be able to ask QE for help here too.
Flags: needinfo?(vladan.bugzilla)
If there some tool or software to replicate a slow connection on a desktop probably someone can experience the issue like it affects other 

For reference my connection download speeds can vary from 10KB/s to 65KB/s
If any of the 3 reporters here can still reproduce, can you try this change and let me know if it changes anything?

goto about:config and set network.http.spdy.pull-allowance to 72000

thanks!
Flags: needinfo?(bullionareboy)
Flags: needinfo?(ananuti)
Flags: needinfo?(aklotz)
Unfortunately I am no longer able to reproduce.
Flags: needinfo?(aklotz)
(In reply to Patrick McManus [:mcmanus] from comment #37)
> If any of the 3 reporters here can still reproduce, can you try this change
> and let me know if it changes anything?
> 
> goto about:config and set network.http.spdy.pull-allowance to 72000
> 
> thanks!

the issue still occur for me, changing network.http.spdy.pull-allowance to 72000 doesn't help.
Flags: needinfo?(ananuti)
(In reply to Patrick McManus [:mcmanus] from comment #37)
> If any of the 3 reporters here can still reproduce, can you try this change
> and let me know if it changes anything?
> 
> goto about:config and set network.http.spdy.pull-allowance to 72000
> 
> thanks!

Sad to say it doesn't change anything.  
The same problem happens.
Flags: needinfo?(bullionareboy)
I think I've found a likely problem, thanks to spending some severe quality time with the log provided in comment 13 by ekanan. Thanks Ekanan.

Note, I still can't reproduce (even when using 400ms of latency), so we're going to have to make a test build to confirm this is the issue causing your problem.

at time 4:01:38.773 in the log (line 220607) we see a 10 byte data frame with a fin flag set for stream 0x21. Stream 0x21 was previously canceled and reset by the client. This is a pretty normal dynamic - lots of resources get canceled as you scroll down the page and we don't need them anymore. When you have a high latency its normal for some data packets to continue to arrive after sending the cancel because of the propogation delay involved. We just ignore them and that's what happens in the log.

the next packet arrives at time 4:01:39.438 (line 225082) and it is a window update for stream 0x23. Stream 0x23 has been opened by firefox, has not been canceled, and this is the first packet we've seen addressed to it. This should be fine. but line 225095 says "SetRecvdFin id=0x23" - which is totally bizarre as window_update can't even carry a fin flag.

A little bit later we receive syn_reply and some data frames for 0x23 and we complain "Data arrived for already server closed stream" in the log and proceed to ignore the data.. we also generate a rst for 0x23. fairly soon after we get a data frame with a fin bit on it set for 0x23 (perhaps in response to our 0x23), but because we have declared that stream closed by the server already we ignore the new server close flag. During all of this, we don't ever notify the channel of what's going on - so it just hangs there.

And that's probably why the page hangs. You can see the "already server closed stream" message for 0x17 and 0x1d as well in the attached log. I haven't been able to reproduce.

This happens pretty obviously in SpdySession31 line 545.. when transitioning state back to reading a new frame, it only clears the fin flag when it is already set and when there is a valid pointer to the stream it was set in. (That's because this is the code where it lets the channel know that the fin was found). the flag is not cleared because it arrived on stream 0x21 which doesn't have a valid pointer any more (it was canceled) - so it doesn't need to be processed. So the flag is left over for 0x23 to get hit with incorrectly. Hilarity ensues.

next steps: prepare a patch and a build for confirmation, and determine whether or not this is a problem in h2 as well.
so this won't impact h2 because of how flag processing happens on new data frames in h2.. the fin flag is basically checked for every packet, not just the few it is defined for as in spdy. If it is not allowed to be set in that type, it is explicitly cleared. (2545 of Http2Session.cpp)

however, it has the same problem that it doesn't clear it in ::resetdownstreamstate() - it just doesn't matter. The patch should fix that for completeness.
actually, I have been able to reproduce the pattern in comment 41 a couple of times - it just has been against pretty unimportant streams (e.g. pictures) that wouldn't result in hanging the page. With my fix I cannot find any more "already server closed stream" logs. So I feel pretty good about that.
can you try a build from https://archive.mozilla.org/pub/firefox/try-builds/mcmanus@ducksong.com-39eaac63f118bbcd6fa421e9c3210b5463db6d30/ (will take about an hour+ from now to be complete) and retest? Thanks! (that's the comment 44 try run)

Thanks for your help!
Flags: needinfo?(bullionareboy)
Flags: needinfo?(ananuti)
(In reply to Patrick McManus [:mcmanus] from comment #45)
> can you try a build from
> https://archive.mozilla.org/pub/firefox/try-builds/mcmanus@ducksong.com-
> 39eaac63f118bbcd6fa421e9c3210b5463db6d30/ (will take about an hour+ from now
> to be complete) and retest? Thanks! (that's the comment 44 try run)
> 
> Thanks for your help!


I played around with it and haven't encountered the hang.

That fixes the hang I was seeing.
Flags: needinfo?(ananuti)
Attachment #8690619 - Flags: review?(daniel)
Assignee: nobody → mcmanus
Status: NEW → ASSIGNED
Comment on attachment 8690619 [details] [diff] [review]
fixup log format strings for spdy/h2

Review of attachment 8690619 [details] [diff] [review]:
-----------------------------------------------------------------

this is a ride along patch - it doesn't fix an operational problem but cleans up some logging problems that became evident when I was reading the attached log closely. It won't need to be backported.
(In reply to Patrick McManus [:mcmanus] from comment #45)
> can you try a build from
> https://archive.mozilla.org/pub/firefox/try-builds/mcmanus@ducksong.com-
> 39eaac63f118bbcd6fa421e9c3210b5463db6d30/ (will take about an hour+ from now
> to be complete) and retest? Thanks! (that's the comment 44 try run)
> 
> Thanks for your help!

Thank you SO much for working on this and also everyone who contributed! :D

The fix seems to be working.  
Its loading all those old posts on the news feed pretty well.  

Only issue i noticed was the picture posts(which Facebook is full of) the images tend to load slower - but nothing of a deal breaker.
Also not sure if its because of connection speed or not.  

Another thing i noticed in the terminal was a bunch of these:
"I/SampleTable( 3529): There are reordered frames present."  

Another time:
"I/SampleTable( 4051): There are reordered frames present."
Flags: needinfo?(bullionareboy)
Comment on attachment 8690620 [details] [diff] [review]
discarded spdy data with fin attributed to wrong stream

Review of attachment 8690620 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/Http2Session.cpp
@@ +593,5 @@
>      LOG3(("  SetRecvdFin id=0x%x\n", mInputFrameDataStream->StreamID()));
>      mInputFrameDataStream->SetRecvdFin(true);
>      MaybeDecrementConcurrent(mInputFrameDataStream);
>    }
> +  mInputFrameFinal = false;

Possibly set the 'mInputFrameFinal' field only once? Now it gets assigned within the conditional block just before this line and here once again? No biggie though. (The spdy version has the same thing.)
Attachment #8690620 - Flags: review?(daniel) → review+
Comment on attachment 8690619 [details] [diff] [review]
fixup log format strings for spdy/h2

Review of attachment 8690619 [details] [diff] [review]:
-----------------------------------------------------------------

Nice fix!
Attachment #8690619 - Flags: review?(daniel) → review+
Thank you everybody!
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #35)
> We don't seem to have consistent steps to reproduce this, or anyone working
> on it. Vladan, do you think I should go ahead and wontfix this for 43? Can
> you help find someone to take a closer look?   We may be able to ask QE for
> help here too.

Looks like this has been fixed :) I'll let Patrick decide on uplifting this fix
Flags: needinfo?(vladan.bugzilla)
https://hg.mozilla.org/mozilla-central/rev/29bb91687133
https://hg.mozilla.org/mozilla-central/rev/4ec0c590ef66
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Patrick what do you think about uplifting this? It's a little bit late for beta (it would be beta 7) but we could potentially take this for aurora 44.
Flags: needinfo?(mcmanus)
Comment on attachment 8690620 [details] [diff] [review]
discarded spdy data with fin attributed to wrong stream

Approval Request Comment
[Feature/regressing bug #]: original spdy bug (ff13?)
[User impact if declined]: uncommon hangs on spdy websites - notably facebook
[Describe test coverage new/current, TreeHerder]: just regression tests - this is a race condition never fully reproduced
[Risks and why]: very low risk - well understood logic bug with 2 line fix. confirmed fixed by reporter
[String/UUID change made/needed]: none

this is a close call for beta, but I would accept it based on it being a very simple patch and the bug being on a high profile use case (though it doesn't seem to impact a huge number of people.)
Flags: needinfo?(mcmanus)
Attachment #8690620 - Flags: approval-mozilla-beta?
Attachment #8690620 - Flags: approval-mozilla-aurora?
Comment on attachment 8690620 [details] [diff] [review]
discarded spdy data with fin attributed to wrong stream

Fix for FB regression, not a lot of testing though. 
Let's uplift it, but please be ready to back this out of beta if it causes any issues.
Attachment #8690620 - Flags: approval-mozilla-beta?
Attachment #8690620 - Flags: approval-mozilla-beta+
Attachment #8690620 - Flags: approval-mozilla-aurora?
Attachment #8690620 - Flags: approval-mozilla-aurora+
Verified fixed on Nightly 45.0a1 20151126030226.
Status: RESOLVED → VERIFIED
Verified fixed on:
  * Aurora 44.0a2 20151127004007
  * Beta 43b7
will this make it to release?  
A bunch of people popped up with facebook related issues - https://www.reddit.com/r/firefox/comments/3uaqdm/facebook_not_loading_properly_on_firefox/
it will be in firefox 43 (this is not a regression in 42 - this is a long time bug)
Isue still present in FF 44.0.2

I realy think of not using firefox anymore. I can t use firefox anymore on facebook. It keeps crashing. Firefox will stop opening any new sites. The cpu will stil  be loaded after I close firefox, the only way is to kill it. Sometimes it takes hours to crash, sometimes it crashes imedietly.

GET https://facebook.com/ [HTTP/2.0 301 Moved Permanently 0ms]
GET https://www.facebook.com/ [HTTP/1.1 200 OK 2266ms]
GET https://static.xx.fbcdn.net/rsrc.php/v2/ym/r/qmjjBAJAWyH.css [HTTP/2.0 200 OK 0ms]
GET https://static.xx.fbcdn.net/rsrc.php/v2/y6/r/fktoXMiJbjU.css [HTTP/2.0 200 OK 0ms]
.........................
GET https://scontent.fomr1-1.fna.fbcdn.net/hprofile-xaf1/v/t1.0-1/c89.171.483.483/s32x32/420706_593076087384418_440627936_n.jpg [HTTP/2.0 200 OK 0ms]
GET https://www.facebook.com/ajax/pagelet/generic.php/LitestandMoreStoriesPagelet 
GET XHR https://www.facebook.com/ajax/pagelet/generic.php/WebEgoPane 
GET https://6-edge-chat.facebook.com/ping 
GET XHR https://6-edge-chat.facebook.com/p 
GET https://6-edge-chat.facebook.com/ping 
.......................

Pinging facebook.com [66.220.156.68] with 32 bytes of data:
Reply from 66.220.156.68: bytes=32 time=138ms TTL=79
Reply from 66.220.156.68: bytes=32 time=134ms TTL=79
Reply from 66.220.156.68: bytes=32 time=134ms TTL=79
Reply from 66.220.156.68: bytes=32 time=134ms TTL=79

Ping statistics for 66.220.156.68:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 134ms, Maximum = 138ms, Average = 135ms

my system i5, win10, 32 bit, 8 Gb of installed RAM, Nvdia video driver. SSD. System cache disabled. 
this problem persist with the old profile, with new (synced)profile, with firefox cache enabled or disabled. with addon enabled, or even in safe mode. When I use a new profile, with no addons and no sync, it takes some time to crash, but eventualy it will. No auto scroll, no hardware acceleration, DRM content disabled. Tried on FF 44.0.2. I have also installed last beta 45.0b2 it didn't crash for some hours, after the first crash than every few minutes. I have noticed that in the GMT night hours it crashes more often than in day hours. It dosent matter if you have other open tabs or not. Video preview disabled in FB setings.

This issue with no loading any sites more is caused also if u press a show comments in a big fan page
...........
GET https://scontent.fomr1-1.fna.fbcdn.net/hprofile-frc3/v/t1.0-1/c5.0.32.32/p32x32/10397828_719967624705134_8049024498570832761_n.jpg [HTTP/1.1 200 OK 0ms]
GET https://scontent.fomr1-1.fna.fbcdn.net/hprofile-xfl1/v/t1.0-1/p32x32/12733969_987417507995979_1352597373467864057_n.jpg [HTTP/1.1 200 OK 0ms]
GET https://5-edge-chat.facebook.com/ping 
GET https://static.xx.fbcdn.net/rsrc.php/v2iT3C3/yw/l/en_US/Ix5NaGVgTxU.js [HTTP/2.0 200 OK 0ms]
GET https://static.xx.fbcdn.net/rsrc.php/v2/yc/r/trX7-QCvzUI.js [HTTP/2.0 200 OK 0ms]
GET https://static.xx.fbcdn.net/rsrc.php/v2/yK/r/Z5t0jQnIhbz.js [HTTP/2.0 200 OK 0ms]
GET https://static.xx.fbcdn.net/rsrc.php/v2/yT/r/cw1s286w_qV.js [HTTP/2.0 200 OK 0ms]
GET XHR https://5-edge-chat.facebook.com/p 
POST XHR https://www.facebook.com/ajax/bz 
POST XHR https://www.facebook.com/ajax/ufi/comment_fetch.php 
POST XHR https://www.facebook.com/ajax/bz 
GET https://5-edge-chat.facebook.com/ping 
GET XHR https://www.facebook.com/ajax/hovercard/page.php 
GET XHR https://5-edge-chat.facebook.com/p 
POST XHR https://www.facebook.com/ajax/bz 
POST XHR https://www.facebook.com/ajax/bz 
POST XHR https://www.facebook.com/ajax/bz 
POST XHR https://www.facebook.com/ajax/like/tooltip.php 
POST XHR 
https://www.facebook.com/ajax/ufi/comment_fetch.php 
POST XHR https://www.facebook.com/ajax/bz 
GET https://6-edge-chat.facebook.com/ping 
..........

if you try to open any page than you will have
GET anysite
and nothing more
I think you issue is bug 1247205.

This one only fixes spdy not http2.

Bug 1247205 is currently only on nightly. We will really appreciate if you can try Nightly.
Thanks!
Flags: needinfo?(caius.b)
Flags: needinfo?(caius.b)
my problem is exactly as described in the first post here. Infinit loop on facebook, now internet conection to firefox, high cpu usage after i close firefox, it will stop enetualy with a crash report, or by killing the process.

I have tried Nightly 47.0a1. The problem is gone, I have managed to brows on FB and other sites for some hours.

I have set "false" on network.http.spdy.enabled.http2 in Firefox 44.0.2 and the issue is gone. I have managed to use the browser for some hours, sometimes if I have much to load in FB, it stays some second to load, but it dosent crash anynmore.

GET https://scontent.fomr1-1.fna.fbcdn.net/hprofile-xap1/v/t1.0-1/p32x32/12644812_952055768212952_3996615422048396562_n.jpg [HTTP/1.1 200 OK 0ms]
GET XHR https://www.facebook.com/ajax/pagelet/generic.php/PagePostsSectionPagelet [HTTP/1.1 200 OK 1088ms]
GET https://scontent.fomr1-1.fna.fbcdn.net/hphotos-xpa1/v/t1.0-0/s480x480/12744496_1741292139434209_2271909222965273822_n.jpg [HTTP/1.1 200 OK 0ms]

or 

GET https://scontent.fomr1-1.fna.fbcdn.net/hprofile-xlt1/v/t1.0-1/c48.48.597.597/s32x32/486909_2410135588293_30522432_n.jpg [HTTP/1.1 200 OK 8ms]
GET https://video.fomr1-1.fna.fbcdn.net/hvideo-xtp1/v/t42.1790-2/12758995_1673029769602468_1587326738_n.mp4 [HTTP/1.1 206 Partial Content 457ms]
GET https://scontent.fomr1-1.fna.fbcdn.net/hvthumb-xpt1/v/t15.0-10/p235x350/12419805_1722193738016839_1991292432_n.jpg [HTTP/2.0 200 OK 0ms]

or

GET https://scontent.fomr1-1.fna.fbcdn.net/hphotos-xpa1/t39.1997-6/p160x160/12056965_787824108010008_1642362681_n.png [HTTP/1.1 200 OK 40ms]
GET https://video.fomr1-1.fna.fbcdn.net/hvideo-xtf1/v/t42.1790-2/12692829_964147603676225_1548757168_n.mp4 [HTTP/1.1 206 Partial Content 377ms]
GET https://scontent.fomr1-1.fna.fbcdn.net/hprofile-xfl1/v/t1.0-1/p32x32/1934329_150437848664744_4445011267210325326_n.jpg [HTTP/2.0 200 OK 0ms]

GET https://scontent.fomr1-1.fna.fbcdn.net/hphotos-xaf1/t39.1997-6/p160x160/851586_126361977548599_392107290_n.png [HTTP/1.1 200 OK 0ms]
GET XHR https://5-edge-chat.facebook.com/pull [HTTP/1.1 200 OK 50066ms]
POST XHR https://5-edge-chat.facebook.com/sub [HTTP/1.1 200 OK 145ms]
POST XHR https://www.facebook.com/ajax/bz [HTTP/1.1 200 OK 152ms]
GET XHR https://5-edge-chat.facebook.com/pull [HTTP/1.1 200 OK 50066ms]
GET XHR https://5-edge-chat.facebook.com/pull [HTTP/1.1 200 OK 98ms]
GET XHR https://5-edge-chat.facebook.com/pull [HTTP/1.1 200 OK 50067ms]
GET XHR https://5-edge-chat.facebook.com/active_ping [HTTP/1.1 200 OK 144ms]
POST XHR https://www.facebook.com/ajax/bz [HTTP/1.1 200 OK 78ms]
GET XHR https://5-edge-chat.facebook.com/pull [HTTP/1.1 200 OK 50067ms]
GET XHR https://5-edge-chat.facebook.com/pull [HTTP/1.1 200 OK 50075ms]
GET XHR https://5-edge-chat.facebook.com/pull [HTTP/1.1 200 OK 50068ms]
POST XHR https://www.facebook.com/ajax/bz [HTTP/1.1 200 OK 72ms]
GET XHR https://5-edge-chat.facebook.com/pull [HTTP/1.1 200 OK 7838ms]
GET XHR https://5-edge-chat.facebook.com/active_ping [HTTP/1.1 200 OK 145ms]
GET XHR https://5-edge-chat.facebook.com/pull [HTTP/1.1 200 OK 50080ms]
GET XHR https://5-edge-chat.facebook.com/pull [HTTP/1.1 200 OK 50076ms]
POST XHR https://www.facebook.com/ajax/like/tooltip.php [HTTP/1.1 200 OK 83ms]
POST XHR https://www.facebook.com/ajax/ufi/comment_fetch.php [HTTP/1.1 200 OK 319ms]
POST XHR https://www.facebook.com/ajax/bz [HTTP/1.1 200 OK 73ms]
GET https://static.xx.fbcdn.net/rsrc.php/v2/ye/r/AFzGggUrTzI.js [HTTP/1.1 200 OK 0ms]
GET https://external.fomr1-1.fna.fbcdn.net/safe_image.php [HTTP/1.1 200 OK 32ms]
GET https://scontent.fomr1-1.fna.fbcdn.net/hphotos-xtp1/t39.1997-6/s118x90/12057017_776940072432833_639121771_n.png [HTTP/1.1 200 OK 33ms]
GET https://scontent.fomr1-1.fna.fbcdn.net/hphotos-xap1/t39.1997-6/s118x90/851558_396469957163777_1480614187_n.png [HTTP/1.1 200 OK 0ms]

and now from nightly with "true" on network.http.spdy.enabled.http2

GET https://scontent.fomr1-1.fna.fbcdn.net/hprofile-xtp1/v/t1.0-1/p32x32/12573681_1663377740588746_8528455517016823373_n.jpg [HTTP/2.0 200 OK 5ms]
GET XHR https://5-edge-chat.facebook.com/pull [HTTP/2.0 200 OK 50065ms]
POST XHR https://www.facebook.com/ajax/like/tooltip.php [HTTP/2.0 200 OK 157ms]
POST XHR https://www.facebook.com/ajax/ufi/comment_fetch.php [HTTP/2.0 200 OK 407ms]
POST XHR https://www.facebook.com/ajax/bz [HTTP/2.0 200 OK 154ms]
GET https://static.xx.fbcdn.net/rsrc.php/v2/ye/r/AFzGggUrTzI.js [HTTP/1.1 200 OK 0ms]
GET https://scontent.fomr1-1.fna.fbcdn.net/hphotos-xfp1/t39.1997-6/s118x90/11409235_1598405817089387_496683729_n.png [HTTP/2.0 200 OK 9ms]
GET https://external.fomr1-1.fna.fbcdn.net/safe_image.php [HTTP/2.0 200 OK 302ms]
GET https://scontent.fomr1-1.fna.fbcdn.net/hprofile-xpf1/v/t1.0-1/c5.12.32.32/p43x43/12376455_1060566230631504_847744236326442096_n.jpg [HTTP/2.0 200 OK 8ms]
GET https://scontent.fomr1-1.fna.fbcdn.net/hprofile-xaf1/v/t1.0-1/c4.0.24.24/p24x24/11880678_1683152765248147_3871919907347191588_n.jpg [HTTP/2.0 200 OK 6ms]



I think that FF crashes on http/2.0 if the request is > some hundred ms
(In reply to cooldeep from comment #69)
> my problem is exactly as described in the first post here. Infinit loop on
> facebook, now internet conection to firefox, high cpu usage after i close
> firefox, it will stop enetualy with a crash report, or by killing the
> process.
> 
> I have tried Nightly 47.0a1. The problem is gone, I have managed to brows on
> FB and other sites for some hours.
> 
> I have set "false" on network.http.spdy.enabled.http2 in Firefox 44.0.2 and
> the issue is gone. I have managed to use the browser for some hours,
> sometimes if I have much to load in FB, it stays some second to load, but it
> dosent crash anynmore.
> 

Thank you!
i had an issue on Linux debian from version 38 , in sites with ads, google ads, taking lot of time to load and eating lke 80%-90% CPU doing nothing just waiting until the ad load after like 20 secs, i tried with old versions (31 and back just working fine), also tried lot of others configs to solve this, until i could understand that was a problem with http or some protocol, and that the thing is solved just setting  network.http.spdy.enabled.v3-1 to false  and the google ad sense ads just take like 50% for 2 sec, the ad is showed and firefox stop loading nothing almost immediately.
so setting it as false by default will improve the time loading and performace drastically on users that dont block ads at least
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: