Closed Bug 1240025 Opened 8 years ago Closed 8 years ago

Cannot load Chrome Platform Status page completely

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox43 --- affected
firefox44 --- affected
firefox45 --- affected
firefox46 --- fixed
firefox47 --- fixed

People

(Reporter: xidorn, Assigned: mcmanus)

Details

(Whiteboard: [see comment 3][spdy][necko-active])

Attachments

(5 files, 1 obsolete file)

Can you ask them to set pref network.notify.changed to false and see if problem goes away. Maybe is dup of bug 1235509
Flags: needinfo?(quanxunzhen)
Oh, it suddently occurs to me where I observed the similiar issue.

When I visit this page: https://www.chromestatus.com/metrics/feature/timeline/popularity , there is a very high possibility that the page cannot be fully loaded.

I can reproduce this issue pretty reliably with both 43 release and 45 aurora on my Mac with repeatedly refresh the page with Cmd-Shift-R.

This may or may not be the same issue as what they met. They haven't given me feedback about that pref yet. But my issue doesn't seem to be related to bug 1235509 anyway.
Your issue is not related, but it would be interesting to figure out what is happening.

Can you make http log and a screenshot like the last one, so we can find the problematic loads in the log.
What is a http log? How can I make it?
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #4)
> What is a http log? How can I make it?

Sorry, here is the link:

https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

Thanks
It is the logs and screenshot from the latest Developer Edition.
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #2)
> Created attachment 8708408 [details]
> Network panel for chromestatus
> 
> Oh, it suddently occurs to me where I observed the similiar issue.
> 
> When I visit this page:
> https://www.chromestatus.com/metrics/feature/timeline/popularity , there is
> a very high possibility that the page cannot be fully loaded.
>

that was 1236170 - an e10s bug (so not surprising you saw it on aurora) that was very recently fixed.

not sure if that's related to the comment 0 complaint.
I don't think that's bug 1236170 because
1. I can reproduce this on release as well where e10s is not enabled yet
2. I was using the latest aurora build (2016-01-15), which should already include the fix of that bug, right?

BTW, one of the people replied me that the issue seems to go away with that pref change mentioned in comment 1, so probably that issue is a dup of bug 1235509. In that case, we can make this bug about my issue :)
Flags: needinfo?(quanxunzhen)
Same issue as bug 1240025?
(In reply to Josh Matthews [:jdm] from comment #10)
> Same issue as bug 1240025?

are you in wrong bug or is this a wrong bug number?
(In reply to Josh Matthews [:jdm] from comment #12)
> I meant bug 1239194.

The reports from comment #0 are the same bug, but from comment #9 it is something else.
The other person also replied to me that it seems changing the pref fixes the issue. So it seems comment 0 is the same issue as bug 1235509. But then, I have another similiar issue described in comment 3, which doesn't seem to relate to bug 1235509.
Let me change the summary and add some notes in whiteboard.
Summary: HTTPS connection sometimes aborted on some Google sites → Cannot load Chrome Platform Status page completely
Whiteboard: [see comment 3]
Attachment #8708383 - Attachment is obsolete: true
needinfo myself not to lose this bug
Flags: needinfo?(dd.mozilla)
short update:

I can actually reproduce this bug, but i am almost sure that it is not a bug in firefox (looking at in the morning again). 

So chrome connects using ipv6 (at least in my case) and firefox (using release 43) with ipv4, do not know why at the moment. And what i can observe is that the connection from firefox gets reset by the server and between chrome and the server not. I cannot force chrome not to use ipv6 :) to try it there. 

Your log shows the same that the connection is reset:


2016-01-16 00:00:25.981024 UTC - 149790720[1004d14a0]: nsSocketInputStream::Read [this=1849d3188 count=13288]
2016-01-16 00:00:25.981028 UTC - 149790720[1004d14a0]:   calling PR_Read [count=13288]
2016-01-16 00:00:25.981037 UTC - 149790720[1004d14a0]:   PR_Read returned [n=-1]
2016-01-16 00:00:25.981041 UTC - 149790720[1004d14a0]: JIMB: ReleaseFD_Locked: mFDref = 2
2016-01-16 00:00:25.981053 UTC - 149790720[1004d14a0]: ErrorAccordingToNSPR [in=-5961 out=804b0014]
2016-01-16 00:00:25.981057 UTC - 149790720[1004d14a0]: nsSocketTransport::OnMsgInputClosed [this=1849d3000 reason=804b0014]
2016-01-16 00:00:25.981060 UTC - 149790720[1004d14a0]: Http2Session 125847800 data frame read failure 804b0014
2016-01-16 00:00:25.981064 UTC - 149790720[1004d14a0]: nsHttpConnection::CloseTransaction[this=1802c8de0 trans=125847800 reason=804b0014]
2016-01-16 00:00:25.981068 UTC - 149790720[1004d14a0]: Http2Session::Close 125847800 804B0014
2016-01-16 00:00:25.981072 UTC - 149790720[1004d14a0]: Http2Session::CloseStream 125847800 180469aa0 0x6 804B004C
2016-01-16 00:00:25.981075 UTC - 149790720[1004d14a0]: MaybeDecrementConcurrent 125847800 id=0x6 concurrent=0 active=0
2016-01-16 00:00:25.981079 UTC - 149790720[1004d14a0]: Http2Session::CloseStream 125847800 1805e8940 0x0 80004004
2016-01-16 00:00:25.981092 UTC - 149790720[1004d14a0]: MaybeDecrementConcurrent 125847800 id=0x0 concurrent=0 active=0
2016-01-16 00:00:25.981096 UTC - 149790720[1004d14a0]: nsHttpTransaction::Close [this=125d8a000 reason=80004004]
2016-01-16 00:00:25.981104 UTC - 149790720[1004d14a0]: nsHttpTransaction 125d8a000 scheduling context set to null in ReleaseBlockingTransaction() - was 11c22e160
2016-01-16 00:00:25.981114 UTC - 149790720[1004d14a0]: Http2Session::CloseStream 125847800 180469d80 0xa 804B004C
2016-01-16 00:00:25.981117 UTC - 149790720[1004d14a0]: MaybeDecrementConcurrent 125847800 id=0xA concurrent=0 active=0
2016-01-16 00:00:25.981121 UTC - 149790720[1004d14a0]: Http2Session::CloseStream 125847800 1805e85e0 0x0 80004004
2016-01-16 00:00:25.981124 UTC - 149790720[1004d14a0]: MaybeDecrementConcurrent 125847800 id=0x0 concurrent=0 active=0
2016-01-16 00:00:25.981127 UTC - 149790720[1004d14a0]: nsHttpTransaction::Close [this=125d86000 reason=80004004]
2016-01-16 00:00:25.981134 UTC - 149790720[1004d14a0]: nsHttpTransaction 125d86000 scheduling context set to null in ReleaseBlockingTransaction() - was 11c22e160
2016-01-16 00:00:25.981140 UTC - 149790720[1004d14a0]: Http2Session::CloseStream 125847800 180469c10 0x8 804B004C
2016-01-16 00:00:25.981143 UTC - 149790720[1004d14a0]: MaybeDecrementConcurrent 125847800 id=0x8 concurrent=0 active=0
2016-01-16 00:00:25.981147 UTC - 149790720[1004d14a0]: Http2Session::CloseStream 125847800 180469930 0x4 804B004C
2016-01-16 00:00:25.981150 UTC - 149790720[1004d14a0]: MaybeDecrementConcurrent 125847800 id=0x4 concurrent=0 active=0
2016-01-16 00:00:25.981153 UTC - 149790720[1004d14a0]: Stream had active partial read frame on close
2016-01-16 00:00:25.981156 UTC - 149790720[1004d14a0]: Http2Session::ChangeDownstreamState() 125847800 from 4 to 6
2016-01-16 00:00:25.981192 UTC - 149790720[1004d14a0]: Http2Session::GenerateGoAway 125847800 code=2
2016-01-16 00:00:25.981196 UTC - 149790720[1004d14a0]: Http2Session::LogIO 125847800 stream=0 id=0x0 [Generate GoAway]
2016-01-16 00:00:25.981201 UTC - 149790720[1004d14a0]: 00000000: 00 00 08 07 00 00 00 00 00 00 00 00 0A 00 00 00
2016-01-16 00:00:25.981204 UTC - 149790720[1004d14a0]: 00000010: 02
I do not have IPv6 address, so it seems that it isn't related to IPv6.
oh interesting.. we might actually have another bug like this on file if I understand what you are saying dragana.

1] packet arrives from server to tcp stack and sits in a recv buffer. firefox does not yet consume
2] rst arrives from server recv buffer is flushed
4] firefox does recv() and gets the error

am I right that the recv buffer gets flushed there? I'm not 100% certain.

any ideas on why we are getting reset? anything obvious like having received a goaway from the peer? (which is admittedly racy.)

can you try turning off push in the preferences? I think that page famously uses push.. perhaps something around how we handle that is making the server unhappy.

a full http log and pcap of your issue would be interesting. Can probably find someone on google.com gfe team to discuss with.
(In reply to Patrick McManus [:mcmanus] from comment #19)
> oh interesting.. we might actually have another bug like this on file if I
> understand what you are saying dragana.
> 
> 1] packet arrives from server to tcp stack and sits in a recv buffer.
> firefox does not yet consume
> 2] rst arrives from server recv buffer is flushed
> 4] firefox does recv() and gets the error
> 
> am I right that the recv buffer gets flushed there? I'm not 100% certain.
> 

As I remember you are right, not 100% sure.

I do not know if above list is correct, I have not look at this, need to compare wireshark logs, but I am planning to.

> any ideas on why we are getting reset? anything obvious like having received
> a goaway from the peer? (which is admittedly racy.)
> 

No idea yet.

If I turn off push it is working fine - so the server does not like our push :)
I have seen couple of times, around 10% that firefox sends tls encryption alert message. (I need to take a look at a that too)

> can you try turning off push in the preferences? I think that page famously
> uses push.. perhaps something around how we handle that is making the server
> unhappy.
> 
> a full http log and pcap of your issue would be interesting. Can probably
> find someone on google.com gfe team to discuss with.

I can reproduce this so I will make a log and pcap. From a log I can see that some of push streams finish just fine.

I will investigate this further.
Flags: needinfo?(dd.mozilla)
1] so when we're done with a particular pushed stream on this page we send a stream RST with a CANCEL.. which is wrong - our log says its because it hasn't received a FIN (end-stream), but it has.

2] however the GFE seems to respond to that by closing the connection. Ouch! not even a goaway. That's really wrong too - as the CANCEL behavior could simply be racy (we might not have received the fin - though in the log we have, but there's no way for the server to know that.)
Assignee: nobody → mcmanus
the tcp rsts are just a downstream trainwreck of what usually happens when data is still flowing on a connection that is closed.. all the inflight transactions get errors.
Attached file httplog.txt.gz
From 06cdc9f044b6daa3db77baa3f90f9418573c1ad8 Mon Sep 17 00:00:00 2001
---
 netwerk/protocol/http/Http2Session.cpp | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
Attachment #8718036 - Flags: review?(hurley)
Attachment #8718036 - Flags: review?(hurley) → review+
Whiteboard: [see comment 3] → [see comment 3][spdy][necko-active]
https://hg.mozilla.org/mozilla-central/rev/4f7533b147a9
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Comment on attachment 8718036 [details] [diff] [review]
[PATCH]  - incorrect close state on pushed stream

Approval Request Comment
[Feature/regressing bug #]: http2 support original bug
[User impact if declined]: instability with google server doing push (page does not load)
[Describe test coverage new/current, TreeHerder]: may require buggy server to have user impact
[Risks and why]: low risk - very small change 
[String/UUID change made/needed]: none
Attachment #8718036 - Flags: approval-mozilla-aurora?
Comment on attachment 8718036 [details] [diff] [review]
[PATCH]  - incorrect close state on pushed stream

Improve reliability of push connections, ok on m-c. Let's uplift this to aurora.
Attachment #8718036 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.