Switching tabs may send an HTTP/2 PRIORITY frame with stream ID == 0

VERIFIED FIXED in Firefox -esr60

Status

()

defect
P2
normal
VERIFIED FIXED
9 months ago
8 months ago

People

(Reporter: iippolitov, Assigned: u408661)

Tracking

({regression})

61 Branch
mozilla64
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 verified, firefox62 wontfix, firefox63 verified, firefox64 verified)

Details

(Whiteboard: [necko-triaged])

Attachments

(1 attachment)

Reporter

Description

9 months ago
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0
Build ID: 20180704192850

Steps to reproduce:

1) create a long living http2 connection to a site (I post images in 4 streams in that connection)
2) open another http2 site (livejournal feed) and wait for it to fully load
3) switch back to a tab with long running connection


Actual results:

From server logs (nginx) I can see that firefox sends PRIORITY http2 frame with sid:0. Nginx closes connection and firefox silently opens another one

Similar workflow in chromium works well.
Older firefox version also worked well.


Expected results:

The PRIORITY frame always identifies a stream.  If a PRIORITY frame  is received with a stream identifier of 0x0, the recipient MUST respond with a connection error (Section 5.4.1) of type PROTOCOL_ERROR.

Firefox should not send PRIORITY frame with sid == 0.
Component: Untriaged → Networking: HTTP
Product: Firefox → Core
Assignee

Comment 1

9 months ago
Hi reporter, you say older firefox versions work properly. Can you use mozregression (https://mozilla.github.io/mozregression/) to figure out where, exactly, things went wrong for firefox? That can help us track down the change that introduced this behaviour. Thanks.
Flags: needinfo?(iippolitov)
Reporter

Comment 2

9 months ago
Utility told me it has not enough data to bissect. But the first faulty build is 2017-08-31 and the last good is 2017-08-30. So it should be easy to find faulty change.
Also, I noticed, 2017-08-31 sends sid=0 when I simply switch tabs. Looks like no need for another http2 site.

Hope this helps.
Flags: needinfo?(iippolitov)
Assignee

Comment 3

9 months ago
OK, this was caused by bug 1377208 - my best guess is there's a pushed stream somewhere in the mix, as that should be the only way we could accidentally do this. Definitely our bug, though. I just have to figure out which makes more sense - reprioritising the pushed stream, or not.
Assignee: nobody → hurley
Blocks: 1377208
Priority: -- → P2
Whiteboard: [necko-triaged]
Assignee

Comment 4

9 months ago
Reporter, can you try the build at https://index.taskcluster.net/v1/task/gecko.v2.try.revision.2c7f502b2dd5df495f8c53d3de5293840baedc30.firefox.linux64-opt/artifacts/public/build/target.tar.bz2 and see if this fixes the issue? You'll want to use a new profile, as nightly can't share profiles with release versions.

If you're not using linux (I guessed based on your bug report), let me know what platform you're using, and I can point you to the appropriate build for that OS. Thanks.
Flags: needinfo?(iippolitov)
Reporter

Comment 5

9 months ago
Couldn't reproduce the issue, so I believe it is fixed.
Thank you very much!
Flags: needinfo?(iippolitov)
Assignee

Comment 6

9 months ago
Great, thanks! I'll prepare this patch to land, and hopefully get it uplifted to beta, as well.
Summary: A long running HTTP2 connection may send PRIORITY frame with stream ID == 0 → Switching tabs may send an HTTP/2 PRIORITY frame with stream ID == 0
Assignee

Comment 7

9 months ago
Right now, pushed streams don't get put in the priority hierarchy
anyway, and with the code as it was, we were sending an illegal PRIORITY
frame for stream 0. This prevents that from happening. We should
probably think about putting pushed streams in the priority hierarchy at
some point.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment on attachment 9010983 [details]
Do not reprioritize pushed streams on tab switch.

Kershaw Chang [:kershaw] has approved the revision.
Attachment #9010983 - Flags: review+

Comment 9

9 months ago
Pushed by hurley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6341eaf530a7
Do not reprioritize pushed streams on tab switch. r=kershaw

Comment 10

9 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/6341eaf530a7
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
What's the impact of this bug? Does it need to be considered for backport?
Flags: needinfo?(hurley)
Assignee

Comment 12

9 months ago
Comment on attachment 9010983 [details]
Do not reprioritize pushed streams on tab switch.

Approval Request Comment
[Feature/Bug causing the regression]: bug 1377208
[User impact if declined]: wasted tcp connections/potentially slower operation on long-running uploads or downloads
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: does a 0-check for http/2 stream ids, and avoids sending priority frames on them (which breaks spec).
[String changes made/needed]: none
Flags: needinfo?(hurley)
Attachment #9010983 - Flags: approval-mozilla-beta?
Assignee

Comment 13

9 months ago
If this makes it through to beta, then we can try pushing the envelope to see how far up the trees (esr) we can get it :)
Comment on attachment 9010983 [details]
Do not reprioritize pushed streams on tab switch.

Low risk minimal patch fixing an old P2 regression. Fix verified by the reporter, uplift accepted for 63 beta 10, thanks.
Attachment #9010983 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Assignee

Comment 16

9 months ago
Comment on attachment 9010983 [details]
Do not reprioritize pushed streams on tab switch.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: see user impact if declined
User impact if declined: potentially slower http/2 downloads, potentially slower or even stalled/killed http/2 uploads
Fix Landed on Version: 64
Risk to taking this patch (and alternatives if risky): no/very little risk
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #9010983 - Flags: approval-mozilla-esr60?
Assignee

Comment 17

9 months ago
Comment on attachment 9010983 [details]
Do not reprioritize pushed streams on tab switch.

Should've requested this same time as I requested esr60. See above for approval request info.
Attachment #9010983 - Flags: approval-mozilla-esr52?
Comment on attachment 9010983 [details]
Do not reprioritize pushed streams on tab switch.

ESR52 is EOL. I honestly have no idea why that flag is still active in Bugzilla.
Attachment #9010983 - Flags: approval-mozilla-esr52?
Comment on attachment 9010983 [details]
Do not reprioritize pushed streams on tab switch.

Fix for intermittent HTTP2 connection issues. Approved for ESR 60.3.
Attachment #9010983 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
Flags: qe-verify+
Hi iippolitov, 

Can you please explain the steps to reproduce from comment 0 in a more easily understandable process for a low technical knowledge tester? I need to be able to reproduce it and then verify the fix myself and I have the impression you'll need to put it is really simple terms for me. 

Thank you!
Flags: needinfo?(iippolitov)
Reporter

Comment 22

8 months ago
Daniel,

First of all, steps in comment 0 are a bit too complex. The issue can be reproduced a bit easier: just switch tabs waiting for a reply in an HTTP/2.0 connection.

You'll need any web site opened in a tab and an HTTP/2 request which is processed for a long time. For example, with a form being sent to a server which replies in 20-30 seconds. 

You send the form and while the browser is waiting for a reply you switch tabs back and forth. If the bug is in effect the long waiting connection will be closed earlier than it should (and you'll see error logs in the web server logs).

I think, I can compose a simple self contained nginx configuration for you to test this. Or a less simple shell script. Feel free to ask additional questions if my explanation looks vague to you.
Flags: needinfo?(iippolitov)
The explanation is not that vague, but I'll need you to correct me if I'm wrong. 

These are my steps taken, on Ubuntu 16 OS:
1. Downloaded a build from a date prior to the push of the fix in mozilla-central:
http://archive.mozilla.org/pub/firefox/nightly/2018/09/2018-09-20-22-01-02-mozilla-central/firefox-64.0a1.en-US.linux-x86_64.tar.bz2
2. Opened e random http page in a tab and a http2 page in another
Tab1: https://ro.yahoo.com/?p=us
Tab2: https://http2.akamai.com/
3. I opened the "about:networking#logging" page in a 3rd tab and clicked on "Start Logging" button.
4. I refresh tab 2.
5. I switch between the tabs for 20-30 seconds.
6. I went to the 3rd tab and clicked on the "Stop Logging" button.
7. Retrieved the log: log.txt-main.9202 from where it was saved. Download it from here:
https://drive.google.com/file/d/13-x9V6U-_TRVWdBGQKjl_yYuf1ewlZPP/view?usp=sharing

I do not have the knowledge to read the log, unfortunately. 

@iippolitov:
A. Did I perform the steps correctly? 
B. If yes, did the issue reproduce?
C. If not, can you correct my steps or provide a suitable test case that I could use?

Thank you!
Flags: needinfo?(iippolitov)
Reporter

Comment 24

8 months ago
Daniel,

I'm not really familiar with mozilla network logs, please give me some time to look through it with issue reproduced and without it.
Strangely, I can't reproduce the issue right now, while I'm 100% sure it breaks my uploads every day.
It may take me a couple of day, but I'll get back to you.
Reporter

Comment 25

8 months ago
The message we are looking for looks like this:


>LOG3(("Http2Session::SendPriorityFrame %p Frame 0x%X depends on 0x%X weight %d\n", this, streamID, dependsOn, weight));

In logs with a problem reproduce it looks like this:

> 2018-10-10 12:31:06.978614 UTC - [Parent 2253: Socket Thread]: I/nsHttp Http2Session::SendPriorityFrame 0x7fa9e13be800 Frame 0x31 depends on 0x7 weight 21
> 2018-10-10 12:31:06.978667 UTC - [Parent 2253: Socket Thread]: I/nsHttp Http2Session::SendPriorityFrame 0x7fa9e13be800 Frame 0x0 depends on 0x7 weight 21
> 2018-10-10 12:31:06.978725 UTC - [Parent 2253: Socket Thread]: I/nsHttp Http2Session::SendPriorityFrame 0x7fa9e13be800 Frame 0x35 depends on 0x7 weight 21

Your log doesn't have any 'SendPriorityFrame' messages. Most likely, because your connection is way to fast. Here is step-by-step guide I succeeded with:

Open http2.akamai.com, 
open dev tools (ctrl-shift-i in my environment), 
in dev tools go to network tab, on the top right, set throttle mode to gprs (slowest)
open about:netowrking#logging, start logging
Reload http2.akamai.com, it will take some time. Don't wait till it's loaded!
While it is loading, click several tabs back and forth.
Grep logs with something like:
grep 'SendPriorityFrame [x[:xdigit:]]* Frame 0x0 depends' /tmp/log2.txt-main.2253

If there is no such entry - try a couple more times. The issue is reproduced like 7 out of 10 attempts with http2.akamai.com throttling. It is reproduced better if you have several long running requests, I think.

Feel free to ask for additional info.
Flags: needinfo?(iippolitov)
I have used your steps from comment 25. I think I have reproduced the issue on Windows 10 x64 on Nightly v64.0a1 from 2018-09-15. Also, it seems to me that the issue is also reproducing on Nightly v64.0a1 from 2018-10-10, the latest Nightly, so the issue may not be fixed.

Can you confirm this? These are the logs:
https://drive.google.com/drive/folders/1F1qts-DgZZQb2N0WJZQJ7Nvlf_hclpZj?usp=sharing
Flags: needinfo?(iippolitov)
Reporter

Comment 27

8 months ago
Daniel,

According to logs, a frame for a stream with zero ID was sent in the 'before fix' log:
>2018-10-11 12:11:50.664000 UTC - [Parent 11524: Socket Thread]: I/nsHttp Http2Session::SendPriorityFrame 0000018AB0308800 Frame 0x0 depends on 0x7 weight 20
But there is no such frame in 'after fix' log file. As far as I can see everything is OK.

Could you please share with me, how did you make the conclusion that you have reproduced the bug? What seems to be faulty behaviour?
Flags: needinfo?(iippolitov) → needinfo?(daniel.bodea)
I do not really understand what 'a frame for a stream with zero ID' means so I just compared the examples you gave me in comment 25 with similar ones in the logs.

In any case, this is good news! I will set the firefox64 tag as verified and I'll post other logs extracted from fixed Beta and ESR60 builds (done with the exact same repro steps at least 2 times each) so that you can confirm the fix yourself.
Flags: needinfo?(daniel.bodea)
Reporter

Comment 29

8 months ago
Daniel,

I'm not really an HTTP/2 professional, but this looks to be a simple issue. Let me explain:

HTTP/2 can send data in several streams. Each stream consists of frames. Streams are identified by an ID. Each frame has that streamID inside. By design, that ID can't be zero.
In case a browser sends a frame for a stream with a zero ID the server will drop this connection.

A client may set different priority for streams. This can be done in initial frames or later with a 'set priority' frame.
Firefox is able to send this 'priority frame'. This kind of packet (frame) is logged with following C statement:
>LOG3(("Http2Session::SendPriorityFrame %p Frame 0x%X depends on 0x%X weight %d\n", this, streamID, dependsOn, weight));
I don't really know what LOG3 does, but it looks like it logs a string for the third level of verbosity.
From other C function I guess the text in quotes is a template:
>"Http2Session::SendPriorityFrame %p Frame 0x%X depends on 0x%X weight %d\n"
Then go values to populate those '%'-something places: 'this', 'streamID', 'dependsOn' and 'weight'.
Obviously, streamID looks like a stream ID mentioned above =) if 'streamID' is zero, than the template looks like this:
>"Http2Session::SendPriorityFrame %p Frame 0x0 depends on 0x%X weight %d\n"
%p will be replaces with a hex number, %X and %d will also be replaced with hex and decimal numbers. But 'Frame 0x0' is a problem indicator.

Then you just have to find 'Frame 0x0' string in the log file.

From user perspective this error looks like a network problem: something can't be loaded (because the server shuts connection mid "flight"). Reload and everything works well.

Hope this helps.
I think it did help and it seems (to me) that the Beta v63.0b13 and Release ESR60 build (from taskcluster, that contains the changeset), are now fixed. For certainty, I would like you to have a look at the logs and confirm that the issue does not occur.

The logs are in the same folder like the ones mentioned in comment 26:
https://drive.google.com/drive/u/2/folders/1F1qts-DgZZQb2N0WJZQJ7Nvlf_hclpZj

Thank you!
Flags: needinfo?(iippolitov)
Reporter

Comment 31

8 months ago
Daniel,

Looks good to me. No traces of the issue
Flags: needinfo?(iippolitov)
Considering the comment above, I will mark this issue as verified in firefox63 and firefox-esr60.
Thank you for your contribution, iippolitov!
Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.