Closed Bug 1480585 Opened 7 years ago Closed 3 years ago

write to wrong stream when two POST requests are sent at the same time using HTTP2

Categories

(Core :: Networking: HTTP, defect, P2)

61 Branch
defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: emry.egyed, Unassigned)

Details

(Whiteboard: [necko-triaged])

Attachments

(4 files)

Attached image http2.png
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36 Steps to reproduce: Submit a page with two POST requests at the same time when using HTTP/2. Actual results: Firefox mixes up the streams and attempts to write to the incorrect stream. The stream is already closed causing the browser to become unresponsive. Expected results: Both POST requests should have been processed normally. The page works fine when the web server (Apache Tomcat) is changed to use HTTP/1.1.
Attached image http11.png
Nick, this sounds a bit serious but feel free to downgrade if you disagree...
Assignee: nobody → hurley
Priority: -- → P1
Whiteboard: [necko-triaged]
Reporter, I am unable to reproduce this behaviour with my own test server. Can you please attach an HTTP log from a run of Firefox where you reproduce the behaviour (https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging)? That may help narrow it down to either a server issue or a bad interaction between firefox and tomcat caused by firefox. Thanks.
Flags: needinfo?(emry.egyed)
Attaching requested log
Flags: needinfo?(emry.egyed)
Thanks for the log. I've taken a look, following the flow of each of the 2 streams, and I don't see anything that indicates an issue in the h2 stack. For the first stream (corresponding to the path /ords/UPTEST/ps_system.pinn_request.post) what I see happen is, shortly after the transaction is dispatched, it is canceled with the error NS_BINDING_ABORTED. That indicates to me that some layer in Firefox above necko has decided that it needs to cancel the channel, and so necko gladly obliges. For the second stream (corresponding to the path /ords/UPTEST/wwv_flow.accept) there appears to be a server-side issue. Things carry on well for about half a second (with flow control and all going swimmingly), and then we see this line in the log: 2018-08-06 22:57:12.578000 UTC - [13524:Socket Thread]: I/nsHttp Http2Session::RecvGoAway 000001CB0F00B800 GOAWAY Last-Good-ID 0xD9 status 0x2 live streams=1 The important bit here is status 0x2, which corresponds to INTERNAL_ERROR. So, the server has sent Firefox a GOAWAY frame (indicating that it wants to shut down the connection) because it has had some sort of error all its own (in other words, not something Firefox did wrong). For the second stream, I recommend looking at your server and seeing if there's an update available, as it may be a known issue that's fixed in a more recent release. For the first stream (the one that gets canceled from somewhere higher up in firefox), if you let me know how you created the post (form submission, xhr, fetch, ...) I may be able to find someone else who could help you track down why the channel is being canceled.
Flags: needinfo?(emry.egyed)
Priority: P1 → P2
I'm not sure exactly how to determine how the post is created, sorry! I am currently using the latest released version of Tomcat, 9.0.10 so there are no more updates. I have previously reported this as an issue to Tomcat because i thought it might be an issue on Tomcat's side. Here is the bug report i had filed, but then they had closed saying Tomcat is functioning as expected. https://bz.apache.org/bugzilla/show_bug.cgi?id=62358 I also do not believe it is a server side issue because the page functions normally when using HTTP/2 with Chrome or Edge.
Flags: needinfo?(emry.egyed)
Also, to answer your question how the post's are created, one seems to be an xhr and the other is subdocument. At least that is what the network debugger shows in the screenshots i had attached.
Thanks for linking to the tomcat bug report. For what they looked at in the log (I didn't try reading the whole tomcat log, just their engineer's interpretation of it), they're right, there is no tomcat bug. However, what they're reading out of the log is exactly the same case as the first stream above (where something in firefox decides to cancel the channel), and has no bearing on the second stream in the firefox log (the one where tomcat sends a GOAWAY/INTERNAL_ERROR, indicating a tomcat issue). So we're both correct for the data we have :) I also just realized that I can look in your initial screenshot to see the source of the request, and you're right, it's an XHR. I don't know off the top of my head who to ask about that, so let me ask around and I'll get someone appropriate cc'd on this bug. ni?myself so this doesn't drop off my radar.
Flags: needinfo?(hurley)
Actually, so I can be more sure I don't misdirect (and potentially not even have to redirect), is the server publicly accessible? Or, if not, can you make a stripped-down copy available? It would be useful to see the script that is generating the xhr, as that might point us to why the xhr is being canceled (there are a lot of possibilities there).
Flags: needinfo?(hurley) → needinfo?(emry.egyed)
Sorry, the server is currently not publicly accessible. I will work with my networking team to see if I can get one of our public facing web servers accessible for you to debug on.
Flags: needinfo?(emry.egyed)
I've set up tomcat on a different machine so that the same behavior can be experienced. You should be able to access the system using this URL: https://72.225.43.160:8080/ords/f?p=1003:1 You will need to do a bit of navigation to get to a page to reproduce the error because the application depends on the navigation path. To reproduce the error, follow these steps: 1. Log in with the username and password "firefox" without the quotes 2. On the left side of the page, click on HTTP2 Test to expand the menu, then click on Charge Codes 3. On the page that loads, click on the Edit button next to the Code with the name 1FB 4. On the page that is titled Charge Code Profile, click the Save button near the top. This is when the application becomes unresponsive because of the error we've been investigating.
Flags: needinfo?(hurley)
Thanks for the testing endpoint. I can definitely see the hang behaviour, but nothing suspicious shows up in my local logs or the dev tools. I'll have to run under a debugger to see who is calling Cancel() on the http channel.
Flags: needinfo?(hurley)
Has there been any progress in troubleshooting the issue? I'm not sure how much longer I will be able to keep the testing endpoint available. Would it be possible for you to copy the files used by the page to continue reproducing it on your end without needing the endpoint?
Flags: needinfo?(hurley)
Unfortunately, I am no longer able to reproduce the behaviour that I was able to reproduce in comment 12, so I can't track this down any farther. Was there perhaps a code update on your end that may have improved xhr error recovery?
Assignee: hurley → nobody
Flags: needinfo?(hurley)
Priority: P2 → P3
There have not been any code changes on my end. I also just tried to reproduce it and am still able to reproduce it on Firefox 62.
That would indicate to me, then, that something has changed since firefox 62 (I'm testing with nightly). Are you able to reproduce with nightly? Note - for testing with nightly, you will need to use a new profile (as switching channels on the same profile is very much not supported, and known to cause crashes recently).
I was able to reproduce it with nightly as well (64.0a1 (2018-09-20)). It worked properly a few times, but then after trying again it locked up just like it has been.

I have found a consistent way to reproduce the issue.

Using the latest Tomcat release (9.0.17) the issue can be seen on the Tomcat home page. I have attached a screenshot (chrome_and_firefox_differences.png) that demonstrates the issues when loading the web page over HTTP/2. Resources are loaded with a size of 0 and do not display properly. In the screenshot Chrome (version 73.0.3683.86) is on the left and Firefox (version 66.0.1) is on the right. There are visual differences in the Server Status, Manager App and Host Manager buttons as well as the backgrounds for the top few regions. Looking at the networking tab in the debugger show a few files are being retrieved with a file size of 0.

These issues only appear in Firefox when using HTTP/2. When Tomcat is configured to use HTTP 1.1, the issues go away and the page renders identically looking as chrome does in the screenshot.

Selena - I can provide a zip of a preconfigured Tomcat setup already configured with HTTP/2 if that would be helpful when trying to reproduce it. Please let me know if you would like me to provide this.

Flags: needinfo?(sdeckelmann)

Nhi -- Can you help prioritize this given that we have steps to reproduce?

Flags: needinfo?(sdeckelmann) → needinfo?(nhnguyen)

Also of note, this only seems to be occurring when compression is enabled.

"content-encoding: gzip" in the response header

Priority: P3 → P2
Flags: needinfo?(nhnguyen)

Can you reproduce this?

If you can can you make a http log?

Flags: needinfo?(emry.egyed)

Redirect a needinfo that is pending on an inactive user to the triage owner.
:dragana, since the bug has high priority, could you please find another way to get the information or close the bug as INCOMPLETE if it is not actionable?

For more information, please visit auto_nag documentation.

Flags: needinfo?(emry.egyed) → needinfo?(dd.mozilla)
Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(dd.mozilla)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: