Open Bug 1655567 Opened 4 years ago Updated 3 years ago

Not gracefully erroring an h2 stream with more summed data from DATA frames than announced by content-length

Categories

(Core :: Networking, defect, P3)

x86_64
Windows 10
defect

Tracking

()

Tracking Status
firefox81 --- affected

People

(Reporter: patrick.colbaut, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file)

For our product running on the Wildfly application server, we have an internal diagnostics servlet that lists the status of our distributed application in the form of an XML based status tree, prettied up via XSLT, which allows "drilling down" into the detailed status of different nodes and components on these nodes, each initiating a new navigation to the servlet, albeit with the appropriately modified parameters.

For some reason, generally after a few navigations through the status tree, bad builds of firefox start to truncate the server responses, resulting in the thus expected XML parsing errors.

I finally took the time to poke around this problem in a complete fresh profile with only a single tab using network logging and mozregression and arrived at the following results:

Last good: 2017-03-07, not reproducible after 50+ navigations
First bad: 2017-03-08, issues encountered during ~5 navigations

This bug (if it actually is one) seems to have been present for quite a while (Firefox 55?)...

The truncation of the server responses happens at exactly 16384 bytes, despite receiving the correct (greater) Content-Length header value and the complete data, which I could verify via the attached networking logs (slightly censored to be on the safe side regarding potential exposure of restricted internal data). A hard refresh will usually fix this for at least one navigation attempt, but not always, and the problem will reappear after a few more navigations.

I'll file this bug under Firefox -General for now, as I'm not sure which specific component would be the most suitable; potentially Core - Networking or Networking: HTTP, although actually receiving the data doesn't seem to be the problem here and the problem might be in a layer further up?

Component: General → Untriaged

Hi, patrick.colbaut!

Thanks for your contribution! And for submitting such a complete report.

As you mentioned I've assigned product and component so the proper team can check.

Please let us know if this issue is reproduced in the latest Nightly edition. You can download it from here: https://nightly.mozilla.org/

If you still have the issue please create a new profile, you have the steps here: https://support.mozilla.org/en-US/kb/profile-manager-create-and-remove-firefox-profiles?redirectlocale=en-US&redirectslug=Managing-profiles#w_starting-the-profile-manager
support.mozilla.orgsupport.mozilla.org

Once you have all this information, please let us know so we can continue investigating.

Regards,

Component: Untriaged → Networking
Product: Firefox → Core

Thanks for that.

I just verified again that the problem still exists in the latest nightly build in a completely fresh profile and can be quite easily reproduced after a few navigation attempts, even in safe mode (not that it should matter in a completely fresh profile). The attached networking log was also produced in a fresh profile, to prevent any extensions from somehow interfering.

I've never seen an error quite like this in any browsing I've done using firefox, expect with this diagnostic servlet.
As such, this doesn't really seem like a critical problem, although it can certainly be annoying.

I absolutely would have assumed that the servlet or the application server itself was doing something wrong in its response handling, like not completely flushing its output buffers before stopping the response data stream or something like that, If the networking logs didn't explicitly show that all data was received.
Additionally, this problem with seemingly truncated response data was not reproducible in other browsers, which seems to prove that firefox is doing something different in its treatment of the server response in this seemingly special situation.

Thanks a lot for the log!

What I can see in it is following:

2020-07-24 17:45:50.201000 UTC - [Parent 22064: Socket Thread]: E/nsHttp nsHttpTransaction::ProcessData [this=000002234EBBE000 count=840]
2020-07-24 17:45:50.201000 UTC - [Parent 22064: Socket Thread]: V/nsHttp nsHttpTransaction::HandleContent [this=000002234EBBE000 count=840]
2020-07-24 17:45:50.201000 UTC - [Parent 22064: Socket Thread]: E/nsHttp nsHttpTransaction::HandleContent [this=000002234EBBE000 count=840 read=839 mContentRead=17223 mContentLength=17223]
2020-07-24 17:45:50.201000 UTC - [Parent 22064: Socket Thread]: V/nsHttp nsHttpTransaction::RemoveDispatchedAsBlocking this=000002234EBBE000 not blocking
2020-07-24 17:45:50.201000 UTC - [Parent 22064: Socket Thread]: V/nsHttp nsHttpTransaction 000002234EBBE000 request context set to null in ReleaseBlockingTransaction() - was 000002234B249160
2020-07-24 17:45:50.201000 UTC - [Parent 22064: Socket Thread]: V/nsHttp nsHttpConnection::PushBack [this=0000022343189000, length=1]
2020-07-24 17:45:50.201000 UTC - [Parent 22064: Socket Thread]: V/nsHttp nsHttpTransaction::Close [this=000002234EBBE000 reason=8000ffff]

And this happens on the 16384 boundary, we don't deliver those 839 bytes to the channel and the final content consumer.

The h2 server actually sends exactly one byte more in DATA frames than announced by the C-L header on the stream. We try to push it back, but there is already an input overflow pending, so we fail. I can't see when it was assigned, the log has been started too late via about:networking.

I think this is a rare server mistake that shows a bug in our code base. The pushback mechanism is not suitable for h2 streams. Each stream is isolated and when the server sends more bytes than announced, it's a malformed response and should be handled as such according the spec. We should gracefully fail the stream (response) with PROTOCOL_ERROR and not just unexpected error. But I don't think that would change much except better error propagation to e.g. devtools (not sure these errors are visible there, though).

Severity: -- → S3
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P3
Summary: Network response truncated at exactly 16384 bytes despite receiving the full data and receiving the correct, greater content-length header → Not gracefully erroring an h2 stream with more summed data from DATA frames than announced by content-length

Thanks for the update!

Now that you mention it, I was wondering about the superfluous new line character in the response when I tried to check if the data itself was completely and correctly received. While I wasn't sure if this might just be some quirk of the spec or an actual bug in our diagnostic servlet implementation, I was sure you'd want to check it out and either fix the potential bug or potentially change the handling to match other browsers in this case.

It obviously turns out to be a bug in the diagnostic servlet and in a perfectly consistent manner at that:

...
response.setContentLength(sb.length());
out.println(sb.toString());
...

The code responsible sets the content-length correctly but then uses the println method to write the data to the response stream, which of course adds another byte to the output. After changing the code to use the print method, I'm not able to produce the error any more, as expected.

As any such response body from the diagnostic servlet should have been a byte too long, could the previously pending input overflow simply be from a previous (seemingly successful for whatever reason) request?

(In reply to Asteaguinity from comment #4)

As any such response body from the diagnostic servlet should have been a byte too long, could the previously pending input overflow simply be from a previous (seemingly successful for whatever reason) request?

Very likely, yes.

Thanks for a quick reply.

Priority: P3 → P2
Whiteboard: [necko-triaged]
Priority: P2 → P3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: