Closed Bug 1590299 Opened 5 years ago Closed 5 years ago

Secure websocket fails after 5 minutes

Categories

(Core :: Networking: WebSockets, defect, P2)

69 Branch
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: hstevecooper, Unassigned)

References

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file)

642.50 KB, application/octet-stream
Details

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0

Steps to reproduce:

  1. Make a secure connection to a websocket server (e.g., 'wss://my-server.com:8443').
  2. Send and receive messages every 30 seconds or so for at least 5 minutes.

Actual results:

After 5 minutes, the websocket closes with error 1006.

Expected results:

The websocket should have remained open.

The bug first appeared in Firefox 65.0 (Windows 32-bit), and affects 69.0.3 as well. Earlier releases 62.0.3, 63.0.3, and 64.0.2 (all Windows 32-bit) work fine.

Also, Chrome works fine with this same websocket server. And, a node.js client connects to and uses the same websocket server without problem. Even Microsoft Edge works.

Only secure (wss) connections are affected in Firefox. If I enable non-secure websocket (config network.websocket.allowInsecureFromHTTPS=true) and try connecting, 69.0.3 works.

The problem is evident immediately upon connecting to the server, in the DevTools -> Network panel:

  • A websocket connection that doesn't fail after 5 minutes shows up with status code 101, a lock icon (indicating wss), and all the protocol upgrade details.

  • A websocket connection that fails after 5 minutes shows up with no status code, no lock icon, no protocol upgrade info.

However, even in the failing case, the first 5 minutes work fine--message traffic is exchanged between client and server without problem.

The websocket server is built with libwebsockets 3.2.0.

Version: 65 Branch → 69 Branch
Component: Untriaged → Networking: WebSockets
Product: Firefox → Core

This sounds like a known issue.
Michal, does this ring any bells to you?

Flags: needinfo?(michal.novotny)

Similar bugs are 1425324 and 1478157.

Steve, could you please provide a log? See https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging and use MOZ_LOG="sync,timestamp,nsWebSocket:5,nsHttp:5,nsSocketTransport:5"

Flags: needinfo?(michal.novotny) → needinfo?(hstevecooper)

After some more investigating, I isolated the problem: WSS over HTTP/2.

Either Firefox's implementation has a bug, or the websocket server (using libwebsockets 3.2.0, with HTTP/2 support enabled) has a bug.

If I turn off HTTP/2 in Firefox (config: network.http.spdy.websockets=false), everything works. Or, if I rebuild the server with HTTP/2 support disabled (libwebsockets build: -DLWS_WITH_HTTP2=0), Firefox works regardless of the config setting.

In pre-65 Firefox, was HTTP/2 support disabled by default? If so, that would explain why pre-65 versions worked, even with an HTTP/2-enabled server. libwebsockets enabled HTTP/2 by default in version 3.1.0, so most servers using it may not be affected yet.

In my testing, I was using my own websocket server; however, I confirmed the libwebsockets-test-server exhibits the same problem with Firefox (i.e., the connection fails after 5 minutes), if HTTP/2 is enabled.

As I mentioned, both Chrome and Edge work with the HTTP/2-enabled server. Chrome was using HTTP/2; I don't know about Edge.

Attached file log-files.zip

Log files as requested.

Flags: needinfo?(hstevecooper)
Flags: needinfo?(dd.mozilla)
Priority: -- → P2
Whiteboard: [necko-triaged]
Status: UNCONFIRMED → NEW
Ever confirmed: true

The http2 session is closed because we receive invalid RST_STREAM frame:

00 00 04 03 00 00 00 00 00

According to spec the stream id must not be 0. If it is, then it's a protocol error.

We believe it's a server side bug and we should probably report it to libwebsockets developers. Mike, could you please find somebody who can take care of this?

Flags: needinfo?(dd.mozilla) → needinfo?(mconca)

Reported to libwebsockets as issue 1752

Flags: needinfo?(mconca)

I pushed a fix for the RST_STREAM sid problem on lws v3.2-stable and master. Thanks for passing on what what wrong with that.

But, if it closes after 5 minutes, it looks like the rst_stream will be part of it closing and is not the root cause.

I found a similar problem 9 months ago, and reported it in Bugzilla here, it was not reproduced by Mozilla and closed

https://bugzilla.mozilla.org/show_bug.cgi?id=1528850

The behaviour is pretty exactly

The problem is evident immediately upon connecting to the server, in the DevTools -> Network panel:
A websocket connection that doesn't fail after 5 minutes shows up with status code 101, a lock icon (indicating wss), and all the protocol upgrade details.

I'm happy if we can find it's something wrong I can fix in lws like the sid thing. But as I said in the original bug, and as the OP finds, this works OK on, eg, Chrome, it seems to just be triggered in firefox. I can't find any clue in the dev tools stuff.

Another clue: as I mentioned on the original bug, this problem manifests itself on and off using https://libwebsockets.org/testserver/ ... on a ws connection that transmits a small message only once at connect... on that ws connection, RX from the server is not seen at the JS inside firefox leading to a) the WSS-H2 decal not being shown, and b) the "Server Info" tab is not filled in. Both of those things happen when a relatively small, ~2-300 byte, JSON ws message sent by the server on connection established is received by the JS.

On other ws connections with continuous traffic from the server, there is no problem with it appearing. So my guess is this is to do with how firefox deals with triggering the message received JS event when it faces isolated, relatively small ws rx over h2; perhaps that in turn is caused by buffering stuff at h2 layer until it reaches some threshold like 1K.

The observation is if I open 4 x tabs (2 or 3 doesn't do the trick) on that, it seems to reliably trigger the delivery of the stalled rx on ALL the tabs. After the fourth tab opens, it is updated correctly and going back to the other tabs, they also have received the JSON ws message that had not been delivered to the JS until then. It's possible the missing data was held at the server by, eg, tx credit exhaustion, but it seems far more likely since the size of the data is just a few hundred bytes (and this is not seen in other browsers) that it was sent and held somehow inside ffox.

Here's a variation on the test server that only opens the one websocket link that shows the symptom (instead of three from the one page).

https://libwebsockets.org/testserver/single.html

With that, the only ws events are ones related to the problem.

In, eg, chrome, it displays two decal boxes on the top right, one set when it receives JSON on the ws link saying "WSS H2". And the box in the lower part of the page is filled in with a list of ips and User Agents connected to the test server. When the JSON is received, it's added to the console log.

In firefox with ws-over-h2 enabled, only one box set by the JS itself is at the top right, and the lower box is empty, both differences are because the JSON sent by the server was not logically received as a ws message. Randomly refreshing the page, or adding more tabs (how many seems to depend on how many people connected, ie, how big the JSON is, but, eg, 3 or 4) will cause all the tabs to receive JSON and work properly afterwards.

See Also: → 1528850

(In reply to Andy Green from comment #9)

I pushed a fix for the RST_STREAM sid problem on lws v3.2-stable and master. Thanks for passing on what what wrong with that.

I cannot reproduce the problem anymore after you pushed this fix. Can you still reproduce it?

I found a similar problem 9 months ago, and reported it in Bugzilla here, it was not reproduced by Mozilla and closed

https://bugzilla.mozilla.org/show_bug.cgi?id=1528850

It doesn't seem to be related to this problem.

Flags: needinfo?(andy)

There are two issues here mixed together... if you read the OP's description:

The problem is evident immediately upon connecting to the server, in the DevTools -> Network panel:
A websocket connection that doesn't fail after 5 minutes shows up with status code 101, a lock icon (indicating wss), and all the protocol upgrade details.

That's related to my bug from 9 months ago and your fix (great news btw).

Flags: needinfo?(andy)

Closing this bug as WFM because the RST_STREAM frame doesn't close the websocket stream anymore.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: