Closed Bug 927481 (WSFrameInspection) Opened 11 years ago Closed 5 years ago

Inspect WebSocket frames

Categories

(Core :: Networking: WebSockets, defect, P3)

x86
macOS
defect

Tracking

()

RESOLVED DUPLICATE of bug 1542170

People

(Reporter: canuckistani, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [User Story][necko-backlog])

User Story

As a developer, I would like to be able to inspect individual frames and frame attributes as they are coming in or leaving the client before they are assembled into messages.

Acceptance criteria: 
- identify/display HTTP open handshake (technically not a frame, this is handled via http)
- Identify/display control frames: Close, Ping, Pong
- Identify/display data frames and provide data frame meta data
- Group together data frames that represent a payload assembled over multiple frames
UI challenge: overlaying a JSON object display over a 
- visualize frames for developers to make them easy to decode. We have to find an intuitive way to represent a frame as shown below:

0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
|     Extended payload length continued, if payload len == 127  |
+ - - - - - - - - - - - - - - - +-------------------------------+
|                               |Masking-key, if MASK set to 1  |
+-------------------------------+-------------------------------+
| Masking-key (continued)       |          Payload Data         |
+-------------------------------- - - - - - - - - - - - - - - - +
:                     Payload Data continued ...                :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
|                     Payload Data continued ...                |
+---------------------------------------------------------------+

Details available at https://tools.ietf.org/html/rfc6455
Devtools would like to build tools that allow web developers to inspect websocket connections, but the current implementation does not support this. I talked with smaug briefly about this at moz summit and he seemed to think it would be relatively easy to do.

Note that to be useful, we would need events for each frame sent or received. A key use case is:

"As the developer of Socket.IO, I want to be able to create an extension that provides a protocol parser for the Socket.IO protocol". 

In order to support protocol parsers as I understand it we need each frame, not just each message?
well, we don't support that level of frame detail for HTTP - which doesn't mean we can't do it for websockets. In HTTP we provide the already parsed headers and delimited body.. this more correlates to the messages in websockets. I'd be a little concerned about multiple sets of parsers running around - if you want to see the wire directly then go to wireshark.. but if you want to see what Firefox thinks is on the wire then devtools is totally the way to go - right?

that being said, I think the socket.io request might really just want to see the messages.. in websocket speak there is a sub-protocol which is really just shorthand for the meaning of the messages - it doesn't involve the lower level framing information.

In any event - I added this bug to a couple of "necko work backlog" lists.
That's a good point, and to be honest the frame requirement may be spurious. Chrome does frames ( or claims to ) but they're missing any extensibility in terms of handling custom protocols. Maybe the requirement should be - let's do messages and see if we really need frames.
Here's a link of what the WS frame inspection capability looks like on Chrome: http://blog.kaazing.com/2012/05/09/inspecting-websocket-traffic-with-chrome-developer-tools/. What would be more interesting is to actually be able to write custom parsers that could be integrated into our dev tools that could look at the frames and decode the protocol running on top of the frames. So if I was interested in socket.IO, then I could write a Socket.IO parser, and someone writing a socket.IO client would have a much, much easier time to debug their clients.

The reason you would need the extra parser is that in a protocol running on top of webSocket, a message payload could span multiple frames.

Doing a separate Socket.IO inspector would be difficult since the socket.IO transport can change. This US should focus on just the ability to inspect WS frames, and maybe providing an API to have a custom inspector would be a separate bug/US.
Blocks: perf-kanban
Whiteboard: [US]
Summary: As a devtools developer I should be able to observe in-content WebSocket frames → observe in-content WebSocket frames
Obviously I added the wrong US to the perf management stack. Will add 885508 instead.
No longer blocks: perf-kanban
Whiteboard: [US] → [User Story]
Correct me if I'm wrong but it looks like this is a dupe of 885508.  That one is a newer bug but also digs more into the actual fix.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
Whoops--I meant to dupe to bug 977858, which is a way to get messages (but not frames).

I'll leave this bug open for getting frames.  But I'm not clear from the comments here if we need that for an initial implementation?
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
User Story: (updated)
We need to be clear about a distinction that needs to be made and why frame inspection is important.

https://tools.ietf.org/html/rfc6455 provides the information about the websocket protocol itself, and the application layer websocket interface, implemented in the browser as a set of JS APIs, described by http://www.w3.org/TR/2011/CR-websockets-20111208/#the-websocket-interface. You cannot provide a network picture by looking at the application interface, the inspector should be providing data on the network layer, this is critical and should be part of the minimum viable product.

There are six types of frames defined in the baseline WS transport layer, see https://tools.ietf.org/html/rfc6455. 

As per the RFC:
  "A frame has an associated type.  Each frame belonging to the same
   message contains the same type of data.  Broadly speaking, there are
   types for textual data (which is interpreted as UTF-8 [RFC3629]
   text), binary data (whose interpretation is left up to the
   application), and control frames (which are not intended to carry
   data for the application but instead for protocol-level signaling,
   such as to signal that the connection should be closed).  This
   version of the protocol defines six frame types and leaves ten
   reserved for future use."

The browser WS interface implementation (http://www.w3.org/TR/2011/CR-websockets-20111208/#the-websocket-interface) should return a message via the onmessage event [for example ws.onmessage = function (event) {], and not an individual frame. The challenge with just providing message data and not frame data is that the network could still be active and be receiving individual frames without the onmessage event being fired at the interface level. Therefore, showing just WS messages seems inappropriate in a network inspector that handles websocket. 

A developer should be able to inspect individual WS frames, and should also be able to see the corresponding messages. 

An individual WebSocket frame as defined in RFC-6455 base framing include a maximum size limit of 18,446,744,073,709,551,615 bytes (the maximum value of a 64-bit unsigned value). However, a WebSocket message can be made up of 1 or more frames and therefore has no limit imposed on it from the protocol level.

Each WebSocket implementation will handle message and frame limits differently, the design constraints include setting maximum messages sizes for memory management. An alternative is to provide a streaming option for large messages. So if a server incorrectly handles this, and our tools don't provide a way of seeing issues with incorrect frames, then a developer will never know where the problem is coming from.

Andrea, can you chime in on how Firefox handles frames and if and how they are exposed? I would think they would need to be available if one implements a protocol handler that sits on top of websocket. 

We also need to determine some performance tradeoffs between how much we expose and the impact this has on the application and browser performance.
Flags: needinfo?(amarchesini)
User Story: (updated)
One more note: frames are NOT needed to write protocol parsers.
(Nit, we're not trying to implement http://www.w3.org/TR/2011/CR-websockets-20111208/#the-websocket-interface, which is _ancient_. NEVER ever look at http://www.w3.org/TR/* specs if there is anything newer available. 
https://html.spec.whatwg.org/multipage/comms.html#the-websocket-interface is the one we're trying to follow.)
(In reply to Axel Kratel from comment #8)
> One more note: frames are NOT needed to write protocol parsers.

What I get from this is, we don't need frames to provide developers with an initial, productive experience. Devs using websockets only think in terms of messages, and protocols are written on top of messages. Frames are an implementation detail that allows implementers flexibility in how they deliver messages.

Aside: the chrome tools have use the 'frames' label but it's BS, they show messages.
(In reply to Olli Pettay [:smaug] from comment #9)
> (Nit, we're not trying to implement
> http://www.w3.org/TR/2011/CR-websockets-20111208/#the-websocket-interface,
> which is _ancient_. NEVER ever look at http://www.w3.org/TR/* specs if there
> is anything newer available. 
> https://html.spec.whatwg.org/multipage/comms.html#the-websocket-interface is
> the one we're trying to follow.)

The WS interface isn't relevant for this implementation, only the RFC which describes websocket.
(In reply to Jeff Griffiths (:canuckistani) from comment #10)
> (In reply to Axel Kratel from comment #8)
> > One more note: frames are NOT needed to write protocol parsers.
> 
> What I get from this is, we don't need frames to provide developers with an
> initial, productive experience. Devs using websockets only think in terms of
> messages, and protocols are written on top of messages. Frames are an
> implementation detail that allows implementers flexibility in how they
> deliver messages.
> 
> Aside: the chrome tools have use the 'frames' label but it's BS, they show
> messages.

We don't need to instrument the WS API, that's already available to developers, you can set breakpoints on the WS API events. Adding a UI that does the same is not interesting. A network panel needs to instrument the network, not the WS API, and then we need to provide a way to correlate the results back with WS API events. The presentation of the frames is what this bug/user story is for. I will log a separate user story that will provide the requirements for linking the WS APIs back to frames. For example, there are no network frames for an error, but the WS API implementation can throw an error. If it does, the developer needs to be able to correlate the error back to the actual frame that caused the error. Similarly, if a payload is split over separate frames, then a developer should be able to navigate from the ws.onmessage event to the frames that contained the payload.
Basically we have 2 objects: WebSocket (the exposed API) and the WebSocketChannel (necko channel).
The communication between these 2 components is managed by the nsIWebSocketListener interface. This is used to hide all the complexity of the frames and the WebSocket protocol into necko and to expose a more easy-to-manage API with methods such as onMessageAvailable() and/or onBinaryMessageAvailable().

If we want to expose the frames, we must decide to 'who'. If this is for devtools, maybe we can implement a nsIWebSocketFrameListener interface to the WebSocket objects or extend nsIWebSocketListener.

Definitely I would not have such interface and frame-notification constantly enabled but only when devtools require it. In particular for e10s where we have WebSocketChannel + WebSocketChannelParent in the parent process sending messages to WebSocketChannelChild in the child process.
Flags: needinfo?(amarchesini)
Let me be 100% clear on this: a minimum viable product MUST instrument frames and not the WS API. The WS API is not the network, it's the application layer that's implemented on top of the network. It is totally redundant to instrument the API, that's basically the same thing as putting a pretty UI on the API. Anything that just instruments the API can be done in the debugger itself, it's a total waste of time to implement. A developer for example will never get any useful data that will tell him why he got an error from the API if he cannot look at the actual data going over the wire.
(In reply to Andrea Marchesini (:baku) from comment #13)
> Basically we have 2 objects: WebSocket (the exposed API) and the
> WebSocketChannel (necko channel).
> The communication between these 2 components is managed by the
> nsIWebSocketListener interface. This is used to hide all the complexity of
> the frames and the WebSocket protocol into necko and to expose a more
> easy-to-manage API with methods such as onMessageAvailable() and/or
> onBinaryMessageAvailable().
> 
> If we want to expose the frames, we must decide to 'who'. If this is for
> devtools, maybe we can implement a nsIWebSocketFrameListener interface to
> the WebSocket objects or extend nsIWebSocketListener.
> 
> Definitely I would not have such interface and frame-notification constantly
> enabled but only when devtools require it. In particular for e10s where we
> have WebSocketChannel + WebSocketChannelParent in the parent process sending
> messages to WebSocketChannelChild in the child process.

Yes, we do not want to expose the API to browser apps, only to the developer tools, i.e at Chrome level. We need to see the frames, and we need to be able to correlate the frames to WS API calls as well. 

We basically need to be able to present all the frames in the network inspector, and then overlay the API events on top. For control frames the correlation is obvious, but we need to provide the correlation for error events and to be able to show which frames belong to a single message send or receive.
Summary: observe in-content WebSocket frames → US: Inspect WebSocket frames
User Story: (updated)
Axel and I had a long discussion about this today, and he's correct:

(In reply to Jeff Griffiths (:canuckistani) from comment #10)
> (In reply to Axel Kratel from comment #8)
> > One more note: frames are NOT needed to write protocol parsers.
> 
> What I get from this is, we don't need frames to provide developers with an
> initial, productive experience. Devs using websockets only think in terms of
> messages, and protocols are written on top of messages. Frames are an
> implementation detail that allows implementers flexibility in how they
> deliver messages.

What the frames do get us that are interesting to developers are:
* control frames
* if we correlate frames to the messages they produce, interesting network-level metadata about websocket traffic
* more context to websocket errors. If the server sends garbage, the websocket client api just aborts with a generic error and hide the garbage frames from the developer.

> Aside: the chrome tools have use the 'frames' label but it's BS, they show
> messages.

Actually it's still unclear to me what they're doing, the tool is really light on detail.
To emphasize some of Jeff's points:

I want to point out that for 95% of the users, a frame is going to be the same as a message, because the maximum frame size is  2^63 octets, that's bigger than most messages ever will be. Someone messed with this and came up with a max of 80MB, see http://stackoverflow.com/questions/13010354/chunking-websocket-transmission. 

This implies that for the most general case, when a developer looks at a frames-capable network inspector, they will see their messages along with the metadata from the transport. A well designed frames viewer isn't going to obfuscate the messaging itself. And the added benefit is that a developer gets to see what data is actually going over the wire, not what the application API is interpreting. That way, if malformed data is passed through, the developer can see it. 

In the network inspector, we can go the extra length and highlight the frames in such a way that the data frames are clearly labeled and easy to distinguish from the control frames. We can also enable a way to hide ANY of the metadata, so all the control frames get hidden and the data frames don't have any visible metadata. it should not be hard to visualize all this. Since we plan on providing decoders that parse protocols so that they can be visualized, it makes sense to make the first visualization about the basic message dialogue.

The control frames is the other awesome bonus of this approach: there are several scenarios that involve control frames. The most useful and obvious one is visibility into why a WS connection is closed. The closing control is critical for people working on apps that should be able to properly handle intermittent connections. A developer needs to be able to distinguish between a connection close due to a close control frame as opposed to a network interruption. 

There are often scenarios where some error handling isn't done right, and the visibility into the control frames is critical. Ping and Pong frames do not require a WS interface access point according to the spec. Not sure if our implementation provides access to those, I don't believe tit does, but it would be nice for a developer who is working on a server implementation to be able to see if the client responds with a pong when it receives a ping. Again, this is important to debug why a server might decide to disconnect. 

Last but not least, the error codes returned by the API aren't going to offer much insight if you can't see the network frames that might have caused the error.
Alias: WSFrameInspection
Depends on: 1203802
Whiteboard: [User Story] → [User Story][necko-backlog]
FWIW we've developed devtools extension for WS frame inspection.
https://github.com/firebug/websocket-monitor/wiki

Honza
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
Is this still not possible in Firefox?
(In reply to Martin Lundberg from comment #21)
> Is this still not possible in Firefox?
Not yet, but it's on our TODO list to refactor the existing extension [1]
and build it on top of WebExtensions API.

Honza

[1] https://github.com/firebug/websocket-monitor/wiki
Honza, let me know if you need something more than what we currently have for inspecting websocket.
And CC me to the webextension API bugs. Thanks!
If there is such a bug, please add it to the blocking list of this bug.
(In reply to Andrea Marchesini [:baku] from comment #23)
> Honza, let me know if you need something more than what we currently have
> for inspecting websocket.
> And CC me to the webextension API bugs. Thanks!
There is no such bug yet. Mainly because there are no WebExtensions API for monitoring WS in Chrome and it isn't clear how they should look like. I've been working on WebExtension experiment (prototype) last year [1], but more work is needed to make a new-api proposal for the Add-ons team.

I'll make sure you are CC-ed as soon as such bug exists, thanks!

Honza

[1] https://github.com/janodvarko/webext-websocket-monitor
Status: REOPENED → RESOLVED
Closed: 10 years ago5 years ago
Resolution: --- → DUPLICATE
Summary: US: Inspect WebSocket frames → Inspect WebSocket frames
You need to log in before you can comment on or make changes to this bug.