I think his analysis is flawed. WebSocket is a message based protocol that does not specify a maximum message size in the RFC. This does not make it a streaming protocol until an implementation decides to deliver incomplete messages to the end application. Some implementations have done this, many (including all browsers) have not and will not.
Time and time again it has been demonstrated that we are bad at choosing a maximum allowed value for all applications and all future considerations (see: ethernet frame sizes, IP address lengths, operating system address spaces, file system block sizes/counts, etc).
In some cases (many of those previously listed) there were hardware, cost, or technical concerns that led to nailing down a number in an RFC. For WebSocket there is no clear benefit to forever encoding a specific numeric maximum message size. It is a high enough level protocol that there is no technical or cost benefit to make message sizes limited by anything other than individual application needs.
As such, the WebSocket RFC leaves maximum message size implementation defined, and specifically says that an implementation SHOULD implement a reasonable maximum message size for its purpose. A chat application that knows it will only be moving small text messages can set its maximum message threshold small to improve buffer performance and catch invalid messages sooner. An application that finds a business case for sending a large file in one large message can set itself up accordingly. Generic WebSocket parsers should expose a method of setting the maximum message size the application wishes to receive.
I definitely agree that not requiring implementations to return their maximum message size along with the "Message too big" error will make some sorts of interoperability more difficult. However, it also prevents exposing implementation security details and simplifies the core spec (the author has already complained that the spec is too complicated already). It is relatively simple for an application to negotiate a maximum message size privately if necessary and the WebSocket extension mechanism allows a method for standardizing a way of doing so if this turns out to be a serious issue in the future.
I've no problem with the lack of a max message size in the RFC, what could cause problems is the fact that it needs to be passed between client and server "out of band", i.e. at the application protocol layer rather than at the websocket protocol layer. Also bear in mind that this blog entry was written based on Draft HyBi 09 and not the final RFC; the wording has changed somewhat since then.
The draft in question suggested that providing a message based interface to application code was possible and that the parser could/should deliver only complete messages to the application code. That's hard to do if you also want to allow for the 'endless streaming' scenario that others on the working group were fond of. The result was a bit of a mess.
The final RFC addresses some of this, but there's no getting around the fact that the websocket protocol itself can't tell you how big a message is until you get the final frame.
Sure you can work around all of this even for a generic parser but the initial wording in the draft in question could lead you towards the wrong design if you're not careful.
The WebSocket protocol design was hijacked by architecture astronauts who decided that it must have all of these extra features added, instead of remaining a simple, easily implementable and understandable protocol. The original WebSocket protocol was a simple stream of delimited messages, with the only complexity being in the handshake that was necessary to ensure that JavaScript apps couldn't send arbitrary data to arbitrary ports without permission.
The problem is that the original handshake wasn't good enough (there were still security vulnerabilities despite he handshake), and when Ian Hickson decided to hand over control to the IETF, the architecture astronauts took over, adding complex framing with six different frame types, subprotocols, extensions, versions, complex bit twiddling required to parse frame headers, fragmentation of messages into smaller frames (which is what this article is complaining about), control frames interleaved with fragmented messages, numeric status codes and textual close reason strings that "MUST NOT" be shown to the user, masking of data by xor'ing with a random value that changes for each frame, but only for one direction (client->server), a two-way closing handshake on top the existing TCP mechanisms for closing the connection, pings to test the connection for liveness, and so on. There are six registries defined for IANA to keep track of http://www.iana.org/assignments/websocket/websocket.xml; extensions, subprotocols, version numbers, close codes, opcodes, and framing bits.
And despite all of this over-engineering and attempt at extensibility, all extensions must know about each other, because there is no standard method for delimiting different extensions' data (or even specifying how much data an extension uses), and there are three header bits and 10 frame types that all extensions must share. And I don't really know why there's a need for subprotocols on top of the ability to just encode that information in the URL.
It's kind of sad how what could have been a relatively simple and easy to implement protocol has been taken over by architecture astronauts. Yes, a few of these features are actually required to securely deploy websockets (the handshake and masking). Most of them are people making up features that would be nice in theory, instead of implementing something simple that works. Ian Hickson's original protocol wasn't perfect; it still needed some work by the time he left. But it was simple, and easy to implement, and didn't impose restrictions that couldn't be worked around at a higher level.
Thanks for an extremely illuminating explanation. I may be wrong, but it almost sounds like long polling and other alternatives are preferable to Websockets because of the added complexity. Stories like this also make me feel fortunate that my favorite beacon of simplicity, JSON, didn't get handed over to a "task force".
It seems to me that the hype around Web Sockets has overshadowed the Server Sent Events API (http://dev.w3.org/html5/eventsource/) which for most situations where you don't need a continues stream of data is a more sensible system. It is purely a message sending system by design.
The really nice thing about SSE is that you can fall back to long polling very easily with exactly the same back end and as it runs over vanilla http without the upgrade protocol system is much easer to implement, you just don't close the connection after sending a message. Obviously its only one way but we have a well established way of sending messages in the other direction with http POST.
Of course, the limitation of SSE is that it's not bidirectional - as the name implies, only the server can send messages to the client. SSE is definitely useful, especially when you don't need the full overhead of a WebSockets server, but the "hype around WebSockets" is largely because you can implement two-way connections with it. Yesterday there was actually a HN post where someone created a full TCP proxy using WebSockets.
Agree, Server Sent Events is an incredibly simple API, has none of the problems with proxies, security and complexity that websockets has run into and is perfect for the typical case where Ajax is fine for sending events to the server, but you need a way to stream events in the other direction.
I just launched the beta of a small service for making Server Sent Events painless at http://eventsourcehq.com
It's currently only available for members of Heroku's beta program, but I plan on launching it as a stand alone service as well. In any case all the code behind it available on Github...
This is some screwed up stuff, as nearly every WebSocket library and tutorial really encourages treating them as discreet messages. This should be fixed post haste, because very few people really want a stream based protocol for web sockets.
I don't think it'll be a problem in practicality. Most implementations primary APIs will be message based and not intended for this streaming case. I'm planning on refactoring my implementation to have a low level streaming API that's used internally and is exposed if you really need it, but on top of that build the message based API that 99.5% of people will use.
Hey, this article is better than a rant, if you don't know much about what's going on with the RFC, but are using websockets anyway. :D
Everything you send up or down is a message, or a packet, and the size of that cannot exceed the size of pipe (with bottlenecks or intermediary restrictions). Call 'em Quantum Packets, or a stream, or a message. The websocket protocol, as imagined by this developer, is meant to allow a continuous lot of Quantum Packs to "flow", without the application level overhead of parsing a bunch of protocol, headers, wackness. I want to get the data into my applications AFAP, cuz I still have to transcode it, analyze it, and all else to make the baby dance.
What we need as developers are minimum-for-reliability standards. No two people in different locations will have the same pipe. As a developer, I consider it my domain to write software on top of, or using, the socket layer to determine the potential through-put of the given socket, and to test such as needed through-out the simulcast. I don't even want the socket-layer-wrapper writers (may God shower them with blessings) intervening at this level, until everybody on Earth has unrestricted 10mbps/s up and down.
If that control is hidden from me, or not an option, or is nullified by protocol, then my app or media could break in ways I could not predict or understand, and so I would have to design my app using the socket layer in a lowest-common-reliability kind of way.
These are not the opinions of a WebSocket RFC acquainted developer.
The WebSocket protocol works for both small and large messages in a single frame (message based), and also small and large frames in multiple fragments (stream based)... It's capable of being used for both. It's a good idea to restrict frame sizes on your application if you know what your limits are.
It's only capable of being used for messages if there's something to guarantee that all hops along the way are going to preserve the message boundaries in ways expected by the application layer. Can the protocol split single messages? Can the protocol merge adjacent messages without reordering?
Unless the protocol specifically guarantees certain behavior and commonly-used systems regularly exercise this guarantee, it's just not going to work reliably when it's needed.
Hearing some of the "works for me" discussion from developers suggests that we're heading for that magic situation where it works 99.9% of the time. I.e., the system looks fine in testing and then fails in mysterious ways (that require deep protocol fixes) in production.
Ideally, implementations of such a protocol would intentionally fragment the messages somewhat if they were not going to guarantee they were atomic. But there are very few developers (and code reviewing managers) enlightened enough to let that kind of thing ship.
Since this post seems to actually have been made in July of this year, does anyone who has been following WebSocket details have any comments on how this situation has changed?
My impression of WebSockets is that it's not actually a "finished" high level protocol. They could have just brought a basic socket style interface into JavaScript and left it at that. (And based on its name, that's what you'd expect at first.) But they decided to add various features, (for better or worse, I don't know yet) on top of that. (I guess part of it is the challenge of working not just on TCP, but sort of within HTTP as well). Just as you wouldn't just pick up TCP and start blowing "data" through it without some additional application specific structure, you're going to need to add your own structure inside WebSocket's framework.
The wording was improved around the suggestion to provide only a message based API.
I think the WebSockets protocol ended up being a little more than it should have been. You have to understand that it was being pulled in all sorts of directions by the working group members and that there are good reasons for all of the parts of the protocol (though some of those parts could work better with other parts IMHO). It had to be finished at some point though and I think the working group did a good job in the end.
Personally I think it would have been better had it been explicitly stream based from a user's perspective, but then I don't have the javascript/browser background to know how foolish that probably sounds.
You don't sound smart for mocking the idea that a 8-exabyte message in a communication protocol is "big enough", you sound like you're mindlessly parroting ideas you don't fully understand. Yes, 8 exabytes is enough for a single message, and always will be. TCP works on "messages" (packets) in the kilobyte range, for comparison. Communication protocol packet sizes aren't equal to the amount of data the communication protocol can send.
The argument FOR 63 bit message sizes was that you could effectively turn the message based protocol into a stream, except, unfortunately, the "stream" has a limit even if it seems plenty big enough now.
Personally I wouldn't have included the 63 bit message size.
Out of curiosity, and forgive my ignorance here, but since everyone seems to prefer using event-driven methods in JS, why was a message-based protocol passed over in favor of this stream solution?
It's elsewhere in the comments or the article: a feature of the protocol allows transmitting partial messages, where the message size is unknown. One example might be the result of a slow, unbuffered SQL query, where it's more useful for the server to pass the result to the client incrementally, rather than buffer the full message ahead of time.
Why you'd want to do that is another question entirely. Introducing roundtrips by feeding tiny chunks to TCP is generally a horrible idea, however, it does prevent the server from dedicating a potentially huge chunk of RAM to buffer the result ahead of time.
Because of this feature, and the author's desire to model this feature as part of some client library API (a mistake? you decide), he's concluded that it's in fact a stream-oriented protocol. That's like concluding it's a byte-oriented protocol because TCP can/will further fragment the partial frames due to segment size constraints, etc. (i.e. it's a silly conclusion).
Because generally people want to pass discrete messages rather than an undifferentiated stream of bytes. And APIs implement even streams as discrete chunks of data to process; you don't process a character at a time, or the whole stream at once, but are instead given a buffer of data to process. Now, if you do that, lazy programmers (and many JavaScript and web app programmers are lazy, or don't know any better) may just decide that one buffer is one message, an assumption which generally works. If the client sends a small chunk of data to the server, that will generally be received on the server and delivered to whatever is listening as that same chunk. But occasionally, that assumption can be broken; an IP packet can be fragmented, or a buffer may fill up and need to be dealt with immediately, or something of the sort. This will cause applications which make such assumptions to break.
If, instead, you just explicitly say that it's a message oriented protocol, then the software that implements it (both on the client and server side) can just provide an API that delivers a message at a time, and if anything happens to be fragmented, they deal with buffering and reassembling it, rather than depending on the application author to get that right.
[+] [-] zaphoyd|14 years ago|reply
Time and time again it has been demonstrated that we are bad at choosing a maximum allowed value for all applications and all future considerations (see: ethernet frame sizes, IP address lengths, operating system address spaces, file system block sizes/counts, etc).
In some cases (many of those previously listed) there were hardware, cost, or technical concerns that led to nailing down a number in an RFC. For WebSocket there is no clear benefit to forever encoding a specific numeric maximum message size. It is a high enough level protocol that there is no technical or cost benefit to make message sizes limited by anything other than individual application needs.
As such, the WebSocket RFC leaves maximum message size implementation defined, and specifically says that an implementation SHOULD implement a reasonable maximum message size for its purpose. A chat application that knows it will only be moving small text messages can set its maximum message threshold small to improve buffer performance and catch invalid messages sooner. An application that finds a business case for sending a large file in one large message can set itself up accordingly. Generic WebSocket parsers should expose a method of setting the maximum message size the application wishes to receive.
I definitely agree that not requiring implementations to return their maximum message size along with the "Message too big" error will make some sorts of interoperability more difficult. However, it also prevents exposing implementation security details and simplifies the core spec (the author has already complained that the spec is too complicated already). It is relatively simple for an application to negotiate a maximum message size privately if necessary and the WebSocket extension mechanism allows a method for standardizing a way of doing so if this turns out to be a serious issue in the future.
[+] [-] LenHolgate|14 years ago|reply
The draft in question suggested that providing a message based interface to application code was possible and that the parser could/should deliver only complete messages to the application code. That's hard to do if you also want to allow for the 'endless streaming' scenario that others on the working group were fond of. The result was a bit of a mess.
The final RFC addresses some of this, but there's no getting around the fact that the websocket protocol itself can't tell you how big a message is until you get the final frame.
Sure you can work around all of this even for a generic parser but the initial wording in the draft in question could lead you towards the wrong design if you're not careful.
[+] [-] lambda|14 years ago|reply
The problem is that the original handshake wasn't good enough (there were still security vulnerabilities despite he handshake), and when Ian Hickson decided to hand over control to the IETF, the architecture astronauts took over, adding complex framing with six different frame types, subprotocols, extensions, versions, complex bit twiddling required to parse frame headers, fragmentation of messages into smaller frames (which is what this article is complaining about), control frames interleaved with fragmented messages, numeric status codes and textual close reason strings that "MUST NOT" be shown to the user, masking of data by xor'ing with a random value that changes for each frame, but only for one direction (client->server), a two-way closing handshake on top the existing TCP mechanisms for closing the connection, pings to test the connection for liveness, and so on. There are six registries defined for IANA to keep track of http://www.iana.org/assignments/websocket/websocket.xml; extensions, subprotocols, version numbers, close codes, opcodes, and framing bits.
And despite all of this over-engineering and attempt at extensibility, all extensions must know about each other, because there is no standard method for delimiting different extensions' data (or even specifying how much data an extension uses), and there are three header bits and 10 frame types that all extensions must share. And I don't really know why there's a need for subprotocols on top of the ability to just encode that information in the URL.
It's kind of sad how what could have been a relatively simple and easy to implement protocol has been taken over by architecture astronauts. Yes, a few of these features are actually required to securely deploy websockets (the handshake and masking). Most of them are people making up features that would be nice in theory, instead of implementing something simple that works. Ian Hickson's original protocol wasn't perfect; it still needed some work by the time he left. But it was simple, and easy to implement, and didn't impose restrictions that couldn't be worked around at a higher level.
[+] [-] pork|14 years ago|reply
[+] [-] LenHolgate|14 years ago|reply
[+] [-] samwillis|14 years ago|reply
The really nice thing about SSE is that you can fall back to long polling very easily with exactly the same back end and as it runs over vanilla http without the upgrade protocol system is much easer to implement, you just don't close the connection after sending a message. Obviously its only one way but we have a well established way of sending messages in the other direction with http POST.
[+] [-] LeafStorm|14 years ago|reply
[+] [-] bobfunk|14 years ago|reply
I just launched the beta of a small service for making Server Sent Events painless at http://eventsourcehq.com
It's currently only available for members of Heroku's beta program, but I plan on launching it as a stand alone service as well. In any case all the code behind it available on Github...
[+] [-] scarmig|14 years ago|reply
[+] [-] andrewvc|14 years ago|reply
[+] [-] theturtle32|14 years ago|reply
[+] [-] duskwuff|14 years ago|reply
[+] [-] NHQ|14 years ago|reply
Everything you send up or down is a message, or a packet, and the size of that cannot exceed the size of pipe (with bottlenecks or intermediary restrictions). Call 'em Quantum Packets, or a stream, or a message. The websocket protocol, as imagined by this developer, is meant to allow a continuous lot of Quantum Packs to "flow", without the application level overhead of parsing a bunch of protocol, headers, wackness. I want to get the data into my applications AFAP, cuz I still have to transcode it, analyze it, and all else to make the baby dance.
What we need as developers are minimum-for-reliability standards. No two people in different locations will have the same pipe. As a developer, I consider it my domain to write software on top of, or using, the socket layer to determine the potential through-put of the given socket, and to test such as needed through-out the simulcast. I don't even want the socket-layer-wrapper writers (may God shower them with blessings) intervening at this level, until everybody on Earth has unrestricted 10mbps/s up and down.
If that control is hidden from me, or not an option, or is nullified by protocol, then my app or media could break in ways I could not predict or understand, and so I would have to design my app using the socket layer in a lowest-common-reliability kind of way.
These are not the opinions of a WebSocket RFC acquainted developer.
[+] [-] simpsond|14 years ago|reply
[+] [-] marshray|14 years ago|reply
Unless the protocol specifically guarantees certain behavior and commonly-used systems regularly exercise this guarantee, it's just not going to work reliably when it's needed.
Hearing some of the "works for me" discussion from developers suggests that we're heading for that magic situation where it works 99.9% of the time. I.e., the system looks fine in testing and then fails in mysterious ways (that require deep protocol fixes) in production.
Ideally, implementations of such a protocol would intentionally fragment the messages somewhat if they were not going to guarantee they were atomic. But there are very few developers (and code reviewing managers) enlightened enough to let that kind of thing ship.
[+] [-] reedhedges|14 years ago|reply
My impression of WebSockets is that it's not actually a "finished" high level protocol. They could have just brought a basic socket style interface into JavaScript and left it at that. (And based on its name, that's what you'd expect at first.) But they decided to add various features, (for better or worse, I don't know yet) on top of that. (I guess part of it is the challenge of working not just on TCP, but sort of within HTTP as well). Just as you wouldn't just pick up TCP and start blowing "data" through it without some additional application specific structure, you're going to need to add your own structure inside WebSocket's framework.
[+] [-] LenHolgate|14 years ago|reply
The wording was improved around the suggestion to provide only a message based API.
I think the WebSockets protocol ended up being a little more than it should have been. You have to understand that it was being pulled in all sorts of directions by the working group members and that there are good reasons for all of the parts of the protocol (though some of those parts could work better with other parts IMHO). It had to be finished at some point though and I think the working group did a good job in the end.
Personally I think it would have been better had it been explicitly stream based from a user's perspective, but then I don't have the javascript/browser background to know how foolish that probably sounds.
[+] [-] kokey|14 years ago|reply
[+] [-] jerf|14 years ago|reply
[+] [-] LenHolgate|14 years ago|reply
Personally I wouldn't have included the 63 bit message size.
[+] [-] ilaksh|14 years ago|reply
[+] [-] angersock|14 years ago|reply
[+] [-] forgotusername|14 years ago|reply
Why you'd want to do that is another question entirely. Introducing roundtrips by feeding tiny chunks to TCP is generally a horrible idea, however, it does prevent the server from dedicating a potentially huge chunk of RAM to buffer the result ahead of time.
Because of this feature, and the author's desire to model this feature as part of some client library API (a mistake? you decide), he's concluded that it's in fact a stream-oriented protocol. That's like concluding it's a byte-oriented protocol because TCP can/will further fragment the partial frames due to segment size constraints, etc. (i.e. it's a silly conclusion).
[+] [-] lambda|14 years ago|reply
If, instead, you just explicitly say that it's a message oriented protocol, then the software that implements it (both on the client and server side) can just provide an API that delivers a message at a time, and if anything happens to be fragmented, they deal with buffering and reassembling it, rather than depending on the application author to get that right.
[+] [-] pyrotechnick|14 years ago|reply