RSS

HTML5 WebSocket Liberates Comet From Hacks

23 Sep

A recent set of HTML5 discussions are changing the course of Comet. First, a recap of the last two years of Comet: With long-polling we set the bar to cross-browser push. With XHR streaming and ActiveXObject(’htmlfile’) we raised it to cross-browser streaming. With SSE we’ve been trying to raise the bar to native, cross-browser streaming. And there we’ve sat, hoping that browser vendors actually implement the latest SSE spec.

I say we’ve been selling ourselves short. All this time pushing for a native server->client streaming transport, but we still lack client->server streaming, and anything resembling a standard transport for bi-directional communication. The Holy Grail of Comet development has always been native browser support of a full-duplex, single-connection communication’s channel, otherwise known as a TCP socket. But we’ve been mired down in hacks so long that we’ve lost the vision.

No Longer. The HTML5 specification now offers WebSocket, a full-duplex communications channel that operates over a single socket. I have been listening closely, and in some cases contributing, to the process of ensuring that WebSocket will:

  • Seamlessly traverse firewalls and routers
  • Allow duly authorized cross-domain communication
  • Integrate well with cookie-based authentication
  • Integrate with existing HTTP load balancers
  • Be compatible with binary data

The API of WebSocket is very straightforward. You create a WebSocket and point it at a url:

var conn = new WebSocket("ws://www.example.com/livedemo")

Then you attach three callbacks:

conn.onopen = function(evt) { alert("Conn opened"); }
conn.onread = function(evt) { alert("Read: " + evt.data); }
conn.onclose = function(evt) { alert("Conn closed"); }

And finally, you can send upstream data with a simple function call:

conn.send("Hello World")

The browser will perform an HTTP handshake with the target web server to determine support, and then a direct stream will be exposed via the onread and send functions. The uri scheme “ws://” is used for WebSocket connections, and the “wss://” URI scheme is for secure WebSocket connections.

After the handshake, bi-directional framed communication ensues. Each frame can be either binary or text, thus allowing for swapping the encoding mid-stream. You can find more information about the protocol itself at the network section of the whatwg HTML5 draft page

While the HTML5 specification is not in a finalized stage, the first public draft was published by the W3C in January 2008, and browser vendors have already began targeting features in the specification. The idea of putting a duplex channel into the spec is not a new one; the TCPConnection API and protocol was initially drafted more than two years ago. Unfortunately there were many significant problems with TCPConnection that held back browser adoption. Ian Hickson, the editor of the HTML5 specification, tackled these problems head on and under his guidance the standard has evolved to a usable state with WebSocket. About this new feature, Mr. Hickson comments:

“I’m looking forward to seeing Web Socket implemented in browsers, as I think it’s going to enable all kinds of realtime applications like chatting, remote controls, and the like, without the ridiculous hacks authors have to use today.”

Comet

WebSocket in a browser is terrific because it drastically cuts down the complexity of the Comet server by an order of magnitude. What’s more though, it provides a straightforward, understandable API to JavaScript developers. The most important part of the specification is that developers can wrap their heads around the API in about five seconds. That’s because it looks so much like a socket.

If the future prospect of a native WebSocket isn’t enough good news, I am proud to announce that the Orbited project has implemented WebSocket for all major browsers, today. We do this by communicating over various Comet transports with the browsers, then performing the WebSocket handshake with the remote server, and proxying data in between. This means that today you can write a WebSocket server and application, start Orbited up, and be on your way. Tomorrow, you won’t need to change any of your server or client code whatsoever. Your application will fall forward to the native implementation of WebSocket for improved performance.

TCPSocket

The single most voiced criticism to this specification has been that a WebSocket isn’t quite the same as a raw TCP socket, because a WebSocket server needs to understand a specific handshake in order for browsers to connect directly, and as such a WebSocket can’t connect to existing servers. If we did allow raw TCP sockets in the browser, a malicious site could cause any visitors to open up a TCP connection to an SMTP server, for instance, turning a casual web visitor into a spambot. There are many variations on this scenario, but the general problem is that a raw socket connections in a browser will allow any sites that a user visits to access network services as if they were the user, in the same network security context as the user. We need to therefore make this an opt-in process or we’ll catch existing servers off-guard. Furthermore, very few protocols have any kind of cross-domain authorization or security mechanisms built in. If we were to allow raw TCP, then we would be opening all manner of cross-site security holes. We could fix these by limiting TCP connections to the origin domain and port, (meaning a direct sockets back to the webserver only) but that would limit any usefulness the TCP socket could provide.

I fully understand the criticism though; Earlier this week I discussed exactly why having a raw socket in the browser is so desirable. You could, for instance, quickly prototype a Gmail clone using a raw socket, an IMAP client, and an XMPP client in the browser.

We have a clear problem then: Direct access to existing network servers could greatly simplify application architecture, but due to security restrictions it’s a non-starter; we absolutely must retrofit each network server with the new WebSocket protocol first. I hope that happens, but we can’t count on it, at least not right away. What we really need is a way to allow the server to opt-in without putting it in the protocol, a way to seamlessly layer access control in front of the back-end server. It turns out that this problem has already been solved for traditional networks. That is, if we have two end-points communicating over TCP, and we need transparent access control in between, then we can use a well known device: A firewall. The beauty of a firewall is that server behind it requires no re-programming, or even re-configuration, yet gains all of the access control/security benefits. What we really need in the browser case, is a custom firewall that can listen for WebSocket connections from the browser, enforce access control, and relay TCP to a back-end server.

That is why Orbited provides this feature under the API name TCPSocket. Orbited is the firewall that sits between the back-end server and the browser. It understands WebSocket protocol for browser communication, and uses whitelist security to accept or reject requests to proxy TCP data to and from a back-end server. That’s right, you can fire up a stock XMPP server, and Orbited, and write the XMPP client entirely in JavaScript. This works cross-browser today. We also offer a binary mode that uses an intermediate encoding to allow the browser to read raw bytes (in the form of JavaScript integer arrays) from a remote server. Here is a diagram of the architecture:

The Future

Now its up to the browser developers to implement Websocket. I expect some will be very quick on the uptake, while it may take years for others. I expect to see a common pattern emerge where application servers listen to the WebSocket protocol directly from new browsers, but fallback to Orbited’s emulation layer for legacy browsers. The key here is that we don’t have to wait on the browser vendors to get started. We can all develop these WebSocket applications now, and when browsers have native support, we’ll all get a performance boost.

We will probably never get a native (raw) TCP socket in the browser, for the security reasons I outlined. It’s okay though — we can use the firewall pattern I outlined. For more information about installing and configuring Orbited, check out the documentation and the getting started section.

Advertisements
 
Leave a comment

Posted by on September 23, 2008 in WEBSOCKET

 

Tags: , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: