RSS

Asynchronous IO is hard!

23 Sep

The tomcat 6 developers have proposed an asynchronous IO extension as their solution for Comet and Ajax push.  I have long argued that asynchronous handling is needed for Servlets (for comet and other use-cases), but that it is very important to make the distinction between asynchronous IO and asynchronous handling of requests. Asynchronous programming is hard and asynchronous IO even harder.  I maintain that asynchronous IO should be implemented by the container and that only asynchronous events should be delivered to an asynchronous servlet. As if to illustrate my point, the example code that tomcat provide for their asynchronous IO contains some classic bugs and inefficiencies that I examine here as they well illustrate why we should be making all efforts possible to encapsulate asynchronous IO below the level of the servlet API.

What to do with zero or more bytes?

The first error that the tomcat example makes is that it does not well handle the fact that asynchronous read may not return all he bytes you need for handle the content. Content can be provided in little chunks by a simple client, a slow network, a busy OS or a malicious attacker. The code that the tomcat example has for handling a read events is:

... if (event.getEventType() == CometEvent.EventType.READ) {
    InputStream is = request.getInputStream();
    byte[] buf = new byte[512];
    do {
        int n = is.read(buf); //can throw an IOException
        if (n > 0) {
            log("Read "+n+" bytes: " + new String(buf, 0, n)
            +" for session: "+request.getSession(true).getId());
        } ...
    } while (is.available() > 0);
}
The bug with this code is that it assumes a 1:1 mapping for bytes to characters and that any bytes read can be converted to a String.  If this JVM is using utf-8 as the

default encoding or the example is extended to explicitly handle character encodings, then  there is the possibility that  the read may return only a partial multi-byte character.   You can’t convert 2 bytes of a 3 byte unicode character with new String(…)!

So this seemingly simple example would need to be made a lot more complex before it could be exposed to the real world.  Real world code would

need to do something like:

  • parse the bytes to determine the boundary between content that can be handled and content that must be buffered waiting for more bytes to arrive.
  • persist unused bytes in a buffer
  • handle any of the bytes/characters that can be handled so as to free space in the buffer so a full buffer will not prevent the extra bytes required from being received.

This is tricky code and the result will be horribly inefficient:

  • Decoding utf-8 is non-trivial
  • Many extra temporary byte buffers will be created
  • If there are many connected users, then there could be many additional buffers persisted between callbacks, consuming significant memory.
  • The container will have already buffered the content in its own efficient buffers, so the data is duplicated moving it to the temp byte buffer.
  • Data that is stored in efficient container buffers must be copied into user memory to be handled as a byte array.  If the content is destined for a File or another network connection, it would be better to allow the container’s efficient buffers to be directly accessed by the operating system and avoid user space handling entirely.
Asynchronous IO is hard and it is even harder to make it efficient. The flaw with this example is both with the actual execution (not handling partial characters) and with the approach of expecting user supplied code to deal with the asynchronous IO in the first place.   A far better approach and the one that I advocate for Servlet 3.0 is to allow the container to handle asynchronous IO and data conversions.  For example If the application wants the request content as a String, then the container can perform the conversion efficiently without extra buffers or copies.
Did I write or should I go now?

The second bug in the tomcat example is with the writing of the response content.  It is unclear from the supporting text if the writer is in blocking mode or not, but either way this code is buggy:

// Send any pending message on all the open connections
for (int i = 0; i < connections.size(); i++) {
    try {
        PrintWriter writer = connections.get(i).getWriter();
        for (int j = 0; j < pendingMessages.length; j++) {
            writer.println(pendingMessages[j] + "<br>");
        }
        writer.flush();
        ...
     } ...
}
If the underlying stream is in asynchronous non-blocking mode, then there is no guarantee that the messages will be written and the Writer.println method has no way to tell the caller that not all the content has been written. Of course the horrid multi-byte character issue remains as partial characters can be written and the unwritten bytes need to be buffered.  Thus it is probably the case that the stream is in blocking mode and the problem becomes that with a single thread writing the responses to all clients, one slow (or malicious) client can block that thread and prevent all other clients from receiving their messages.  Without the complexities of asynchronous writes, this example would need to be modified to have threads dispatched to handle each client and a thread pool to efficiently recycle those threads – but wait…. isn’t that all part of the mechanisms provided by the servlet container?  By avoiding doing your work inside Servlet.dispatch, the developer is going to have to re-invent quite a few wheels: buffering, dispatching, threadpools etc. etc.

Conclusion

Tomcat has good asynchronous IO buffers, dispatching and thread pooling built inside the container, yet when the  experienced developers that wrote tomcat came to write a simple example of using their IO API, they include some significant bugs (or completely over-simplified the real work that needs to be done). Asynchronous IO is hard and it is harder to make efficient. It is simply not something that we want application or framework developers having to deal with, as if the container developers can’t get it right, what chance do other developers not versed in the complexities have?!   An extensible asynchronous IO API is a good thing to have in a container, but I think it is the wrong API to solve the use-cases of Comet, Ajax push or any other asynchronous scheduling concerns that a framework developer may need to deal with.

Advertisements
 
Leave a comment

Posted by on September 23, 2008 in COMET

 

Tags: ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: