Infallible APIs

Fellow Atlassian Charles Miller recently wrote an amusing post about methods and constructors in Java that declare a checked exception, but can be called in a way that is required by the specification not to fail. A common example involves string encodings:

try {
    s = new String(byteArray, "UTF-8");
} catch (UnsupportedEncodingException e) {
    throw new Error("UTF-8 is missing??");
}

This code is the result of two conflicting factors. On one hand, since the constructor in question takes an arbitrary character encoding, the case of the encoding being unavailable must be taken into account. On the other hand, 90% of code that calls this constructor will be explicitly invoking a character set that is required to be provided with the Java Runtime Environment, and its absence would be an error serious enough to justify terminating the VM entirely.

The unnecessary exception-handling code is ugly, and obscures the actual intent of the method in which it appears. Charles jokingly proposes adding a “yoda” statement to Java to tell the JVM, “do, or do not; there is no try.”

Another Solution

Usually, code smells like this indicate a poorly-designed API. If you know the method can be called in a way such that failure would mean there’s an internal error, then it shouldn’t be throwing a checked exception. As is often the case, you could solve this problem with stronger types:

public class Charset {
    public static Charset findByName(String charsetName)
        throws UnsupportedEncodingException {
        // implementation left as an exercise for the reader...
    }
    // ...
    public static class Standard {
        public static final Charset UTF_8 = //...
        public static final Charset US_ASCII = //...
        // etc.
    }
}

Then give String a new constructor:

    public String(byte[] bytes, Charset charset) {
        // no exception declared!
        //...
    }

Now you’ve got a few different ways to use this. When you know you want to use a built-in encoding:

    String s = new String(byteArray, Charset.Standard.UTF_8);
    // no checked exception here!

When you want to use a variable encoding, that may or may not be defined in this VM:

    Charset charset;
    try {
        charset = Charset.findByName(charsetName);
    } catch (UnsupportedEncodingException e) {
        System.err.println("Unknown charset: " + charsetName
            + "; falling back to US-ASCII");
        charset = Charset.Standard.US_ASCII;
    }
    String s = new String(byteArray, charset);

And the original String constructor could remain—rewritten to use the new Charset facilities—as a convenience for the case where you really do want to use an unknown encoding, and fail if it’s not present:

    try {
        String s = new String(byteArray, charsetName);
    } catch (UnsupportedEncodingException e) {
        System.err.println("Unknown charset: " + charsetName
            + "; ignoring");
    }

The Worst Part of it All

As it happens, Sun, did add a Charset class to Java in 1.4, as part of the NIO framework. Astonishingly, they still did not define constant implementations of Charset for the set of required standard ones. So you still have to look them up by name, and you still have to catch an exception! Worse yet, they invented some new exceptions for the purpose, UnsupportedCharsetException and IllegalCharsetNameException. These exceptions are unchecked runtime exceptions, which avoids the problem of having to clutter your code with exception handling code in the cases where you are using a standard charset, but makes it easier to mishandle cases where you aren’t.

The kicker, though, is that String does not have a constructor that accepts a Charset object! Instead, you’re supposed to use the CharsetDecoder class or the decode method of Charset, which wraps everything in Buffer objects, making you jump through a few hoops to accomplish the same things:

    String s =
        Charset.forName(charsetName)
               .decode(ByteBuffer.wrap(byteArray))
               .toString();
    // this might throw an exception, but it's unchecked
    // so we don't need to catch it unless we want to handle it

Ick.

Still, though, the NIO charset handling is a little bit better than what we had before, and it gives you some flexibility that wasn’t there in the previous implementation (for example, you can control how to handle unmappable byte sequences). It wouldn’t be too hard to add in a few utility methods to smooth over the rough edges here. I’ll leave that as an exercise for the reader.

Post a comment

You may use the following HTML:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>