Follow up work for Bug 673039. As discussed at: http://en.wikipedia.org/wiki/UTF-8 (and also in diagrams within our code, such as above _analyzeUtf8 in StringObject.cpp), UTF-8 code points can be represented by multibyte sequences up to 6 bytes in length. We neglected to include any cases beyond the 1 byte case. (Whoops.) Bug 673039 illustrates a 2 byte case. But it would be best to cover each of the 5 cases beyond the single byte cases (and preferably tickle the edges between the cases).
Note that the description at the UTF-8 wikipage of the different multi-byte encoding cases is talking about a range going up to 31 bits, but a unicode escape sequence \uNNNN can represent at most 16 bits. I'm not a unicode expert. I'm currently assuming I'll be using surrogate pairs to construct cases for 0x10000 and above. (Still reading.)
Dan, include in your i9 work.