Our string encoding and decoding APIs in XPCOM are a mess. Encoding and decoding are spread across several different files and directories, and half a dozen or so people (including a few who would be expected to know the code and weren't just fixing bugs in passing) have had problems making sure fixes address all the relevant pieces of code. We should consolidate all such code into one or two files in a single location so that the code is easier to understand, simpler to fix, easier to test, and less prone to bugs through incomplete fixes. We should not be interpreting UTF-8 or UTF-16 data in XPCOM in more than one file per encoding, if at all possible.
See also the UTF8 conversions in intl/uconv where there also some SSE and ARM optimizations that should be used here. Also all the UTF* handling first does a calculate of the length walking all characters, and then does a conversion.
With all due respect, I didn't file this bug to optimize the code. I filed it because the current code is scattered, disorganized, and duplicative; that it might not be hyper-efficient is, to me, a much smaller concern than its over-complexity or the security bugs that complexity has engendered. Optimize after the code's cleaned up, or if that's not your cup of tea optimize the current code in separate bugs -- optimization is not a goal of this bug, except insofar as codesize reduction may have minor performance effects at the margin.