Created attachment 546742 [details] [diff] [review]
convert hyphenation-point offsets correctly to utf16 offsets
Because I misinterpreted the libhyphen API in bug 253317, hyphenation positions are not returned correctly when non-ASCII characters are present. (This doesn't affect the en-US patterns, but showed up once I started testing with more languages for bug 672320.)
The issue is that although the hnj_hyphen_hyphenate2() function takes the text as an 8-bit string, with a length in bytes, the hyphens array that it returns (when using UTF-8 dictionaries) is indexed by Unicode character count, not (as I assumed) by UTF-8 code unit positions in the input string.
This means that the conversion of hyphenation-point offsets to our UTF-16 text representation is incorrect.
Pushed to mozilla-inbound: