Bug 503502 makes note that mb_strlen() is inaccurate unless we pass it 'utf-8' as it's encoding. Grepping through our tree shows mb_strlen() used a half dozen times but never with a specified encoding. Example File: $x = '海'; echo "no encoding: " .mb_strlen($x)."\n"; echo "with encoding: " .mb_strlen($x, 'UTF-8')."\n" Example Output (run on khan): no encoding: 3 with encoding: 1
This should be a quick fix but we'll want to test it out too to make sure stuff isn't depending on the broken code.
Created attachment 395954 [details] [diff] [review] patch v1
If we plan to do some thorough testing, maybe we should file a new one for it.
I know I filed this and asked you to patch it, but in the mean time bug 512766 was filed and it makes more sense to do it that way so I'm forward duping this.