Open Bug 670837 Opened 11 years ago Updated 4 years ago
maxlength shouldn't count one non-BMP character as two characters
Steps to reproduce: 1. Enter data:text/html,<input maxlength=3> into the location bar. 2. Type (or copy and paste)
Oops, Bugzilla didn't accept non-BMP characters... Steps to reproduce: 1. Enter data:text/html,<input maxlength=3> into the location bar. 2. Type (or copy and paste) 𠮷野家 into the text box. Expected result: 𠮷野家 Actual result: 𠮷野
maxlength is a maximum allowed value length in code-point (not UTF-16 code unit) per HTML5 spec. Chrome works as expected.
For what it's worth, WebKit's maxlength impl counts glyph clusters. We should do the same, but we don't have good infrastructure for it.... We probably need a bug on said infrastructure, and also some conversation between Mounir and whoever implements it on how it needs to work for performance to not be hosed.
And I think the HTML5 spec is sort of wrong here. Maxlength should not count combining marks and the like, imo.
That depends why you want maxlength in the first place. See http://lists.w3.org/Archives/Public/www-international/2011AprJun/0119.html A more serious case than the STR in comment 1 is entering 野家𠮷 -- this leaves an unpaired surrogate in the input field.
The DOM doesn't do anything with maxlength except the attribute reflection, even validation has been disabled before shipping Firefox 4. This should be fixed in the editor. FWIW, we might change the behavior with maxlength to not prevent typing but just making the field invalid if the text length is greater than maxlength, see bug 613016.
Component: DOM: Core & HTML → Editor
QA Contact: general → editor
Version: unspecified → Trunk
(In reply to comment #8) > Either way, we shouldn't truncate in mid-supplementary character. If > maxlength is interpreted as UTF-16 code units, "野家𠮷" with a > maxlength of 3 should give "野家" Definitely. We should never allow unpaired surrogates. (Well, I don't think we can prevent people creating them via JS string-munging. So our code needs to handle them robustly anywhere they occur. But wherever possible, we should prevent them arising.)
It really sounds like there are two separate issues here: 1) Fix editor to not truncate in mid-surrogate-pair. 2) Spec discussion that needs to happen (and I'm probably the wrong person to drive it). Who's willing to take on #2?
Note that #2 is happening in https://github.com/whatwg/html/issues/1467
I have created a pen that makes it slightly easy to check: https://codepen.io/thany/pen/zmRZKM My current findings: Firefox 64 has this bug Chrome 72 has this bug Edge 17 does NOT have this bug
You need to log in before you can comment on or make changes to this bug.