670837 - maxlength shouldn't count one non-BMP character as two characters

Masatoshi Kimura [:emk]

Reporter

Description

•

14 years ago

Steps to reproduce: 1. Enter data:text/html,<input maxlength=3> into the location bar. 2. Type (or copy and paste)

Masatoshi Kimura [:emk]

Reporter

Comment 1

•

14 years ago

Oops, Bugzilla didn't accept non-BMP characters... Steps to reproduce: 1. Enter data:text/html,<input maxlength=3> into the location bar. 2. Type (or copy and paste) 𠮷野家 into the text box. Expected result: 𠮷野家 Actual result: 𠮷野

Masatoshi Kimura [:emk]

Reporter

Comment 2

•

14 years ago

maxlength is a maximum allowed value length in code-point (not UTF-16 code unit) per HTML5 spec. Chrome works as expected.

Boris Zbarsky [:bzbarsky]

Comment 3

•

14 years ago

For what it's worth, WebKit's maxlength impl counts glyph clusters. We should do the same, but we don't have good infrastructure for it.... We probably need a bug on said infrastructure, and also some conversation between Mounir and whoever implements it on how it needs to work for performance to not be hosed.

Boris Zbarsky [:bzbarsky]

Comment 4

•

14 years ago

And I think the HTML5 spec is sort of wrong here. Maxlength should not count combining marks and the like, imo.

Simon Montagu :smontagu

Comment 5

•

14 years ago

That depends why you want maxlength in the first place. See http://lists.w3.org/Archives/Public/www-international/2011AprJun/0119.html A more serious case than the STR in comment 1 is entering 野家𠮷 -- this leaves an unpaired surrogate in the input field.

Mounir Lamouri (:mounir)

Comment 6

•

14 years ago

The DOM doesn't do anything with maxlength except the attribute reflection, even validation has been disabled before shipping Firefox 4. This should be fixed in the editor. FWIW, we might change the behavior with maxlength to not prevent typing but just making the field invalid if the text length is greater than maxlength, see bug 613016.

Component: DOM: Core & HTML → Editor

QA Contact: general → editor

Version: unspecified → Trunk

Jonathan Kew [:jfkthame]

Comment 7

•

14 years ago

(In reply to comment #3) > For what it's worth, WebKit's maxlength impl counts glyph clusters. We > should do the same.... I don't think I agree with this, in general. For cases like Latin letters with accents, it seems a reasonable interpretation (although as Simon mentions, this depends on the use case); however, for Indic scripts where a "cluster" may consist of multiple conjoined consonants, plus a vowel mark, plus additional marks such as nasalization, I don't believe it makes any sense (to users) to count the "length" of a string in terms of glyph clusters; they are well aware of the constituent characters within such clusters and would expect to count them separately. Whether "length" (for the purposes of maxlength) is most usefully measured in terms of Unicode characters or UTF16 code units is a tricky question, but given that Javascript and DOMStrings expose the UTF16 encoding form, I think it would be most consistent for maxlength to be expressed as a count of UTF16 units, too.

Simon Montagu :smontagu

Comment 8

•

14 years ago

(In reply to comment #7) > Whether "length" (for the purposes of maxlength) is most usefully measured > in terms of Unicode characters or UTF16 code units is a tricky question, but > given that Javascript and DOMStrings expose the UTF16 encoding form, I think > it would be most consistent for maxlength to be expressed as a count of > UTF16 units, too. Either way, we shouldn't truncate in mid-supplementary character. If maxlength is interpreted as UTF-16 code units, "野家𠮷" with a maxlength of 3 should give "野家"

Jonathan Kew [:jfkthame]

Comment 9

•

14 years ago

(In reply to comment #8) > Either way, we shouldn't truncate in mid-supplementary character. If > maxlength is interpreted as UTF-16 code units, "野家𠮷" with a > maxlength of 3 should give "野家" Definitely. We should never allow unpaired surrogates. (Well, I don't think we can prevent people creating them via JS string-munging. So our code needs to handle them robustly anywhere they occur. But wherever possible, we should prevent them arising.)

Boris Zbarsky [:bzbarsky]

Comment 10

•

14 years ago

It really sounds like there are two separate issues here: 1) Fix editor to not truncate in mid-surrogate-pair. 2) Spec discussion that needs to happen (and I'm probably the wrong person to drive it). Who's willing to take on #2?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)(Got a cold so that using sick leaves)

Updated

•

11 years ago

Depends on: 1026397

Boris Zbarsky [:bzbarsky]

Comment 11

•

9 years ago

Note that #2 is happening in https://github.com/whatwg/html/issues/1467

Boris Zbarsky [:bzbarsky]

Updated

•

9 years ago

Blocks: 1277820

Martijn

Comment 12

•

7 years ago

I have created a pen that makes it slightly easy to check: https://codepen.io/thany/pen/zmRZKM My current findings: Firefox 64 has this bug Chrome 72 has this bug Edge 17 does NOT have this bug

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Thomas Wisniewski [:twisniewski]

Comment 13

•

2 years ago

Would fixing this align all browsers on this WPT? https://wpt.fyi/results/html/semantics/forms/constraints/input-maxlength-emoji.html

Flags: needinfo?(masayuki)

Jonathan Kew [:jfkthame]

Comment 14

•

2 years ago

No; it looks like that WPT case fails because of its use of execCommand("InsertHTML", ...), which Firefox doesn't seem to support in an <input type=text> (quite logically, IMO).

Changing the test to use execCommand("InsertText", ...) makes it pass, AFAICS from trying locally.

Thomas Wisniewski [:twisniewski]

Comment 15

•

2 years ago

Oh, interesting.. I don't think there are any WPTs to cover that. Perhaps masayuki knows better..

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)(Got a cold so that using sick leaves)

Comment 16

•

2 years ago

Well, I think that the test should be rewritten with insertText because there is no agreements about serialization from HTML fragment to plaintext value in <input> and <textarea>.

Flags: needinfo?(masayuki)

Thomas Wisniewski [:twisniewski]

Comment 17

•

2 years ago

I don't disagree, but that still will leave us with the interop problem with insertHTML, won't it? Should we file any issues against the spec/WPTs/etc?

Flags: needinfo?(masayuki)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)(Got a cold so that using sick leaves)

Comment 18

•

2 years ago

Yeah, I think so. The problem about insertHTML should be discussed separately from this bug.

Flags: needinfo?(masayuki)

Thomas Wisniewski [:twisniewski]

Updated

•

2 years ago

Bugzilla

maxlength shouldn't count one non-BMP character as two characters

Categories

(Core :: DOM: Editor, defect)

Tracking

()

People

(Reporter: emk, Unassigned)

References

(Blocks 1 open bug,
URL
)

Details

(Keywords: intl)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Comment 11

Updated

Comment 12

Updated

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Updated