Closed
Bug 1288975
Opened 9 years ago
Closed 9 years ago
InnerText does not match rendered text when lang attribute affects rendering
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
FIXED
mozilla51
Tracking | Status | |
---|---|---|
firefox51 | --- | fixed |
People
(Reporter: ejsanders, Assigned: jfkthame)
References
()
Details
Attachments
(2 files)
9.51 KB,
patch
|
MatsPalmgren_bugz
:
review+
|
Details | Diff | Splinter Review |
1.60 KB,
patch
|
MatsPalmgren_bugz
:
review+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36
Steps to reproduce:
Create the following element:
<span lang="tr" style="text-transform: uppercase;">i</span>
This correctly renders as İ (capital I with a dot) due to Turkish capitalisation rules.
Now query this element's .innerText property.
Actual results:
The innerText value is a standard Latin capital 'I'. Although the uppercase transform is applied, it is not done in the correct locale.
Expected results:
MDN on innerText: "As a getter, it approximates the text the user would get if they highlighted the contents of the element with the cursor and then copied to the clipboard."
Therefore the value should be 'İ'. This is the value returned in Chrome.
Comment 1•9 years ago
|
||
Chrome and Safari have the same behaviour. I guess it's a bug. Would you please check, John?
Flags: needinfo?(jdai)
Updated•9 years ago
|
Assignee: nobody → jdai
Flags: needinfo?(jdai)
Updated•9 years ago
|
Comment 2•9 years ago
|
||
Yes, our behaviour is different with Chrome and Safari, because our spec[1] didn't define innerText with lang behaviour. According to ROC spec[1], Getting innerText step 2.5, "If the node is a text node, then for each CSS text box produced by the node, in content order, compute the text of the box after application of the CSS white-space processing rules and text-transform rules, and return a list of the resulting strings."
[1] http://rocallahan.github.io/innerText-spec/index.html#getting-innertext
Comment 3•9 years ago
|
||
What does IE do here, and what about Edge?
innerText has been a great mess always, browsers implementing it in different ways.
But, bug 264412 added code using GetRenderedText. Why that isn't returning the right text.
CCing some folks who are familiar with that part.
Assignee | ||
Comment 4•9 years ago
|
||
(In reply to ejsanders from comment #0)
> MDN on innerText: "As a getter, it approximates the text the user would get
> if they highlighted the contents of the element with the cursor and then
> copied to the clipboard."
Well... actually, in my testing it appears that highlighting the contents and copying to the clipboard (in Firefox) yields "I", not "İ", and so innerText is behaving exactly as MDN suggests.
But I'd agree this is not the most desirable behavior (for either copy/paste or innerText); it would be better to take the lang attribute into account.
Looks like the problem is that GetRenderedText relies on a local TransformChars function[1] which doesn't know anything about language-specific rendering. That behavior is implemented by nsCaseTransformTextRunFactory::TransformString,[2] so if we can make GetRenderedText use that instead, we should get a better result here.
[1] https://dxr.mozilla.org/mozilla-central/rev/ddeb0295df692695b36295177d6790e5393e1f9a/layout/generic/nsTextFrame.cpp#9401
[2] https://dxr.mozilla.org/mozilla-central/rev/ddeb0295df692695b36295177d6790e5393e1f9a/layout/generic/nsTextRunTransformations.cpp#275
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(jfkthame)
(In reply to Jonathan Kew (:jfkthame) from comment #4)
> Well... actually, in my testing it appears that highlighting the contents
> and copying to the clipboard (in Firefox) yields "I", not "İ", and so
> innerText is behaving exactly as MDN suggests.
I don't even get an upper case 'I' when copying, just a lowercase 'i', as per bug 35148 (which has been unresolved for 16 years...)
Assignee | ||
Comment 7•9 years ago
|
||
(In reply to ejsanders from comment #5)
> (In reply to Jonathan Kew (:jfkthame) from comment #4)
> > Well... actually, in my testing it appears that highlighting the contents
> > and copying to the clipboard (in Firefox) yields "I", not "İ", and so
> > innerText is behaving exactly as MDN suggests.
>
> I don't even get an upper case 'I' when copying, just a lowercase 'i', as
> per bug 35148 (which has been unresolved for 16 years...)
Huh, interesting... I was pasting it into TextEdit.app, and there, I get uppercase "I"; but pasting the same clipboard contents into TextWrangler.app, I get lowercase "i".
Personally, I am not at all sure I agree that copy/paste *should* apply transforms; I didn't immediately raise that question, as I was misled (by the TextEdit result) into thinking that we'd taken that decision at some point, and it was now a "done deal". But I sympathize with bug 35148 comment 17, for example, and am not convinced we should make a change there.
(In which case the description of innerText may be a bit misleading and should be adjusted.)
Assignee | ||
Comment 8•9 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #7)
> Huh, interesting... I was pasting it into TextEdit.app, and there, I get
> uppercase "I"; but pasting the same clipboard contents into
> TextWrangler.app, I get lowercase "i".
Ah, I see... we put both text and HTML into the system clipboard, and TextEdit interprets the HTML -- including the text-transform:uppercase (but without applying language-specific rules). But the plain text content is indeed un-transformed.
Assignee | ||
Comment 9•9 years ago
|
||
Here's a patch that makes innerText rely on the same (language-aware) transformation code as rendering. Note that this does not affect copy-to-clipboard, which is a separate issue (bug 35148), as mentioned above; it's unclear to me whether we should change that behavior or not.
Attachment #8776276 -
Flags: review?(mats)
Comment 10•9 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #3)
> What does IE do here, and what about Edge?
IE 11.494.10586.0 and Edge 25.10586.0.0
The innerText value is a standard Latin small letter 'i'.
>
> innerText has been a great mess always, browsers implementing it in
> different ways.
>
> But, bug 264412 added code using GetRenderedText. Why that isn't returning
> the right text.
>
> CCing some folks who are familiar with that part.
Comment 11•9 years ago
|
||
Thanks for the comments. Changed the bug assignee to :jfkthame.
Assignee: jdai → jfkthame
Comment 12•9 years ago
|
||
Comment on attachment 8776276 [details] [diff] [review]
Make innerText use nsCaseTransformTextRunFactory::TransformString to implement text-transform, so that it benefits from language-specific behaviors
> layout/generic/nsTextFrame.cpp
>+ nsAutoString origString;
nit: "origString" seems like a misnomer, how about "fragString" instead?
> layout/generic/nsTextRunTransformations.cpp
>+ bool auxiliaryOutputArrays = (aCanBreakBeforeArray && aStyleArray);
The () seems superfluous.
There are two more places that appears to use 'i' as a textrun index:
http://searchfox.org/mozilla-central/rev/f433f0dd7158d7bfc4c4607161fc6baa88b5a9f4/layout/generic/nsTextRunTransformations.cpp#318,583
"aTextRun->mStyles[i]" and "aTextRun->CanBreakBefore(i)"
Shouldn't these also be "aOffsetInTextRun + i" now?
(and if so, then it's probably better to change the loop increment
to "++i, ++aOffsetInTextRun" and simply use aOffsetInTextRun in
these places)
Flags: needinfo?(mats)
Attachment #8776276 -
Flags: review?(mats) → review+
Assignee | ||
Comment 14•9 years ago
|
||
Our current tests for innerText are in WPT, so I suggest we add a couple more examples there to cover this.
Attachment #8782976 -
Flags: review?(mats)
Updated•9 years ago
|
Attachment #8782976 -
Flags: review?(mats) → review+
Assignee | ||
Comment 15•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/eccb85036391ef19eb2088b75373a06a4f04c76b
Bug 1288975 - Make innerText use nsCaseTransformTextRunFactory::TransformString to implement text-transform, so that it benefits from language-specific behaviors. r=mats
https://hg.mozilla.org/integration/mozilla-inbound/rev/54bfcfb26e1179fb9601a177630670964f76caee
Bug 1288975 - Add innerText tests for text-transform support of German double-s and Turkish i vs. dotless-i to web-platform-tests. r=mats
Comment 16•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/eccb85036391
https://hg.mozilla.org/mozilla-central/rev/54bfcfb26e11
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox51:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in
before you can comment on or make changes to this bug.
Description
•