Closed Bug 1288975 Opened 9 years ago Closed 9 years ago

InnerText does not match rendered text when lang attribute affects rendering

Categories

(Core :: DOM: Core & HTML, defect)

47 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox51 --- fixed

People

(Reporter: ejsanders, Assigned: jfkthame)

References

()

Details

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36 Steps to reproduce: Create the following element: <span lang="tr" style="text-transform: uppercase;">i</span> This correctly renders as İ (capital I with a dot) due to Turkish capitalisation rules. Now query this element's .innerText property. Actual results: The innerText value is a standard Latin capital 'I'. Although the uppercase transform is applied, it is not done in the correct locale. Expected results: MDN on innerText: "As a getter, it approximates the text the user would get if they highlighted the contents of the element with the cursor and then copied to the clipboard." Therefore the value should be 'İ'. This is the value returned in Chrome.
Component: Untriaged → DOM: Core & HTML
Product: Firefox → Core
Chrome and Safari have the same behaviour. I guess it's a bug. Would you please check, John?
Flags: needinfo?(jdai)
Assignee: nobody → jdai
Flags: needinfo?(jdai)
Yes, our behaviour is different with Chrome and Safari, because our spec[1] didn't define innerText with lang behaviour. According to ROC spec[1], Getting innerText step 2.5, "If the node is a text node, then for each CSS text box produced by the node, in content order, compute the text of the box after application of the CSS white-space processing rules and text-transform rules, and return a list of the resulting strings." [1] http://rocallahan.github.io/innerText-spec/index.html#getting-innertext
What does IE do here, and what about Edge? innerText has been a great mess always, browsers implementing it in different ways. But, bug 264412 added code using GetRenderedText. Why that isn't returning the right text. CCing some folks who are familiar with that part.
Blocks: innertext
Flags: needinfo?(mats)
Flags: needinfo?(jfkthame)
(In reply to ejsanders from comment #0) > MDN on innerText: "As a getter, it approximates the text the user would get > if they highlighted the contents of the element with the cursor and then > copied to the clipboard." Well... actually, in my testing it appears that highlighting the contents and copying to the clipboard (in Firefox) yields "I", not "İ", and so innerText is behaving exactly as MDN suggests. But I'd agree this is not the most desirable behavior (for either copy/paste or innerText); it would be better to take the lang attribute into account. Looks like the problem is that GetRenderedText relies on a local TransformChars function[1] which doesn't know anything about language-specific rendering. That behavior is implemented by nsCaseTransformTextRunFactory::TransformString,[2] so if we can make GetRenderedText use that instead, we should get a better result here. [1] https://dxr.mozilla.org/mozilla-central/rev/ddeb0295df692695b36295177d6790e5393e1f9a/layout/generic/nsTextFrame.cpp#9401 [2] https://dxr.mozilla.org/mozilla-central/rev/ddeb0295df692695b36295177d6790e5393e1f9a/layout/generic/nsTextRunTransformations.cpp#275
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(jfkthame)
(In reply to Jonathan Kew (:jfkthame) from comment #4) > Well... actually, in my testing it appears that highlighting the contents > and copying to the clipboard (in Firefox) yields "I", not "İ", and so > innerText is behaving exactly as MDN suggests. I don't even get an upper case 'I' when copying, just a lowercase 'i', as per bug 35148 (which has been unresolved for 16 years...)
(In reply to ejsanders from comment #5) > (In reply to Jonathan Kew (:jfkthame) from comment #4) > > Well... actually, in my testing it appears that highlighting the contents > > and copying to the clipboard (in Firefox) yields "I", not "İ", and so > > innerText is behaving exactly as MDN suggests. > > I don't even get an upper case 'I' when copying, just a lowercase 'i', as > per bug 35148 (which has been unresolved for 16 years...) Huh, interesting... I was pasting it into TextEdit.app, and there, I get uppercase "I"; but pasting the same clipboard contents into TextWrangler.app, I get lowercase "i". Personally, I am not at all sure I agree that copy/paste *should* apply transforms; I didn't immediately raise that question, as I was misled (by the TextEdit result) into thinking that we'd taken that decision at some point, and it was now a "done deal". But I sympathize with bug 35148 comment 17, for example, and am not convinced we should make a change there. (In which case the description of innerText may be a bit misleading and should be adjusted.)
(In reply to Jonathan Kew (:jfkthame) from comment #7) > Huh, interesting... I was pasting it into TextEdit.app, and there, I get > uppercase "I"; but pasting the same clipboard contents into > TextWrangler.app, I get lowercase "i". Ah, I see... we put both text and HTML into the system clipboard, and TextEdit interprets the HTML -- including the text-transform:uppercase (but without applying language-specific rules). But the plain text content is indeed un-transformed.
Here's a patch that makes innerText rely on the same (language-aware) transformation code as rendering. Note that this does not affect copy-to-clipboard, which is a separate issue (bug 35148), as mentioned above; it's unclear to me whether we should change that behavior or not.
Attachment #8776276 - Flags: review?(mats)
(In reply to Olli Pettay [:smaug] from comment #3) > What does IE do here, and what about Edge? IE 11.494.10586.0 and Edge 25.10586.0.0 The innerText value is a standard Latin small letter 'i'. > > innerText has been a great mess always, browsers implementing it in > different ways. > > But, bug 264412 added code using GetRenderedText. Why that isn't returning > the right text. > > CCing some folks who are familiar with that part.
Thanks for the comments. Changed the bug assignee to :jfkthame.
Assignee: jdai → jfkthame
Comment on attachment 8776276 [details] [diff] [review] Make innerText use nsCaseTransformTextRunFactory::TransformString to implement text-transform, so that it benefits from language-specific behaviors > layout/generic/nsTextFrame.cpp >+ nsAutoString origString; nit: "origString" seems like a misnomer, how about "fragString" instead? > layout/generic/nsTextRunTransformations.cpp >+ bool auxiliaryOutputArrays = (aCanBreakBeforeArray && aStyleArray); The () seems superfluous. There are two more places that appears to use 'i' as a textrun index: http://searchfox.org/mozilla-central/rev/f433f0dd7158d7bfc4c4607161fc6baa88b5a9f4/layout/generic/nsTextRunTransformations.cpp#318,583 "aTextRun->mStyles[i]" and "aTextRun->CanBreakBefore(i)" Shouldn't these also be "aOffsetInTextRun + i" now? (and if so, then it's probably better to change the loop increment to "++i, ++aOffsetInTextRun" and simply use aOffsetInTextRun in these places)
Flags: needinfo?(mats)
Attachment #8776276 - Flags: review?(mats) → review+
Don't forget to add regression tests.
Flags: in-testsuite?
Our current tests for innerText are in WPT, so I suggest we add a couple more examples there to cover this.
Attachment #8782976 - Flags: review?(mats)
Attachment #8782976 - Flags: review?(mats) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/eccb85036391ef19eb2088b75373a06a4f04c76b Bug 1288975 - Make innerText use nsCaseTransformTextRunFactory::TransformString to implement text-transform, so that it benefits from language-specific behaviors. r=mats https://hg.mozilla.org/integration/mozilla-inbound/rev/54bfcfb26e1179fb9601a177630670964f76caee Bug 1288975 - Add innerText tests for text-transform support of German double-s and Turkish i vs. dotless-i to web-platform-tests. r=mats
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: