As a security precaution, we have turned on the setting "Require API key authentication for API requests" for everyone. If this has broken something, please contact bugzilla-admin@mozilla.org
Last Comment Bug 264412 - (innertext) Add support for element.innerText
(innertext)
: Add support for element.innerText
Status: RESOLVED FIXED
[compat]
: dev-doc-needed, feature
Product: Core
Classification: Components
Component: DOM: Core & HTML (show other bugs)
: Trunk
: All All
: -- normal with 9 votes (vote)
: mozilla45
Assigned To: Robert O'Callahan (:roc) (email my personal email if necessary)
:
: Andrew Overholt [:overholt]
Mentors:
http://www.sharepointcustomization.co...
: 254174 287395 310661 317330 338801 339017 343357 934346 975880 1107872 (view as bug list)
Depends on: 1268833 1221043 1226293 1260025 1288975 1290937
Blocks: 914252 1170774 ringmark-ring1 942989 998390 1130981
  Show dependency treegraph
 
Reported: 2004-10-14 13:14 PDT by Paul_Adams
Modified: 2016-08-02 04:28 PDT (History)
45 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed

MozReview Requests
Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:
Show discarded requests

Attachments
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz (40 bytes, text/x-review-board-request)
2015-10-18 20:48 PDT, Robert O'Callahan (:roc) (email my personal email if necessary)
mats: review+
mzehe: review+
Details | Review
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats (40 bytes, text/x-review-board-request)
2015-10-18 20:48 PDT, Robert O'Callahan (:roc) (email my personal email if necessary)
mats: review+
bugs: review+
Details | Review
MozReview Request: Bug 264412. Optimize GetRenderedText. r=mats (40 bytes, text/x-review-board-request)
2015-10-29 23:27 PDT, Robert O'Callahan (:roc) (email my personal email if necessary)
mats: review+
Details | Review

Description User image Paul_Adams 2004-10-14 13:14:41 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20040913 Firefox/0.10
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20040913 Firefox/0.10

It seems that for some reason, the innerText property is not updatable with the
script in the above URL.

Reproducible: Always
Steps to Reproduce:
1. Go to the URL listed in this bug report
2.
3.

Actual Results:  
The innerText properties of the p, td, div and span elements are undefined when
the javascript starts, and they never get updated by the script.
The input element does actually get updated if the script is downloaded and
modified to take out the null checking.

Expected Results:  
The innerText property of the p, div, span, and td elements should have been
updated via the javascript.

If the innerText property in the 'clock.js' script is replaced by innerHTML,
then everything works OK
Comment 1 User image Phil Ringnalda (:philor) 2004-10-14 13:50:02 PDT
Gecko (and thus the Mozilla suite, Firefox, Netscape, Camino, Galeon, ...) never
has supported innerText -
http://www.mozilla.org/docs/web-developer/upgrade_2.html#dom_unsupp

If you actually meant to file an enhancement bug (on Browser - DOM Level 0) for
innerText support, well, in an era when we support document.all as long as you
don't ask first, who knows?
Comment 2 User image Paul_Adams 2004-10-15 08:10:40 PDT
This does beg the question why do you continue to support innerHTML?!!
Comment 3 User image Gérard Talbot 2005-02-16 10:38:25 PST
*** Bug 282317 has been marked as a duplicate of this bug. ***
Comment 4 User image John Vandenberg 2005-10-01 02:47:19 PDT
*** Bug 310661 has been marked as a duplicate of this bug. ***
Comment 5 User image Phil Ringnalda (:philor) 2006-05-22 17:06:18 PDT
*** Bug 338801 has been marked as a duplicate of this bug. ***
Comment 6 User image :Gavin Sharp [email: gavin@gavinsharp.com] 2006-05-23 13:06:13 PDT
*** Bug 287395 has been marked as a duplicate of this bug. ***
Comment 7 User image :Gavin Sharp [email: gavin@gavinsharp.com] 2006-05-23 13:06:15 PDT
*** Bug 317330 has been marked as a duplicate of this bug. ***
Comment 8 User image :Gavin Sharp [email: gavin@gavinsharp.com] 2006-05-23 13:06:31 PDT
*** Bug 339017 has been marked as a duplicate of this bug. ***
Comment 9 User image :Gavin Sharp [email: gavin@gavinsharp.com] 2006-05-23 13:06:38 PDT
*** Bug 254174 has been marked as a duplicate of this bug. ***
Comment 10 User image Aryeh Gregor (:ayg) (next working March 28-April 26) 2011-01-30 16:24:38 PST
Is Gecko interested in implementing this if a detailed spec is written?  It's complicated, but considerably more useful than textContent for getting a plaintext version of an HTML element, and every other browser has it.  WebKit seems not to be interested in dropping it, and it would be nice if we could get interop here.
Comment 11 User image Olli Pettay [:smaug] (review queue closed until backlog cleared) 2011-01-31 03:19:23 PST
Why innerText is more useful than textContent?
Based on http://msdn.microsoft.com/en-us/library/ms533899%28v=vs.85%29.aspx
it is quite bizarre API.
Comment 12 User image Aryeh Gregor (:ayg) (next working March 28-April 26) 2011-01-31 18:28:39 PST
innerText returns the element's HTML contents converted to plaintext.  Conceptually, it works the same way as Selection stringification -- in fact, WebKit uses the same algorithm for both, and I'm going to specify both as the same algorithm.  textContent just concatenates the text node descendants in tree order, so it preserves indentation and so forth.  innerText collapses runs of whitespace to a single space, inserts newlines after block elements, etc.

I don't have specific use-cases offhand for why anyone would want to use innerText instead of textContent.  The large majority of authors seem to only set it, in which case aliasing it to textContent would work, and in fact Opera treats it similarly to textContent (although they apparently have some compat bugs as a result).  But given that everyone needs to implement the plaintext conversion algorithm for Selection stringification anyway, I don't see any reason not to provide innerText too.  The additional implementation burden should be low, and it makes things easier for authors if all browsers implement it instead of all but Firefox.
Comment 13 User image Boris Zbarsky [:bz] (still a bit busy) 2011-02-02 17:23:34 PST
*** Bug 343357 has been marked as a duplicate of this bug. ***
Comment 14 User image Aryeh Gregor (:ayg) (next working March 28-April 26) 2011-02-03 12:04:54 PST
I got a list of webpages using innerText and went through a whole bunch.  Most of them use it either for setting (where it behaves the same as textContent), or for getting for elements where it behaves basically the same as textContent anyway (e.g., only text node children).  The most interesting use I found was to transform a comment to plaintext for a quote, in two places:

1) http://www.cristianofino.net/post/Comment-Toolbar-plugin-per-WordPress.aspx

This uses innerText for a quotation feature -- it's part of an actively-maintained Wordpress plugin, <http://wordpress.org/extend/plugins/comment-toolbar/>.  The relevant JS is at <http://www.cristianofino.net/User%20Controls/CommentToolbar.js>.  To get the quote when you click the "Quota" button on a comment, it basically stringifies the Selection if there is one, otherwise takes the innerText of the post you clicked on, otherwise takes the textContent.  It so happens that this works okay, because there's no block-level markup in the comments and <br>'s and newlines are paired up exactly.  But given more general markup, Firefox (and Opera and IE9) would look much worse here, with lots of extra whitespace stuck in.

2) http://blog.livedoor.jp/geinin_matome/archives/1080186.html

Similar to (1) -- hover over the links like ">>66" to get a tooltip with a plaintext quote in WebKit or IE.  Again, no interesting markup that would make a big difference between innerText and textContent, but there could have been.
Comment 15 User image Aryeh Gregor (:ayg) (next working March 28-April 26) 2011-07-05 08:45:49 PDT
I filed a bug asking for this to be specced in HTML5 using a relatively trivial algorithm (textContent but with <br> converted to "\n"):

http://www.w3.org/Bugs/Public/show_bug.cgi?id=13145

I did this because I was reminded of the research and discussion I did about this last February:

http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-February/030179.html

Notable from that link is that I found some pages that do break on Firefox because of missing innerText, e.g.:

http://beihai.gov.cn/5863/2010_4_28/5863_98289_1272423962000.html?open=1

At the time of this writing, a large chunk of content is missing in Firefox 6.0a2 which is present in Chrome 14 dev and Opera 11.50:

"4月26日上午,北海船检局召开中心组成员《新跨越》专题学习会。     会议根据广西区港航管理局《关于在全区港航系统开展推动广西水运事业新跨越专题学习讨论活动的通知》(港航政治[2010]59号文精神,围绕推动广西水运事业新跨越主题,认真组织学习了广西区人民政府..."

That's because of the following script:

<script>var strCol=document.getElementById("contentID").innerText;if(strCol.length>150){document.write(strCol.substring(0,150)+"...");}else{document.write(strCol);}</script>

The script does nothing in Gecko because the conditional checks the length of undefined, which is a fatal error.  It works in all other browsers.
Comment 16 User image Frank Jiao 2012-03-02 02:19:59 PST Comment hidden (advocacy)
Comment 17 User image bluelettrico 2012-06-09 06:27:26 PDT Comment hidden (advocacy)
Comment 18 User image danya.postfactum 2013-02-02 10:04:04 PST
As I see innerText is in spec TODO list (http://wiki.whatwg.org/wiki/Specs_todo).
Looks like nobody is going to write it. So, we neve get innerText in Firefox ? :(
Comment 19 User image Loic 2013-11-04 11:30:51 PST
*** Bug 934346 has been marked as a duplicate of this bug. ***
Comment 20 User image Boris Zbarsky [:bz] (still a bit busy) 2014-03-10 20:18:28 PDT
*** Bug 975880 has been marked as a duplicate of this bug. ***
Comment 21 User image Robert Longson 2014-12-05 04:01:02 PST
*** Bug 1107872 has been marked as a duplicate of this bug. ***
Comment 22 User image Yuri 2015-01-01 01:21:31 PST
This really needs to be fixed. Here is the test case just in case http://jsfiddle.net/azpnmxuc/
Otherwise people will keep creating duplicates, like they did before.
Comment 23 User image Karl Dubost :karlcow 2015-01-21 16:51:42 PST
WebCompat issue created because of lack of support of innerText 
https://github.com/webcompat/web-bugs/issues/633
Comment 24 User image Karl Dubost :karlcow 2015-01-21 17:07:15 PST
Other previously reported Webcompat issues (some are now invalid or resolved):
* Bug 942989
* Bug 281825
* Bug 187698
* Bug 131793
* Bug 358349 
* Bug 25918
* Bug 148812
* Bug 169598
* Bug 220538 
* Bug 471755
Comment 25 User image Karl Dubost :karlcow 2015-04-03 16:54:53 PDT
Adding for context, an interesting blog post with textContent and innerText, performances computations, etc.
http://perfectionkills.com/the-poor-misunderstood-innerText/
Comment 26 User image Yuri 2015-04-03 17:05:27 PDT
innerText should be faster than innerHtml, because the former doesn't need any parsing. Seems like browsers should be motivated to support it well as a good practice.
Comment 27 User image Juriy "kangax" Zaytsev 2015-04-03 17:17:02 PDT
Would Mozilla be interested implementing innerText after all? As Karl mentioned, I've outlined use-cases in a recent blog post. I believe innerText is more than just a pesky webcompat offender. I'll try to work on spec in the meantime.
Comment 28 User image Karl Dubost :karlcow 2015-05-06 21:42:43 PDT
Another Web Compatibility issue https://webcompat.com/issues/1051
Comment 29 User image Karl Dubost :karlcow 2015-05-14 00:32:11 PDT
Another Web Compatibility issue https://webcompat.com/issues/1071#issuecomment-101837467
Comment 30 User image Mike Taylor [:miketaylr] 2015-06-02 14:49:51 PDT
Apart from the web compatibility gains, Juriy has written a great summary of innerText and a possible way forward in terms of a (pseudo) spec. If Chrome and IE are willing to converge, great. If not, we should pick one of the flavors and implement it.

http://perfectionkills.com/the-poor-misunderstood-innerText/#naive-spec
http://kangax.github.io/jstests/innerText/
Comment 32 User image Hallvord R. M. Steen [:hallvors] 2015-07-13 14:57:06 PDT
Breaks Google Calendar (and probably GMail) navigation rendering: https://bugzilla.mozilla.org/show_bug.cgi?id=914252#c16
Comment 33 User image Hallvord R. M. Steen [:hallvors] 2015-08-21 06:20:10 PDT
Breaks Facebook status update field resizing - bug 1197166
Comment 34 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-09-27 21:24:07 PDT
Breaks clicking on erroneous words in xkcd SimpleWriter. http://blog.xkcd.com/2015/09/22/a-thing-explainer-word-checker/
Comment 35 User image Carlos Alén Silva 2015-09-28 03:56:52 PDT
For what is worth, please note that the original URL ( http://www.sharepointcustomization.com/resources/codesamples/TimeZoneClocks/clock_example.htm ) is gone.
Comment 36 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-09-28 19:20:19 PDT
We need to implement something here, because regardless of whether innerText is a desirable feature, Web developers are using it and sites are broken in Firefox without it.

I propose we implement the last version of Aryeh's proposed spec, here: https://rawgit.com/timdown/rangy/master/fiddlings/spec/innerText.htm

Olli, does that sound OK to you?
Comment 37 User image Olli Pettay [:smaug] (review queue closed until backlog cleared) 2015-09-29 01:15:19 PDT
Given that this is so bizarre feature, I'd like to hear feedback from anne and bz too.

But yes, I agree, we need to implement something here.
Comment 38 User image Anne (:annevk) 2015-09-29 01:52:05 PDT
What makes the most sense to me is that getting behaves equivalently to stringifying a selection. So once selection is defined, this feature would be too. That matches what WebKit is doing if memory serves me.
Comment 39 User image Olli Pettay [:smaug] (review queue closed until backlog cleared) 2015-09-29 09:31:32 PDT
Yeah, that has just the issue that selection isn't properly specified.

How differently would we behave if we just used our selection stringification?
Comment 40 User image Juriy "kangax" Zaytsev 2015-09-29 09:49:12 PDT
Good question. I have no time now but if someone could run it through my test — http://kangax.github.io/jstests/innerText/ (which also shows how other implementations differ among each other; and which I'm planning to update with couple more options soon)
Comment 41 User image Boris Zbarsky [:bz] (still a bit busy) 2015-09-29 09:58:40 PDT
I think in the short term we should do whatever is simplest here.  If just stringifying a selection is simpler for us than implementing Aryeh's somewhat stillborn spec, we should do that.  Compat for this stuff is pretty much nonexistent anyway.  :(
Comment 42 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-09-29 15:17:23 PDT
My priorities are in this order:
0) Don't break stuff that currently works.
1) Implement a reasonably Web-compatible version of innerText in Gecko.
2) Have a spec for innerText that browsers converge on.
3) Make innerText consistent with selection.toString().

I think perhaps the best way to proceed is to extend nsPlainTextSerializer with new options until it supports Aryeh's draft spec, then implement innerText using nsPlainTextSerializer, and iterate on the implementation and draft spec until we've shipped something Web-compatible enough. Then we can try changing selection.toString() to use the same options.

Juriy, do you know of any issues with the 2011 draft? https://rawgit.com/timdown/rangy/master/fiddlings/spec/innerText.htm
Comment 43 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-09-29 15:18:36 PDT
(In reply to Boris Zbarsky [:bz] from comment #41)
> I think in the short term we should do whatever is simplest here.  If just
> stringifying a selection is simpler for us than implementing Aryeh's
> somewhat stillborn spec, we should do that.  Compat for this stuff is pretty
> much nonexistent anyway.  :(

While that's true, I would like to have a spec, and I'm not confident that just pushing our current selection.toString() behavior onto the Web would be good for the Web or us.
Comment 44 User image Juriy "kangax" Zaytsev 2015-09-29 15:44:32 PDT
> Juriy, do you know of any issues with the 2011 draft? 

Unfortunately, I haven't had a chance to look into Aryeh's test thoroughly. Glancing at it now, the algorithm seems correct but it's probably missing few things. I see that Aryeh spec'd tabs between table-cell elements — this mimics webkit behavior but not IE. It also looks like inline-block elements don't create newlines — this has been the case in IE<=10 but not anymore (including webkit). There's no mention of what iframes resolve to (webkit ignores them, IE doesn't); or <canvas>, or <video>, or <select>. Well, <select> would resolve to its content (due to display:inline?) so that's IE's behavior but not WebKit. Finally, visibility:hidden mimics WebKit's but not IE behavior.

Yeah, this is a sad state of affairs.

Like I mentioned in a post, my understanding is that we need to:

1) Write a spec
2) Try to converge IE and WebKit/Blink behavior
3) Implement spec'd behavior in Firefox

The hardest part is — of course — choosing which behavior to go with. Considering drastic difference between IE and WebKit, I doubt there's any serious compat issues lurking here. Most of the problems are from not implementing _any_ version of innerText.

I'd love to see what FF's selection behavior is — is it closer to WK or IE's innerText implementations? Perhaps that's the starting point.
Comment 45 User image Juriy "kangax" Zaytsev 2015-09-29 15:45:14 PDT
*Aryeh's spec thoroughly
Comment 46 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-09-29 15:47:37 PDT
When in doubt we should probably just follow Chrome, which I expect they mostly got from Webkit.
Comment 47 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-09-29 16:19:18 PDT
Here are the results I get on Juriy's tests using selection.toString():

Newlines between paragraphs: 2
Newlines after table caption: 0
Newlines after block, non-paragraph elements (e.g. <div>): 1
Includes display:none elements: no
Includes visibility:hidden elements: no
Includes <select> content: no
Includes <button> content: no
Includes <canvas>, <video> fallback content: canvas no, video yes
Includes <iframe> fallback content: no
Includes <script>, <style> content: no
Creates leading space for anonymous inline boxes (including <li>): yes
Creates newline around inline, block-styled elements (e.g. <span>): no
Respects white-space:pre: yes
Creates (only) spaces between block, inline-styled elements (e.g. <div>): no
Puts tabs between <td> tags: yes
Preserves upper/lower-cased text: no
Comment 48 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-09-29 16:49:20 PDT
Looking at nsPlainTextSerializer, it's pretty complex. Some changes we'd want to make would be quite disruptive, e.g. right now many HTML elements (such as <span>) ignore IsElementBlock() in nsPlainTextSerializer::DoOpenContainer. I'd feel a lot more comfortable about having a separate implementation for innerText that we can evolve independently. I also think it would be good to deploy an implementation of innerText whose behavior is as simple as possible, so that the spec can be as simple as possible while still be being Web-compatible. Maybe we should try picking the simpler behavior everywhere IE and Chrome differ.
Comment 49 User image Karl Dubost :karlcow 2015-09-29 23:41:48 PDT
Roc,

First of all. Very cool. That's a very good news. Thanks for taking this. 

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) (offline Oct 1-4) from comment #43)
> While that's true, I would like to have a spec

That's partly the goal of the opened issue on the webcompat spec
https://github.com/whatwg/compat/issues/5
You are welcome to suggest prose or writing along the implementation and we can adjust step by step. 

I'm pretty sure Mike, Hallvord and I also can dig out old Web Compat issues and see how we fare with regards to them with the implementation you will be doing. That would be nice test cases. And specifically how it would work in the face of code like these ones which try to address the Web Compatibility issue. 


----
> function innerText (elem) {
>   return elem.innerText || elem.textContent;
> }

 module.exports = innerText;
In https://github.com/joshwnj/marki/blob/475b400cb4ad9a2c1d43fe46bebb0f518c397c05/src/inner-text.js

---- 

> if (el.textContent !== undefined)
>   el.textContent = string;
> else
>   el.innerText = string;
In https://github.com/HubSpot/youmightnotneedjquery/blob/ef987223c20e480fcbfb5924d96c11cd928e1226/comparisons/elements/set_text/ie8.js

----

> function text (el, value) {
>   if (arguments.length === 2) {
>     el.innerText = el.textContent = value;
>   }
>   if (typeof el.innerText === 'string') {
>     return el.innerText;
>   }
>   return el.textContent;
> }

In https://github.com/bevacqua/insignia/blob/53742c6929332b7c776d85dade8d7d82f524195b/text.js


----

> // innerText polyfill for Firefox
> ((document) => {
>   const dummy = document.createElement('div');
>     Object.defineProperty(HTMLElement.prototype, 'innerText', {
>       get: function()  { return this.textContent },
>       set: function(v) { this.textContent = v; }
>     });
>   }
> })(document);
In https://github.com/ng-kyoto/ng-kyoto.github.io/blob/9315fb7b24baec5a1614b4692bfedf9c0fc267dc/app/utils/innertext-polyfill.ts
Comment 50 User image Olli Pettay [:smaug] (review queue closed until backlog cleared) 2015-09-30 03:24:19 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) (offline Oct 1-4) from comment #48)
> Looking at nsPlainTextSerializer, it's pretty complex.
It is, which is one reason why we for innerHTML, which is quite well spec'ed, moved out from using ns*Serializer (at least in HTML case). Performance was another reason (bug 744830 comment 51).
FragmentOrElement::GetMarkup does the innerHTML serialization.

> I'd feel a lot more comfortable
> about having a separate implementation for innerText that we can evolve
> independently. I also think it would be good to deploy an implementation of
> innerText whose behavior is as simple as possible, so that the spec can be
> as simple as possible while still be being Web-compatible. Maybe we should
> try picking the simpler behavior everywhere IE and Chrome differ.
This all sounds good
Comment 51 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-04 21:57:40 PDT
(In reply to Karl Dubost :karlcow from comment #49)
> I'm pretty sure Mike, Hallvord and I also can dig out old Web Compat issues
> and see how we fare with regards to them with the implementation you will be
> doing. That would be nice test cases. And specifically how it would work in
> the face of code like these ones which try to address the Web Compatibility
> issue.

So far I have not been able to find any specific Web compat issues that require more than just making innerText an alias for textContent. I suspect they exist since IE's behavior has changed over time; I've asked Travis if he can find out. That's the information we really need.

(I looked at Webkit history for information about how and why their implementation has evolved, but changes to their plaintext serializer (TextIterator) do not typically mention why the change was made, and linked Webkit Bugzilla bugs are not helpful. Webkit's TextIterator is used for clipboard and other features so it's unclear which, if any, of their behavior is actually important for Web apps.)
Comment 52 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-04 21:59:22 PDT
Though actually, we need to not just come up with a spec that's the minimum required for Web compatibility, but something that IE and Chrome at least are willing to implement, so we might need something more complicated than the minimum to satisfy their perceived needs rather than just their actual needs...
Comment 53 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-08 22:56:48 PDT
I have created a proposed spec plus testsuite here: https://github.com/rocallahan/innerText-spec
Comment 54 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-18 20:48:04 PDT
Created attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats
Comment 55 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-18 20:48:07 PDT
Created attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Comment 56 User image Mats Palmgren (:mats) 2015-10-19 15:16:30 PDT
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

https://reviewboard.mozilla.org/r/22369/#review20043

::: layout/generic/nsIFrame.h:1907
(Diff revision 1)
>     * @param aSkippedMaxLength  Maximum number of characters to return

The GetRenderedText() documentation needs to be updated.

::: layout/generic/nsIFrame.h:1922
(Diff revision 1)
> +  virtual RenderedText GetRenderedText(uint32_t aFlags = 0,

It's a bit hard to understand what the zero means at the call site:
GetRenderedText(0, aContentOffset, aContentOffset + 1);
I think you should add a CONTENT_TEXT_OFFSETS to make it clear that the flag passed matches the offsets:
GetRenderedText(CONTENT_TEXT_OFFSETS, aContentOffset, aContentOffset + 1);

Do you think it's likely that we'll add additional flags here in the future?  If not, then I suggest you make it an enum class to avoid typos like:
GetRenderedText(aContentOffset, aContentOffset + 1);
Actually, I think we should do that now anyway.  Then if we need more flags in the future we add "operator|" etc.

Alternatively, could we bundle these three params into an object, DOMOffsets/RenderedOffsets subclassed of SomeOffsets, or something like that? e.g.
GetRenderedText(DOMOffsets(aContentOffset, aContentOffset + 1));
GetRenderedText(RenderedOffsets(aRenderedOffset, aRenderedOffset + 1));

::: layout/generic/nsTextFrame.h:264
(Diff revision 1)
> -  virtual nsresult GetRenderedText(nsAString* aString = nullptr,
> +  virtual RenderedText GetRenderedText(uint32_t aFlags = 0,

aFlags = CONTENT_TEXT_OFFSETS would make this clearer too

::: layout/generic/nsTextFrame.cpp:8888
(Diff revision 1)
> -  uint32_t validCharsLength = 0;
> +  bool setOffsets = false;

|setOffsets| is a confusing name. It's used like so:
if (!setOffsets) {
  // set the offsets
  setOffsets = true;
}
Which appears to be the opposite of what the name suggests.
Perhaps |haveOffsets| is better?
Comment 57 User image Mats Palmgren (:mats) 2015-10-19 15:54:50 PDT
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

https://reviewboard.mozilla.org/r/22371/#review20051

r=mats with those nits addressed as you see fit

::: dom/base/nsRange.cpp:3201
(Diff revision 1)
> +  int mRequiredLineBreakCount;

Why use a signed type for mRequiredLineBreakCount?
And why int instead of int32_t?

::: dom/base/nsRange.cpp:3206
(Diff revision 1)
> +    while (mRequiredLineBreakCount > 0) {

This loop seems silly.  Why not just do:

    if (mRequiredLineBreakCount > 0) {
      if (!mString.IsEmpty()) {
        mString.Append('\n');
      }
      mRequiredLineBreakCount = 0;
    }

::: dom/base/nsRange.cpp:3228
(Diff revision 1)
> +static bool IsVisibleAndNotInReplacedElement(nsIFrame* aFrame)

add a newline after |bool| please

::: dom/base/nsRange.cpp:3279
(Diff revision 1)
> +  if (aFrame->GetType() != nsGkAtoms::tableCellFrame) {

And nsGkAtoms::bcTableCellFrame ?

::: dom/base/nsRange.cpp:3325
(Diff revision 1)
> +  nsIContent* currentNode = static_cast<nsIContent*>(mStartParent.get());

Can you use AsContent() here?

::: dom/base/nsRange.cpp:3328
(Diff revision 1)
> +    nsGenericDOMDataNode* t = static_cast<nsGenericDOMDataNode*>(mStartParent.get());

'auto' makes this pattern more readable, IMO.

::: dom/base/nsRange.cpp:3358
(Diff revision 1)
> +          nsIFrame::RenderedText text = f->GetRenderedText(0, 0, UINT32_MAX);

Just GetRenderedText() seems clearer.

::: dom/base/nsRange.cpp:3371
(Diff revision 1)
> +          result.Append(NS_LITERAL_STRING("\n"));

Append('\n') is shorter (three places)

::: dom/base/nsRange.cpp:3373
(Diff revision 1)
> +        uint8_t display = f->StyleDisplay()->mDisplay;

No need to declare a variable for this, just use the expression directly in the 'switch'.
Comment 58 User image Mats Palmgren (:mats) 2015-10-19 16:00:03 PDT
FYI, nsDocumentEncoder has some special code for handling ShadowRoot
and IsHTMLElement(nsGkAtoms::rp) here:
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsDocumentEncoder.cpp#107
did you consider those?
Comment 59 User image Mats Palmgren (:mats) 2015-10-19 16:11:13 PDT
It would be good to have test that checks <table><tfoot>x</tfoot><tbody>y</tbody></table>
since those are reordered in rendering.  And some tests using shadow dom.
Comment 60 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-19 18:35:43 PDT
https://reviewboard.mozilla.org/r/22369/#review20043

> It's a bit hard to understand what the zero means at the call site:
> GetRenderedText(0, aContentOffset, aContentOffset + 1);
> I think you should add a CONTENT_TEXT_OFFSETS to make it clear that the flag passed matches the offsets:
> GetRenderedText(CONTENT_TEXT_OFFSETS, aContentOffset, aContentOffset + 1);
> 
> Do you think it's likely that we'll add additional flags here in the future?  If not, then I suggest you make it an enum class to avoid typos like:
> GetRenderedText(aContentOffset, aContentOffset + 1);
> Actually, I think we should do that now anyway.  Then if we need more flags in the future we add "operator|" etc.
> 
> Alternatively, could we bundle these three params into an object, DOMOffsets/RenderedOffsets subclassed of SomeOffsets, or something like that? e.g.
> GetRenderedText(DOMOffsets(aContentOffset, aContentOffset + 1));
> GetRenderedText(RenderedOffsets(aRenderedOffset, aRenderedOffset + 1));

I thought I would have to add more flags but in the end I didn't. So making this an enum class parameter, and moving it to the end, seems like the way to go.

> |setOffsets| is a confusing name. It's used like so:
> if (!setOffsets) {
>   // set the offsets
>   setOffsets = true;
> }
> Which appears to be the opposite of what the name suggests.
> Perhaps |haveOffsets| is better?

Renamed to haveOffsets.
Comment 61 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-19 19:05:38 PDT
https://reviewboard.mozilla.org/r/22371/#review20065

::: dom/base/nsRange.cpp:3201
(Diff revision 1)
> +  int mRequiredLineBreakCount;

It can only be 0, 1 or 2. I'll make it int8_t.

::: dom/base/nsRange.cpp:3206
(Diff revision 1)
> +    while (mRequiredLineBreakCount > 0) {

Because we need to add as many newlines as mRequiredLineBreakCount says. In particular sometimes (when a <p> is present) we need to add two newlines.

::: dom/base/nsRange.cpp:3279
(Diff revision 1)
> +  if (aFrame->GetType() != nsGkAtoms::tableCellFrame) {

Yes, good catch.

::: dom/base/nsRange.cpp:3325
(Diff revision 1)
> +  nsIContent* currentNode = static_cast<nsIContent*>(mStartParent.get());

Sure.

::: dom/base/nsRange.cpp:3328
(Diff revision 1)
> +    nsGenericDOMDataNode* t = static_cast<nsGenericDOMDataNode*>(mStartParent.get());

OK

::: dom/base/nsRange.cpp:3358
(Diff revision 1)
> +          nsIFrame::RenderedText text = f->GetRenderedText(0, 0, UINT32_MAX);

Sure.

::: dom/base/nsRange.cpp:3371
(Diff revision 1)
> +          result.Append(NS_LITERAL_STRING("\n"));

OK, though I have had to add an Append(char) overload for that.

::: dom/base/nsRange.cpp:3373
(Diff revision 1)
> +        uint8_t display = f->StyleDisplay()->mDisplay;

OK
Comment 62 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-19 19:37:56 PDT
(In reply to Mats Palmgren (:mats) from comment #58)
> FYI, nsDocumentEncoder has some special code for handling ShadowRoot

I don't understand what that code is for. It was added in bug 806506 without explanation. I think we should ignore that for now, but let's ask William. Unless I hear otherwise I'd like to completely ignore shadow DOM, like we're ignoring CSS anonymous content. I'll write some tests to check that it's ignored. Chrome excludes shadow DOM content from "innerText".

> and IsHTMLElement(nsGkAtoms::rp) here:
> http://mxr.mozilla.org/mozilla-central/source/dom/base/nsDocumentEncoder.
> cpp#107

Yes ... clearly <rp> contents should be included in innerText, though Chrome doesn't do it. I'll fix that in the spec and here.

> It would be good to have test that checks <table><tfoot>x</tfoot><tbody>y</tbody></table>
> since those are reordered in rendering.  And some tests using shadow dom.

Sure.
Comment 63 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-19 20:13:03 PDT
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats
Comment 64 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-19 20:13:06 PDT
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Comment 65 User image Mats Palmgren (:mats) 2015-10-20 07:11:39 PDT
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

https://reviewboard.mozilla.org/r/22369/#review20107

r=mats with the issue below fixed

::: layout/generic/nsIFrame.h:1907
(Diff revision 2)
>     * @param aSkippedMaxLength  Maximum number of characters to return

The updated documentation is still missing.
Comment 66 User image Mats Palmgren (:mats) 2015-10-20 07:22:53 PDT
https://reviewboard.mozilla.org/r/22371/#review20111

::: dom/base/nsRange.cpp:3267
(Diff revision 2)
> +static int

You're still using 'int' here.  I guess you want int8_t?

::: dom/base/nsRange.cpp:3386
(Diff revision 2)
> +          result.Append('\n');

Oh, right, 'result' is the accumulator object here.  I misread it as a string type the last time which is why I suggested to use Assign(char).  Adding an overload is fine, but don't feel like you have to if you want to revert that bit.
Comment 67 User image Mats Palmgren (:mats) 2015-10-20 07:23:30 PDT
(It's fine, but I still don't get why you want to use a signed type though.)
Comment 68 User image Mats Palmgren (:mats) 2015-10-20 07:29:33 PDT
We're not compatible with Chrome on this test (table.innerText):
<table>
<tfoot><tr><td>footer</tfoot>
<thead><tr><td style="visibility:collapse">thead</thead>
<tbody><tr><td>tbody</tbody>
</table>

You may want to fix that, and perhaps also update the spec?

Please also add tests with visibility:collapse on (each) row-group, row, and cell.
Comment 69 User image Mats Palmgren (:mats) 2015-10-20 07:56:25 PDT
We should probably also add tests for grid/flex items that use 'order':
<div style="display:grid"><x style="order:1">1</x><x>2</x></div>
<div style="display:flex"><x style="order:1">1</x><x>2</x></div>

"2" is rendered before "1", but the innerText should be "1\n2"
I suppose (since the <x>'s are blockified).

Probably also worth having tests with visibility:collapse on some items,
since it's supported at least for flexbox:
http://www.w3.org/TR/css-flexbox-1/#visibility-collapse
(and Grid might support it too in the future)
Comment 70 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-20 15:14:59 PDT
https://reviewboard.mozilla.org/r/22369/#review20197

::: layout/generic/nsIFrame.h:1907
(Diff revision 2)
>     * @param aSkippedMaxLength  Maximum number of characters to return

Fixed
Comment 71 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-20 15:16:25 PDT
https://reviewboard.mozilla.org/r/22371/#review20199

::: dom/base/nsRange.cpp:3267
(Diff revision 2)
> +static int

Yes.

I'm using a signed type because especially for limited-range types, it's safer. Bugs like for (...; x >= 0; ...) can't bite.
Comment 72 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-20 15:20:11 PDT
(In reply to Mats Palmgren (:mats) from comment #68)
> We're not compatible with Chrome on this test (table.innerText):
> <table>
> <tfoot><tr><td>footer</tfoot>
> <thead><tr><td style="visibility:collapse">thead</thead>
> <tbody><tr><td>tbody</tbody>
> </table>
> 
> You may want to fix that, and perhaps also update the spec?

I'd rather not change anything here. I don't think there's anything clearly wrong with what the spec and our implementation do, changing this to match Chrome would add complexity, and I'd be surprised if it causes Web compatibility problems in practice.

> Please also add tests with visibility:collapse on (each) row-group, row, and
> cell.

Done.

(In reply to Mats Palmgren (:mats) from comment #69)
> We should probably also add tests for grid/flex items that use 'order':
> <div style="display:grid"><x style="order:1">1</x><x>2</x></div>
> <div style="display:flex"><x style="order:1">1</x><x>2</x></div>
> 
> "2" is rendered before "1", but the innerText should be "1\n2"
> I suppose (since the <x>'s are blockified).

Done.

> Probably also worth having tests with visibility:collapse on some items,
> since it's supported at least for flexbox:
> http://www.w3.org/TR/css-flexbox-1/#visibility-collapse
> (and Grid might support it too in the future)

Done.
Comment 73 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-20 15:21:43 PDT
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats
Comment 74 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-20 15:21:46 PDT
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Comment 75 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-20 15:24:21 PDT
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3b4cc6272d38
Comment 76 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-20 16:14:59 PDT
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Comment 77 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-20 16:16:45 PDT
https://reviewboard.mozilla.org/r/22371/#review20203

I've updated <rp> handling a little bit to ensure that visiblity:hidden <rp>s don't contribute to the result of innerText.
Comment 78 User image William Chen [:wchen] 2015-10-20 17:04:07 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #62)
> (In reply to Mats Palmgren (:mats) from comment #58)
> > FYI, nsDocumentEncoder has some special code for handling ShadowRoot
> 
> I don't understand what that code is for. It was added in bug 806506 without
> explanation. I think we should ignore that for now, but let's ask William.
> Unless I hear otherwise I'd like to completely ignore shadow DOM, like we're
> ignoring CSS anonymous content. I'll write some tests to check that it's
> ignored. Chrome excludes shadow DOM content from "innerText".
If I recall correctly, the code is to make serializing from a selection in shadow DOM work. In some cases, selection within a shadow DOM is described by a range where the root container is the shadow root and there is code that checks if the root container is visible before serializing the contents. It looks like innerText is supposed to work similar to serializing a selection, and given that selection with shadow DOM is not well defined and behaves poorly in all implementations (spec only says user agents provide a best attempt), it's not clear what behavior we actually want for textContent so whatever is easiest is probably fine for now. It should be more clear once selection and shadow DOM is specified.
Comment 79 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-20 17:15:35 PDT
OK great, thanks.
Comment 80 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-21 18:08:18 PDT
These patches caused some accessibility test failures that I'm fixing up now.
Comment 81 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-24 02:36:37 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab657569f554a1b1dae9575fb95ca5a94949442a
Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

https://hg.mozilla.org/integration/mozilla-inbound/rev/a396f4262479a5695c918f59bacd542ba21448b0
Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Comment 82 User image Olli Pettay [:smaug] (review queue closed until backlog cleared) 2015-10-24 04:19:39 PDT
Hmm, this landed already? Is my review needed here? 
(Sorry, I've had some other stuff to review too)
Comment 83 User image Guilherme Lima 2015-10-26 12:56:27 PDT
Roc landed by mistake, it seems:
b70e89c03c56	Robert O'Callahan — Revert incorrectly committed changes ab657569f554 and a396f4262479
Comment 84 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-26 15:37:30 PDT
I landed it by mistake and backed it out immediately. So it still needs your review.
Comment 85 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-26 20:13:58 PDT
https://treeherder.mozilla.org/#/jobs?repo=try&revision=326d0927d8a0
Comment 86 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-27 04:27:50 PDT
https://treeherder.mozilla.org/#/jobs?repo=try&revision=bfdc26cd9ea4
Comment 87 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-27 22:50:38 PDT
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4e7a7c3d1a11
Comment 88 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 04:31:05 PDT
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

The test changes here are to adjust for the fact that
nsTextFrame::GetRenderedText can now trim whitespace from the end of lines
that end in a hard line break.
Comment 89 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 04:31:09 PDT
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Comment 90 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 04:45:27 PDT
https://reviewboard.mozilla.org/r/22369/#review20995

mats: I'm adding aPushedFrame = true; in nsLineLayout.cpp. This fixes a bug that was revealed by the accessibility tests, where a line was not getting SetLineWrapped(true) even though we had a soft line break, because nsBlockFrame::ReflowInlineFrame did not see a pushed frame. So innerText could strip spaces before soft line breaks when it shouldn't.

This also fixes a bug in GetRenderedText where whitespace trimming was not being taken into account when we compute the length of the rendered text for a frame, and so our rendered text offsets were off sometimes. Also revealed by the accessibility tests.

marcoz: The accessibility test changes are entirely due to the new code trimming trailing whitespace when it's before a hard line break and collapsible.
Comment 91 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 04:48:06 PDT
https://reviewboard.mozilla.org/r/22371/#review20997

The changes here are very minor, just a static-analysis fix and a couple of tests to catch the bug about lines not being marked as having a soft line break.
Comment 92 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 04:50:43 PDT
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

The test changes here are to adjust for the fact that
nsTextFrame::GetRenderedText can now trim whitespace from the end of lines
that end in a hard line break.
Comment 93 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 04:50:47 PDT
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Comment 94 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 05:01:22 PDT
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

The test changes here are to adjust for the fact that
nsTextFrame::GetRenderedText can now trim whitespace from the end of lines
that end in a hard line break.
Comment 95 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 05:01:25 PDT
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Comment 96 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 05:03:48 PDT
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

The test changes here are to adjust for the fact that
nsTextFrame::GetRenderedText can now trim whitespace from the end of lines
that end in a hard line break.
Comment 97 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 05:03:52 PDT
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Comment 98 User image Marco Zehe (:MarcoZ) 2015-10-28 07:41:15 PDT
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

https://reviewboard.mozilla.org/r/22369/#review21029

r=marcoz

::: accessible/generic/HyperTextAccessible.cpp:1989
(Diff revision 4)
> -  // Only get info up to original offset, we know that will be larger than skipped offset
> +  *aRenderedOffset = text.mOffsetWithinNodeRenderedText;

Can we keep the comment, please?

::: accessible/generic/HyperTextAccessible.cpp:2013
(Diff revision 4)
> -  // We only need info up to skipped offset -- that is what we're converting to original offset
> +  *aContentOffset = text.mOffsetWithinNodeText;

Same here.

r=me with the two nits fixed. Thanks!
Comment 99 User image Mats Palmgren (:mats) 2015-10-28 10:51:40 PDT
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

https://reviewboard.mozilla.org/r/22369/#review21059

r=mats

::: layout/generic/nsTextFrame.cpp:9009
(Diff revisions 3 - 4)
> +        NS_ASSERTION(result.mOffsetWithinNodeRenderedText >= offsetInRenderedString &&

Please use MOZ_ASSERT instead.
Comment 100 User image Mats Palmgren (:mats) 2015-10-28 10:51:59 PDT
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

https://reviewboard.mozilla.org/r/22371/#review21061
Comment 101 User image Olli Pettay [:smaug] (review queue closed until backlog cleared) 2015-10-28 15:13:17 PDT
+struct InnerTextAccumulator {
{ goes to its own line
+  nsString& mString;
+  int8_t mRequiredLineBreakCount;
Could you put these to the end of the struct.

void Append(char ch)
s/ch/aCh/

+static bool
+ElementIsVisible(nsIContent* aContent)
+{
+  Element* elem = aContent->AsElement();
+  if (!elem) {
+    return false;
+  }
You must check aContent->IsElement() before using AsElement(). AsElement just does cast.
(and in DOM world its name would be GetAsElement() if it might return null.)


+  RefPtr<nsStyleContext> sc = nsComputedDOMStyle::GetStyleContextForElement(
+      elem, nullptr, nullptr);
2 spaces for indentation.


How does ElementIsVisible behave if the element is inside display: none; ? And how should it work?


GetRequiredInnerTextLineBreakCount
So <p<>'s linebreak handling never depends on its styling.


+enum TreeTraversalState {
+  AT_NODE,
+  AFTER_NODE
+};
Could you please add some comment what the states are about.

+void
+nsGenericHTMLElement::GetInnerText(mozilla::dom::DOMString& aValue,
+                                   mozilla::ErrorResult& aError)
you can drop mozilla::dom and mozilla:: here

+{
+  if (!GetPrimaryFrame(Flush_Layout)) {
+    RefPtr<nsStyleContext> sc =
+        nsComputedDOMStyle::GetStyleContextForElementNoFlush(this, nullptr, nullptr);
2 space for indentation

+  RefPtr<nsRange> range;
+  nsresult rv = nsRange::CreateRange(static_cast<nsINode*>(this), 0, this,
+      GetChildCount(), getter_AddRefs(range));
2 spaces for indentation

Hmm, GetInnerText looks a bit malloc heavy.  Did you compare the performance to other engines?
One simple optimization would be to not create Range object all the time. Either reuse one (and let Document to keep it alive) or
don't use Range at all but just pass the right 'startParent' and 'endParent' to GetInnerTextNoFlush.
In fact, since you just pass 'this' as start and end, why do we need Range at all?
GetInnerTextNoFlush could be just a static method taking startParent and endOffset, or if we want to support using if for other cases too, 
make it take start/endParent and also start/endOffset. But no need for creating Range object all the time.

+  nsString str;
+  const char16_t* s = aValue.BeginReading();
+  const char16_t* end = aValue.EndReading();
nsAutoString for str.

+      if (!str.IsEmpty()) {
+        RefPtr<nsTextNode> textContent =
+            new nsTextNode(NodeInfo()->NodeInfoManager());
2 spaces for indentation

+      already_AddRefed<mozilla::dom::NodeInfo> ni =
+          NodeInfo()->NodeInfoManager()->GetNodeInfo(nsGkAtoms::br,
ditto

+      str.SetLength(0);
.Truncate()


So I'd like to see a bit less malloc heavy (especially given that Range is cycle collectable object) setup for the getter, and it should be rather easy to do.
Comment 102 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 15:30:59 PDT
(In reply to Marco Zehe (:MarcoZ) from comment #98)
> ::: accessible/generic/HyperTextAccessible.cpp:1989
> (Diff revision 4)
> > -  // Only get info up to original offset, we know that will be larger than skipped offset
> > +  *aRenderedOffset = text.mOffsetWithinNodeRenderedText;> Can we keep the comment, please?

I don't think we should keep it, because the fact "we know that will be larger than skipped offset" is no longer relevant. The new code doesn't depend on that assumption.

> ::: accessible/generic/HyperTextAccessible.cpp:2013
> (Diff revision 4)
> > -  // We only need info up to skipped offset -- that is what we're converting to original offset
> > +  *aContentOffset = text.mOffsetWithinNodeText;
> 
> Same here.

I don't think this comment really makes sense anymore either.
Comment 103 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 15:32:31 PDT
(In reply to Mats Palmgren (:mats) from comment #99)
> ::: layout/generic/nsTextFrame.cpp:9009
> (Diff revisions 3 - 4)
> > +        NS_ASSERTION(result.mOffsetWithinNodeRenderedText >= offsetInRenderedString &&
> 
> Please use MOZ_ASSERT instead.

I'm not a fan of this change, since I think an error in rendered text offsets is not worth interrupting a test run for, but I don't really care so I'll do it.
Comment 104 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 19:25:56 PDT
(In reply to Olli Pettay [:smaug] from comment #101)
> +struct InnerTextAccumulator {
> { goes to its own line

Done.

> +  nsString& mString;
> +  int8_t mRequiredLineBreakCount;
> Could you put these to the end of the struct.

Done.

> void Append(char ch)
> s/ch/aCh/

Done

> +static bool
> +ElementIsVisible(nsIContent* aContent)
> +{
> +  Element* elem = aContent->AsElement();
> +  if (!elem) {
> +    return false;
> +  }
> You must check aContent->IsElement() before using AsElement(). AsElement
> just does cast.
> (and in DOM world its name would be GetAsElement() if it might return null.)

I made ElementIsVisible take Element* and moved AsElement() to the caller. The caller only calls it if IsHTMLElement(nsGkAtoms::rp) is true, which I guess is enough to justify calling AsElement().

> +  RefPtr<nsStyleContext> sc = nsComputedDOMStyle::GetStyleContextForElement(
> +      elem, nullptr, nullptr);
> 2 spaces for indentation.

Done.

> How does ElementIsVisible behave if the element is inside display: none; ?
> And how should it work?

It returns true. It's only supposed to check the visiblity style. This is because <rp> is normally display:none but we want to include it in innerText --- but we *don't* want to include it if it's visibility:hidden.

> GetRequiredInnerTextLineBreakCount
> So <p<>'s linebreak handling never depends on its styling.

Correct. I'm guessing that making linebreak handling depend on CSS style (e.g. 'margin') would be unnecessarily complicated.

> +enum TreeTraversalState {
> +  AT_NODE,
> +  AFTER_NODE
> +};
> Could you please add some comment what the states are about.

Sure.

> +void
> +nsGenericHTMLElement::GetInnerText(mozilla::dom::DOMString& aValue,
> +                                   mozilla::ErrorResult& aError)
> you can drop mozilla::dom and mozilla:: here

Done.

> +{
> +  if (!GetPrimaryFrame(Flush_Layout)) {
> +    RefPtr<nsStyleContext> sc =
> +        nsComputedDOMStyle::GetStyleContextForElementNoFlush(this, nullptr,
> nullptr);
> 2 space for indentation

Done.

> +  RefPtr<nsRange> range;
> +  nsresult rv = nsRange::CreateRange(static_cast<nsINode*>(this), 0, this,
> +      GetChildCount(), getter_AddRefs(range));
> 2 spaces for indentation

Done.

> Hmm, GetInnerText looks a bit malloc heavy.  Did you compare the performance
> to other engines?

No. I don't know of any use-cases where performance would matter, though I suppose that won't stop people writing silly benchmarks.

> One simple optimization would be to not create Range object all the time.
> Either reuse one (and let Document to keep it alive) or
> don't use Range at all but just pass the right 'startParent' and 'endParent'
> to GetInnerTextNoFlush.
> In fact, since you just pass 'this' as start and end, why do we need Range
> at all?

Because I'd like to reuse this to implement Selection.toString().

> GetInnerTextNoFlush could be just a static method taking startParent and
> endOffset, or if we want to support using if for other cases too, 
> make it take start/endParent and also start/endOffset. But no need for
> creating Range object all the time.

OK, I'll do that.
Comment 105 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 19:48:12 PDT
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

The test changes here are to adjust for the fact that
nsTextFrame::GetRenderedText can now trim whitespace from the end of lines
that end in a hard line break.
Comment 106 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-28 19:48:15 PDT
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Comment 107 User image Marco Zehe (:MarcoZ) 2015-10-29 02:37:10 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #102)
> (In reply to Marco Zehe (:MarcoZ) from comment #98)
> > ::: accessible/generic/HyperTextAccessible.cpp:1989
> > (Diff revision 4)
> > > -  // Only get info up to original offset, we know that will be larger than skipped offset
> > > +  *aRenderedOffset = text.mOffsetWithinNodeRenderedText;> Can we keep the comment, please?
> 
> I don't think we should keep it, because the fact "we know that will be
> larger than skipped offset" is no longer relevant. The new code doesn't
> depend on that assumption.

OK, thanks for the clarification! Just wanted to make sure we don't lose any important information. This goes for the second one, too.
Comment 108 User image Olli Pettay [:smaug] (review queue closed until backlog cleared) 2015-10-29 09:53:15 PDT
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Note 
  struct RenderedText {
    nsString mString;
    uint32_t mOffsetWithinNodeRenderedText;
    int32_t mOffsetWithinNodeText;
    RenderedText() : mOffsetWithinNodeRenderedText(0),
        mOffsetWithinNodeText(0) {}
  };
that mString shows up pretty badly in a silly testcase like 
http://mozilla.pettay.fi/moztests/innertext_nonsimple_nostyle_test.html
That test has two parts, and it is the latter part where mString handling shows up.
The first 500 runs is reflow bounded (can we do anything to that?), and the latter part is malloc heavy.
Using nsAutoString and not nsString reduces allocations quite a bit. Still behind blink's performance, but at least a bit closer.
Comment 109 User image Olli Pettay [:smaug] (review queue closed until backlog cleared) 2015-10-29 11:09:09 PDT
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

+IsVisibleAndNotInReplacedElement(nsIFrame* aFrame)
+{
+  if (!aFrame || !aFrame->StyleVisibility()->IsVisible()) {
+    return false;
+  }
+  for (nsIFrame* f = aFrame->GetParent(); f; f = f->GetParent()) {
+    if (f->GetContent() && f->GetContent()->IsHTMLElement(nsGkAtoms::button)) {
+      continue;
+    }
+    if (f->IsFrameOfType(nsIFrame::eReplaced)) {
+      return false;
+    }
+  }
+  return true;
+}
This shows up in performance profiles quite a bit (less than the nsString though). Anything we can do to it?

+nsRange::GetInnerTextNoFlush(DOMString& aValue, ErrorResult& aError,
+                             nsINode* aStartParent, uint32_t aStartOffset,
+                             nsINode* aEndParent, uint32_t aEndOffset)
I think aStartParent and aEndParent should be nsIContent objects since that is what the algorithm assumes.
(aEndParent->AsContent() isn't safe if aEndParent isn't nsIContent)


+void
+nsGenericHTMLElement::GetInnerText(mozilla::dom::DOMString& aValue,
+                                   mozilla::ErrorResult& aError)
+{
+  if (!GetPrimaryFrame(Flush_Layout)) {
+    RefPtr<nsStyleContext> sc =
+      nsComputedDOMStyle::GetStyleContextForElementNoFlush(this, nullptr, nullptr);
+    if (!sc || sc->StyleDisplay()->mDisplay == NS_STYLE_DISPLAY_NONE ||
+        !GetCurrentDoc()) {
GetCurrentDoc() is deprecated, and I think we want this to deal with shadow dom too, so IsInComposedDoc() should work fine.



+  // Batch possible DOMSubtreeModified events.
+  mozAutoSubtreeModified subtree(nullptr, nullptr);
+
+  // Scope firing mutation events so that we don't carry any state that
+  // might be stale
+  {
+    // We're relying on mozAutoSubtreeModified to keep a strong reference if
+    // needed.
+    nsIDocument* doc = OwnerDoc();
+
+    // Optimize the common case of there being no observers
+    if (nsContentUtils::HasMutationListeners(doc, NS_EVENT_BITS_MUTATION_NODEREMOVED)) {
+      subtree.UpdateTarget(doc, nullptr);
+      // Keep "this" alive during mutation listener firing
+      kungFuDeathGrip = this;
+      nsCOMPtr<nsINode> child;
+      for (child = nsINode::GetFirstChild();
+           child && child->GetParentNode() == this;
+           child = child->GetNextSibling()) {
+        nsContentUtils::MaybeFireNodeRemoved(child, this, doc);
+      }
+    }
+  }

// Batch possible DOMSubtreeModified events.
mozAutoSubtreeModified subtree(doc, nullptr);
FireNodeRemovedForChildren();

should work here.
Comment 110 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-29 20:53:25 PDT
(In reply to Olli Pettay [:smaug] from comment #108)
> Note 
>   struct RenderedText {
>     nsString mString;
>     uint32_t mOffsetWithinNodeRenderedText;
>     int32_t mOffsetWithinNodeText;
>     RenderedText() : mOffsetWithinNodeRenderedText(0),
>         mOffsetWithinNodeText(0) {}
>   };
> that mString shows up pretty badly in a silly testcase like 
> http://mozilla.pettay.fi/moztests/innertext_nonsimple_nostyle_test.html
> That test has two parts, and it is the latter part where mString handling
> shows up.
> The first 500 runs is reflow bounded (can we do anything to that?)

Not the way it's currently specced, no.

> and the
> latter part is malloc heavy.
> Using nsAutoString and not nsString reduces allocations quite a bit. Still
> behind blink's performance, but at least a bit closer.

OK. The question is, is this something worth optimizing?
Comment 111 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-29 21:56:41 PDT
According to my profile 2/3 of the time is actually in nsTextFrame::GetRenderedText for your benchmark. 1/7 is in IsVisibleAndNotInReplacedElement. The string memory overhead (Linux) seems to be quite low, maybe 1/15.
Comment 112 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-29 23:27:16 PDT
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

The test changes here are to adjust for the fact that
nsTextFrame::GetRenderedText can now trim whitespace from the end of lines
that end in a hard line break.
Comment 113 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-29 23:27:19 PDT
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Comment 114 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-29 23:27:22 PDT
Created attachment 8681097 [details]
MozReview Request: Bug 264412. Optimize GetRenderedText. r=mats

Bug 264412. Optimize GetRenderedText. r=mats

With these changes we're slightly faster than Chrome on the non-reflowing part of
Olli's testcase.
Comment 115 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-29 23:29:13 PDT
I(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #111)
> According to my profile 2/3 of the time is actually in
> nsTextFrame::GetRenderedText for your benchmark. 1/7 is in
> IsVisibleAndNotInReplacedElement. The string memory overhead (Linux) seems
> to be quite low, maybe 1/15.

I misread the profile. A lot of the time in GetRenderedText was string processing; Append(char) seems to have significant fixed overhead. The third patch here avoids that. I also made changes to IsVisibleAndNotInReplacedElement to make it somewhat faster, and made a tweak so we call it roughly half as often.
Comment 116 User image Mats Palmgren (:mats) 2015-10-30 04:49:14 PDT
Comment on attachment 8681097 [details]
MozReview Request: Bug 264412. Optimize GetRenderedText. r=mats

https://reviewboard.mozilla.org/r/23769/#review21253

::: layout/generic/nsTextFrame.cpp:8906
(Diff revision 1)
> +    if (ch == '\n') {

Perhaps this is more readable?

    if ((ch == '\n' && !aStyle->NewlineIsSignificant(aFrame)) ||
        (ch == '\t' && !aStyle->TabIsSignificant())) {
      ch = ' ';
    }

::: layout/generic/nsTextFrame.cpp:8918
(Diff revision 1)
> +  if (aStyle->mTextTransform != NS_STYLE_TEXT_TRANSFORM_NONE) {

Why exclude TRANSFORM_NONE up front?
The 'switch' is a NOP for that value already so it doesn't seem necessary.
Is it just an optimization for the common case?
If so, you should probably point that out for clarity, and perhaps also
make it an early return instead?

::: layout/generic/nsTextFrame.cpp:9058
(Diff revision 1)
> +          trimmedOffsets.GetEnd() - iter.GetOriginalOffset());

iter.GetOriginalOffset() hides a subtraction and this expression is
repeated in a couple of places (also in a loop).
I think you should make this a local var, say originalOffset.
That should make the std::min fit on one line?
Also, you can avoid the addition inside the loop, like so:

auto end = originalOffset + runLength;
for (int32_t i = originalOffset; i < end; ++i) {
  ... CharAt(i)

It might be worth getting rid of iter.GetOriginalOffset()
in the while-loop condition too unless it complicates
the code too much.
Comment 117 User image Olli Pettay [:smaug] (review queue closed until backlog cleared) 2015-10-30 05:13:15 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #110)
> (In reply to Olli Pettay [:smaug] from comment #108)
> > and the
> > latter part is malloc heavy.
> > Using nsAutoString and not nsString reduces allocations quite a bit. Still
> > behind blink's performance, but at least a bit closer.
> 
> OK. The question is, is this something worth optimizing?
Well, we've spent quite some time to optimize innerHTML for example since many or probably most of
DOM related benchmarks have some silly micro benchmarks for it. 
Now with innerText supported I wouldn't be surprised benchmarks adding tests for it.
The reason I tested the performance here was to make sure we aren't way of the performance of others.
Comment 118 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-30 05:21:49 PDT
(In reply to Mats Palmgren (:mats) from comment #116)
> ::: layout/generic/nsTextFrame.cpp:8906
> (Diff revision 1)
> > +    if (ch == '\n') {
> 
> Perhaps this is more readable?
> 
>     if ((ch == '\n' && !aStyle->NewlineIsSignificant(aFrame)) ||
>         (ch == '\t' && !aStyle->TabIsSignificant())) {
>       ch = ' ';
>     }

Sure

> ::: layout/generic/nsTextFrame.cpp:8918
> (Diff revision 1)
> > +  if (aStyle->mTextTransform != NS_STYLE_TEXT_TRANSFORM_NONE) {
> 
> Why exclude TRANSFORM_NONE up front?
> The 'switch' is a NOP for that value already so it doesn't seem necessary.
> Is it just an optimization for the common case?
> If so, you should probably point that out for clarity, and perhaps also
> make it an early return instead?

Sorry, that was left over from previous work. Removed.

> ::: layout/generic/nsTextFrame.cpp:9058
> (Diff revision 1)
> > +          trimmedOffsets.GetEnd() - iter.GetOriginalOffset());
> 
> iter.GetOriginalOffset() hides a subtraction and this expression is
> repeated in a couple of places (also in a loop).
> I think you should make this a local var, say originalOffset.
> That should make the std::min fit on one line?
> Also, you can avoid the addition inside the loop, like so:
> 
> auto end = originalOffset + runLength;
> for (int32_t i = originalOffset; i < end; ++i) {
>   ... CharAt(i)
> 
> It might be worth getting rid of iter.GetOriginalOffset()
> in the while-loop condition too unless it complicates
> the code too much.

I'd rather not do these things because they complicate the code and I very much doubt there's any measurable performance impact.
Comment 119 User image Mats Palmgren (:mats) 2015-10-30 05:47:24 PDT
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #118)
> I'd rather not do these things because they complicate the code and I very
> much doubt there's any measurable performance impact.

Well, I didn't quite like the formatting here:
      runLength = std::min(runLength,
          trimmedOffsets.GetEnd() - iter.GetOriginalOffset());

so I figured a local var would fix that:
      auto originalOffset = iter.GetOriginalOffset();
      runLength = std::min(runLength, trimmedOffsets.GetEnd() - originalOffset);

and when you have that you might as well use it in the for-loop.

Oh well, it's a matter of opinion I guess. :-)

So r=mats with the TRANSFORM_NONE check removed.
Comment 120 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-30 06:08:42 PDT
Comment on attachment 8681097 [details]
MozReview Request: Bug 264412. Optimize GetRenderedText. r=mats

Bug 264412. Optimize GetRenderedText. r=mats

With these changes we're slightly faster than Chrome on the non-reflowing part of
Olli's testcase.
Comment 121 User image Mats Palmgren (:mats) 2015-10-30 06:45:46 PDT
Comment on attachment 8681097 [details]
MozReview Request: Bug 264412. Optimize GetRenderedText. r=mats

https://reviewboard.mozilla.org/r/23769/#review21259

r=mats if my question below isn't an issue

::: dom/base/nsRange.cpp:3403
(Diff revision 2)
> -    } else {
> +    if (currentNode == endNode && currentState == endState) {
> +      break;

Hmm, doesn't this skip the AFTER_NODE action for the endNode?

::: layout/generic/nsTextFrame.cpp:9050
(Diff revision 2)
> +      runLength = std::min(runLength,
> +          trimmedOffsets.GetEnd() - iter.GetOriginalOffset());

I'd prefer if the second arg to std::min lined up with the first.
Comment 122 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-30 13:33:23 PDT
(In reply to Mats Palmgren (:mats) from comment #121)
> ::: dom/base/nsRange.cpp:3403
> (Diff revision 2)
> > -    } else {
> > +    if (currentNode == endNode && currentState == endState) {
> > +      break;
> 
> Hmm, doesn't this skip the AFTER_NODE action for the endNode?

That's the same behavior as before. We stop as soon as we reach the state endNode/endState. And that's OK. For example, when HTMLElement calls GetInnerTextNoFlush, we pass the subject element and its GetChildCount() for aEndParent/aEndOffset. So we stop when currentNode is aEndParent and currentState is AFTER_NODE, which is before we've done the AFTER_NODE actions for aEndParent, which is correct per spec.
Comment 123 User image Mats Palmgren (:mats) 2015-10-30 14:50:31 PDT
OK, good.
Comment 124 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-10-30 18:18:38 PDT
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f919b665924c
Comment 125 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-11-01 19:26:56 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/95346f49d048f5abb3d60df4beec4fe4ef412017
Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

https://hg.mozilla.org/integration/mozilla-inbound/rev/5ebc59281c25fbb8ea288f24797b9ece1fdb21a5
Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

https://hg.mozilla.org/integration/mozilla-inbound/rev/9160f08660b8290559e427fd80d617edd86fe2a6
Bug 264412. Optimize GetRenderedText. r=mats
Comment 127 User image Binyamin 2015-11-02 05:16:32 PST
Please update the compatibility table in https://developer.mozilla.org/en-US/docs/Web/API/Node/innerText - supported since Firefox 45.
Comment 128 User image Juriy "kangax" Zaytsev 2015-11-03 00:55:15 PST
I think there's a bug.

<p style="visibility:hidden"><span style="visibility:visible">p</span></p><div>div</div>

This chunk produces "p\ndiv" in FF but "p\n\ndiv" in Chrome. 

Chrome behavior seems to make more sense to me. Also, removing visibility:hidden/visibility:visible from <p> produces more sensible double \n in FF right away.
Comment 129 User image Juriy "kangax" Zaytsev 2015-11-03 01:08:35 PST
Besides the above mentioned combination (which I assume to be a bug), I have so far found 2 differences in Chrome and FF: Chrome doesn't respect "whitespace: pre-line" (actually none of the browsers do besides FF) which I think is a bug in Chrome (and IE/Edge). Chrome puts 1 newline between paragraph and non-paragraph (like div) whereas FF puts 2, which I think is correct, and which IE10+/Edge do too.

The updated compat table — http://kangax.github.io/jstests/innerText/

I'm planning to add few more tests (from the ones Robert added with this commit) to find out if there's more deviations among implementations. And will try to file tickets with Chrome as well.
Comment 130 User image Juriy "kangax" Zaytsev 2015-11-04 02:50:29 PST
Quick note about performance. 

On my test page, I get 25-30ms vs. 35-40ms. in Chrome vs. Firefox (jsperf is unusable so my benchmark is crude) — so a pretty significant/noticeable difference. This might be because Chrome cuts corners.

The reason this is important is because some apps need to request innerText quite frequently on large documents when editing text and having to calculate position of cursor in formatted text.

P.S. Edge seems to perform really bad (~100ms) although I'm checking it through VirtualBox so perhaps that's what's affecting it.
Comment 132 User image Carsten Book [:Tomcat] 2015-11-05 08:51:05 PST
removing the b2g 2.5 flag since this commit has been reverted due to an incorrect merge, sorry for the confusion
Comment 133 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-11-23 16:18:17 PST
(In reply to Juriy "kangax" Zaytsev from comment #128)
> <p style="visibility:hidden"><span
> style="visibility:visible">p</span></p><div>div</div>
> 
> This chunk produces "p\ndiv" in FF but "p\n\ndiv" in Chrome. 
> 
> Chrome behavior seems to make more sense to me. Also, removing
> visibility:hidden/visibility:visible from <p> produces more sensible double
> \n in FF right away.

The FF/spec behavior here is intentional.

Handling CSS 'visibility' in innerText is tricky because in CSS 'visibility' does not affect layout, but that doesn't make sense in the context of 'innerText'. (The most faithful mimicry would be to replace text with spaces, perhaps, but that seems crazy and no browser does it.) So I've chosen to take a simple approach in FF and the spec where visibility:visible elements are just ignored for innerText. Chrome has a halfway approach where visibility:hidden suppresses some related text but not a <p>'s extra newlines *if the <p> has visible descendants* (or something like that).

We could make the spec a little more complicated to match Chrome here, but I'm reluctant to do that without a strong reason. Chrome's <p> handling has some definitely undesirable features, e.g. abc<p>123</p>def gives "abc\n123\n\ndef", i.e. you get a "bottom margin" but not a "top margin", which makes no sense, so even if Chrome's <p>-related behavior had a simple spec (which I doubt), we wouldn't really want to follow it. I very much doubt this case will matter in practice.
Comment 134 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-11-23 16:19:21 PST
(In reply to Juriy "kangax" Zaytsev from comment #129)
> Besides the above mentioned combination (which I assume to be a bug), I have
> so far found 2 differences in Chrome and FF: Chrome doesn't respect
> "whitespace: pre-line" (actually none of the browsers do besides FF) which I
> think is a bug in Chrome (and IE/Edge). Chrome puts 1 newline between
> paragraph and non-paragraph (like div) whereas FF puts 2, which I think is
> correct, and which IE10+/Edge do too.
> 
> The updated compat table — http://kangax.github.io/jstests/innerText/
> 
> I'm planning to add few more tests (from the ones Robert added with this
> commit) to find out if there's more deviations among implementations. And
> will try to file tickets with Chrome as well.

Thanks!!!
Comment 135 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-11-23 16:22:57 PST
(In reply to Juriy "kangax" Zaytsev from comment #130)
> On my test page, I get 25-30ms vs. 35-40ms. in Chrome vs. Firefox (jsperf is
> unusable so my benchmark is crude) — so a pretty significant/noticeable
> difference. This might be because Chrome cuts corners.
>
> The reason this is important is because some apps need to request innerText
> quite frequently on large documents when editing text and having to
> calculate position of cursor in formatted text.

OK ... it would be great if you can file a bug with a specific testcase on this.
Comment 136 User image Tantek Çelik 2015-12-09 11:24:34 PST
From a spec standpoint, just to confirm, this from 2011-02-03 is the latest there is right?

https://rawgit.com/timdown/rangy/master/fiddlings/spec/innerText.htm
Comment 137 User image Karl Dubost :karlcow 2015-12-09 11:55:30 PST
Tantek,
here by Roc
https://github.com/rocallahan/innerText-spec
Comment 139 User image devan.shah14 2016-02-04 13:02:00 PST
When we try to fetch innerText when it is not set at all, it used return undefined and now it is returning "" (empty string), can this be changed so that in the case that innerText is not set then it comes as undefined and not as an empty string. Technically if you have not set any thing it should be undefined, or was there a reason to actually have the default be "" empty string.
Comment 140 User image Juriy "kangax" Zaytsev 2016-02-04 13:06:22 PST
innerText is not supposed to return something that was set before. It returns element's textual representation, which always exists and should always be a string.

Note You need to log in before you can comment on or make changes to this bug.