Closed Bug 264412 (innertext) Opened 20 years ago Closed 9 years ago

Add support for element.innerText

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
p11 + ---
firefox45 --- fixed

People

(Reporter: Paul_Adams, Assigned: roc)

References

()

Details

(Keywords: dev-doc-needed, feature, Whiteboard: [compat])

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20040913 Firefox/0.10
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20040913 Firefox/0.10

It seems that for some reason, the innerText property is not updatable with the
script in the above URL.

Reproducible: Always
Steps to Reproduce:
1. Go to the URL listed in this bug report
2.
3.

Actual Results:  
The innerText properties of the p, td, div and span elements are undefined when
the javascript starts, and they never get updated by the script.
The input element does actually get updated if the script is downloaded and
modified to take out the null checking.

Expected Results:  
The innerText property of the p, div, span, and td elements should have been
updated via the javascript.

If the innerText property in the 'clock.js' script is replaced by innerHTML,
then everything works OK
Gecko (and thus the Mozilla suite, Firefox, Netscape, Camino, Galeon, ...) never
has supported innerText -
http://www.mozilla.org/docs/web-developer/upgrade_2.html#dom_unsupp

If you actually meant to file an enhancement bug (on Browser - DOM Level 0) for
innerText support, well, in an era when we support document.all as long as you
don't ask first, who knows?
Status: UNCONFIRMED → RESOLVED
Closed: 20 years ago
Resolution: --- → INVALID
This does beg the question why do you continue to support innerHTML?!!
*** Bug 282317 has been marked as a duplicate of this bug. ***
*** Bug 310661 has been marked as a duplicate of this bug. ***
*** Bug 338801 has been marked as a duplicate of this bug. ***
*** Bug 287395 has been marked as a duplicate of this bug. ***
*** Bug 317330 has been marked as a duplicate of this bug. ***
*** Bug 339017 has been marked as a duplicate of this bug. ***
*** Bug 254174 has been marked as a duplicate of this bug. ***
Alias: innertext
Status: RESOLVED → VERIFIED
Assignee: bugzilla → nobody
Component: General → DOM
Product: Firefox → Core
QA Contact: general → general
Is Gecko interested in implementing this if a detailed spec is written?  It's complicated, but considerably more useful than textContent for getting a plaintext version of an HTML element, and every other browser has it.  WebKit seems not to be interested in dropping it, and it would be nice if we could get interop here.
Status: VERIFIED → REOPENED
Ever confirmed: true
Resolution: INVALID → ---
Component: DOM → DOM: Core & HTML
OS: Windows XP → All
Hardware: x86 → All
Version: unspecified → Trunk
Why innerText is more useful than textContent?
Based on http://msdn.microsoft.com/en-us/library/ms533899%28v=vs.85%29.aspx
it is quite bizarre API.
innerText returns the element's HTML contents converted to plaintext.  Conceptually, it works the same way as Selection stringification -- in fact, WebKit uses the same algorithm for both, and I'm going to specify both as the same algorithm.  textContent just concatenates the text node descendants in tree order, so it preserves indentation and so forth.  innerText collapses runs of whitespace to a single space, inserts newlines after block elements, etc.

I don't have specific use-cases offhand for why anyone would want to use innerText instead of textContent.  The large majority of authors seem to only set it, in which case aliasing it to textContent would work, and in fact Opera treats it similarly to textContent (although they apparently have some compat bugs as a result).  But given that everyone needs to implement the plaintext conversion algorithm for Selection stringification anyway, I don't see any reason not to provide innerText too.  The additional implementation burden should be low, and it makes things easier for authors if all browsers implement it instead of all but Firefox.
I got a list of webpages using innerText and went through a whole bunch.  Most of them use it either for setting (where it behaves the same as textContent), or for getting for elements where it behaves basically the same as textContent anyway (e.g., only text node children).  The most interesting use I found was to transform a comment to plaintext for a quote, in two places:

1) http://www.cristianofino.net/post/Comment-Toolbar-plugin-per-WordPress.aspx

This uses innerText for a quotation feature -- it's part of an actively-maintained Wordpress plugin, <http://wordpress.org/extend/plugins/comment-toolbar/>.  The relevant JS is at <http://www.cristianofino.net/User%20Controls/CommentToolbar.js>.  To get the quote when you click the "Quota" button on a comment, it basically stringifies the Selection if there is one, otherwise takes the innerText of the post you clicked on, otherwise takes the textContent.  It so happens that this works okay, because there's no block-level markup in the comments and <br>'s and newlines are paired up exactly.  But given more general markup, Firefox (and Opera and IE9) would look much worse here, with lots of extra whitespace stuck in.

2) http://blog.livedoor.jp/geinin_matome/archives/1080186.html

Similar to (1) -- hover over the links like ">>66" to get a tooltip with a plaintext quote in WebKit or IE.  Again, no interesting markup that would make a big difference between innerText and textContent, but there could have been.
I filed a bug asking for this to be specced in HTML5 using a relatively trivial algorithm (textContent but with <br> converted to "\n"):

http://www.w3.org/Bugs/Public/show_bug.cgi?id=13145

I did this because I was reminded of the research and discussion I did about this last February:

http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-February/030179.html

Notable from that link is that I found some pages that do break on Firefox because of missing innerText, e.g.:

http://beihai.gov.cn/5863/2010_4_28/5863_98289_1272423962000.html?open=1

At the time of this writing, a large chunk of content is missing in Firefox 6.0a2 which is present in Chrome 14 dev and Opera 11.50:

"4月26日上午,北海船检局召开中心组成员《新跨越》专题学习会。     会议根据广西区港航管理局《关于在全区港航系统开展推动广西水运事业新跨越专题学习讨论活动的通知》(港航政治[2010]59号文精神,围绕推动广西水运事业新跨越主题,认真组织学习了广西区人民政府..."

That's because of the following script:

<script>var strCol=document.getElementById("contentID").innerText;if(strCol.length>150){document.write(strCol.substring(0,150)+"...");}else{document.write(strCol);}</script>

The script does nothing in Gecko because the conditional checks the length of undefined, which is a fatal error.  It works in all other browsers.
Keywords: dev-doc-needed
As I see innerText is in spec TODO list (http://wiki.whatwg.org/wiki/Specs_todo).
Looks like nobody is going to write it. So, we neve get innerText in Firefox ? :(
This really needs to be fixed. Here is the test case just in case http://jsfiddle.net/azpnmxuc/
Otherwise people will keep creating duplicates, like they did before.
WebCompat issue created because of lack of support of innerText 
https://github.com/webcompat/web-bugs/issues/633
Other previously reported Webcompat issues (some are now invalid or resolved):
* Bug 942989
* Bug 281825
* Bug 187698
* Bug 131793
* Bug 358349 
* Bug 25918
* Bug 148812
* Bug 169598
* Bug 220538 
* Bug 471755
Blocks: 1130981
Adding for context, an interesting blog post with textContent and innerText, performances computations, etc.
http://perfectionkills.com/the-poor-misunderstood-innerText/
innerText should be faster than innerHtml, because the former doesn't need any parsing. Seems like browsers should be motivated to support it well as a good practice.
Would Mozilla be interested implementing innerText after all? As Karl mentioned, I've outlined use-cases in a recent blog post. I believe innerText is more than just a pesky webcompat offender. I'll try to work on spec in the meantime.
Another Web Compatibility issue https://webcompat.com/issues/1051
Apart from the web compatibility gains, Juriy has written a great summary of innerText and a possible way forward in terms of a (pseudo) spec. If Chrome and IE are willing to converge, great. If not, we should pick one of the flavors and implement it.

http://perfectionkills.com/the-poor-misunderstood-innerText/#naive-spec
http://kangax.github.io/jstests/innerText/
Summary: innerText property on various elements not updatable with javascript → Add support for element.innerText
Blocks: 1170774
tracking-p11: --- → +
Whiteboard: [compat]
Blocks: 914252
Blocks: 942989
Breaks Google Calendar (and probably GMail) navigation rendering: https://bugzilla.mozilla.org/show_bug.cgi?id=914252#c16
See Also: → 1197166
Breaks Facebook status update field resizing - bug 1197166
Status: REOPENED → NEW
For what is worth, please note that the original URL ( http://www.sharepointcustomization.com/resources/codesamples/TimeZoneClocks/clock_example.htm ) is gone.
We need to implement something here, because regardless of whether innerText is a desirable feature, Web developers are using it and sites are broken in Firefox without it.

I propose we implement the last version of Aryeh's proposed spec, here: https://rawgit.com/timdown/rangy/master/fiddlings/spec/innerText.htm

Olli, does that sound OK to you?
Assignee: nobody → roc
Flags: needinfo?(bugs)
Given that this is so bizarre feature, I'd like to hear feedback from anne and bz too.

But yes, I agree, we need to implement something here.
Flags: needinfo?(bzbarsky)
Flags: needinfo?(annevk)
What makes the most sense to me is that getting behaves equivalently to stringifying a selection. So once selection is defined, this feature would be too. That matches what WebKit is doing if memory serves me.
Flags: needinfo?(annevk)
Yeah, that has just the issue that selection isn't properly specified.

How differently would we behave if we just used our selection stringification?
Flags: needinfo?(bugs)
Good question. I have no time now but if someone could run it through my test — http://kangax.github.io/jstests/innerText/ (which also shows how other implementations differ among each other; and which I'm planning to update with couple more options soon)
I think in the short term we should do whatever is simplest here.  If just stringifying a selection is simpler for us than implementing Aryeh's somewhat stillborn spec, we should do that.  Compat for this stuff is pretty much nonexistent anyway.  :(
Flags: needinfo?(bzbarsky)
My priorities are in this order:
0) Don't break stuff that currently works.
1) Implement a reasonably Web-compatible version of innerText in Gecko.
2) Have a spec for innerText that browsers converge on.
3) Make innerText consistent with selection.toString().

I think perhaps the best way to proceed is to extend nsPlainTextSerializer with new options until it supports Aryeh's draft spec, then implement innerText using nsPlainTextSerializer, and iterate on the implementation and draft spec until we've shipped something Web-compatible enough. Then we can try changing selection.toString() to use the same options.

Juriy, do you know of any issues with the 2011 draft? https://rawgit.com/timdown/rangy/master/fiddlings/spec/innerText.htm
Flags: needinfo?(kangax)
(In reply to Boris Zbarsky [:bz] from comment #41)
> I think in the short term we should do whatever is simplest here.  If just
> stringifying a selection is simpler for us than implementing Aryeh's
> somewhat stillborn spec, we should do that.  Compat for this stuff is pretty
> much nonexistent anyway.  :(

While that's true, I would like to have a spec, and I'm not confident that just pushing our current selection.toString() behavior onto the Web would be good for the Web or us.
> Juriy, do you know of any issues with the 2011 draft? 

Unfortunately, I haven't had a chance to look into Aryeh's test thoroughly. Glancing at it now, the algorithm seems correct but it's probably missing few things. I see that Aryeh spec'd tabs between table-cell elements — this mimics webkit behavior but not IE. It also looks like inline-block elements don't create newlines — this has been the case in IE<=10 but not anymore (including webkit). There's no mention of what iframes resolve to (webkit ignores them, IE doesn't); or <canvas>, or <video>, or <select>. Well, <select> would resolve to its content (due to display:inline?) so that's IE's behavior but not WebKit. Finally, visibility:hidden mimics WebKit's but not IE behavior.

Yeah, this is a sad state of affairs.

Like I mentioned in a post, my understanding is that we need to:

1) Write a spec
2) Try to converge IE and WebKit/Blink behavior
3) Implement spec'd behavior in Firefox

The hardest part is — of course — choosing which behavior to go with. Considering drastic difference between IE and WebKit, I doubt there's any serious compat issues lurking here. Most of the problems are from not implementing _any_ version of innerText.

I'd love to see what FF's selection behavior is — is it closer to WK or IE's innerText implementations? Perhaps that's the starting point.
Flags: needinfo?(kangax)
*Aryeh's spec thoroughly
When in doubt we should probably just follow Chrome, which I expect they mostly got from Webkit.
Here are the results I get on Juriy's tests using selection.toString():

Newlines between paragraphs: 2
Newlines after table caption: 0
Newlines after block, non-paragraph elements (e.g. <div>): 1
Includes display:none elements: no
Includes visibility:hidden elements: no
Includes <select> content: no
Includes <button> content: no
Includes <canvas>, <video> fallback content: canvas no, video yes
Includes <iframe> fallback content: no
Includes <script>, <style> content: no
Creates leading space for anonymous inline boxes (including <li>): yes
Creates newline around inline, block-styled elements (e.g. <span>): no
Respects white-space:pre: yes
Creates (only) spaces between block, inline-styled elements (e.g. <div>): no
Puts tabs between <td> tags: yes
Preserves upper/lower-cased text: no
Looking at nsPlainTextSerializer, it's pretty complex. Some changes we'd want to make would be quite disruptive, e.g. right now many HTML elements (such as <span>) ignore IsElementBlock() in nsPlainTextSerializer::DoOpenContainer. I'd feel a lot more comfortable about having a separate implementation for innerText that we can evolve independently. I also think it would be good to deploy an implementation of innerText whose behavior is as simple as possible, so that the spec can be as simple as possible while still be being Web-compatible. Maybe we should try picking the simpler behavior everywhere IE and Chrome differ.
Roc,

First of all. Very cool. That's a very good news. Thanks for taking this. 

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) (offline Oct 1-4) from comment #43)
> While that's true, I would like to have a spec

That's partly the goal of the opened issue on the webcompat spec
https://github.com/whatwg/compat/issues/5
You are welcome to suggest prose or writing along the implementation and we can adjust step by step. 

I'm pretty sure Mike, Hallvord and I also can dig out old Web Compat issues and see how we fare with regards to them with the implementation you will be doing. That would be nice test cases. And specifically how it would work in the face of code like these ones which try to address the Web Compatibility issue. 


----
> function innerText (elem) {
>   return elem.innerText || elem.textContent;
> }

 module.exports = innerText;
In https://github.com/joshwnj/marki/blob/475b400cb4ad9a2c1d43fe46bebb0f518c397c05/src/inner-text.js

---- 

> if (el.textContent !== undefined)
>   el.textContent = string;
> else
>   el.innerText = string;
In https://github.com/HubSpot/youmightnotneedjquery/blob/ef987223c20e480fcbfb5924d96c11cd928e1226/comparisons/elements/set_text/ie8.js

----

> function text (el, value) {
>   if (arguments.length === 2) {
>     el.innerText = el.textContent = value;
>   }
>   if (typeof el.innerText === 'string') {
>     return el.innerText;
>   }
>   return el.textContent;
> }

In https://github.com/bevacqua/insignia/blob/53742c6929332b7c776d85dade8d7d82f524195b/text.js


----

> // innerText polyfill for Firefox
> ((document) => {
>   const dummy = document.createElement('div');
>     Object.defineProperty(HTMLElement.prototype, 'innerText', {
>       get: function()  { return this.textContent },
>       set: function(v) { this.textContent = v; }
>     });
>   }
> })(document);
In https://github.com/ng-kyoto/ng-kyoto.github.io/blob/9315fb7b24baec5a1614b4692bfedf9c0fc267dc/app/utils/innertext-polyfill.ts
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) (offline Oct 1-4) from comment #48)
> Looking at nsPlainTextSerializer, it's pretty complex.
It is, which is one reason why we for innerHTML, which is quite well spec'ed, moved out from using ns*Serializer (at least in HTML case). Performance was another reason (bug 744830 comment 51).
FragmentOrElement::GetMarkup does the innerHTML serialization.

> I'd feel a lot more comfortable
> about having a separate implementation for innerText that we can evolve
> independently. I also think it would be good to deploy an implementation of
> innerText whose behavior is as simple as possible, so that the spec can be
> as simple as possible while still be being Web-compatible. Maybe we should
> try picking the simpler behavior everywhere IE and Chrome differ.
This all sounds good
(In reply to Karl Dubost :karlcow from comment #49)
> I'm pretty sure Mike, Hallvord and I also can dig out old Web Compat issues
> and see how we fare with regards to them with the implementation you will be
> doing. That would be nice test cases. And specifically how it would work in
> the face of code like these ones which try to address the Web Compatibility
> issue.

So far I have not been able to find any specific Web compat issues that require more than just making innerText an alias for textContent. I suspect they exist since IE's behavior has changed over time; I've asked Travis if he can find out. That's the information we really need.

(I looked at Webkit history for information about how and why their implementation has evolved, but changes to their plaintext serializer (TextIterator) do not typically mention why the change was made, and linked Webkit Bugzilla bugs are not helpful. Webkit's TextIterator is used for clipboard and other features so it's unclear which, if any, of their behavior is actually important for Web apps.)
Though actually, we need to not just come up with a spec that's the minimum required for Web compatibility, but something that IE and Chrome at least are willing to implement, so we might need something more complicated than the minimum to satisfy their perceived needs rather than just their actual needs...
I have created a proposed spec plus testsuite here: https://github.com/rocallahan/innerText-spec
Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Attachment #8675487 - Flags: review?(mats)
Attachment #8675487 - Flags: review?(bugs)
Keywords: feature
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

https://reviewboard.mozilla.org/r/22369/#review20043

::: layout/generic/nsIFrame.h:1907
(Diff revision 1)
>     * @param aSkippedMaxLength  Maximum number of characters to return

The GetRenderedText() documentation needs to be updated.

::: layout/generic/nsIFrame.h:1922
(Diff revision 1)
> +  virtual RenderedText GetRenderedText(uint32_t aFlags = 0,

It's a bit hard to understand what the zero means at the call site:
GetRenderedText(0, aContentOffset, aContentOffset + 1);
I think you should add a CONTENT_TEXT_OFFSETS to make it clear that the flag passed matches the offsets:
GetRenderedText(CONTENT_TEXT_OFFSETS, aContentOffset, aContentOffset + 1);

Do you think it's likely that we'll add additional flags here in the future?  If not, then I suggest you make it an enum class to avoid typos like:
GetRenderedText(aContentOffset, aContentOffset + 1);
Actually, I think we should do that now anyway.  Then if we need more flags in the future we add "operator|" etc.

Alternatively, could we bundle these three params into an object, DOMOffsets/RenderedOffsets subclassed of SomeOffsets, or something like that? e.g.
GetRenderedText(DOMOffsets(aContentOffset, aContentOffset + 1));
GetRenderedText(RenderedOffsets(aRenderedOffset, aRenderedOffset + 1));

::: layout/generic/nsTextFrame.h:264
(Diff revision 1)
> -  virtual nsresult GetRenderedText(nsAString* aString = nullptr,
> +  virtual RenderedText GetRenderedText(uint32_t aFlags = 0,

aFlags = CONTENT_TEXT_OFFSETS would make this clearer too

::: layout/generic/nsTextFrame.cpp:8888
(Diff revision 1)
> -  uint32_t validCharsLength = 0;
> +  bool setOffsets = false;

|setOffsets| is a confusing name. It's used like so:
if (!setOffsets) {
  // set the offsets
  setOffsets = true;
}
Which appears to be the opposite of what the name suggests.
Perhaps |haveOffsets| is better?
Attachment #8675486 - Flags: review?(mats)
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

https://reviewboard.mozilla.org/r/22371/#review20051

r=mats with those nits addressed as you see fit

::: dom/base/nsRange.cpp:3201
(Diff revision 1)
> +  int mRequiredLineBreakCount;

Why use a signed type for mRequiredLineBreakCount?
And why int instead of int32_t?

::: dom/base/nsRange.cpp:3206
(Diff revision 1)
> +    while (mRequiredLineBreakCount > 0) {

This loop seems silly.  Why not just do:

    if (mRequiredLineBreakCount > 0) {
      if (!mString.IsEmpty()) {
        mString.Append('\n');
      }
      mRequiredLineBreakCount = 0;
    }

::: dom/base/nsRange.cpp:3228
(Diff revision 1)
> +static bool IsVisibleAndNotInReplacedElement(nsIFrame* aFrame)

add a newline after |bool| please

::: dom/base/nsRange.cpp:3279
(Diff revision 1)
> +  if (aFrame->GetType() != nsGkAtoms::tableCellFrame) {

And nsGkAtoms::bcTableCellFrame ?

::: dom/base/nsRange.cpp:3325
(Diff revision 1)
> +  nsIContent* currentNode = static_cast<nsIContent*>(mStartParent.get());

Can you use AsContent() here?

::: dom/base/nsRange.cpp:3328
(Diff revision 1)
> +    nsGenericDOMDataNode* t = static_cast<nsGenericDOMDataNode*>(mStartParent.get());

'auto' makes this pattern more readable, IMO.

::: dom/base/nsRange.cpp:3358
(Diff revision 1)
> +          nsIFrame::RenderedText text = f->GetRenderedText(0, 0, UINT32_MAX);

Just GetRenderedText() seems clearer.

::: dom/base/nsRange.cpp:3371
(Diff revision 1)
> +          result.Append(NS_LITERAL_STRING("\n"));

Append('\n') is shorter (three places)

::: dom/base/nsRange.cpp:3373
(Diff revision 1)
> +        uint8_t display = f->StyleDisplay()->mDisplay;

No need to declare a variable for this, just use the expression directly in the 'switch'.
Attachment #8675487 - Flags: review?(mats) → review+
FYI, nsDocumentEncoder has some special code for handling ShadowRoot
and IsHTMLElement(nsGkAtoms::rp) here:
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsDocumentEncoder.cpp#107
did you consider those?
It would be good to have test that checks <table><tfoot>x</tfoot><tbody>y</tbody></table>
since those are reordered in rendering.  And some tests using shadow dom.
https://reviewboard.mozilla.org/r/22369/#review20043

> It's a bit hard to understand what the zero means at the call site:
> GetRenderedText(0, aContentOffset, aContentOffset + 1);
> I think you should add a CONTENT_TEXT_OFFSETS to make it clear that the flag passed matches the offsets:
> GetRenderedText(CONTENT_TEXT_OFFSETS, aContentOffset, aContentOffset + 1);
> 
> Do you think it's likely that we'll add additional flags here in the future?  If not, then I suggest you make it an enum class to avoid typos like:
> GetRenderedText(aContentOffset, aContentOffset + 1);
> Actually, I think we should do that now anyway.  Then if we need more flags in the future we add "operator|" etc.
> 
> Alternatively, could we bundle these three params into an object, DOMOffsets/RenderedOffsets subclassed of SomeOffsets, or something like that? e.g.
> GetRenderedText(DOMOffsets(aContentOffset, aContentOffset + 1));
> GetRenderedText(RenderedOffsets(aRenderedOffset, aRenderedOffset + 1));

I thought I would have to add more flags but in the end I didn't. So making this an enum class parameter, and moving it to the end, seems like the way to go.

> |setOffsets| is a confusing name. It's used like so:
> if (!setOffsets) {
>   // set the offsets
>   setOffsets = true;
> }
> Which appears to be the opposite of what the name suggests.
> Perhaps |haveOffsets| is better?

Renamed to haveOffsets.
https://reviewboard.mozilla.org/r/22371/#review20065

::: dom/base/nsRange.cpp:3201
(Diff revision 1)
> +  int mRequiredLineBreakCount;

It can only be 0, 1 or 2. I'll make it int8_t.

::: dom/base/nsRange.cpp:3206
(Diff revision 1)
> +    while (mRequiredLineBreakCount > 0) {

Because we need to add as many newlines as mRequiredLineBreakCount says. In particular sometimes (when a <p> is present) we need to add two newlines.

::: dom/base/nsRange.cpp:3279
(Diff revision 1)
> +  if (aFrame->GetType() != nsGkAtoms::tableCellFrame) {

Yes, good catch.

::: dom/base/nsRange.cpp:3325
(Diff revision 1)
> +  nsIContent* currentNode = static_cast<nsIContent*>(mStartParent.get());

Sure.

::: dom/base/nsRange.cpp:3328
(Diff revision 1)
> +    nsGenericDOMDataNode* t = static_cast<nsGenericDOMDataNode*>(mStartParent.get());

OK

::: dom/base/nsRange.cpp:3358
(Diff revision 1)
> +          nsIFrame::RenderedText text = f->GetRenderedText(0, 0, UINT32_MAX);

Sure.

::: dom/base/nsRange.cpp:3371
(Diff revision 1)
> +          result.Append(NS_LITERAL_STRING("\n"));

OK, though I have had to add an Append(char) overload for that.

::: dom/base/nsRange.cpp:3373
(Diff revision 1)
> +        uint8_t display = f->StyleDisplay()->mDisplay;

OK
(In reply to Mats Palmgren (:mats) from comment #58)
> FYI, nsDocumentEncoder has some special code for handling ShadowRoot

I don't understand what that code is for. It was added in bug 806506 without explanation. I think we should ignore that for now, but let's ask William. Unless I hear otherwise I'd like to completely ignore shadow DOM, like we're ignoring CSS anonymous content. I'll write some tests to check that it's ignored. Chrome excludes shadow DOM content from "innerText".

> and IsHTMLElement(nsGkAtoms::rp) here:
> http://mxr.mozilla.org/mozilla-central/source/dom/base/nsDocumentEncoder.
> cpp#107

Yes ... clearly <rp> contents should be included in innerText, though Chrome doesn't do it. I'll fix that in the spec and here.

> It would be good to have test that checks <table><tfoot>x</tfoot><tbody>y</tbody></table>
> since those are reordered in rendering.  And some tests using shadow dom.

Sure.
Flags: needinfo?(wchen)
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Attachment #8675486 - Flags: review?(mats) → review+
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

https://reviewboard.mozilla.org/r/22369/#review20107

r=mats with the issue below fixed

::: layout/generic/nsIFrame.h:1907
(Diff revision 2)
>     * @param aSkippedMaxLength  Maximum number of characters to return

The updated documentation is still missing.
https://reviewboard.mozilla.org/r/22371/#review20111

::: dom/base/nsRange.cpp:3267
(Diff revision 2)
> +static int

You're still using 'int' here.  I guess you want int8_t?

::: dom/base/nsRange.cpp:3386
(Diff revision 2)
> +          result.Append('\n');

Oh, right, 'result' is the accumulator object here.  I misread it as a string type the last time which is why I suggested to use Assign(char).  Adding an overload is fine, but don't feel like you have to if you want to revert that bit.
(It's fine, but I still don't get why you want to use a signed type though.)
We're not compatible with Chrome on this test (table.innerText):
<table>
<tfoot><tr><td>footer</tfoot>
<thead><tr><td style="visibility:collapse">thead</thead>
<tbody><tr><td>tbody</tbody>
</table>

You may want to fix that, and perhaps also update the spec?

Please also add tests with visibility:collapse on (each) row-group, row, and cell.
We should probably also add tests for grid/flex items that use 'order':
<div style="display:grid"><x style="order:1">1</x><x>2</x></div>
<div style="display:flex"><x style="order:1">1</x><x>2</x></div>

"2" is rendered before "1", but the innerText should be "1\n2"
I suppose (since the <x>'s are blockified).

Probably also worth having tests with visibility:collapse on some items,
since it's supported at least for flexbox:
http://www.w3.org/TR/css-flexbox-1/#visibility-collapse
(and Grid might support it too in the future)
https://reviewboard.mozilla.org/r/22369/#review20197

::: layout/generic/nsIFrame.h:1907
(Diff revision 2)
>     * @param aSkippedMaxLength  Maximum number of characters to return

Fixed
https://reviewboard.mozilla.org/r/22371/#review20199

::: dom/base/nsRange.cpp:3267
(Diff revision 2)
> +static int

Yes.

I'm using a signed type because especially for limited-range types, it's safer. Bugs like for (...; x >= 0; ...) can't bite.
(In reply to Mats Palmgren (:mats) from comment #68)
> We're not compatible with Chrome on this test (table.innerText):
> <table>
> <tfoot><tr><td>footer</tfoot>
> <thead><tr><td style="visibility:collapse">thead</thead>
> <tbody><tr><td>tbody</tbody>
> </table>
> 
> You may want to fix that, and perhaps also update the spec?

I'd rather not change anything here. I don't think there's anything clearly wrong with what the spec and our implementation do, changing this to match Chrome would add complexity, and I'd be surprised if it causes Web compatibility problems in practice.

> Please also add tests with visibility:collapse on (each) row-group, row, and
> cell.

Done.

(In reply to Mats Palmgren (:mats) from comment #69)
> We should probably also add tests for grid/flex items that use 'order':
> <div style="display:grid"><x style="order:1">1</x><x>2</x></div>
> <div style="display:flex"><x style="order:1">1</x><x>2</x></div>
> 
> "2" is rendered before "1", but the innerText should be "1\n2"
> I suppose (since the <x>'s are blockified).

Done.

> Probably also worth having tests with visibility:collapse on some items,
> since it's supported at least for flexbox:
> http://www.w3.org/TR/css-flexbox-1/#visibility-collapse
> (and Grid might support it too in the future)

Done.
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
https://reviewboard.mozilla.org/r/22371/#review20203

I've updated <rp> handling a little bit to ensure that visiblity:hidden <rp>s don't contribute to the result of innerText.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #62)
> (In reply to Mats Palmgren (:mats) from comment #58)
> > FYI, nsDocumentEncoder has some special code for handling ShadowRoot
> 
> I don't understand what that code is for. It was added in bug 806506 without
> explanation. I think we should ignore that for now, but let's ask William.
> Unless I hear otherwise I'd like to completely ignore shadow DOM, like we're
> ignoring CSS anonymous content. I'll write some tests to check that it's
> ignored. Chrome excludes shadow DOM content from "innerText".
If I recall correctly, the code is to make serializing from a selection in shadow DOM work. In some cases, selection within a shadow DOM is described by a range where the root container is the shadow root and there is code that checks if the root container is visible before serializing the contents. It looks like innerText is supposed to work similar to serializing a selection, and given that selection with shadow DOM is not well defined and behaves poorly in all implementations (spec only says user agents provide a best attempt), it's not clear what behavior we actually want for textContent so whatever is easiest is probably fine for now. It should be more clear once selection and shadow DOM is specified.
Flags: needinfo?(wchen)
These patches caused some accessibility test failures that I'm fixing up now.
Hmm, this landed already? Is my review needed here? 
(Sorry, I've had some other stuff to review too)
Roc landed by mistake, it seems:
b70e89c03c56	Robert O'Callahan — Revert incorrectly committed changes ab657569f554 and a396f4262479
I landed it by mistake and backed it out immediately. So it still needs your review.
Flags: needinfo?(bugs)
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

The test changes here are to adjust for the fact that
nsTextFrame::GetRenderedText can now trim whitespace from the end of lines
that end in a hard line break.
Attachment #8675486 - Attachment description: MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats → MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz
Attachment #8675486 - Flags: review?(mzehe)
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
https://reviewboard.mozilla.org/r/22369/#review20995

mats: I'm adding aPushedFrame = true; in nsLineLayout.cpp. This fixes a bug that was revealed by the accessibility tests, where a line was not getting SetLineWrapped(true) even though we had a soft line break, because nsBlockFrame::ReflowInlineFrame did not see a pushed frame. So innerText could strip spaces before soft line breaks when it shouldn't.

This also fixes a bug in GetRenderedText where whitespace trimming was not being taken into account when we compute the length of the rendered text for a frame, and so our rendered text offsets were off sometimes. Also revealed by the accessibility tests.

marcoz: The accessibility test changes are entirely due to the new code trimming trailing whitespace when it's before a hard line break and collapsible.
https://reviewboard.mozilla.org/r/22371/#review20997

The changes here are very minor, just a static-analysis fix and a couple of tests to catch the bug about lines not being marked as having a soft line break.
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

The test changes here are to adjust for the fact that
nsTextFrame::GetRenderedText can now trim whitespace from the end of lines
that end in a hard line break.
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

The test changes here are to adjust for the fact that
nsTextFrame::GetRenderedText can now trim whitespace from the end of lines
that end in a hard line break.
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

The test changes here are to adjust for the fact that
nsTextFrame::GetRenderedText can now trim whitespace from the end of lines
that end in a hard line break.
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Attachment #8675486 - Flags: review+ → review?(mats)
Attachment #8675487 - Flags: review+ → review?(mats)
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

https://reviewboard.mozilla.org/r/22369/#review21029

r=marcoz

::: accessible/generic/HyperTextAccessible.cpp:1989
(Diff revision 4)
> -  // Only get info up to original offset, we know that will be larger than skipped offset
> +  *aRenderedOffset = text.mOffsetWithinNodeRenderedText;

Can we keep the comment, please?

::: accessible/generic/HyperTextAccessible.cpp:2013
(Diff revision 4)
> -  // We only need info up to skipped offset -- that is what we're converting to original offset
> +  *aContentOffset = text.mOffsetWithinNodeText;

Same here.

r=me with the two nits fixed. Thanks!
Attachment #8675486 - Flags: review?(mzehe) → review+
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

https://reviewboard.mozilla.org/r/22369/#review21059

r=mats

::: layout/generic/nsTextFrame.cpp:9009
(Diff revisions 3 - 4)
> +        NS_ASSERTION(result.mOffsetWithinNodeRenderedText >= offsetInRenderedString &&

Please use MOZ_ASSERT instead.
Attachment #8675486 - Flags: review?(mats) → review+
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

https://reviewboard.mozilla.org/r/22371/#review21061
Attachment #8675487 - Flags: review?(mats) → review+
+struct InnerTextAccumulator {
{ goes to its own line
+  nsString& mString;
+  int8_t mRequiredLineBreakCount;
Could you put these to the end of the struct.

void Append(char ch)
s/ch/aCh/

+static bool
+ElementIsVisible(nsIContent* aContent)
+{
+  Element* elem = aContent->AsElement();
+  if (!elem) {
+    return false;
+  }
You must check aContent->IsElement() before using AsElement(). AsElement just does cast.
(and in DOM world its name would be GetAsElement() if it might return null.)


+  RefPtr<nsStyleContext> sc = nsComputedDOMStyle::GetStyleContextForElement(
+      elem, nullptr, nullptr);
2 spaces for indentation.


How does ElementIsVisible behave if the element is inside display: none; ? And how should it work?


GetRequiredInnerTextLineBreakCount
So <p<>'s linebreak handling never depends on its styling.


+enum TreeTraversalState {
+  AT_NODE,
+  AFTER_NODE
+};
Could you please add some comment what the states are about.

+void
+nsGenericHTMLElement::GetInnerText(mozilla::dom::DOMString& aValue,
+                                   mozilla::ErrorResult& aError)
you can drop mozilla::dom and mozilla:: here

+{
+  if (!GetPrimaryFrame(Flush_Layout)) {
+    RefPtr<nsStyleContext> sc =
+        nsComputedDOMStyle::GetStyleContextForElementNoFlush(this, nullptr, nullptr);
2 space for indentation

+  RefPtr<nsRange> range;
+  nsresult rv = nsRange::CreateRange(static_cast<nsINode*>(this), 0, this,
+      GetChildCount(), getter_AddRefs(range));
2 spaces for indentation

Hmm, GetInnerText looks a bit malloc heavy.  Did you compare the performance to other engines?
One simple optimization would be to not create Range object all the time. Either reuse one (and let Document to keep it alive) or
don't use Range at all but just pass the right 'startParent' and 'endParent' to GetInnerTextNoFlush.
In fact, since you just pass 'this' as start and end, why do we need Range at all?
GetInnerTextNoFlush could be just a static method taking startParent and endOffset, or if we want to support using if for other cases too, 
make it take start/endParent and also start/endOffset. But no need for creating Range object all the time.

+  nsString str;
+  const char16_t* s = aValue.BeginReading();
+  const char16_t* end = aValue.EndReading();
nsAutoString for str.

+      if (!str.IsEmpty()) {
+        RefPtr<nsTextNode> textContent =
+            new nsTextNode(NodeInfo()->NodeInfoManager());
2 spaces for indentation

+      already_AddRefed<mozilla::dom::NodeInfo> ni =
+          NodeInfo()->NodeInfoManager()->GetNodeInfo(nsGkAtoms::br,
ditto

+      str.SetLength(0);
.Truncate()


So I'd like to see a bit less malloc heavy (especially given that Range is cycle collectable object) setup for the getter, and it should be rather easy to do.
Flags: needinfo?(bugs)
Attachment #8675487 - Flags: review?(bugs) → review-
(In reply to Marco Zehe (:MarcoZ) from comment #98)
> ::: accessible/generic/HyperTextAccessible.cpp:1989
> (Diff revision 4)
> > -  // Only get info up to original offset, we know that will be larger than skipped offset
> > +  *aRenderedOffset = text.mOffsetWithinNodeRenderedText;> Can we keep the comment, please?

I don't think we should keep it, because the fact "we know that will be larger than skipped offset" is no longer relevant. The new code doesn't depend on that assumption.

> ::: accessible/generic/HyperTextAccessible.cpp:2013
> (Diff revision 4)
> > -  // We only need info up to skipped offset -- that is what we're converting to original offset
> > +  *aContentOffset = text.mOffsetWithinNodeText;
> 
> Same here.

I don't think this comment really makes sense anymore either.
Flags: needinfo?(mzehe)
(In reply to Mats Palmgren (:mats) from comment #99)
> ::: layout/generic/nsTextFrame.cpp:9009
> (Diff revisions 3 - 4)
> > +        NS_ASSERTION(result.mOffsetWithinNodeRenderedText >= offsetInRenderedString &&
> 
> Please use MOZ_ASSERT instead.

I'm not a fan of this change, since I think an error in rendered text offsets is not worth interrupting a test run for, but I don't really care so I'll do it.
(In reply to Olli Pettay [:smaug] from comment #101)
> +struct InnerTextAccumulator {
> { goes to its own line

Done.

> +  nsString& mString;
> +  int8_t mRequiredLineBreakCount;
> Could you put these to the end of the struct.

Done.

> void Append(char ch)
> s/ch/aCh/

Done

> +static bool
> +ElementIsVisible(nsIContent* aContent)
> +{
> +  Element* elem = aContent->AsElement();
> +  if (!elem) {
> +    return false;
> +  }
> You must check aContent->IsElement() before using AsElement(). AsElement
> just does cast.
> (and in DOM world its name would be GetAsElement() if it might return null.)

I made ElementIsVisible take Element* and moved AsElement() to the caller. The caller only calls it if IsHTMLElement(nsGkAtoms::rp) is true, which I guess is enough to justify calling AsElement().

> +  RefPtr<nsStyleContext> sc = nsComputedDOMStyle::GetStyleContextForElement(
> +      elem, nullptr, nullptr);
> 2 spaces for indentation.

Done.

> How does ElementIsVisible behave if the element is inside display: none; ?
> And how should it work?

It returns true. It's only supposed to check the visiblity style. This is because <rp> is normally display:none but we want to include it in innerText --- but we *don't* want to include it if it's visibility:hidden.

> GetRequiredInnerTextLineBreakCount
> So <p<>'s linebreak handling never depends on its styling.

Correct. I'm guessing that making linebreak handling depend on CSS style (e.g. 'margin') would be unnecessarily complicated.

> +enum TreeTraversalState {
> +  AT_NODE,
> +  AFTER_NODE
> +};
> Could you please add some comment what the states are about.

Sure.

> +void
> +nsGenericHTMLElement::GetInnerText(mozilla::dom::DOMString& aValue,
> +                                   mozilla::ErrorResult& aError)
> you can drop mozilla::dom and mozilla:: here

Done.

> +{
> +  if (!GetPrimaryFrame(Flush_Layout)) {
> +    RefPtr<nsStyleContext> sc =
> +        nsComputedDOMStyle::GetStyleContextForElementNoFlush(this, nullptr,
> nullptr);
> 2 space for indentation

Done.

> +  RefPtr<nsRange> range;
> +  nsresult rv = nsRange::CreateRange(static_cast<nsINode*>(this), 0, this,
> +      GetChildCount(), getter_AddRefs(range));
> 2 spaces for indentation

Done.

> Hmm, GetInnerText looks a bit malloc heavy.  Did you compare the performance
> to other engines?

No. I don't know of any use-cases where performance would matter, though I suppose that won't stop people writing silly benchmarks.

> One simple optimization would be to not create Range object all the time.
> Either reuse one (and let Document to keep it alive) or
> don't use Range at all but just pass the right 'startParent' and 'endParent'
> to GetInnerTextNoFlush.
> In fact, since you just pass 'this' as start and end, why do we need Range
> at all?

Because I'd like to reuse this to implement Selection.toString().

> GetInnerTextNoFlush could be just a static method taking startParent and
> endOffset, or if we want to support using if for other cases too, 
> make it take start/endParent and also start/endOffset. But no need for
> creating Range object all the time.

OK, I'll do that.
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

The test changes here are to adjust for the fact that
nsTextFrame::GetRenderedText can now trim whitespace from the end of lines
that end in a hard line break.
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Attachment #8675487 - Flags: review- → review?(bugs)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #102)
> (In reply to Marco Zehe (:MarcoZ) from comment #98)
> > ::: accessible/generic/HyperTextAccessible.cpp:1989
> > (Diff revision 4)
> > > -  // Only get info up to original offset, we know that will be larger than skipped offset
> > > +  *aRenderedOffset = text.mOffsetWithinNodeRenderedText;> Can we keep the comment, please?
> 
> I don't think we should keep it, because the fact "we know that will be
> larger than skipped offset" is no longer relevant. The new code doesn't
> depend on that assumption.

OK, thanks for the clarification! Just wanted to make sure we don't lose any important information. This goes for the second one, too.
Flags: needinfo?(mzehe)
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Note 
  struct RenderedText {
    nsString mString;
    uint32_t mOffsetWithinNodeRenderedText;
    int32_t mOffsetWithinNodeText;
    RenderedText() : mOffsetWithinNodeRenderedText(0),
        mOffsetWithinNodeText(0) {}
  };
that mString shows up pretty badly in a silly testcase like 
http://mozilla.pettay.fi/moztests/innertext_nonsimple_nostyle_test.html
That test has two parts, and it is the latter part where mString handling shows up.
The first 500 runs is reflow bounded (can we do anything to that?), and the latter part is malloc heavy.
Using nsAutoString and not nsString reduces allocations quite a bit. Still behind blink's performance, but at least a bit closer.
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

+IsVisibleAndNotInReplacedElement(nsIFrame* aFrame)
+{
+  if (!aFrame || !aFrame->StyleVisibility()->IsVisible()) {
+    return false;
+  }
+  for (nsIFrame* f = aFrame->GetParent(); f; f = f->GetParent()) {
+    if (f->GetContent() && f->GetContent()->IsHTMLElement(nsGkAtoms::button)) {
+      continue;
+    }
+    if (f->IsFrameOfType(nsIFrame::eReplaced)) {
+      return false;
+    }
+  }
+  return true;
+}
This shows up in performance profiles quite a bit (less than the nsString though). Anything we can do to it?

+nsRange::GetInnerTextNoFlush(DOMString& aValue, ErrorResult& aError,
+                             nsINode* aStartParent, uint32_t aStartOffset,
+                             nsINode* aEndParent, uint32_t aEndOffset)
I think aStartParent and aEndParent should be nsIContent objects since that is what the algorithm assumes.
(aEndParent->AsContent() isn't safe if aEndParent isn't nsIContent)


+void
+nsGenericHTMLElement::GetInnerText(mozilla::dom::DOMString& aValue,
+                                   mozilla::ErrorResult& aError)
+{
+  if (!GetPrimaryFrame(Flush_Layout)) {
+    RefPtr<nsStyleContext> sc =
+      nsComputedDOMStyle::GetStyleContextForElementNoFlush(this, nullptr, nullptr);
+    if (!sc || sc->StyleDisplay()->mDisplay == NS_STYLE_DISPLAY_NONE ||
+        !GetCurrentDoc()) {
GetCurrentDoc() is deprecated, and I think we want this to deal with shadow dom too, so IsInComposedDoc() should work fine.



+  // Batch possible DOMSubtreeModified events.
+  mozAutoSubtreeModified subtree(nullptr, nullptr);
+
+  // Scope firing mutation events so that we don't carry any state that
+  // might be stale
+  {
+    // We're relying on mozAutoSubtreeModified to keep a strong reference if
+    // needed.
+    nsIDocument* doc = OwnerDoc();
+
+    // Optimize the common case of there being no observers
+    if (nsContentUtils::HasMutationListeners(doc, NS_EVENT_BITS_MUTATION_NODEREMOVED)) {
+      subtree.UpdateTarget(doc, nullptr);
+      // Keep "this" alive during mutation listener firing
+      kungFuDeathGrip = this;
+      nsCOMPtr<nsINode> child;
+      for (child = nsINode::GetFirstChild();
+           child && child->GetParentNode() == this;
+           child = child->GetNextSibling()) {
+        nsContentUtils::MaybeFireNodeRemoved(child, this, doc);
+      }
+    }
+  }

// Batch possible DOMSubtreeModified events.
mozAutoSubtreeModified subtree(doc, nullptr);
FireNodeRemovedForChildren();

should work here.
Attachment #8675487 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #108)
> Note 
>   struct RenderedText {
>     nsString mString;
>     uint32_t mOffsetWithinNodeRenderedText;
>     int32_t mOffsetWithinNodeText;
>     RenderedText() : mOffsetWithinNodeRenderedText(0),
>         mOffsetWithinNodeText(0) {}
>   };
> that mString shows up pretty badly in a silly testcase like 
> http://mozilla.pettay.fi/moztests/innertext_nonsimple_nostyle_test.html
> That test has two parts, and it is the latter part where mString handling
> shows up.
> The first 500 runs is reflow bounded (can we do anything to that?)

Not the way it's currently specced, no.

> and the
> latter part is malloc heavy.
> Using nsAutoString and not nsString reduces allocations quite a bit. Still
> behind blink's performance, but at least a bit closer.

OK. The question is, is this something worth optimizing?
According to my profile 2/3 of the time is actually in nsTextFrame::GetRenderedText for your benchmark. 1/7 is in IsVisibleAndNotInReplacedElement. The string memory overhead (Linux) seems to be quite low, maybe 1/15.
Comment on attachment 8675486 [details]
MozReview Request: Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

Bug 264412. Refactor nsIFrame::GetRenderedText API to be more sane. r=mats,marcoz

The test changes here are to adjust for the fact that
nsTextFrame::GetRenderedText can now trim whitespace from the end of lines
that end in a hard line break.
Comment on attachment 8675487 [details]
MozReview Request: Bug 264412. Implement HTMLElement.innerText. r=smaug,mats

Bug 264412. Implement HTMLElement.innerText. r=smaug,mats
Attachment #8675487 - Flags: review+ → review?(bugs)
Bug 264412. Optimize GetRenderedText. r=mats

With these changes we're slightly faster than Chrome on the non-reflowing part of
Olli's testcase.
Attachment #8681097 - Flags: review?(mats)
I(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #111)
> According to my profile 2/3 of the time is actually in
> nsTextFrame::GetRenderedText for your benchmark. 1/7 is in
> IsVisibleAndNotInReplacedElement. The string memory overhead (Linux) seems
> to be quite low, maybe 1/15.

I misread the profile. A lot of the time in GetRenderedText was string processing; Append(char) seems to have significant fixed overhead. The third patch here avoids that. I also made changes to IsVisibleAndNotInReplacedElement to make it somewhat faster, and made a tweak so we call it roughly half as often.
Comment on attachment 8681097 [details]
MozReview Request: Bug 264412. Optimize GetRenderedText. r=mats

https://reviewboard.mozilla.org/r/23769/#review21253

::: layout/generic/nsTextFrame.cpp:8906
(Diff revision 1)
> +    if (ch == '\n') {

Perhaps this is more readable?

    if ((ch == '\n' && !aStyle->NewlineIsSignificant(aFrame)) ||
        (ch == '\t' && !aStyle->TabIsSignificant())) {
      ch = ' ';
    }

::: layout/generic/nsTextFrame.cpp:8918
(Diff revision 1)
> +  if (aStyle->mTextTransform != NS_STYLE_TEXT_TRANSFORM_NONE) {

Why exclude TRANSFORM_NONE up front?
The 'switch' is a NOP for that value already so it doesn't seem necessary.
Is it just an optimization for the common case?
If so, you should probably point that out for clarity, and perhaps also
make it an early return instead?

::: layout/generic/nsTextFrame.cpp:9058
(Diff revision 1)
> +          trimmedOffsets.GetEnd() - iter.GetOriginalOffset());

iter.GetOriginalOffset() hides a subtraction and this expression is
repeated in a couple of places (also in a loop).
I think you should make this a local var, say originalOffset.
That should make the std::min fit on one line?
Also, you can avoid the addition inside the loop, like so:

auto end = originalOffset + runLength;
for (int32_t i = originalOffset; i < end; ++i) {
  ... CharAt(i)

It might be worth getting rid of iter.GetOriginalOffset()
in the while-loop condition too unless it complicates
the code too much.
Attachment #8681097 - Flags: review?(mats)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #110)
> (In reply to Olli Pettay [:smaug] from comment #108)
> > and the
> > latter part is malloc heavy.
> > Using nsAutoString and not nsString reduces allocations quite a bit. Still
> > behind blink's performance, but at least a bit closer.
> 
> OK. The question is, is this something worth optimizing?
Well, we've spent quite some time to optimize innerHTML for example since many or probably most of
DOM related benchmarks have some silly micro benchmarks for it. 
Now with innerText supported I wouldn't be surprised benchmarks adding tests for it.
The reason I tested the performance here was to make sure we aren't way of the performance of others.
Attachment #8675487 - Flags: review?(bugs) → review+
(In reply to Mats Palmgren (:mats) from comment #116)
> ::: layout/generic/nsTextFrame.cpp:8906
> (Diff revision 1)
> > +    if (ch == '\n') {
> 
> Perhaps this is more readable?
> 
>     if ((ch == '\n' && !aStyle->NewlineIsSignificant(aFrame)) ||
>         (ch == '\t' && !aStyle->TabIsSignificant())) {
>       ch = ' ';
>     }

Sure

> ::: layout/generic/nsTextFrame.cpp:8918
> (Diff revision 1)
> > +  if (aStyle->mTextTransform != NS_STYLE_TEXT_TRANSFORM_NONE) {
> 
> Why exclude TRANSFORM_NONE up front?
> The 'switch' is a NOP for that value already so it doesn't seem necessary.
> Is it just an optimization for the common case?
> If so, you should probably point that out for clarity, and perhaps also
> make it an early return instead?

Sorry, that was left over from previous work. Removed.

> ::: layout/generic/nsTextFrame.cpp:9058
> (Diff revision 1)
> > +          trimmedOffsets.GetEnd() - iter.GetOriginalOffset());
> 
> iter.GetOriginalOffset() hides a subtraction and this expression is
> repeated in a couple of places (also in a loop).
> I think you should make this a local var, say originalOffset.
> That should make the std::min fit on one line?
> Also, you can avoid the addition inside the loop, like so:
> 
> auto end = originalOffset + runLength;
> for (int32_t i = originalOffset; i < end; ++i) {
>   ... CharAt(i)
> 
> It might be worth getting rid of iter.GetOriginalOffset()
> in the while-loop condition too unless it complicates
> the code too much.

I'd rather not do these things because they complicate the code and I very much doubt there's any measurable performance impact.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #118)
> I'd rather not do these things because they complicate the code and I very
> much doubt there's any measurable performance impact.

Well, I didn't quite like the formatting here:
      runLength = std::min(runLength,
          trimmedOffsets.GetEnd() - iter.GetOriginalOffset());

so I figured a local var would fix that:
      auto originalOffset = iter.GetOriginalOffset();
      runLength = std::min(runLength, trimmedOffsets.GetEnd() - originalOffset);

and when you have that you might as well use it in the for-loop.

Oh well, it's a matter of opinion I guess. :-)

So r=mats with the TRANSFORM_NONE check removed.
Comment on attachment 8681097 [details]
MozReview Request: Bug 264412. Optimize GetRenderedText. r=mats

Bug 264412. Optimize GetRenderedText. r=mats

With these changes we're slightly faster than Chrome on the non-reflowing part of
Olli's testcase.
Attachment #8681097 - Flags: review?(mats)
Attachment #8681097 - Flags: review?(mats) → review+
Comment on attachment 8681097 [details]
MozReview Request: Bug 264412. Optimize GetRenderedText. r=mats

https://reviewboard.mozilla.org/r/23769/#review21259

r=mats if my question below isn't an issue

::: dom/base/nsRange.cpp:3403
(Diff revision 2)
> -    } else {
> +    if (currentNode == endNode && currentState == endState) {
> +      break;

Hmm, doesn't this skip the AFTER_NODE action for the endNode?

::: layout/generic/nsTextFrame.cpp:9050
(Diff revision 2)
> +      runLength = std::min(runLength,
> +          trimmedOffsets.GetEnd() - iter.GetOriginalOffset());

I'd prefer if the second arg to std::min lined up with the first.
(In reply to Mats Palmgren (:mats) from comment #121)
> ::: dom/base/nsRange.cpp:3403
> (Diff revision 2)
> > -    } else {
> > +    if (currentNode == endNode && currentState == endState) {
> > +      break;
> 
> Hmm, doesn't this skip the AFTER_NODE action for the endNode?

That's the same behavior as before. We stop as soon as we reach the state endNode/endState. And that's OK. For example, when HTMLElement calls GetInnerTextNoFlush, we pass the subject element and its GetChildCount() for aEndParent/aEndOffset. So we stop when currentNode is aEndParent and currentState is AFTER_NODE, which is before we've done the AFTER_NODE actions for aEndParent, which is correct per spec.
OK, good.
https://hg.mozilla.org/mozilla-central/rev/95346f49d048
https://hg.mozilla.org/mozilla-central/rev/5ebc59281c25
https://hg.mozilla.org/mozilla-central/rev/9160f08660b8
Status: NEW → RESOLVED
Closed: 20 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
Please update the compatibility table in https://developer.mozilla.org/en-US/docs/Web/API/Node/innerText - supported since Firefox 45.
I think there's a bug.

<p style="visibility:hidden"><span style="visibility:visible">p</span></p><div>div</div>

This chunk produces "p\ndiv" in FF but "p\n\ndiv" in Chrome. 

Chrome behavior seems to make more sense to me. Also, removing visibility:hidden/visibility:visible from <p> produces more sensible double \n in FF right away.
Besides the above mentioned combination (which I assume to be a bug), I have so far found 2 differences in Chrome and FF: Chrome doesn't respect "whitespace: pre-line" (actually none of the browsers do besides FF) which I think is a bug in Chrome (and IE/Edge). Chrome puts 1 newline between paragraph and non-paragraph (like div) whereas FF puts 2, which I think is correct, and which IE10+/Edge do too.

The updated compat table — http://kangax.github.io/jstests/innerText/

I'm planning to add few more tests (from the ones Robert added with this commit) to find out if there's more deviations among implementations. And will try to file tickets with Chrome as well.
Depends on: 1221043
Quick note about performance. 

On my test page, I get 25-30ms vs. 35-40ms. in Chrome vs. Firefox (jsperf is unusable so my benchmark is crude) — so a pretty significant/noticeable difference. This might be because Chrome cuts corners.

The reason this is important is because some apps need to request innerText quite frequently on large documents when editing text and having to calculate position of cursor in formatted text.

P.S. Edge seems to perform really bad (~100ms) although I'm checking it through VirtualBox so perhaps that's what's affecting it.
removing the b2g 2.5 flag since this commit has been reverted due to an incorrect merge, sorry for the confusion
Depends on: 1226293
(In reply to Juriy "kangax" Zaytsev from comment #128)
> <p style="visibility:hidden"><span
> style="visibility:visible">p</span></p><div>div</div>
> 
> This chunk produces "p\ndiv" in FF but "p\n\ndiv" in Chrome. 
> 
> Chrome behavior seems to make more sense to me. Also, removing
> visibility:hidden/visibility:visible from <p> produces more sensible double
> \n in FF right away.

The FF/spec behavior here is intentional.

Handling CSS 'visibility' in innerText is tricky because in CSS 'visibility' does not affect layout, but that doesn't make sense in the context of 'innerText'. (The most faithful mimicry would be to replace text with spaces, perhaps, but that seems crazy and no browser does it.) So I've chosen to take a simple approach in FF and the spec where visibility:visible elements are just ignored for innerText. Chrome has a halfway approach where visibility:hidden suppresses some related text but not a <p>'s extra newlines *if the <p> has visible descendants* (or something like that).

We could make the spec a little more complicated to match Chrome here, but I'm reluctant to do that without a strong reason. Chrome's <p> handling has some definitely undesirable features, e.g. abc<p>123</p>def gives "abc\n123\n\ndef", i.e. you get a "bottom margin" but not a "top margin", which makes no sense, so even if Chrome's <p>-related behavior had a simple spec (which I doubt), we wouldn't really want to follow it. I very much doubt this case will matter in practice.
(In reply to Juriy "kangax" Zaytsev from comment #129)
> Besides the above mentioned combination (which I assume to be a bug), I have
> so far found 2 differences in Chrome and FF: Chrome doesn't respect
> "whitespace: pre-line" (actually none of the browsers do besides FF) which I
> think is a bug in Chrome (and IE/Edge). Chrome puts 1 newline between
> paragraph and non-paragraph (like div) whereas FF puts 2, which I think is
> correct, and which IE10+/Edge do too.
> 
> The updated compat table — http://kangax.github.io/jstests/innerText/
> 
> I'm planning to add few more tests (from the ones Robert added with this
> commit) to find out if there's more deviations among implementations. And
> will try to file tickets with Chrome as well.

Thanks!!!
(In reply to Juriy "kangax" Zaytsev from comment #130)
> On my test page, I get 25-30ms vs. 35-40ms. in Chrome vs. Firefox (jsperf is
> unusable so my benchmark is crude) — so a pretty significant/noticeable
> difference. This might be because Chrome cuts corners.
>
> The reason this is important is because some apps need to request innerText
> quite frequently on large documents when editing text and having to
> calculate position of cursor in formatted text.

OK ... it would be great if you can file a bug with a specific testcase on this.
Flags: needinfo?(kangax)
From a spec standpoint, just to confirm, this from 2011-02-03 is the latest there is right?

https://rawgit.com/timdown/rangy/master/fiddlings/spec/innerText.htm
Blocks: 998390
When we try to fetch innerText when it is not set at all, it used return undefined and now it is returning "" (empty string), can this be changed so that in the case that innerText is not set then it comes as undefined and not as an empty string. Technically if you have not set any thing it should be undefined, or was there a reason to actually have the default be "" empty string.
innerText is not supposed to return something that was set before. It returns element's textual representation, which always exists and should always be a string.
Flags: needinfo?(kangax)
Depends on: 1260025
Depends on: 1268833
Depends on: 1290937
See Also: → 1709790
You need to log in before you can comment on or make changes to this bug.