Open Bug 68489 Opened 19 years ago Updated Last year

Savings from ignorable whitespace

Categories

(Core :: Layout, defect, P3)

defect

Tracking

()

ASSIGNED
Future

People

(Reporter: rbs, Assigned: dbaron)

References

Details

Attachments

(1 file, 2 obsolete files)

[Follow-up to bug 68411]

The content model includes whitespace. However, the whitespace can be ignorable
in many contexts so that it is not necessary to create any nsTextFrame for them.

For example, the inter-tag whitespace in <table>\n<tr>\n<td> are ignorable.
Not ignoring them would even be harmful. This is because the first child of the
<table> is expected to be its first row, not the nsTextFrame that would have
been created in between. A similar problem arises in mathml where, for example,
the first child of 
  <mfrac>
    <mi>a</mi>
    <mi>b</mi>
  </mfrac>
is expected to be its numerator. In general therefore, whitespace is ignorable
when there is an embedded ordering of children that is significant. This is the 
case within the <table>...</table> and the <math>...</math> environments.

Currently, apart from table elements (and soon math elements) no special action
is taken to avoid creating nsTextFrame associated to ignorable whitespace.
On the contrary, a number of frame owners have been using special after-the-fact 
hacks to deal with the problem at the frame level.

The patch on bug 68411 now provides a general mechanism for any container frame
to instruct the frame construction code that it doesn't want ignorable
whitespace as children. To do so, the Init() method of the container frame just
has to include the statement:

mState |= NS_FRAME_EXCLUDE_IGNORABLE_WHITESPACE;

(there could be an if clause that causes the addition of the bit to be made
only under certain conditions, e.g., if the style context of the frame doesn't
have 'white-space: pre')

This bug is a request for investigation of other frame classes that can
benefit from the optimization. For each case that will be deemed fit for
exclusion, there will be two aspects:

1) Just adding the bit in the Init() so that the creation of the nsTextFrames
are stopped. This will bring savings in memory. This may also bring about
savings is reflow-time, as it will cut down on the hacks that would be reached.

2) Removing the hacks. This is more involved and would require some familiarity
with the code. It may appeal to reason to postpone this second aspect at a
subsequent stage, when 1) is completed everywhere and has been tested for quite
some time.
We *can't* drop the white-space from <table>.

   table { display: block; white-space: pre; }
   table * { display: inline; }

   <table>
     <tr><td>A</td></tr> <tr><td>B</td></tr>
     <tr><td>C</td></tr> <tr><td>D</td></tr>
   </table>

...should render just like:

   A B
   C D

...and not

   ABCD

...or whatever.
Tested, and it worked okay, as expected.
It is the ignorable inter-tag whitespace <table>\n<tr>\n<td> that are wed out.
                                                ^^    ^^
The ones that may exist elsewhere (e.g., in <td> ...\n... </td>) are left as is.
Some explanations about how the flag work:
First, <table> set the bit to say it doesn't want ignorable whitespace children,
so all the inter-tag whitespace <tr>...</tr>\n<tr>...</tr>\n, etc, are wed out.
Then <tr> set the bit too, so all the inter-tag <td>...<td>\n<td>...<td>\n are
wed out. But <td> doesn't set the bit, so it keeps all of its children.

If we suppose for a moment that only some of the <tr> set the bit, it would
therefore be possible to selectively and independently exclude or include
whitespace at any level of the hierarchy. That's in fact the motivation for
investigating if any win could come from this scheme in the general frame model.
Ian -- we can drop the whitespace if we check first for 'white-space: pre' (and
the 'display' types that will cause that to be recognized).
Ok, my misunderstanding is at which point is this drop done? Is it at the 
frame construction level, redone each time style is reresolved? If so, then yes,
we can just check for the display types, and drop whitespace on table, inline-
table, table-row, table-row-group, table-footer-group, table-header-group, 
table-coloumn, and table-column-group frames (keeping whitespace in table-cell
frames of course). But then we can actually drop _all_ non-table frames there,
since all such content should get wrapped by the appropriate anonymous frames,
and so should never exist there in the first place.

If this is done at the parser/content sink level, though, then we can't do any 
of this (for tables) since we don't know what is a table ahead of time, and it
could change dynamically at any time.

Note that we shouldn't ever do this based on the tagname of HTML elements; it
should be done exclusively on 'display' properties. But then block elements 
never contain whitespace nodes either, theoretically, since those would be 
wrapped in anonymous inline boxes, and inline boxes can (if white-space is set
to 'normal', the initial value) collapse all contiguous whitespace characters to
a single space.

(MathML elements are a different matter, since they have display semantics not
described by CSS. For MathML, this would indeed be based on the namespace and 
tagname of the elements.)
Ian - It's done at frame construction time.  We still want the whitespace in the
content model.  Since it's done with frame construction, if we change the style
change hint for the 'white-space' property to NS_STYLE_HINT_FRAMECHANGE (or
whatever it's called), dynamic changes to the 'white-space' property should be
handled correctly.
Yes, currently changes to whitespace only cause a REFLOW hint, so we need to
modify that as David suggested to make sure we reframe.
Cool, in that case I see no problems. :-)
Depends on: 68411
I'm willing to work on this early this summer, but if someone wants it done
sooner they should take it...
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla0.9.2
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Priority: P1 → P2
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Target Milestone: mozilla0.9.5 → mozilla0.9.6
I had a look at the existing code.  I don't think it's general enough, since it
seems to me it would cause whitespace to be stripped in something like:

<p><b>hello</b> <i>world</i></p>

if we used it for block frames.  Or did I miss something?

For block frames, we can always (when 'white-space' is 'pre' or 'nowrap') strip
whitespace:
 * at the beginning or the end
 * between child block frames, as opposed to child inline frames
It seems to me a more appropriate name for the current state bit would be
NS_FRAME_EXCLUDE_WHITESPACE.  I can't think of situations other than blocks that
would have similar rules for the ones that would apply to blocks, so I don't
think we need a second general solution in addition to the general one used for
MathML and tables.  However, the savings for blocks would be considerable (in
terms of memory and performance and simplification of the block code), so I
think we should try to do this for the block code.

I guess the other frame classes I could think of that might benefit from the
existing code are the XUL frames, except I think ignorable whitespace is
stripped out in the XUL *content sink*.
It actually would be good to do this for the relevant XUL frames so that XUL
display types work better from non-XUL documents.
Target Milestone: mozilla0.9.6 → mozilla0.9.7
Well, this was a little hack to see if I could do this for block frames.  It
actually doesn't seem all that hard -- except that this patch didn't succeed in
getting rid of all of the frames I wanted to (never mind the ones separated by
comments).  It also crashes when I click on a link...
Oh, and I wonder why ProcessChildren and ProcessBlockChildren still need to be
separate...
+        nsCOMPtr<nsIContent> content(*iter);
+        nsCOMPtr<nsITextContent> textContent = do_QueryInterface(content);
+        PRBool onlyWhitespace;
+        if (textContent &&
+            NS_SUCCEEDED(textContent->IsOnlyWhitespace(&onlyWhitespace)) &&
+            onlyWhitespace)

could be s//'ed with the existing /*static PRBool*/ IsOnlyWhitespace(*iter) that
is somewhere in that file.
Target Milestone: mozilla0.9.7 → Future
This patch, after a quick test, seems to do what I intended it do, and is the
first half of getting rid of ignorable whitespace at the beginning and end of
blocks.  What it doesn't deal with is ignorable whitespace within inlines in
blocks, e.g.,

<div><span>    <span>hi</span></span></div>

which we do need to fix if we're going to use this to allow us to simplify the
block code, although only because of block-within-inline cases, such as:

<div><span>    <span style="display: block">hi</span></span></div>


However, this patch should give a bit of the performance and footprint gain (I
suspect the majority) that we should get from this.

I'll look into doing the second half later -- I may just add it to this patch,
although I may decide I want to do enough refactoring to want it to be a
separate patch.
Attachment #59019 - Attachment is obsolete: true
Actually, I realize there are other cases this won't handle, like:

<div><p>foo</p>  <p>foo</p></div>

So maybe it's worth getting just this in for the performance improvement?
The patch looks good enough to me. The case "[block-elt] space [block-elt]" is
pretty hard because it is not obvious to tell if an element is a block without
resolving its style context which requires creating a frame for it in the first
place.

===
typo:
+  } else if (aContent->IsContentOfType(nsIContent::eCOMMENT) ||
+             aContent->IsContentOfType(nsIContent::eCOMMENT)) {

should be: 
+  } else if (aContent->IsContentOfType(nsIContent::eCOMMENT | nsIContent::ePI)

===
   // never create frames for comments on PIs
+  // XXX It would be nice to move this logic into |ChildIterator| --
+  // it's already partly there for |SkipInitialWhitespace| and
+  // |SkipFinalWhitespace|.
   if (tag == nsLayoutAtoms::commentTagName ||
       tag == nsLayoutAtoms::processingInstructionTagName)
-    return rv;
+    return NS_OK;

while there, you could perhaps move this in NeedFrameFor() while using the newly
added IsContentOfType() there.
>typo:
>+  } else if (aContent->IsContentOfType(nsIContent::eCOMMENT) ||
>+             aContent->IsContentOfType(nsIContent::eCOMMENT)) {
>
>should be: 
>+  } else if (aContent->IsContentOfType(nsIContent::eCOMMENT | nsIContent::ePI)

Thanks, although it needs to be two separate calls -- the latter asks whether
the content is both a comment and a processing instruction.


I'd like to find a better way to do child filtering.  We have three types of
filtering that we want to do:
 1. filtering out comments and processing instructions
 2. (1) plus removing all text nodes that are only whitespace
 3. (1) plus removing certain text nodes that are only whitespace -- those at
the beginning of the child list (ignoring things filtered in (1), as are the
following statements), those at the end of the child list, and those in the
middle of the child list preceded or followed by a block (although only handling
the preceded case would catch the cases of extra lines being created -- except
for one inline-block case).

I don't really see a good way to do this, yet.
What about trying a two-pass frame creation scheme (something that ocurred to me
the other day after sending my earlier comment). In the first-pass, only create
frames that are guaranteed to be needed (i.e,. non-textframes, non-comment|pi),
whilst marking the other content nodes that are left out (e.g., by enlisting
them in a voidarray or a similar structure that keeps track of the prev (and/or
next) frame siblings at the insertion points. Then in the second pass, possibly
create other frames that should be created [the caveat is that the child list is
not claimed to be construted until all children are created]. However, if you
are already measuring a sizeable win with the current patch, maybe the current
patch is ok for the time being, and no further gymnastics are needed at the
moment.
Attachment #62595 - Flags: needs-work+
I just removed this from my tree.
Attachment #62595 - Attachment is obsolete: true
QA Contact: chrispetersen → layout
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
You need to log in before you can comment on or make changes to this bug.