Closed Bug 502568 Opened 15 years ago Closed 15 years ago

[HTML5][Patch] Should flush occasionally when loading pure text

Categories

(Core :: DOM: HTML Parser, defect, P2)

x86
macOS
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: roc, Assigned: hsivonen)

References

(Blocks 1 open bug, )

Details

Attachments

(1 file, 4 obsolete files)

The testcase is a million-line file containing <pre> text. The HTML5 parser builds one giant text node, which is fine, but we don't seem to ever flush partial contents of that text node during the load --- nsTextFrame::CharacterDataChanged is never called, and you don't ever see partial results.
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
In the creation of a text node relatively expensive? Does it make sense to create a text node to hold text when the parser doesn't know if the node will get inserted or have its content extracted and appended to a pre-existing text node?
Just creating a text node is not very expensive. Creating and reflowing text frames can be very expensive, but we're willing to eat that cost to get incremental display, in general.

We probably should use some kind of timer here. If we read text and then the network load stalls, the user should be able to read that text within some bounded (short) amount of time.
Looking at it another way, if the browser is idle and we have received content from the network that should be visible, but the user can't read it, that is a bug.
Hmm, I guess comment #3 and the second paragraph of comment #2 aren't exactly about this bug.

Even if the network load is never idle, we should still not allow unbounded delay between data being received from the network and being visible to the user.
(In reply to comment #2)
> Creating and reflowing text frames can be very expensive,

Does this happen when a text node is inserted and notified on? Or does it only happen the the document update batch ends? So if a text node is inserted, notified, extended with more text and notified within one doc update, will the text frames be created/reflowed once or twice?
Also, is there a timer API that can fire to a non-main thread?
(In reply to comment #5)
> Does this happen when a text node is inserted and notified on? Or does it only
> happen the the document update batch ends?

Frame construction happens on textnode insertion.

Reflow normally happens asynchronously.

> So if a text node is inserted,
> notified, extended with more text and notified within one doc update, will the
> text frames be created/reflowed once or twice?

Created once and reflowed not at all, I think.
(In reply to comment #6)
> Also, is there a timer API that can fire to a non-main thread?

I don't know.
(In reply to comment #7)
> (In reply to comment #5)
> > So if a text node is inserted,
> > notified, extended with more text and notified within one doc update, will the
> > text frames be created/reflowed once or twice?
> 
> Created once and reflowed not at all, I think.

So from layout perf point of view it would be safe for the off-the-main-thread parser to introduce a text node discontinuity at Necko buffer boundaries and leave it to the tree op executor to decide whether to coalesce the discontinuity within one doc update?

Still, it might be smarter to make the timer control the discontinuity creation off-the-main-thread than making it merely control on-the-main-thread flushing. However, if a timer decides that the parser thread hasn't flushed parser 1 recently enough, it won't be able to flush parser 1 while the same parser thread is working for parser 2.

On the third hand, if the expensive part is reflow so that adding text to text nodes isn't too expensive if it happens before the next reflow has happenend and reflow has its own timer, shouldn't the parser post a flush request on the main thread at the end of each Necko buffer hoping that if another Necko buffer comes in right away and ends up adding text to the last text node the reflow timer hasn't fired yet? (Doing it this way would be the simplest solution from the parser POV.)
(In reply to comment #9)
> So from layout perf point of view it would be safe for the off-the-main-thread
> parser to introduce a text node discontinuity at Necko buffer boundaries and
> leave it to the tree op executor to decide whether to coalesce the
> discontinuity within one doc update?

Yes.

> On the third hand, if the expensive part is reflow so that adding text to text
> nodes isn't too expensive if it happens before the next reflow has happenend
> and reflow has its own timer, shouldn't the parser post a flush request on the
> main thread at the end of each Necko buffer hoping that if another Necko buffer
> comes in right away and ends up adding text to the last text node the reflow
> timer hasn't fired yet? (Doing it this way would be the simplest solution from
> the parser POV.)

That sounds fine. Reflow doesn't happen on a timer, but it will (compositor phase 2).
(In reply to comment #10)
> > On the third hand, if the expensive part is reflow so that adding text to text
> > nodes isn't too expensive if it happens before the next reflow has happenend
> > and reflow has its own timer, shouldn't the parser post a flush request on the
> > main thread at the end of each Necko buffer hoping that if another Necko buffer
> > comes in right away and ends up adding text to the last text node the reflow
> > timer hasn't fired yet? (Doing it this way would be the simplest solution from
> > the parser POV.)
> 
> That sounds fine. Reflow doesn't happen on a timer, but it will (compositor
> phase 2).

OK. I'll implement it this way.

While I'm at it, I'm removing the need to use text nodes as text holders in the tree op queue.

It seems to me that if I ask the text node to take care of the notification, it'll fire mutatation events:
http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsGenericDOMDataNode.cpp#356

Yet, the existing sinks as the node to take care of its notifications:
http://mxr.mozilla.org/mozilla-central/source/content/xml/document/src/nsXMLContentSink.cpp#864

Bug in the existing sinks?
s/as/asks/
Attached patch WIP (obsolete) — Splinter Review
Filed bug 528937 about the useless use of nsresult around this part of the code.
Attached patch Better WIP (obsolete) — Splinter Review
Attachment #412596 - Attachment is obsolete: true
Attached patch Potential fix (obsolete) — Splinter Review
I'll dogfood this before requesting review. Also, this needs a .sjs-based test case.
Attachment #412795 - Attachment is obsolete: true
(In reply to comment #11)
> Bug in the existing sinks?

Filed bug 529276.
I know you haven't asked for review yet, but I just happened to notice a crash that suggested this easy change:

diff --git a/parser/html/nsHtml5TreeOperation.cpp b/parser/html/nsHtml5TreeOperation.cpp
--- a/parser/html/nsHtml5TreeOperation.cpp
+++ b/parser/html/nsHtml5TreeOperation.cpp
@@ -130,9 +130,11 @@ nsHtml5TreeOperation::AppendText(PRUnich
                                  nsHtml5TreeOpExecutor* aBuilder)
 {
   nsresult rv = NS_OK;
   nsIContent* previousTextNode = aBuilder->GetPreviousTextNode();
-  if (aParent->GetLastChild() == previousTextNode) {
+  if (previousTextNode &&
+      previousTextNode == aParent->GetLastChild())
+  {
     nsHtml5OtherDocUpdate update(aParent->GetOwnerDoc(),
                                  aBuilder->GetDocument());
     // XXX notifications!
     rv = previousTextNode->AppendText(aBuffer, aLength, PR_FALSE);

I haven't looked into it deeply, but I imagine GetLastChild can return null when the node has no children, and previousTextNode can return null when there haven't been any previous text nodes, so the mere equality of previousTextNode and aParent->GetLastChild() does not imply the existence of previousTextNode.
(In reply to comment #18)
> I know you haven't asked for review yet, but I just happened to notice a crash

... and, I just happened to notice that you've already fixed it!

(I blame the new Bugzilla for auto-hiding obsolete patches and thus leading me to believe there had only been one version of this patch so far.  Hrmph!)
Blocks: 511405
(In reply to comment #19)
> (In reply to comment #18)
> > I know you haven't asked for review yet, but I just happened to notice a crash
> 
> ... and, I just happened to notice that you've already fixed it!

Thanks. Per discussion on public-html, I'm going to remove the previousTextNode parser-insertedness check.

Another finding: Getting rid of the timer looks bad for throughput. I think I'll add a timer that posts runnables to the parser thread that flush (instead of flushing at the end of Necko buffer).

Also, the parser thread shouldn't flush if the current element on the stack is a foster-parenting element. (Since flushing in that case without other bookkeeping would make the tree shape dependent on flush points.)
Priority: -- → P2
Blocks: 482925
Depends on: 530535
Attached patch Fix using a timer after all (obsolete) — Splinter Review
Notes:
* The values of the timer prefs are based on guesswork and a pre-existing comment in nsContentSink saying that the flush interval shouldn't be more frequent than 1/10 second to avoid perf issues. Tuning these values is bug 482925.

* The MaybeFlush setup in nsAHtml5TreeOpSink was a bad premature "optimization", so this patch removes it.

* nsHtml5StreamParser::InitializeStatics() uses a capital 'I', since it is hand-written C++ and not autogenerated from Java.

* The relocated flush timer isn't traversed in cycle collection, because the CC wouldn't walk onwards from the timer anyway.

* Text is never flushed if there's an infinite stream of text after a foster-parenting start tag like <table>. Let's change that only if a top site really uses text-only comet responses with that kind of syntax error.
Attachment #412839 - Attachment is obsolete: true
Attachment #414216 - Flags: review?(bnewman)
Comment on attachment 414216 [details] [diff] [review]
Fix using a timer after all

Sigh. I'm seeing mysterious breakage on Wikipedia with this patch.
Attachment #414216 - Flags: review?(bnewman)
Comment on attachment 414216 [details] [diff] [review]
Fix using a timer after all

Sorry for the spam. There was other bogus stuff in my queue.
Attachment #414216 - Flags: review?(bnewman)
Attachment #414216 - Attachment is obsolete: true
Attachment #414459 - Flags: review?(bnewman)
Attachment #414216 - Flags: review?(bnewman)
Summary: [HTML5] Should flush occasionally when loading pure text → [HTML5][Patch] Should flush occasionally when loading pure text
Comment on attachment 414459 [details] [diff] [review]
Check array index to avoid crashing if timer fires before the root element has been pushed

Looks good to me.

(In reply to comment #21)
> * Text is never flushed if there's an infinite stream of text after a
> foster-parenting start tag like <table>. Let's change that only if a top site
> really uses text-only comet responses with that kind of syntax error.

Yeah, it's hard to get worried about that.
Attachment #414459 - Flags: review?(bnewman) → review+
Thanks for the r.
http://hg.mozilla.org/mozilla-central/rev/aa7ed71f36b7
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: