The default bug view has changed. See this FAQ.

[HTML5][Patch] Should flush occasionally when loading pure text

RESOLVED FIXED

Status

()

Core
HTML: Parser
P2
normal
RESOLVED FIXED
8 years ago
7 years ago

People

(Reporter: roc, Assigned: hsivonen)

Tracking

(Blocks: 1 bug)

Trunk
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment, 4 obsolete attachments)

The testcase is a million-line file containing <pre> text. The HTML5 parser builds one giant text node, which is fine, but we don't seem to ever flush partial contents of that text node during the load --- nsTextFrame::CharacterDataChanged is never called, and you don't ever see partial results.
(Assignee)

Updated

8 years ago
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
(Assignee)

Comment 1

8 years ago
In the creation of a text node relatively expensive? Does it make sense to create a text node to hold text when the parser doesn't know if the node will get inserted or have its content extracted and appended to a pre-existing text node?
Just creating a text node is not very expensive. Creating and reflowing text frames can be very expensive, but we're willing to eat that cost to get incremental display, in general.

We probably should use some kind of timer here. If we read text and then the network load stalls, the user should be able to read that text within some bounded (short) amount of time.
Looking at it another way, if the browser is idle and we have received content from the network that should be visible, but the user can't read it, that is a bug.
Hmm, I guess comment #3 and the second paragraph of comment #2 aren't exactly about this bug.

Even if the network load is never idle, we should still not allow unbounded delay between data being received from the network and being visible to the user.
(Assignee)

Comment 5

8 years ago
(In reply to comment #2)
> Creating and reflowing text frames can be very expensive,

Does this happen when a text node is inserted and notified on? Or does it only happen the the document update batch ends? So if a text node is inserted, notified, extended with more text and notified within one doc update, will the text frames be created/reflowed once or twice?
(Assignee)

Comment 6

8 years ago
Also, is there a timer API that can fire to a non-main thread?
(In reply to comment #5)
> Does this happen when a text node is inserted and notified on? Or does it only
> happen the the document update batch ends?

Frame construction happens on textnode insertion.

Reflow normally happens asynchronously.

> So if a text node is inserted,
> notified, extended with more text and notified within one doc update, will the
> text frames be created/reflowed once or twice?

Created once and reflowed not at all, I think.
(In reply to comment #6)
> Also, is there a timer API that can fire to a non-main thread?

I don't know.
(Assignee)

Comment 9

8 years ago
(In reply to comment #7)
> (In reply to comment #5)
> > So if a text node is inserted,
> > notified, extended with more text and notified within one doc update, will the
> > text frames be created/reflowed once or twice?
> 
> Created once and reflowed not at all, I think.

So from layout perf point of view it would be safe for the off-the-main-thread parser to introduce a text node discontinuity at Necko buffer boundaries and leave it to the tree op executor to decide whether to coalesce the discontinuity within one doc update?

Still, it might be smarter to make the timer control the discontinuity creation off-the-main-thread than making it merely control on-the-main-thread flushing. However, if a timer decides that the parser thread hasn't flushed parser 1 recently enough, it won't be able to flush parser 1 while the same parser thread is working for parser 2.

On the third hand, if the expensive part is reflow so that adding text to text nodes isn't too expensive if it happens before the next reflow has happenend and reflow has its own timer, shouldn't the parser post a flush request on the main thread at the end of each Necko buffer hoping that if another Necko buffer comes in right away and ends up adding text to the last text node the reflow timer hasn't fired yet? (Doing it this way would be the simplest solution from the parser POV.)
(In reply to comment #9)
> So from layout perf point of view it would be safe for the off-the-main-thread
> parser to introduce a text node discontinuity at Necko buffer boundaries and
> leave it to the tree op executor to decide whether to coalesce the
> discontinuity within one doc update?

Yes.

> On the third hand, if the expensive part is reflow so that adding text to text
> nodes isn't too expensive if it happens before the next reflow has happenend
> and reflow has its own timer, shouldn't the parser post a flush request on the
> main thread at the end of each Necko buffer hoping that if another Necko buffer
> comes in right away and ends up adding text to the last text node the reflow
> timer hasn't fired yet? (Doing it this way would be the simplest solution from
> the parser POV.)

That sounds fine. Reflow doesn't happen on a timer, but it will (compositor phase 2).
(Assignee)

Comment 11

8 years ago
(In reply to comment #10)
> > On the third hand, if the expensive part is reflow so that adding text to text
> > nodes isn't too expensive if it happens before the next reflow has happenend
> > and reflow has its own timer, shouldn't the parser post a flush request on the
> > main thread at the end of each Necko buffer hoping that if another Necko buffer
> > comes in right away and ends up adding text to the last text node the reflow
> > timer hasn't fired yet? (Doing it this way would be the simplest solution from
> > the parser POV.)
> 
> That sounds fine. Reflow doesn't happen on a timer, but it will (compositor
> phase 2).

OK. I'll implement it this way.

While I'm at it, I'm removing the need to use text nodes as text holders in the tree op queue.

It seems to me that if I ask the text node to take care of the notification, it'll fire mutatation events:
http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsGenericDOMDataNode.cpp#356

Yet, the existing sinks as the node to take care of its notifications:
http://mxr.mozilla.org/mozilla-central/source/content/xml/document/src/nsXMLContentSink.cpp#864

Bug in the existing sinks?
(Assignee)

Comment 12

8 years ago
s/as/asks/
(Assignee)

Comment 13

8 years ago
Created attachment 412596 [details] [diff] [review]
WIP
(Assignee)

Comment 14

8 years ago
Filed bug 528937 about the useless use of nsresult around this part of the code.
(Assignee)

Comment 15

8 years ago
Created attachment 412795 [details] [diff] [review]
Better WIP
Attachment #412596 - Attachment is obsolete: true
(Assignee)

Comment 16

8 years ago
Created attachment 412839 [details] [diff] [review]
Potential fix

I'll dogfood this before requesting review. Also, this needs a .sjs-based test case.
Attachment #412795 - Attachment is obsolete: true
(Assignee)

Comment 17

8 years ago
(In reply to comment #11)
> Bug in the existing sinks?

Filed bug 529276.
I know you haven't asked for review yet, but I just happened to notice a crash that suggested this easy change:

diff --git a/parser/html/nsHtml5TreeOperation.cpp b/parser/html/nsHtml5TreeOperation.cpp
--- a/parser/html/nsHtml5TreeOperation.cpp
+++ b/parser/html/nsHtml5TreeOperation.cpp
@@ -130,9 +130,11 @@ nsHtml5TreeOperation::AppendText(PRUnich
                                  nsHtml5TreeOpExecutor* aBuilder)
 {
   nsresult rv = NS_OK;
   nsIContent* previousTextNode = aBuilder->GetPreviousTextNode();
-  if (aParent->GetLastChild() == previousTextNode) {
+  if (previousTextNode &&
+      previousTextNode == aParent->GetLastChild())
+  {
     nsHtml5OtherDocUpdate update(aParent->GetOwnerDoc(),
                                  aBuilder->GetDocument());
     // XXX notifications!
     rv = previousTextNode->AppendText(aBuffer, aLength, PR_FALSE);

I haven't looked into it deeply, but I imagine GetLastChild can return null when the node has no children, and previousTextNode can return null when there haven't been any previous text nodes, so the mere equality of previousTextNode and aParent->GetLastChild() does not imply the existence of previousTextNode.
(In reply to comment #18)
> I know you haven't asked for review yet, but I just happened to notice a crash

... and, I just happened to notice that you've already fixed it!

(I blame the new Bugzilla for auto-hiding obsolete patches and thus leading me to believe there had only been one version of this patch so far.  Hrmph!)
(Assignee)

Updated

8 years ago
Blocks: 511405
(Assignee)

Comment 20

7 years ago
(In reply to comment #19)
> (In reply to comment #18)
> > I know you haven't asked for review yet, but I just happened to notice a crash
> 
> ... and, I just happened to notice that you've already fixed it!

Thanks. Per discussion on public-html, I'm going to remove the previousTextNode parser-insertedness check.

Another finding: Getting rid of the timer looks bad for throughput. I think I'll add a timer that posts runnables to the parser thread that flush (instead of flushing at the end of Necko buffer).

Also, the parser thread shouldn't flush if the current element on the stack is a foster-parenting element. (Since flushing in that case without other bookkeeping would make the tree shape dependent on flush points.)
(Assignee)

Updated

7 years ago
Priority: -- → P2
(Assignee)

Updated

7 years ago
Blocks: 482925
(Assignee)

Updated

7 years ago
Depends on: 530535
(Assignee)

Comment 21

7 years ago
Created attachment 414216 [details] [diff] [review]
Fix using a timer after all

Notes:
* The values of the timer prefs are based on guesswork and a pre-existing comment in nsContentSink saying that the flush interval shouldn't be more frequent than 1/10 second to avoid perf issues. Tuning these values is bug 482925.

* The MaybeFlush setup in nsAHtml5TreeOpSink was a bad premature "optimization", so this patch removes it.

* nsHtml5StreamParser::InitializeStatics() uses a capital 'I', since it is hand-written C++ and not autogenerated from Java.

* The relocated flush timer isn't traversed in cycle collection, because the CC wouldn't walk onwards from the timer anyway.

* Text is never flushed if there's an infinite stream of text after a foster-parenting start tag like <table>. Let's change that only if a top site really uses text-only comet responses with that kind of syntax error.
Attachment #412839 - Attachment is obsolete: true
Attachment #414216 - Flags: review?(bnewman)
(Assignee)

Comment 22

7 years ago
Comment on attachment 414216 [details] [diff] [review]
Fix using a timer after all

Sigh. I'm seeing mysterious breakage on Wikipedia with this patch.
Attachment #414216 - Flags: review?(bnewman)
(Assignee)

Comment 23

7 years ago
Comment on attachment 414216 [details] [diff] [review]
Fix using a timer after all

Sorry for the spam. There was other bogus stuff in my queue.
Attachment #414216 - Flags: review?(bnewman)
(Assignee)

Comment 24

7 years ago
Created attachment 414459 [details] [diff] [review]
Check array index to avoid crashing if timer fires before the root element has been pushed
Attachment #414216 - Attachment is obsolete: true
Attachment #414459 - Flags: review?(bnewman)
Attachment #414216 - Flags: review?(bnewman)
(Assignee)

Updated

7 years ago
Summary: [HTML5] Should flush occasionally when loading pure text → [HTML5][Patch] Should flush occasionally when loading pure text
Comment on attachment 414459 [details] [diff] [review]
Check array index to avoid crashing if timer fires before the root element has been pushed

Looks good to me.

(In reply to comment #21)
> * Text is never flushed if there's an infinite stream of text after a
> foster-parenting start tag like <table>. Let's change that only if a top site
> really uses text-only comet responses with that kind of syntax error.

Yeah, it's hard to get worried about that.
Attachment #414459 - Flags: review?(bnewman) → review+
(Assignee)

Comment 26

7 years ago
Thanks for the r.
http://hg.mozilla.org/mozilla-central/rev/aa7ed71f36b7
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.