Closed Bug 502568 Opened 15 years ago Closed 15 years ago

[HTML5][Patch] Should flush occasionally when loading pure text

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: roc, Assigned: hsivonen)

References

(Blocks 1 open bug,
URL
)

Details

Attachments

(1 file, 4 obsolete files)

WIP 15 years ago Henri Sivonen (:hsivonen) 36.69 KB, patch		Details \| Diff \| Splinter Review
Better WIP 15 years ago Henri Sivonen (:hsivonen) 36.75 KB, patch		Details \| Diff \| Splinter Review
Potential fix 15 years ago Henri Sivonen (:hsivonen) 40.12 KB, patch		Details \| Diff \| Splinter Review
Fix using a timer after all 15 years ago Henri Sivonen (:hsivonen) 55.36 KB, patch		Details \| Diff \| Splinter Review
Check array index to avoid crashing if timer fires before the root element has been pushed 15 years ago Henri Sivonen (:hsivonen) 55.40 KB, patch	mozilla+ben : review+	Details \| Diff \| Splinter Review

Robert O'Callahan (:roc) (email my personal email if necessary)

Reporter

Description

•

15 years ago

The testcase is a million-line file containing <pre> text. The HTML5 parser builds one giant text node, which is fine, but we don't seem to ever flush partial contents of that text node during the load --- nsTextFrame::CharacterDataChanged is never called, and you don't ever see partial results.

Henri Sivonen (:hsivonen)

Assignee

Updated

•

15 years ago

Assignee: nobody → hsivonen

Status: NEW → ASSIGNED

Henri Sivonen (:hsivonen)

Assignee

Comment 1

•

15 years ago

In the creation of a text node relatively expensive? Does it make sense to create a text node to hold text when the parser doesn't know if the node will get inserted or have its content extracted and appended to a pre-existing text node?

Robert O'Callahan (:roc) (email my personal email if necessary)

Reporter

Comment 2

•

15 years ago

Just creating a text node is not very expensive. Creating and reflowing text frames can be very expensive, but we're willing to eat that cost to get incremental display, in general.

We probably should use some kind of timer here. If we read text and then the network load stalls, the user should be able to read that text within some bounded (short) amount of time.

Robert O'Callahan (:roc) (email my personal email if necessary)

Reporter

Comment 3

•

15 years ago

Looking at it another way, if the browser is idle and we have received content from the network that should be visible, but the user can't read it, that is a bug.

Robert O'Callahan (:roc) (email my personal email if necessary)

Reporter

Comment 4

•

15 years ago

Hmm, I guess comment #3 and the second paragraph of comment #2 aren't exactly about this bug.

Even if the network load is never idle, we should still not allow unbounded delay between data being received from the network and being visible to the user.

Henri Sivonen (:hsivonen)

Assignee

Comment 5

•

15 years ago

(In reply to comment #2)
> Creating and reflowing text frames can be very expensive,

Does this happen when a text node is inserted and notified on? Or does it only happen the the document update batch ends? So if a text node is inserted, notified, extended with more text and notified within one doc update, will the text frames be created/reflowed once or twice?

Henri Sivonen (:hsivonen)

Assignee

Comment 6

•

15 years ago

Also, is there a timer API that can fire to a non-main thread?

Robert O'Callahan (:roc) (email my personal email if necessary)

Reporter

Comment 7

•

15 years ago

(In reply to comment #5)
> Does this happen when a text node is inserted and notified on? Or does it only
> happen the the document update batch ends?

Frame construction happens on textnode insertion.

Reflow normally happens asynchronously.

> So if a text node is inserted,
> notified, extended with more text and notified within one doc update, will the
> text frames be created/reflowed once or twice?

Created once and reflowed not at all, I think.

Robert O'Callahan (:roc) (email my personal email if necessary)

Reporter

Comment 8

•

15 years ago

(In reply to comment #6)
> Also, is there a timer API that can fire to a non-main thread?

I don't know.

Henri Sivonen (:hsivonen)

Assignee

Comment 9

•

15 years ago

(In reply to comment #7)
> (In reply to comment #5)
> > So if a text node is inserted,
> > notified, extended with more text and notified within one doc update, will the
> > text frames be created/reflowed once or twice?
> 
> Created once and reflowed not at all, I think.

So from layout perf point of view it would be safe for the off-the-main-thread parser to introduce a text node discontinuity at Necko buffer boundaries and leave it to the tree op executor to decide whether to coalesce the discontinuity within one doc update?

Still, it might be smarter to make the timer control the discontinuity creation off-the-main-thread than making it merely control on-the-main-thread flushing. However, if a timer decides that the parser thread hasn't flushed parser 1 recently enough, it won't be able to flush parser 1 while the same parser thread is working for parser 2.

On the third hand, if the expensive part is reflow so that adding text to text nodes isn't too expensive if it happens before the next reflow has happenend and reflow has its own timer, shouldn't the parser post a flush request on the main thread at the end of each Necko buffer hoping that if another Necko buffer comes in right away and ends up adding text to the last text node the reflow timer hasn't fired yet? (Doing it this way would be the simplest solution from the parser POV.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Reporter

Comment 10

•

15 years ago

(In reply to comment #9)
> So from layout perf point of view it would be safe for the off-the-main-thread
> parser to introduce a text node discontinuity at Necko buffer boundaries and
> leave it to the tree op executor to decide whether to coalesce the
> discontinuity within one doc update?

Yes.

> On the third hand, if the expensive part is reflow so that adding text to text
> nodes isn't too expensive if it happens before the next reflow has happenend
> and reflow has its own timer, shouldn't the parser post a flush request on the
> main thread at the end of each Necko buffer hoping that if another Necko buffer
> comes in right away and ends up adding text to the last text node the reflow
> timer hasn't fired yet? (Doing it this way would be the simplest solution from
> the parser POV.)

That sounds fine. Reflow doesn't happen on a timer, but it will (compositor phase 2).

Henri Sivonen (:hsivonen)

Assignee

Comment 11

•

15 years ago

(In reply to comment #10)
> > On the third hand, if the expensive part is reflow so that adding text to text
> > nodes isn't too expensive if it happens before the next reflow has happenend
> > and reflow has its own timer, shouldn't the parser post a flush request on the
> > main thread at the end of each Necko buffer hoping that if another Necko buffer
> > comes in right away and ends up adding text to the last text node the reflow
> > timer hasn't fired yet? (Doing it this way would be the simplest solution from
> > the parser POV.)
> 
> That sounds fine. Reflow doesn't happen on a timer, but it will (compositor
> phase 2).

OK. I'll implement it this way.

While I'm at it, I'm removing the need to use text nodes as text holders in the tree op queue.

It seems to me that if I ask the text node to take care of the notification, it'll fire mutatation events:
http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsGenericDOMDataNode.cpp#356

Yet, the existing sinks as the node to take care of its notifications:
http://mxr.mozilla.org/mozilla-central/source/content/xml/document/src/nsXMLContentSink.cpp#864

Bug in the existing sinks?

Henri Sivonen (:hsivonen)

Assignee

Comment 12

•

15 years ago

s/as/asks/

Henri Sivonen (:hsivonen)

Assignee

Comment 13

•

15 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

Henri Sivonen (:hsivonen)

Assignee

Comment 14

•

15 years ago

Filed bug 528937 about the useless use of nsresult around this part of the code.

Henri Sivonen (:hsivonen)

Assignee

Comment 15

•

15 years ago

Attached patch Better WIP (obsolete) — Details — Splinter Review

Attachment #412596 - Attachment is obsolete: true

Henri Sivonen (:hsivonen)

Assignee

Comment 16

•

15 years ago

Attached patch Potential fix (obsolete) — Details — Splinter Review

I'll dogfood this before requesting review. Also, this needs a .sjs-based test case.

Attachment #412795 - Attachment is obsolete: true

Henri Sivonen (:hsivonen)

Assignee

Comment 17

•

15 years ago

(In reply to comment #11)
> Bug in the existing sinks?

Filed bug 529276.

Ben Newman (:bnewman) (:benjamn)

Comment 18

•

15 years ago

I know you haven't asked for review yet, but I just happened to notice a crash that suggested this easy change:

diff --git a/parser/html/nsHtml5TreeOperation.cpp b/parser/html/nsHtml5TreeOperation.cpp
--- a/parser/html/nsHtml5TreeOperation.cpp
+++ b/parser/html/nsHtml5TreeOperation.cpp
@@ -130,9 +130,11 @@ nsHtml5TreeOperation::AppendText(PRUnich
                                  nsHtml5TreeOpExecutor* aBuilder)
 {
   nsresult rv = NS_OK;
   nsIContent* previousTextNode = aBuilder->GetPreviousTextNode();
-  if (aParent->GetLastChild() == previousTextNode) {
+  if (previousTextNode &&
+      previousTextNode == aParent->GetLastChild())
+  {
     nsHtml5OtherDocUpdate update(aParent->GetOwnerDoc(),
                                  aBuilder->GetDocument());
     // XXX notifications!
     rv = previousTextNode->AppendText(aBuffer, aLength, PR_FALSE);

I haven't looked into it deeply, but I imagine GetLastChild can return null when the node has no children, and previousTextNode can return null when there haven't been any previous text nodes, so the mere equality of previousTextNode and aParent->GetLastChild() does not imply the existence of previousTextNode.

Ben Newman (:bnewman) (:benjamn)

Comment 19

•

15 years ago

(In reply to comment #18)
> I know you haven't asked for review yet, but I just happened to notice a crash

... and, I just happened to notice that you've already fixed it!

(I blame the new Bugzilla for auto-hiding obsolete patches and thus leading me to believe there had only been one version of this patch so far.  Hrmph!)

Henri Sivonen (:hsivonen)

Assignee

Updated

•

15 years ago

Blocks: 511405

Henri Sivonen (:hsivonen)

Assignee

Comment 20

•

15 years ago

(In reply to comment #19)
> (In reply to comment #18)
> > I know you haven't asked for review yet, but I just happened to notice a crash
> 
> ... and, I just happened to notice that you've already fixed it!

Thanks. Per discussion on public-html, I'm going to remove the previousTextNode parser-insertedness check.

Another finding: Getting rid of the timer looks bad for throughput. I think I'll add a timer that posts runnables to the parser thread that flush (instead of flushing at the end of Necko buffer).

Also, the parser thread shouldn't flush if the current element on the stack is a foster-parenting element. (Since flushing in that case without other bookkeeping would make the tree shape dependent on flush points.)

Henri Sivonen (:hsivonen)

Assignee

Updated

•

15 years ago

Priority: -- → P2

Henri Sivonen (:hsivonen)

Assignee

Updated

•

15 years ago

Blocks: 482925

Henri Sivonen (:hsivonen)

Assignee

Updated

•

15 years ago

Depends on: 530535

Henri Sivonen (:hsivonen)

Assignee

Comment 21

•

15 years ago

Attached patch Fix using a timer after all (obsolete) — Details — Splinter Review

Notes:
* The values of the timer prefs are based on guesswork and a pre-existing comment in nsContentSink saying that the flush interval shouldn't be more frequent than 1/10 second to avoid perf issues. Tuning these values is bug 482925.

* The MaybeFlush setup in nsAHtml5TreeOpSink was a bad premature "optimization", so this patch removes it.

* nsHtml5StreamParser::InitializeStatics() uses a capital 'I', since it is hand-written C++ and not autogenerated from Java.

* The relocated flush timer isn't traversed in cycle collection, because the CC wouldn't walk onwards from the timer anyway.

* Text is never flushed if there's an infinite stream of text after a foster-parenting start tag like <table>. Let's change that only if a top site really uses text-only comet responses with that kind of syntax error.

Attachment #412839 - Attachment is obsolete: true

Attachment #414216 - Flags: review?(bnewman)

Henri Sivonen (:hsivonen)

Assignee

Comment 22

•

15 years ago

Comment on attachment 414216 [details] [diff] [review]
Fix using a timer after all

Sigh. I'm seeing mysterious breakage on Wikipedia with this patch.

Attachment #414216 - Flags: review?(bnewman)

Henri Sivonen (:hsivonen)

Assignee

Comment 23

•

15 years ago

Comment on attachment 414216 [details] [diff] [review]
Fix using a timer after all

Sorry for the spam. There was other bogus stuff in my queue.

Attachment #414216 - Flags: review?(bnewman)

Henri Sivonen (:hsivonen)

Assignee

Comment 24

•

15 years ago

Attached patch Check array index to avoid crashing if timer fires before the root element has been pushed — Details — Splinter Review

Attachment #414216 - Attachment is obsolete: true

Attachment #414459 - Flags: review?(bnewman)

Attachment #414216 - Flags: review?(bnewman)

Henri Sivonen (:hsivonen)

Assignee

Updated

•

15 years ago

Summary: [HTML5] Should flush occasionally when loading pure text → [HTML5][Patch] Should flush occasionally when loading pure text

Ben Newman (:bnewman) (:benjamn)

Comment 25

•

15 years ago

Comment on attachment 414459 [details] [diff] [review]
Check array index to avoid crashing if timer fires before the root element has been pushed

Looks good to me.

(In reply to comment #21)
> * Text is never flushed if there's an infinite stream of text after a
> foster-parenting start tag like <table>. Let's change that only if a top site
> really uses text-only comet responses with that kind of syntax error.

Yeah, it's hard to get worried about that.

Attachment #414459 - Flags: review?(bnewman) → review+

Henri Sivonen (:hsivonen)

Assignee

Comment 26

•

15 years ago

Thanks for the r.
http://hg.mozilla.org/mozilla-central/rev/aa7ed71f36b7

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.