Closed Bug 110544 Opened 21 years ago Closed 21 years ago

Setting comment node text is slow


(Core :: DOM: Core & HTML, defect, P3)






(Reporter: hjtoi-bugzilla, Assigned: hjtoi-bugzilla)



(Keywords: dataloss, perf)


(6 files, 9 obsolete files)

739 bytes, text/html
2.88 KB, text/html
14.51 KB, text/html
17.07 KB, patch
: review+
Details | Diff | Splinter Review
599 bytes, patch
: review+
Details | Diff | Splinter Review
2.78 KB, patch
: review+
Details | Diff | Splinter Review
This is probably a minor issue, but I fixed it anyway... In nsCommentNode.cpp
SetText() and StripCommentDelimiters() functions are quite inefficient. They do
unnecessary string copies (and possible allocations) and moves. I rewrote these,
and according to Quantify the new version of SetText(PRUnichar*) is about 8
times faster than the old, while the versions that take char* or nsString are
over 20 times faster. 

It seems that only the PRUnichar* version is called in the browser, though. 

We call this place hundreds of times during startup, and for every comment node
in a page. In XML, there won't be comment delimiters coming into SetText. HTML
is different.

Attaching patch. Reviews?
Keywords: perf
Priority: -- → P3
Whiteboard: [fixinhand]
Target Milestone: --- → mozilla0.9.7
This should be done with Substring() stuff, that way we could do this w/o ever
copying string data or allocating string data when stripping out comment delimiters.

Create two StripCommentDelimiters() functions, one that return |const
nsDependentSubstring|, and one that return |const nsDependentCSubsting|, then
you should be able to simply do mInner.SetText(StripCommentDelim(aString)); w/o
doing any string copying or allocation :-)

Oh, to make this work you probably haveto add a
nsGenericDOMDataNode::SetText(const nsAReadableCString&) method, and probably a
nsTextFragment::operator=(const nsAReadableCString&) method, but those should be
more or less one-liners.
Attached patch Better patch with substrings (obsolete) — Splinter Review
This patch adds a few more checks to make things safer, and also uses
Substring. I don't know about performance yet.

One thing I am not so sure about: "aStr.GetFlatBufferHandle()->DataStart()". If
I replace aStr with an ns[Auto]String that I created I will get a crash in my
simple testcase (haven't run into this in the browser suite). Is there another
way of doing this or do I need a 3rd StripCommentDelimiters() function that
takes nsAReadableString?
You can't use aStr.GetFlatBufferHandle() since all strings that come through
this code are not guaranteed to be flat, in stead of relying on that make
StripCommentDelimiters() take a const nsAString& in stead of a flat buffer and
work with iterators in the implementation, then the code is guaranteed to work
with all types of strings. No need for a 3rd method, make the PRUnichar* callers
wrap the PRUnichar* in a nsDependentString.
As the testcase shows, Mozilla fixes comment delimiters always. IE6 fixes them
only when parsing. The DOM specification does not say that
document.createComment() must fix delimiters, but the documentation for
interface Comment says a comment represents the string between the comment
delimiters. Therefore I am slightly more inclined to say that Mozilla has the
correct implementation, and that we should fix the comment in the comment node.

"Interface Comment

    This interface inherits from CharacterData and represents the content of a
comment, i.e., all the characters between the starting '<!--' and ending '-->'.
Note that this is the definition of a comment in XML, and, in practice, HTML,
although some HTML tools may implement the full SGML comment structure."
No, mozilla is incorrect per the DOM spec, just as comment nodes expose the text
between the delimiters, setting the text sets the text between the delimiters so
if there are delimiters in the text we'll end up with an invalid
(unserializable) comment. This is, believe it or not, how it should work.
Attached file another test (obsolete) —
Ok, this patch removes the stripping of commeent delimiters in the comment
node. Harish, could you make a patch in the parser that would make parser strip
the comment delimiters there so we could test this?

I have a 3rd edition of the original patch otherwise ready except there are two
types of comments that get handled differently in the code done with iterators
and raw buffer. I don't feel like pursuing that at the moment if we are going
to do what IE6 is doing and according to jst is the correct thing to do.
Attachment #58171 - Attachment is obsolete: true
Attachment #58253 - Attachment is obsolete: true
As the "another testcase" shows this is also a dataloss bug (so it is in IE as
well, btw). Given <!12--> we show " 2" as comment, ie shows "2". We replace the
first char with space, IE always eats two characters after "<!".

I think my strict comment parsing is now ok, much easier to read, dunno about
speed (hey that rhymes). Normal comment parsing still needs a little love with
comments that have several sections of "-- ... --" as the test row 15. shows.
Keywords: dataloss
Attached patch Proposed fix to parser (obsolete) — Splinter Review
Attachment #58779 - Attachment is obsolete: true
Attachment #59629 - Attachment is obsolete: true
Attached patch full patch (obsolete) — Splinter Review
This latest patch obsoletes all previous patches. I have one problem that I am
aware of, which seems like a string/iterator bug which manifests itself as hang
when trying to open the testcase that I will attach shortly (Ian's evil comment
test without strict doctype).
Attachment #58800 - Attachment is obsolete: true
Attachment #59651 - Attachment is obsolete: true
Doh, seems like the patch got corrupted in HTMLContentSinkPart. Well, there is
only one change that goes together with the change in GenericDOMDataNode.
Specifically, I noticed that it doesn't seem to make sense to create a comment
node and then AppendData() to it - it should be SetData() in my opinion. Do the
different implementations cause some problems with events etc.? Another thing is
that if you AppendData("") we asserted with the old code. Would it be safe to
just bail out if the string to append is empty, or do we need to do reflows and
fire mutation notifications even though we did not change data?
Target Milestone: mozilla0.9.7 → mozilla0.9.8
harishd noted that in strict comment parsing we should allow comment end with
--\w*> where \w* is whitespace. I also got a workaround from jag for the hang,
so I should have a new patch tomorrow.
Comment on attachment 64864 [details] [diff] [review]
Full corrected patch with jag's string workaround

>Index: htmlparser/src/nsHTMLTokens.cpp
>+      //aString = Substring(beginData, ++current);
>+      // XXX Instead we can do this EVIL HACK (from jag):
>+      PRUint32 len = Distance(beginData, ++current);
>+      aString.SetLength(len);
>+      PRUnichar* dest = NS_CONST_CAST(PRUnichar*, aString.get());
>+      copy_string(beginData, gt, dest);

The 'gt' on the last line should of course be 'current'.

harishd, review?
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Comment on attachment 64864 [details] [diff] [review]
Full corrected patch with jag's string workaround

>+nsresult ConsumeComment(nsScanner& aScanner, nsString& aString) {
>+        if (*current == kMinus && current != beginLastMinus) { // ->
>+          goodComment = PR_TRUE;
>+          --current;
>+          if (*current == kMinus && current != beginLastMinus) { // -->
>+            --current;

harishd noticed that goodComment = PR_TRUE should probably be moved
into the second if to guard against cases like <!-- hello -> there -->.
jag: some bad news. Your workaround works in all other cases except with the
last testcase (attachment 59892 [details]).

I would really love to get this checked in. I just run the page loader test, and
this seems to give 2.5% improvement in page load time!
How does it fail?
Sorry, same problem: hangs doing Distance. Stack is

NTDLL! 77fa018c()
nsDebug::Assertion(const char * 0x1010e8bc `string', const char * 0x1010e8e8
`string', const char * 0x1012d470 `string', int 91) line 290 + 13 bytes
copy_string(nsReadingIterator<unsigned short> & {"????????????"}, const
nsReadingIterator<unsigned short> & {"-????????????"}, CalculateLength<unsigned
short> & {...}) line 91 + 28 bytes
Distance_Impl(const nsReadingIterator<unsigned short> & {" -????????????"},
const nsReadingIterator<unsigned short> & {"-????????????"}) line 91 + 17 bytes
Distance(const nsReadingIterator<unsigned short> & {" -????????????"}, const
nsReadingIterator<unsigned short> & {"-????????????"}) line 99 + 13 bytes
ConsumeComment(nsScanner & {...}, nsString & {""}) line 1136 + 19 bytes
The assertion is "|copy_string| will never terminate".
I was rereading the patch, am I wrong to say that:

+        if (*current == kMinus && current != beginLastMinus) { // ->
+          goodComment = PR_TRUE;
+          --current;
+          if (*current == kMinus && current != beginLastMinus) { // -->
+            --current;
+          }
+        } else if (*current == '!' && current != beginLastMinus) {
+          --current;
+          if (*current == kMinus && current != beginLastMinus) {
+            --current;
+            if (*current == kMinus && current != beginLastMinus) { // --!>
+              --current;
+              goodComment = PR_TRUE;
-          return result;
+        } else if (current == beginLastMinus) {
+          goodComment = PR_TRUE;
+        }

If I understood correctly, the last 'if' in this block is for the special case
of <!-->
This could be refactored as:

+        if (*current == kMinus) {   // ->  
+          goodComment = PR_TRUE;
           if (current != beginLastMinus) { // <!--...-> but not <!-->
+            --current;
+            if (*current == kMinus && current != beginLastMinus) { // -->
+              --current;
+            }
+          }
+        } else if (*current == '!' && current != beginLastMinus) {
+          --current;
+          if (*current == kMinus && current != beginLastMinus) {
+            --current;
+            if (*current == kMinus && current != beginLastMinus) { // --!>
+              --current;
+              goodComment = PR_TRUE;
-          return result;

I don't know is this helps...
jag: any progress on this, 2.5% speedup is pretty significant so I'd really like
to get this checked in...
Whiteboard: [fixinhand] → [fixinhand][2.5% improvement in page load]
Attached patch Fix view source (obsolete) — Splinter Review
View source was missing comment open/close.
Target Milestone: mozilla0.9.9 → mozilla1.0
*sigh* I keep forgetting to post comments in this bug.

So, here's what's going on: we have two iterators on a nsSlidingString with at
least two buffers internally. iter1 is on the first buffer, iter2 is on the
start of the second buffer. --iter2;. If we now test SameFragment(iter1, iter2),
this should say "yes" but it says "no".

Upon the --iter2, normalize_backward() is called, which will get the previous
buffer/fragment. This is the relevant part of

253 if ( result_buffer == mStart.mBuffer )
254   aFragment.mStart = mStart.mPosInBuffer;
255 else
256   aFragment.mStart = result_buffer->DataStart();

In there, since we're on the first buffer now, the aFragment.mStart is set to
mStart.mPosInBuffer. This means that if |mStart| points to just a few chars
before the end of the first buffer, the newly created fragment will start there,
instead of at the beginning of the buffer, so that the check
|iter1.fragment().mStart == iter2.fragment().mStart| fails.

In |Distance|, which calls |copy_string|, which calls

393 return PRUint32(SameFragment(first, last) ? last.get()-first.get() :

This failure of SameFragment() means that first.size_forward() will get N+1
instead of N (where N is last.get()-first.get()) so first is advanced beyond
last, which means the |while (first != last)| in copy_string will never end.

I think the solution is to hand back the same fragment regardless of |Position
mStart, mEnd|, but I'll have to study the code a little more to find out why
we're currently doing this.

Vidur, scc?
Target Milestone: mozilla1.0 → mozilla0.9.9
Whoops, I accidentily changed the milestone this morning, setting it back.
Target Milestone: mozilla0.9.9 → mozilla1.0
Attached patch Scanner patchSplinter Review
Some progress... Vidur found this little piece in the scanner where we perhaps
aren't doing what we should. With this patch all my tests finally pass.

However, with this patch Substring() still does not work. I hit the assertion:
###!!! ASSERTION: You can't |write| into an |nsWritingIterator| with no space!:

'size_forward() > 0', file ..\..\dist\include\string\nsStringIterator.h, line
and if I continue I find myself again with the infamous:

###!!! ASSERTION: |copy_string| will never terminate: 'count_copied > 0', file
.\..\dist\include\string\nsAlgorithm.h, line 91
|mStart.mPosInBuffer| for the sliding string should only change if there's been
a call to |nsSlidingString::DiscardPrefix| between the creation of the iterators
and their subsequent use. My patch ensures that all old iterators are reset
after we call |nsSlidingString::DiscardPrefix|.
Which Substring is that?
Sorry, I should have been more clear. If I comment out your hacks (that now work
with Vidur's patch) and use the correct lines a la 

aString = Substring(foo,bar)

I hit the assertions I mentioned and hang. The assertions happened on that line
but I realized it might have been also the operator = that did this.
Comment on attachment 72854 [details] [diff] [review]
Scanner patch

r=harishd [ it would be great if the sliding string 
itself took care of reintializing the iterator. 
Attachment #72854 - Flags: review+
Comment on attachment 69701 [details] [diff] [review]
Latest combined patch fixing harishd's findings and view source

Do we need to override all those SetText() methods any more? Can't we just
fallback on the nsGenericDOMDataNode ones now that we don't actually do
anything in the overriden methods?
Yup, jst is right, we don't seem to need SetText() explicitly in the comment
Comment on attachment 72854 [details] [diff] [review]
Scanner patch

Attachment #72854 - Flags: superreview+
Comment on attachment 73104 [details] [diff] [review]
no more SetText in comment node

Attachment #73104 - Flags: superreview+
Comment on attachment 73104 [details] [diff] [review]
no more SetText in comment node

Attachment #73104 - Flags: review+
Comment on attachment 69701 [details] [diff] [review]
Latest combined patch fixing harishd's findings and view source

>+  while (FindInReadable(NS_LITERAL_STRING("--"), current, currentEnd)) {

Make a named literal here for the "--" string to avoid creating that string
wrapper for every iteration through the while loop.

>+  // When we get here, we have always already consumed <!
>+  // Skip over possible leading minuses
>+  if (*current == kMinus && current != end) {

Maybe check |current != end| before dereferencing *current in case current
could in some case point to the end of the string.

>+    if (*current == kMinus && current != end) { // <!--

Same here, check |current != end| before touching *current

>Index: htmlparser/src/nsViewSourceHTML.cpp
>-        const nsAReadableString& commentValue = aToken->GetStringValue();
>-        result=WriteTag(mCommentTag,commentValue,0,PR_TRUE);
>+        nsAutoString theStr;
>+        theStr.Assign(NS_LITERAL_STRING("<!--"));
>+        theStr.Append(aToken->GetStringValue());
>+        theStr.Append(NS_LITERAL_STRING("-->"));
>+        result=WriteTag(mCommentTag,theStr,0,PR_TRUE);

Use the overloaded + string operator here to avoid doing 3 separate string

sr=jst with that looked at...
Attachment #69701 - Flags: superreview+
I have fixes for jst's nits in my tree. I was not aware it would be safe to use
operator+, but at least Windows seemed to accept it. Hopefully all the other
compilers accept it as well.
Yeah, operator+ on nsAString n' friends is ok XP, except maybe in some weird
edge cases (this not being one of those).
Comment on attachment 69701 [details] [diff] [review]
Latest combined patch fixing harishd's findings and view source

r=harishd ( ConsumeComment() would be reusable if it accepted iterators instead
of nsScanner. I'll open up a new bug on that )
Attachment #69701 - Flags: review+
Comment on attachment 73104 [details] [diff] [review]
no more SetText in comment node

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #73104 - Flags: approval+
Checked in.
Closed: 21 years ago
Resolution: --- → FIXED
Whiteboard: [fixinhand][2.5% improvement in page load]
For some reason page load test numbers did not improve on the Tinderbox
machines. They did not seem to go down either, but I would really like to know
what is going on. I will have to check on Monday with variations of the patch to
see if my initial perf numbers were a fluke or if something significant changed
since running those tests.
I tried the same nsHTMLTokens.cpp changes that I used when getting the 2.5% perf
increase, but still got no measureable difference to current trunk. I did not
try changes in nsCommentNode or nsScanner. nsScanner change is probably the only
thing that could slow us down a bit but I doubt it would have been able to eat
2.5%. Nevertheless, it is the correct fix that we need and there is no better way.

The bottom line is, I am unhappy the perf did not improve but happy that we have
better code that is at least as fast as the old code.

Case closed.
heikki: can you please verify this bug for me 
I shouldn't verify my own fixes ;)

Anyway, you should try the attachments and any links to tests you see here and
verify that it works as you would expect.

I have received 2 regression bugs in strict mode because of this fix.
Technically we are doing the correct thing. I am seeking input on whether or not
they should be fixed in comment parsing. The bugs are bug 132238 and bug 132785.
Verifying this one with testcases attached to bug-report & I think its working 
as expected.
Verified on 2002-05-23-08-trunk.
Component: DOM: Core → DOM: Core & HTML
QA Contact: stummala → general
You need to log in before you can comment on or make changes to this bug.