Closed Bug 110544 Opened 23 years ago Closed 23 years ago

Setting comment node text is slow

Categories

(Core :: DOM: Core & HTML, defect, P3)

defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: hjtoi-bugzilla, Assigned: hjtoi-bugzilla)

References

Details

(Keywords: dataloss, perf)

Attachments

(6 files, 9 obsolete files)

739 bytes, text/html
Details
2.88 KB, text/html
Details
14.51 KB, text/html
Details
17.07 KB, patch
harishd
: review+
Details | Diff | Splinter Review
599 bytes, patch
harishd
: review+
Details | Diff | Splinter Review
2.78 KB, patch
harishd
: review+
Details | Diff | Splinter Review
This is probably a minor issue, but I fixed it anyway... In nsCommentNode.cpp SetText() and StripCommentDelimiters() functions are quite inefficient. They do unnecessary string copies (and possible allocations) and moves. I rewrote these, and according to Quantify the new version of SetText(PRUnichar*) is about 8 times faster than the old, while the versions that take char* or nsString are over 20 times faster. It seems that only the PRUnichar* version is called in the browser, though. We call this place hundreds of times during startup, and for every comment node in a page. In XML, there won't be comment delimiters coming into SetText. HTML is different. Attaching patch. Reviews?
Status: NEW → ASSIGNED
Keywords: perf
Priority: -- → P3
Whiteboard: [fixinhand]
Target Milestone: --- → mozilla0.9.7
This should be done with Substring() stuff, that way we could do this w/o ever copying string data or allocating string data when stripping out comment delimiters. Create two StripCommentDelimiters() functions, one that return |const nsDependentSubstring|, and one that return |const nsDependentCSubsting|, then you should be able to simply do mInner.SetText(StripCommentDelim(aString)); w/o doing any string copying or allocation :-) Oh, to make this work you probably haveto add a nsGenericDOMDataNode::SetText(const nsAReadableCString&) method, and probably a nsTextFragment::operator=(const nsAReadableCString&) method, but those should be more or less one-liners.
Attached patch Better patch with substrings (obsolete) — Splinter Review
This patch adds a few more checks to make things safer, and also uses Substring. I don't know about performance yet. One thing I am not so sure about: "aStr.GetFlatBufferHandle()->DataStart()". If I replace aStr with an ns[Auto]String that I created I will get a crash in my simple testcase (haven't run into this in the browser suite). Is there another way of doing this or do I need a 3rd StripCommentDelimiters() function that takes nsAReadableString?
You can't use aStr.GetFlatBufferHandle() since all strings that come through this code are not guaranteed to be flat, in stead of relying on that make StripCommentDelimiters() take a const nsAString& in stead of a flat buffer and work with iterators in the implementation, then the code is guaranteed to work with all types of strings. No need for a 3rd method, make the PRUnichar* callers wrap the PRUnichar* in a nsDependentString.
As the testcase shows, Mozilla fixes comment delimiters always. IE6 fixes them only when parsing. The DOM specification does not say that document.createComment() must fix delimiters, but the documentation for interface Comment says a comment represents the string between the comment delimiters. Therefore I am slightly more inclined to say that Mozilla has the correct implementation, and that we should fix the comment in the comment node. "Interface Comment This interface inherits from CharacterData and represents the content of a comment, i.e., all the characters between the starting '<!--' and ending '-->'. Note that this is the definition of a comment in XML, and, in practice, HTML, although some HTML tools may implement the full SGML comment structure."
No, mozilla is incorrect per the DOM spec, just as comment nodes expose the text between the delimiters, setting the text sets the text between the delimiters so if there are delimiters in the text we'll end up with an invalid (unserializable) comment. This is, believe it or not, how it should work.
Attached file another test (obsolete) —
Ok, this patch removes the stripping of commeent delimiters in the comment node. Harish, could you make a patch in the parser that would make parser strip the comment delimiters there so we could test this? I have a 3rd edition of the original patch otherwise ready except there are two types of comments that get handled differently in the code done with iterators and raw buffer. I don't feel like pursuing that at the moment if we are going to do what IE6 is doing and according to jst is the correct thing to do.
Attachment #58171 - Attachment is obsolete: true
Attachment #58253 - Attachment is obsolete: true
As the "another testcase" shows this is also a dataloss bug (so it is in IE as well, btw). Given <!12--> we show " 2" as comment, ie shows "2". We replace the first char with space, IE always eats two characters after "<!". I think my strict comment parsing is now ok, much easier to read, dunno about speed (hey that rhymes). Normal comment parsing still needs a little love with comments that have several sections of "-- ... --" as the test row 15. shows.
Keywords: dataloss
Attached patch Proposed fix to parser (obsolete) — Splinter Review
Attachment #58779 - Attachment is obsolete: true
Attachment #59629 - Attachment is obsolete: true
Attached patch full patch (obsolete) — Splinter Review
This latest patch obsoletes all previous patches. I have one problem that I am aware of, which seems like a string/iterator bug which manifests itself as hang when trying to open the testcase that I will attach shortly (Ian's evil comment test without strict doctype).
Attachment #58800 - Attachment is obsolete: true
Attachment #59651 - Attachment is obsolete: true
Doh, seems like the patch got corrupted in HTMLContentSinkPart. Well, there is only one change that goes together with the change in GenericDOMDataNode. Specifically, I noticed that it doesn't seem to make sense to create a comment node and then AppendData() to it - it should be SetData() in my opinion. Do the different implementations cause some problems with events etc.? Another thing is that if you AppendData("") we asserted with the old code. Would it be safe to just bail out if the string to append is empty, or do we need to do reflows and fire mutation notifications even though we did not change data?
Target Milestone: mozilla0.9.7 → mozilla0.9.8
harishd noted that in strict comment parsing we should allow comment end with --\w*> where \w* is whitespace. I also got a workaround from jag for the hang, so I should have a new patch tomorrow.
Comment on attachment 64864 [details] [diff] [review] Full corrected patch with jag's string workaround >Index: htmlparser/src/nsHTMLTokens.cpp >=================================================================== >+ //aString = Substring(beginData, ++current); >+ // XXX Instead we can do this EVIL HACK (from jag): >+ PRUint32 len = Distance(beginData, ++current); >+ aString.SetLength(len); >+ PRUnichar* dest = NS_CONST_CAST(PRUnichar*, aString.get()); >+ copy_string(beginData, gt, dest); The 'gt' on the last line should of course be 'current'. harishd, review?
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Comment on attachment 64864 [details] [diff] [review] Full corrected patch with jag's string workaround >+nsresult ConsumeComment(nsScanner& aScanner, nsString& aString) { >+ if (*current == kMinus && current != beginLastMinus) { // -> >+ goodComment = PR_TRUE; >+ --current; >+ if (*current == kMinus && current != beginLastMinus) { // --> >+ --current; harishd noticed that goodComment = PR_TRUE should probably be moved into the second if to guard against cases like <!-- hello -> there -->.
jag: some bad news. Your workaround works in all other cases except with the last testcase (attachment 59892 [details]). I would really love to get this checked in. I just run the page loader test, and this seems to give 2.5% improvement in page load time!
How does it fail?
Sorry, same problem: hangs doing Distance. Stack is NTDLL! 77fa018c() nsDebug::Assertion(const char * 0x1010e8bc `string', const char * 0x1010e8e8 `string', const char * 0x1012d470 `string', int 91) line 290 + 13 bytes copy_string(nsReadingIterator<unsigned short> & {"????????????"}, const nsReadingIterator<unsigned short> & {"-????????????"}, CalculateLength<unsigned short> & {...}) line 91 + 28 bytes Distance_Impl(const nsReadingIterator<unsigned short> & {" -????????????"}, const nsReadingIterator<unsigned short> & {"-????????????"}) line 91 + 17 bytes Distance(const nsReadingIterator<unsigned short> & {" -????????????"}, const nsReadingIterator<unsigned short> & {"-????????????"}) line 99 + 13 bytes ConsumeComment(nsScanner & {...}, nsString & {""}) line 1136 + 19 bytes ... The assertion is "|copy_string| will never terminate".
I was rereading the patch, am I wrong to say that: + if (*current == kMinus && current != beginLastMinus) { // -> + goodComment = PR_TRUE; + --current; + if (*current == kMinus && current != beginLastMinus) { // --> + --current; + } + } else if (*current == '!' && current != beginLastMinus) { + --current; + if (*current == kMinus && current != beginLastMinus) { + --current; + if (*current == kMinus && current != beginLastMinus) { // --!> + --current; + goodComment = PR_TRUE; } } - return result; + } else if (current == beginLastMinus) { + goodComment = PR_TRUE; + } If I understood correctly, the last 'if' in this block is for the special case of <!--> This could be refactored as: + if (*current == kMinus) { // -> + goodComment = PR_TRUE; if (current != beginLastMinus) { // <!--...-> but not <!--> + --current; + if (*current == kMinus && current != beginLastMinus) { // --> + --current; + } + } + } else if (*current == '!' && current != beginLastMinus) { + --current; + if (*current == kMinus && current != beginLastMinus) { + --current; + if (*current == kMinus && current != beginLastMinus) { // --!> + --current; + goodComment = PR_TRUE; } } - return result; I don't know is this helps...
jag: any progress on this, 2.5% speedup is pretty significant so I'd really like to get this checked in...
Whiteboard: [fixinhand] → [fixinhand][2.5% improvement in page load]
Attached patch Fix view source (obsolete) — Splinter Review
View source was missing comment open/close.
Target Milestone: mozilla0.9.9 → mozilla1.0
*sigh* I keep forgetting to post comments in this bug. So, here's what's going on: we have two iterators on a nsSlidingString with at least two buffers internally. iter1 is on the first buffer, iter2 is on the start of the second buffer. --iter2;. If we now test SameFragment(iter1, iter2), this should say "yes" but it says "no". Upon the --iter2, normalize_backward() is called, which will get the previous buffer/fragment. This is the relevant part of nsSlidingSubstring::GetReadableFragment: 253 if ( result_buffer == mStart.mBuffer ) 254 aFragment.mStart = mStart.mPosInBuffer; 255 else 256 aFragment.mStart = result_buffer->DataStart(); In there, since we're on the first buffer now, the aFragment.mStart is set to mStart.mPosInBuffer. This means that if |mStart| points to just a few chars before the end of the first buffer, the newly created fragment will start there, instead of at the beginning of the buffer, so that the check |iter1.fragment().mStart == iter2.fragment().mStart| fails. In |Distance|, which calls |copy_string|, which calls nsCharSourceTraits<>::readable_distance: 393 return PRUint32(SameFragment(first, last) ? last.get()-first.get() : first.size_forward()); This failure of SameFragment() means that first.size_forward() will get N+1 instead of N (where N is last.get()-first.get()) so first is advanced beyond last, which means the |while (first != last)| in copy_string will never end. I think the solution is to hand back the same fragment regardless of |Position mStart, mEnd|, but I'll have to study the code a little more to find out why we're currently doing this. Vidur, scc?
Target Milestone: mozilla1.0 → mozilla0.9.9
Whoops, I accidentily changed the milestone this morning, setting it back.
Target Milestone: mozilla0.9.9 → mozilla1.0
Attached patch Scanner patchSplinter Review
Some progress... Vidur found this little piece in the scanner where we perhaps aren't doing what we should. With this patch all my tests finally pass. However, with this patch Substring() still does not work. I hit the assertion: ###!!! ASSERTION: You can't |write| into an |nsWritingIterator| with no space!: 'size_forward() > 0', file ..\..\dist\include\string\nsStringIterator.h, line 356 and if I continue I find myself again with the infamous: ###!!! ASSERTION: |copy_string| will never terminate: 'count_copied > 0', file . .\..\dist\include\string\nsAlgorithm.h, line 91
|mStart.mPosInBuffer| for the sliding string should only change if there's been a call to |nsSlidingString::DiscardPrefix| between the creation of the iterators and their subsequent use. My patch ensures that all old iterators are reset after we call |nsSlidingString::DiscardPrefix|.
Which Substring is that?
Sorry, I should have been more clear. If I comment out your hacks (that now work with Vidur's patch) and use the correct lines a la aString = Substring(foo,bar) I hit the assertions I mentioned and hang. The assertions happened on that line but I realized it might have been also the operator = that did this.
Comment on attachment 72854 [details] [diff] [review] Scanner patch r=harishd [ it would be great if the sliding string itself took care of reintializing the iterator. ]
Attachment #72854 - Flags: review+
Comment on attachment 69701 [details] [diff] [review] Latest combined patch fixing harishd's findings and view source Do we need to override all those SetText() methods any more? Can't we just fallback on the nsGenericDOMDataNode ones now that we don't actually do anything in the overriden methods?
Yup, jst is right, we don't seem to need SetText() explicitly in the comment node.
Comment on attachment 72854 [details] [diff] [review] Scanner patch sr=jst
Attachment #72854 - Flags: superreview+
Comment on attachment 73104 [details] [diff] [review] no more SetText in comment node sr=jst
Attachment #73104 - Flags: superreview+
Comment on attachment 73104 [details] [diff] [review] no more SetText in comment node r=harishd
Attachment #73104 - Flags: review+
Comment on attachment 69701 [details] [diff] [review] Latest combined patch fixing harishd's findings and view source >+ while (FindInReadable(NS_LITERAL_STRING("--"), current, currentEnd)) { Make a named literal here for the "--" string to avoid creating that string wrapper for every iteration through the while loop. >+ // When we get here, we have always already consumed <! >+ // Skip over possible leading minuses >+ if (*current == kMinus && current != end) { Maybe check |current != end| before dereferencing *current in case current could in some case point to the end of the string. >+ if (*current == kMinus && current != end) { // <!-- Same here, check |current != end| before touching *current >Index: htmlparser/src/nsViewSourceHTML.cpp [...] >- const nsAReadableString& commentValue = aToken->GetStringValue(); >- result=WriteTag(mCommentTag,commentValue,0,PR_TRUE); >+ nsAutoString theStr; >+ theStr.Assign(NS_LITERAL_STRING("<!--")); >+ theStr.Append(aToken->GetStringValue()); >+ theStr.Append(NS_LITERAL_STRING("-->")); >+ result=WriteTag(mCommentTag,theStr,0,PR_TRUE); Use the overloaded + string operator here to avoid doing 3 separate string appends/assignments. sr=jst with that looked at...
Attachment #69701 - Flags: superreview+
I have fixes for jst's nits in my tree. I was not aware it would be safe to use operator+, but at least Windows seemed to accept it. Hopefully all the other compilers accept it as well.
Yeah, operator+ on nsAString n' friends is ok XP, except maybe in some weird edge cases (this not being one of those).
Comment on attachment 69701 [details] [diff] [review] Latest combined patch fixing harishd's findings and view source r=harishd ( ConsumeComment() would be reusable if it accepted iterators instead of nsScanner. I'll open up a new bug on that )
Attachment #69701 - Flags: review+
Comment on attachment 73104 [details] [diff] [review] no more SetText in comment node a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #73104 - Flags: approval+
Checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Whiteboard: [fixinhand][2.5% improvement in page load]
For some reason page load test numbers did not improve on the Tinderbox machines. They did not seem to go down either, but I would really like to know what is going on. I will have to check on Monday with variations of the patch to see if my initial perf numbers were a fluke or if something significant changed since running those tests.
I tried the same nsHTMLTokens.cpp changes that I used when getting the 2.5% perf increase, but still got no measureable difference to current trunk. I did not try changes in nsCommentNode or nsScanner. nsScanner change is probably the only thing that could slow us down a bit but I doubt it would have been able to eat 2.5%. Nevertheless, it is the correct fix that we need and there is no better way. The bottom line is, I am unhappy the perf did not improve but happy that we have better code that is at least as fast as the old code. Case closed.
heikki: can you please verify this bug for me
I shouldn't verify my own fixes ;) Anyway, you should try the attachments and any links to tests you see here and verify that it works as you would expect. I have received 2 regression bugs in strict mode because of this fix. Technically we are doing the correct thing. I am seeking input on whether or not they should be fixed in comment parsing. The bugs are bug 132238 and bug 132785.
Verifying this one with testcases attached to bug-report & I think its working as expected. Verified on 2002-05-23-08-trunk.
Status: RESOLVED → VERIFIED
Component: DOM: Core → DOM: Core & HTML
QA Contact: stummala → general
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: