Closed Bug 110544 Opened 23 years ago Closed 23 years ago

Setting comment node text is slow

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla1.0

People

(Reporter: hjtoi-bugzilla, Assigned: hjtoi-bugzilla)

References

Details

(Keywords: dataloss, perf)

Attachments

(6 files, 9 obsolete files)

Proposed fix 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 7.10 KB, patch		Details \| Diff \| Splinter Review
Better patch with substrings 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 9.36 KB, patch		Details \| Diff \| Splinter Review
HTML tetscase for setting and getting a comment node 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 739 bytes, text/html		Details
another test 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 1.57 KB, text/html		Details
no stripping of delimiters in comment node 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 2.55 KB, patch		Details \| Diff \| Splinter Review
WIP of parser fix, strict comments should be ok 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 10.61 KB, patch		Details \| Diff \| Splinter Review
another testcase 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 2.88 KB, text/html		Details
Proposed fix to parser 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 11.63 KB, patch		Details \| Diff \| Splinter Review
full patch 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 15.64 KB, patch		Details \| Diff \| Splinter Review
testcase that hangs with latest patch 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 14.51 KB, text/html		Details
Full corrected patch with jag's string workaround 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 16.22 KB, patch		Details \| Diff \| Splinter Review
Fix view source 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 815 bytes, patch		Details \| Diff \| Splinter Review
Latest combined patch fixing harishd's findings and view source 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 17.07 KB, patch	harishd : review+ jst : superreview+	Details \| Diff \| Splinter Review
Scanner patch 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 599 bytes, patch	harishd : review+ jst : superreview+	Details \| Diff \| Splinter Review
no more SetText in comment node 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 2.78 KB, patch	harishd : review+ jst : superreview+ asa : approval+	Details \| Diff \| Splinter Review

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Description

•

23 years ago

This is probably a minor issue, but I fixed it anyway... In nsCommentNode.cpp SetText() and StripCommentDelimiters() functions are quite inefficient. They do unnecessary string copies (and possible allocations) and moves. I rewrote these, and according to Quantify the new version of SetText(PRUnichar*) is about 8 times faster than the old, while the versions that take char* or nsString are over 20 times faster. It seems that only the PRUnichar* version is called in the browser, though. We call this place hundreds of times during startup, and for every comment node in a page. In XML, there won't be comment delimiters coming into SetText. HTML is different. Attaching patch. Reviews?

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 1

•

23 years ago

Attached patch Proposed fix (obsolete) — Details — Splinter Review

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Updated

•

23 years ago

Status: NEW → ASSIGNED

Keywords: perf

Priority: -- → P3

Whiteboard: [fixinhand]

Target Milestone: --- → mozilla0.9.7

Johnny Stenback (:jst)

Comment 2

•

23 years ago

This should be done with Substring() stuff, that way we could do this w/o ever copying string data or allocating string data when stripping out comment delimiters. Create two StripCommentDelimiters() functions, one that return |const nsDependentSubstring|, and one that return |const nsDependentCSubsting|, then you should be able to simply do mInner.SetText(StripCommentDelim(aString)); w/o doing any string copying or allocation :-) Oh, to make this work you probably haveto add a nsGenericDOMDataNode::SetText(const nsAReadableCString&) method, and probably a nsTextFragment::operator=(const nsAReadableCString&) method, but those should be more or less one-liners.

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 3

•

23 years ago

Attached patch Better patch with substrings (obsolete) — Details — Splinter Review

This patch adds a few more checks to make things safer, and also uses Substring. I don't know about performance yet. One thing I am not so sure about: "aStr.GetFlatBufferHandle()->DataStart()". If I replace aStr with an ns[Auto]String that I created I will get a crash in my simple testcase (haven't run into this in the browser suite). Is there another way of doing this or do I need a 3rd StripCommentDelimiters() function that takes nsAReadableString?

Johnny Stenback (:jst)

Comment 4

•

23 years ago

You can't use aStr.GetFlatBufferHandle() since all strings that come through this code are not guaranteed to be flat, in stead of relying on that make StripCommentDelimiters() take a const nsAString& in stead of a flat buffer and work with iterators in the implementation, then the code is guaranteed to work with all types of strings. No need for a 3rd method, make the PRUnichar* callers wrap the PRUnichar* in a nsDependentString.

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 5

•

23 years ago

Attached file HTML tetscase for setting and getting a comment node — Details

As the testcase shows, Mozilla fixes comment delimiters always. IE6 fixes them only when parsing. The DOM specification does not say that document.createComment() must fix delimiters, but the documentation for interface Comment says a comment represents the string between the comment delimiters. Therefore I am slightly more inclined to say that Mozilla has the correct implementation, and that we should fix the comment in the comment node. "Interface Comment This interface inherits from CharacterData and represents the content of a comment, i.e., all the characters between the starting ''. Note that this is the definition of a comment in XML, and, in practice, HTML, although some HTML tools may implement the full SGML comment structure."

Johnny Stenback (:jst)

Comment 6

•

23 years ago

No, mozilla is incorrect per the DOM spec, just as comment nodes expose the text between the delimiters, setting the text sets the text between the delimiters so if there are delimiters in the text we'll end up with an invalid (unserializable) comment. This is, believe it or not, how it should work.

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 7

•

23 years ago

Attached file another test (obsolete) — Details

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 8

•

23 years ago

Attached patch no stripping of delimiters in comment node (obsolete) — Details — Splinter Review

Ok, this patch removes the stripping of commeent delimiters in the comment node. Harish, could you make a patch in the parser that would make parser strip the comment delimiters there so we could test this? I have a 3rd edition of the original patch otherwise ready except there are two types of comments that get handled differently in the code done with iterators and raw buffer. I don't feel like pursuing that at the moment if we are going to do what IE6 is doing and according to jst is the correct thing to do.

Attachment #58171 - Attachment is obsolete: true

Attachment #58253 - Attachment is obsolete: true

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 9

•

23 years ago

Attached patch WIP of parser fix, strict comments should be ok (obsolete) — Details — Splinter Review

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 10

•

23 years ago

Attached file another testcase — Details

See also http://lxr.mozilla.org/mozilla/source/htmlparser/tests/logparse/comments.html for Hixie's Evil tests for stricts comments.

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 11

•

23 years ago

As the "another testcase" shows this is also a dataloss bug (so it is in IE as well, btw). Given <!12--> we show " 2" as comment, ie shows "2". We replace the first char with space, IE always eats two characters after "<!". I think my strict comment parsing is now ok, much easier to read, dunno about speed (hey that rhymes). Normal comment parsing still needs a little love with comments that have several sections of "-- ... --" as the test row 15. shows.

Keywords: dataloss

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 12

•

23 years ago

Attached patch Proposed fix to parser (obsolete) — Details — Splinter Review

Attachment #58779 - Attachment is obsolete: true

Attachment #59629 - Attachment is obsolete: true

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 13

•

23 years ago

Attached patch full patch (obsolete) — Details — Splinter Review

This latest patch obsoletes all previous patches. I have one problem that I am aware of, which seems like a string/iterator bug which manifests itself as hang when trying to open the testcase that I will attach shortly (Ian's evil comment test without strict doctype).

Attachment #58800 - Attachment is obsolete: true

Attachment #59651 - Attachment is obsolete: true

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 14

•

23 years ago

Attached file testcase that hangs with latest patch — Details

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 15

•

23 years ago

Doh, seems like the patch got corrupted in HTMLContentSinkPart. Well, there is only one change that goes together with the change in GenericDOMDataNode. Specifically, I noticed that it doesn't seem to make sense to create a comment node and then AppendData() to it - it should be SetData() in my opinion. Do the different implementations cause some problems with events etc.? Another thing is that if you AppendData("") we asserted with the old code. Would it be safe to just bail out if the string to append is empty, or do we need to do reflows and fire mutation notifications even though we did not change data?

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Updated

•

23 years ago

Depends on: 112943

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Updated

•

23 years ago

Target Milestone: mozilla0.9.7 → mozilla0.9.8

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 16

•

23 years ago

harishd noted that in strict comment parsing we should allow comment end with --\w*> where \w* is whitespace. I also got a workaround from jag for the hang, so I should have a new patch tomorrow.

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 17

•

23 years ago

Attached patch Full corrected patch with jag's string workaround (obsolete) — Details — Splinter Review

Attachment #59889 - Attachment is obsolete: true

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 18

•

23 years ago

Comment on attachment 64864 [details] [diff] [review] Full corrected patch with jag's string workaround >Index: htmlparser/src/nsHTMLTokens.cpp >=================================================================== >+ //aString = Substring(beginData, ++current); >+ // XXX Instead we can do this EVIL HACK (from jag): >+ PRUint32 len = Distance(beginData, ++current); >+ aString.SetLength(len); >+ PRUnichar* dest = NS_CONST_CAST(PRUnichar*, aString.get()); >+ copy_string(beginData, gt, dest); The 'gt' on the last line should of course be 'current'. harishd, review?

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Updated

•

23 years ago

Target Milestone: mozilla0.9.8 → mozilla0.9.9

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 19

•

23 years ago

Comment on attachment 64864 [details] [diff] [review] Full corrected patch with jag's string workaround >+nsresult ConsumeComment(nsScanner& aScanner, nsString& aString) { >+ if (*current == kMinus && current != beginLastMinus) { // -> >+ goodComment = PR_TRUE; >+ --current; >+ if (*current == kMinus && current != beginLastMinus) { // --> >+ --current; harishd noticed that goodComment = PR_TRUE should probably be moved into the second if to guard against cases like .

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 20

•

23 years ago

jag: some bad news. Your workaround works in all other cases except with the last testcase (attachment 59892 [details]). I would really love to get this checked in. I just run the page loader test, and this seems to give 2.5% improvement in page load time!

jag (Peter Annema)

Comment 21

•

23 years ago

How does it fail?

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 22

•

23 years ago

Sorry, same problem: hangs doing Distance. Stack is NTDLL! 77fa018c() nsDebug::Assertion(const char * 0x1010e8bc `string', const char * 0x1010e8e8 `string', const char * 0x1012d470 `string', int 91) line 290 + 13 bytes copy_string(nsReadingIterator<unsigned short> & {"????????????"}, const nsReadingIterator<unsigned short> & {"-????????????"}, CalculateLength<unsigned short> & {...}) line 91 + 28 bytes Distance_Impl(const nsReadingIterator<unsigned short> & {" -????????????"}, const nsReadingIterator<unsigned short> & {"-????????????"}) line 91 + 17 bytes Distance(const nsReadingIterator<unsigned short> & {" -????????????"}, const nsReadingIterator<unsigned short> & {"-????????????"}) line 99 + 13 bytes ConsumeComment(nsScanner & {...}, nsString & {""}) line 1136 + 19 bytes ... The assertion is "|copy_string| will never terminate".

Jerome Lacoste

Comment 23

•

23 years ago

I was rereading the patch, am I wrong to say that: + if (*current == kMinus && current != beginLastMinus) { // -> + goodComment = PR_TRUE; + --current; + if (*current == kMinus && current != beginLastMinus) { // --> + --current; + } + } else if (*current == '!' && current != beginLastMinus) { + --current; + if (*current == kMinus && current != beginLastMinus) { + --current; + if (*current == kMinus && current != beginLastMinus) { // --!> + --current; + goodComment = PR_TRUE; } } - return result; + } else if (current == beginLastMinus) { + goodComment = PR_TRUE; + } If I understood correctly, the last 'if' in this block is for the special case of  + --current; + if (*current == kMinus && current != beginLastMinus) { // --> + --current; + } + } + } else if (*current == '!' && current != beginLastMinus) { + --current; + if (*current == kMinus && current != beginLastMinus) { + --current; + if (*current == kMinus && current != beginLastMinus) { // --!> + --current; + goodComment = PR_TRUE; } } - return result; I don't know is this helps...

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Updated

•

23 years ago

Keywords: nsbeta1

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Updated

•

23 years ago

Keywords: nsbeta1 → nsbeta1+

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 24

•

23 years ago

jag: any progress on this, 2.5% speedup is pretty significant so I'd really like to get this checked in...

Whiteboard: [fixinhand] → [fixinhand][2.5% improvement in page load]

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 25

•

23 years ago

Attached patch Fix view source (obsolete) — Details — Splinter Review

View source was missing comment open/close.

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 26

•

23 years ago

Attached patch Latest combined patch fixing harishd's findings and view source — Details — Splinter Review

Attachment #64864 - Attachment is obsolete: true

Attachment #69688 - Attachment is obsolete: true

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Updated

•

23 years ago

Target Milestone: mozilla0.9.9 → mozilla1.0

jag (Peter Annema)

Comment 27

•

23 years ago

*sigh* I keep forgetting to post comments in this bug. So, here's what's going on: we have two iterators on a nsSlidingString with at least two buffers internally. iter1 is on the first buffer, iter2 is on the start of the second buffer. --iter2;. If we now test SameFragment(iter1, iter2), this should say "yes" but it says "no". Upon the --iter2, normalize_backward() is called, which will get the previous buffer/fragment. This is the relevant part of nsSlidingSubstring::GetReadableFragment: 253 if ( result_buffer == mStart.mBuffer ) 254 aFragment.mStart = mStart.mPosInBuffer; 255 else 256 aFragment.mStart = result_buffer->DataStart(); In there, since we're on the first buffer now, the aFragment.mStart is set to mStart.mPosInBuffer. This means that if |mStart| points to just a few chars before the end of the first buffer, the newly created fragment will start there, instead of at the beginning of the buffer, so that the check |iter1.fragment().mStart == iter2.fragment().mStart| fails. In |Distance|, which calls |copy_string|, which calls nsCharSourceTraits<>::readable_distance: 393 return PRUint32(SameFragment(first, last) ? last.get()-first.get() : first.size_forward()); This failure of SameFragment() means that first.size_forward() will get N+1 instead of N (where N is last.get()-first.get()) so first is advanced beyond last, which means the |while (first != last)| in copy_string will never end. I think the solution is to hand back the same fragment regardless of |Position mStart, mEnd|, but I'll have to study the code a little more to find out why we're currently doing this. Vidur, scc?

Target Milestone: mozilla1.0 → mozilla0.9.9

jag (Peter Annema)

Comment 28

•

23 years ago

Whoops, I accidentily changed the milestone this morning, setting it back.

Target Milestone: mozilla0.9.9 → mozilla1.0

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 29

•

23 years ago

Attached patch Scanner patch — Details — Splinter Review

Some progress... Vidur found this little piece in the scanner where we perhaps aren't doing what we should. With this patch all my tests finally pass. However, with this patch Substring() still does not work. I hit the assertion: ###!!! ASSERTION: You can't |write| into an |nsWritingIterator| with no space!: 'size_forward() > 0', file ..\..\dist\include\string\nsStringIterator.h, line 356 and if I continue I find myself again with the infamous: ###!!! ASSERTION: |copy_string| will never terminate: 'count_copied > 0', file . .\..\dist\include\string\nsAlgorithm.h, line 91

vidur (gone)

Comment 30

•

23 years ago

|mStart.mPosInBuffer| for the sliding string should only change if there's been a call to |nsSlidingString::DiscardPrefix| between the creation of the iterators and their subsequent use. My patch ensures that all old iterators are reset after we call |nsSlidingString::DiscardPrefix|.

jag (Peter Annema)

Comment 31

•

23 years ago

Which Substring is that?

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 32

•

23 years ago

Sorry, I should have been more clear. If I comment out your hacks (that now work with Vidur's patch) and use the correct lines a la aString = Substring(foo,bar) I hit the assertions I mentioned and hang. The assertions happened on that line but I realized it might have been also the operator = that did this.

harishd

Comment 33

•

23 years ago

Comment on attachment 72854 [details] [diff] [review] Scanner patch r=harishd [ it would be great if the sliding string itself took care of reintializing the iterator. ]

Attachment #72854 - Flags: review+

Johnny Stenback (:jst)

Comment 34

•

23 years ago

Comment on attachment 69701 [details] [diff] [review] Latest combined patch fixing harishd's findings and view source Do we need to override all those SetText() methods any more? Can't we just fallback on the nsGenericDOMDataNode ones now that we don't actually do anything in the overriden methods?

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 35

•

23 years ago

Attached patch no more SetText in comment node — Details — Splinter Review

Yup, jst is right, we don't seem to need SetText() explicitly in the comment node.

Johnny Stenback (:jst)

Comment 36

•

23 years ago

Comment on attachment 72854 [details] [diff] [review] Scanner patch sr=jst

Attachment #72854 - Flags: superreview+

Johnny Stenback (:jst)

Comment 37

•

23 years ago

Comment on attachment 73104 [details] [diff] [review] no more SetText in comment node sr=jst

Attachment #73104 - Flags: superreview+

harishd

Comment 38

•

23 years ago

Comment on attachment 73104 [details] [diff] [review] no more SetText in comment node r=harishd

Attachment #73104 - Flags: review+

Johnny Stenback (:jst)

Comment 39

•

23 years ago

Comment on attachment 69701 [details] [diff] [review] Latest combined patch fixing harishd's findings and view source >+ while (FindInReadable(NS_LITERAL_STRING("--"), current, currentEnd)) { Make a named literal here for the "--" string to avoid creating that string wrapper for every iteration through the while loop. >+ // When we get here, we have always already consumed <! >+ // Skip over possible leading minuses >+ if (*current == kMinus && current != end) { Maybe check |current != end| before dereferencing *current in case current could in some case point to the end of the string. >+ if (*current == kMinus && current != end) { // ")); >+ result=WriteTag(mCommentTag,theStr,0,PR_TRUE); Use the overloaded + string operator here to avoid doing 3 separate string appends/assignments. sr=jst with that looked at...

Attachment #69701 - Flags: superreview+

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 40

•

23 years ago

I have fixes for jst's nits in my tree. I was not aware it would be safe to use operator+, but at least Windows seemed to accept it. Hopefully all the other compilers accept it as well.

Johnny Stenback (:jst)

Comment 41

•

23 years ago

Yeah, operator+ on nsAString n' friends is ok XP, except maybe in some weird edge cases (this not being one of those).

harishd

Comment 42

•

23 years ago

Comment on attachment 69701 [details] [diff] [review] Latest combined patch fixing harishd's findings and view source r=harishd ( ConsumeComment() would be reusable if it accepted iterators instead of nsScanner. I'll open up a new bug on that )

Attachment #69701 - Flags: review+

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Updated

•

23 years ago

Depends on: 130410

Asa Dotzler [:asa]

Comment 43

•

23 years ago

Comment on attachment 73104 [details] [diff] [review] no more SetText in comment node a=asa (on behalf of drivers) for checkin to the 1.0 trunk

Attachment #73104 - Flags: approval+

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 44

•

23 years ago

Checked in.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Whiteboard: [fixinhand][2.5% improvement in page load]

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 45

•

23 years ago

For some reason page load test numbers did not improve on the Tinderbox machines. They did not seem to go down either, but I would really like to know what is going on. I will have to check on Monday with variations of the patch to see if my initial perf numbers were a fluke or if something significant changed since running those tests.

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 46

•

23 years ago

I tried the same nsHTMLTokens.cpp changes that I used when getting the 2.5% perf increase, but still got no measureable difference to current trunk. I did not try changes in nsCommentNode or nsScanner. nsScanner change is probably the only thing that could slow us down a bit but I doubt it would have been able to eat 2.5%. Nevertheless, it is the correct fix that we need and there is no better way. The bottom line is, I am unhappy the perf did not improve but happy that we have better code that is at least as fast as the old code. Case closed.

Sivakiran Tummala

Comment 47

•

23 years ago

heikki: can you please verify this bug for me

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 48

•

23 years ago

I shouldn't verify my own fixes ;) Anyway, you should try the attachments and any links to tests you see here and verify that it works as you would expect. I have received 2 regression bugs in strict mode because of this fix. Technically we are doing the correct thing. I am seeking input on whether or not they should be fixed in comment parsing. The bugs are bug 132238 and bug 132785.

Prashant Desale

Comment 49

•

23 years ago

Verifying this one with testcases attached to bug-report & I think its working as expected. Verified on 2002-05-23-08-trunk.

Status: RESOLVED → VERIFIED

timeless

Updated

•

16 years ago

Component: DOM: Core → DOM: Core & HTML

QA Contact: stummala → general

You need to log in before you can comment on or make changes to this bug.