Closed
Bug 110544
Opened 23 years ago
Closed 23 years ago
Setting comment node text is slow
Categories
(Core :: DOM: Core & HTML, defect, P3)
Core
DOM: Core & HTML
Tracking
()
VERIFIED
FIXED
mozilla1.0
People
(Reporter: hjtoi-bugzilla, Assigned: hjtoi-bugzilla)
References
Details
(Keywords: dataloss, perf)
Attachments
(6 files, 9 obsolete files)
739 bytes,
text/html
|
Details | |
2.88 KB,
text/html
|
Details | |
14.51 KB,
text/html
|
Details | |
17.07 KB,
patch
|
harishd
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
599 bytes,
patch
|
harishd
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
2.78 KB,
patch
|
harishd
:
review+
jst
:
superreview+
asa
:
approval+
|
Details | Diff | Splinter Review |
This is probably a minor issue, but I fixed it anyway... In nsCommentNode.cpp
SetText() and StripCommentDelimiters() functions are quite inefficient. They do
unnecessary string copies (and possible allocations) and moves. I rewrote these,
and according to Quantify the new version of SetText(PRUnichar*) is about 8
times faster than the old, while the versions that take char* or nsString are
over 20 times faster.
It seems that only the PRUnichar* version is called in the browser, though.
We call this place hundreds of times during startup, and for every comment node
in a page. In XML, there won't be comment delimiters coming into SetText. HTML
is different.
Attaching patch. Reviews?
Assignee | ||
Comment 1•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Keywords: perf
Priority: -- → P3
Whiteboard: [fixinhand]
Target Milestone: --- → mozilla0.9.7
Comment 2•23 years ago
|
||
This should be done with Substring() stuff, that way we could do this w/o ever
copying string data or allocating string data when stripping out comment delimiters.
Create two StripCommentDelimiters() functions, one that return |const
nsDependentSubstring|, and one that return |const nsDependentCSubsting|, then
you should be able to simply do mInner.SetText(StripCommentDelim(aString)); w/o
doing any string copying or allocation :-)
Oh, to make this work you probably haveto add a
nsGenericDOMDataNode::SetText(const nsAReadableCString&) method, and probably a
nsTextFragment::operator=(const nsAReadableCString&) method, but those should be
more or less one-liners.
Assignee | ||
Comment 3•23 years ago
|
||
This patch adds a few more checks to make things safer, and also uses
Substring. I don't know about performance yet.
One thing I am not so sure about: "aStr.GetFlatBufferHandle()->DataStart()". If
I replace aStr with an ns[Auto]String that I created I will get a crash in my
simple testcase (haven't run into this in the browser suite). Is there another
way of doing this or do I need a 3rd StripCommentDelimiters() function that
takes nsAReadableString?
Comment 4•23 years ago
|
||
You can't use aStr.GetFlatBufferHandle() since all strings that come through
this code are not guaranteed to be flat, in stead of relying on that make
StripCommentDelimiters() take a const nsAString& in stead of a flat buffer and
work with iterators in the implementation, then the code is guaranteed to work
with all types of strings. No need for a 3rd method, make the PRUnichar* callers
wrap the PRUnichar* in a nsDependentString.
Assignee | ||
Comment 5•23 years ago
|
||
As the testcase shows, Mozilla fixes comment delimiters always. IE6 fixes them
only when parsing. The DOM specification does not say that
document.createComment() must fix delimiters, but the documentation for
interface Comment says a comment represents the string between the comment
delimiters. Therefore I am slightly more inclined to say that Mozilla has the
correct implementation, and that we should fix the comment in the comment node.
"Interface Comment
This interface inherits from CharacterData and represents the content of a
comment, i.e., all the characters between the starting '<!--' and ending '-->'.
Note that this is the definition of a comment in XML, and, in practice, HTML,
although some HTML tools may implement the full SGML comment structure."
Comment 6•23 years ago
|
||
No, mozilla is incorrect per the DOM spec, just as comment nodes expose the text
between the delimiters, setting the text sets the text between the delimiters so
if there are delimiters in the text we'll end up with an invalid
(unserializable) comment. This is, believe it or not, how it should work.
Assignee | ||
Comment 7•23 years ago
|
||
Assignee | ||
Comment 8•23 years ago
|
||
Ok, this patch removes the stripping of commeent delimiters in the comment
node. Harish, could you make a patch in the parser that would make parser strip
the comment delimiters there so we could test this?
I have a 3rd edition of the original patch otherwise ready except there are two
types of comments that get handled differently in the code done with iterators
and raw buffer. I don't feel like pursuing that at the moment if we are going
to do what IE6 is doing and according to jst is the correct thing to do.
Attachment #58171 -
Attachment is obsolete: true
Attachment #58253 -
Attachment is obsolete: true
Assignee | ||
Comment 9•23 years ago
|
||
Assignee | ||
Comment 10•23 years ago
|
||
See also
http://lxr.mozilla.org/mozilla/source/htmlparser/tests/logparse/comments.html
for Hixie's Evil tests for stricts comments.
Assignee | ||
Comment 11•23 years ago
|
||
As the "another testcase" shows this is also a dataloss bug (so it is in IE as
well, btw). Given <!12--> we show " 2" as comment, ie shows "2". We replace the
first char with space, IE always eats two characters after "<!".
I think my strict comment parsing is now ok, much easier to read, dunno about
speed (hey that rhymes). Normal comment parsing still needs a little love with
comments that have several sections of "-- ... --" as the test row 15. shows.
Keywords: dataloss
Assignee | ||
Comment 12•23 years ago
|
||
Attachment #58779 -
Attachment is obsolete: true
Attachment #59629 -
Attachment is obsolete: true
Assignee | ||
Comment 13•23 years ago
|
||
This latest patch obsoletes all previous patches. I have one problem that I am
aware of, which seems like a string/iterator bug which manifests itself as hang
when trying to open the testcase that I will attach shortly (Ian's evil comment
test without strict doctype).
Attachment #58800 -
Attachment is obsolete: true
Attachment #59651 -
Attachment is obsolete: true
Assignee | ||
Comment 14•23 years ago
|
||
Assignee | ||
Comment 15•23 years ago
|
||
Doh, seems like the patch got corrupted in HTMLContentSinkPart. Well, there is
only one change that goes together with the change in GenericDOMDataNode.
Specifically, I noticed that it doesn't seem to make sense to create a comment
node and then AppendData() to it - it should be SetData() in my opinion. Do the
different implementations cause some problems with events etc.? Another thing is
that if you AppendData("") we asserted with the old code. Would it be safe to
just bail out if the string to append is empty, or do we need to do reflows and
fire mutation notifications even though we did not change data?
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Assignee | ||
Comment 16•23 years ago
|
||
harishd noted that in strict comment parsing we should allow comment end with
--\w*> where \w* is whitespace. I also got a workaround from jag for the hang,
so I should have a new patch tomorrow.
Assignee | ||
Comment 17•23 years ago
|
||
Attachment #59889 -
Attachment is obsolete: true
Assignee | ||
Comment 18•23 years ago
|
||
Comment on attachment 64864 [details] [diff] [review]
Full corrected patch with jag's string workaround
>Index: htmlparser/src/nsHTMLTokens.cpp
>===================================================================
>+ //aString = Substring(beginData, ++current);
>+ // XXX Instead we can do this EVIL HACK (from jag):
>+ PRUint32 len = Distance(beginData, ++current);
>+ aString.SetLength(len);
>+ PRUnichar* dest = NS_CONST_CAST(PRUnichar*, aString.get());
>+ copy_string(beginData, gt, dest);
The 'gt' on the last line should of course be 'current'.
harishd, review?
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Assignee | ||
Comment 19•23 years ago
|
||
Comment on attachment 64864 [details] [diff] [review]
Full corrected patch with jag's string workaround
>+nsresult ConsumeComment(nsScanner& aScanner, nsString& aString) {
>+ if (*current == kMinus && current != beginLastMinus) { // ->
>+ goodComment = PR_TRUE;
>+ --current;
>+ if (*current == kMinus && current != beginLastMinus) { // -->
>+ --current;
harishd noticed that goodComment = PR_TRUE should probably be moved
into the second if to guard against cases like <!-- hello -> there -->.
Assignee | ||
Comment 20•23 years ago
|
||
jag: some bad news. Your workaround works in all other cases except with the
last testcase (attachment 59892 [details]).
I would really love to get this checked in. I just run the page loader test, and
this seems to give 2.5% improvement in page load time!
Comment 21•23 years ago
|
||
How does it fail?
Assignee | ||
Comment 22•23 years ago
|
||
Sorry, same problem: hangs doing Distance. Stack is
NTDLL! 77fa018c()
nsDebug::Assertion(const char * 0x1010e8bc `string', const char * 0x1010e8e8
`string', const char * 0x1012d470 `string', int 91) line 290 + 13 bytes
copy_string(nsReadingIterator<unsigned short> & {"????????????"}, const
nsReadingIterator<unsigned short> & {"-????????????"}, CalculateLength<unsigned
short> & {...}) line 91 + 28 bytes
Distance_Impl(const nsReadingIterator<unsigned short> & {" -????????????"},
const nsReadingIterator<unsigned short> & {"-????????????"}) line 91 + 17 bytes
Distance(const nsReadingIterator<unsigned short> & {" -????????????"}, const
nsReadingIterator<unsigned short> & {"-????????????"}) line 99 + 13 bytes
ConsumeComment(nsScanner & {...}, nsString & {""}) line 1136 + 19 bytes
...
The assertion is "|copy_string| will never terminate".
Comment 23•23 years ago
|
||
I was rereading the patch, am I wrong to say that:
+ if (*current == kMinus && current != beginLastMinus) { // ->
+ goodComment = PR_TRUE;
+ --current;
+ if (*current == kMinus && current != beginLastMinus) { // -->
+ --current;
+ }
+ } else if (*current == '!' && current != beginLastMinus) {
+ --current;
+ if (*current == kMinus && current != beginLastMinus) {
+ --current;
+ if (*current == kMinus && current != beginLastMinus) { // --!>
+ --current;
+ goodComment = PR_TRUE;
}
}
- return result;
+ } else if (current == beginLastMinus) {
+ goodComment = PR_TRUE;
+ }
If I understood correctly, the last 'if' in this block is for the special case
of <!-->
This could be refactored as:
+ if (*current == kMinus) { // ->
+ goodComment = PR_TRUE;
if (current != beginLastMinus) { // <!--...-> but not <!-->
+ --current;
+ if (*current == kMinus && current != beginLastMinus) { // -->
+ --current;
+ }
+ }
+ } else if (*current == '!' && current != beginLastMinus) {
+ --current;
+ if (*current == kMinus && current != beginLastMinus) {
+ --current;
+ if (*current == kMinus && current != beginLastMinus) { // --!>
+ --current;
+ goodComment = PR_TRUE;
}
}
- return result;
I don't know is this helps...
Assignee | ||
Updated•23 years ago
|
Assignee | ||
Comment 24•23 years ago
|
||
jag: any progress on this, 2.5% speedup is pretty significant so I'd really like
to get this checked in...
Whiteboard: [fixinhand] → [fixinhand][2.5% improvement in page load]
Assignee | ||
Comment 25•23 years ago
|
||
View source was missing comment open/close.
Assignee | ||
Comment 26•23 years ago
|
||
Attachment #64864 -
Attachment is obsolete: true
Attachment #69688 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.9 → mozilla1.0
Comment 27•23 years ago
|
||
*sigh* I keep forgetting to post comments in this bug.
So, here's what's going on: we have two iterators on a nsSlidingString with at
least two buffers internally. iter1 is on the first buffer, iter2 is on the
start of the second buffer. --iter2;. If we now test SameFragment(iter1, iter2),
this should say "yes" but it says "no".
Upon the --iter2, normalize_backward() is called, which will get the previous
buffer/fragment. This is the relevant part of
nsSlidingSubstring::GetReadableFragment:
253 if ( result_buffer == mStart.mBuffer )
254 aFragment.mStart = mStart.mPosInBuffer;
255 else
256 aFragment.mStart = result_buffer->DataStart();
In there, since we're on the first buffer now, the aFragment.mStart is set to
mStart.mPosInBuffer. This means that if |mStart| points to just a few chars
before the end of the first buffer, the newly created fragment will start there,
instead of at the beginning of the buffer, so that the check
|iter1.fragment().mStart == iter2.fragment().mStart| fails.
In |Distance|, which calls |copy_string|, which calls
nsCharSourceTraits<>::readable_distance:
393 return PRUint32(SameFragment(first, last) ? last.get()-first.get() :
first.size_forward());
This failure of SameFragment() means that first.size_forward() will get N+1
instead of N (where N is last.get()-first.get()) so first is advanced beyond
last, which means the |while (first != last)| in copy_string will never end.
I think the solution is to hand back the same fragment regardless of |Position
mStart, mEnd|, but I'll have to study the code a little more to find out why
we're currently doing this.
Vidur, scc?
Target Milestone: mozilla1.0 → mozilla0.9.9
Comment 28•23 years ago
|
||
Whoops, I accidentily changed the milestone this morning, setting it back.
Target Milestone: mozilla0.9.9 → mozilla1.0
Assignee | ||
Comment 29•23 years ago
|
||
Some progress... Vidur found this little piece in the scanner where we perhaps
aren't doing what we should. With this patch all my tests finally pass.
However, with this patch Substring() still does not work. I hit the assertion:
###!!! ASSERTION: You can't |write| into an |nsWritingIterator| with no space!:
'size_forward() > 0', file ..\..\dist\include\string\nsStringIterator.h, line
356
and if I continue I find myself again with the infamous:
###!!! ASSERTION: |copy_string| will never terminate: 'count_copied > 0', file
.
.\..\dist\include\string\nsAlgorithm.h, line 91
Comment 30•23 years ago
|
||
|mStart.mPosInBuffer| for the sliding string should only change if there's been
a call to |nsSlidingString::DiscardPrefix| between the creation of the iterators
and their subsequent use. My patch ensures that all old iterators are reset
after we call |nsSlidingString::DiscardPrefix|.
Comment 31•23 years ago
|
||
Which Substring is that?
Assignee | ||
Comment 32•23 years ago
|
||
Sorry, I should have been more clear. If I comment out your hacks (that now work
with Vidur's patch) and use the correct lines a la
aString = Substring(foo,bar)
I hit the assertions I mentioned and hang. The assertions happened on that line
but I realized it might have been also the operator = that did this.
Comment 33•23 years ago
|
||
Comment on attachment 72854 [details] [diff] [review]
Scanner patch
r=harishd [ it would be great if the sliding string
itself took care of reintializing the iterator.
]
Attachment #72854 -
Flags: review+
Comment 34•23 years ago
|
||
Comment on attachment 69701 [details] [diff] [review]
Latest combined patch fixing harishd's findings and view source
Do we need to override all those SetText() methods any more? Can't we just
fallback on the nsGenericDOMDataNode ones now that we don't actually do
anything in the overriden methods?
Assignee | ||
Comment 35•23 years ago
|
||
Yup, jst is right, we don't seem to need SetText() explicitly in the comment
node.
Comment 36•23 years ago
|
||
Comment on attachment 72854 [details] [diff] [review]
Scanner patch
sr=jst
Attachment #72854 -
Flags: superreview+
Comment 37•23 years ago
|
||
Comment on attachment 73104 [details] [diff] [review]
no more SetText in comment node
sr=jst
Attachment #73104 -
Flags: superreview+
Comment 38•23 years ago
|
||
Comment on attachment 73104 [details] [diff] [review]
no more SetText in comment node
r=harishd
Attachment #73104 -
Flags: review+
Comment 39•23 years ago
|
||
Comment on attachment 69701 [details] [diff] [review]
Latest combined patch fixing harishd's findings and view source
>+ while (FindInReadable(NS_LITERAL_STRING("--"), current, currentEnd)) {
Make a named literal here for the "--" string to avoid creating that string
wrapper for every iteration through the while loop.
>+ // When we get here, we have always already consumed <!
>+ // Skip over possible leading minuses
>+ if (*current == kMinus && current != end) {
Maybe check |current != end| before dereferencing *current in case current
could in some case point to the end of the string.
>+ if (*current == kMinus && current != end) { // <!--
Same here, check |current != end| before touching *current
>Index: htmlparser/src/nsViewSourceHTML.cpp
[...]
>- const nsAReadableString& commentValue = aToken->GetStringValue();
>- result=WriteTag(mCommentTag,commentValue,0,PR_TRUE);
>+ nsAutoString theStr;
>+ theStr.Assign(NS_LITERAL_STRING("<!--"));
>+ theStr.Append(aToken->GetStringValue());
>+ theStr.Append(NS_LITERAL_STRING("-->"));
>+ result=WriteTag(mCommentTag,theStr,0,PR_TRUE);
Use the overloaded + string operator here to avoid doing 3 separate string
appends/assignments.
sr=jst with that looked at...
Attachment #69701 -
Flags: superreview+
Assignee | ||
Comment 40•23 years ago
|
||
I have fixes for jst's nits in my tree. I was not aware it would be safe to use
operator+, but at least Windows seemed to accept it. Hopefully all the other
compilers accept it as well.
Comment 41•23 years ago
|
||
Yeah, operator+ on nsAString n' friends is ok XP, except maybe in some weird
edge cases (this not being one of those).
Comment 42•23 years ago
|
||
Comment on attachment 69701 [details] [diff] [review]
Latest combined patch fixing harishd's findings and view source
r=harishd ( ConsumeComment() would be reusable if it accepted iterators instead
of nsScanner. I'll open up a new bug on that )
Attachment #69701 -
Flags: review+
Comment 43•23 years ago
|
||
Comment on attachment 73104 [details] [diff] [review]
no more SetText in comment node
a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #73104 -
Flags: approval+
Assignee | ||
Comment 44•23 years ago
|
||
Checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Whiteboard: [fixinhand][2.5% improvement in page load]
Assignee | ||
Comment 45•23 years ago
|
||
For some reason page load test numbers did not improve on the Tinderbox
machines. They did not seem to go down either, but I would really like to know
what is going on. I will have to check on Monday with variations of the patch to
see if my initial perf numbers were a fluke or if something significant changed
since running those tests.
Assignee | ||
Comment 46•23 years ago
|
||
I tried the same nsHTMLTokens.cpp changes that I used when getting the 2.5% perf
increase, but still got no measureable difference to current trunk. I did not
try changes in nsCommentNode or nsScanner. nsScanner change is probably the only
thing that could slow us down a bit but I doubt it would have been able to eat
2.5%. Nevertheless, it is the correct fix that we need and there is no better way.
The bottom line is, I am unhappy the perf did not improve but happy that we have
better code that is at least as fast as the old code.
Case closed.
Comment 47•23 years ago
|
||
heikki: can you please verify this bug for me
Assignee | ||
Comment 48•23 years ago
|
||
I shouldn't verify my own fixes ;)
Anyway, you should try the attachments and any links to tests you see here and
verify that it works as you would expect.
I have received 2 regression bugs in strict mode because of this fix.
Technically we are doing the correct thing. I am seeking input on whether or not
they should be fixed in comment parsing. The bugs are bug 132238 and bug 132785.
Comment 49•23 years ago
|
||
Verifying this one with testcases attached to bug-report & I think its working
as expected.
Verified on 2002-05-23-08-trunk.
Status: RESOLVED → VERIFIED
Component: DOM: Core → DOM: Core & HTML
QA Contact: stummala → general
You need to log in
before you can comment on or make changes to this bug.
Description
•