Closed Bug 311566 Opened 19 years ago Closed 17 years ago

nsCSSScanner::GatherIdent does lots of string appending

Categories

(Core :: CSS Parsing and Computation, defect, P3)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: dbaron)

References

(Blocks 2 open bugs)

Details

(Keywords: perf)

Attachments

(1 file, 20 obsolete files)

5.26 KB, patch
Details | Diff | Splinter Review
The profiler breakdown here looks like:

45819 385     2335 nsCSSScanner::GatherIdent(unsigned&, int, nsString&)
               1538 nsSubstring::Replace(unsigned, unsigned, unsigned short
const*, unsigned)
                256 nsCSSScanner::Read(unsigned&)
                 50 memcpy
                 45 nsSubstring::ReplacePrep(unsigned, unsigned, unsigned)
                 30 __i686.get_pc_thunk.bx
                 27 _init
                  4 StringUnicharInputStream::Read(unsigned short*, unsigned,
unsigned*)

This is when the total amount of time spent in CSS parsing was:

45198 705    18681 CSSParserImpl::ParseColorString

At least part of the problem is we're appending things one char at a time; it
would be nice if we could do better there, either by changing the scanner logic
or by having strings deal faster somehow (though I'm not sure how).
Blocks: 106356
Instead of .Append'ing each character of an identifier or string, use a buffer to gather first as much as possible. With a buffer of 256, generally almost all strings and identifiers will then be .Append'ed in one .Append call.
Also, made mBuffer instead of dynamically allocated at creation of nsCSSScanner, a fixed member of the struct. (no risk of null pointer, and less malloc's).

Who can do some testing and performance analysis if this really helps?
Attachment #200982 - Flags: review?(dbaron)
We're already reading out of a buffer.  I think we shouldn't have another buffer; the ideal fix should just append from the initial buffer to the final one in larger chunks (though not necessarily all at once).

It also seems like your patch is missing changes (probably to the .h file).
(Boris, was that your intent when filing this?)
The intent was just to have a specific bug on something I saw taking up a large part of CSS property value parsing in some profiles....
Attachment #200982 - Attachment is obsolete: true
Attachment #200982 - Flags: review?(dbaron)
This version directly eats as much as possible from the read buffer, for both strings and ident's. NextURL could possibly have the same treatment, but is probably not worth the trouble. (url("..") are parsed as strings, but url(http..) not).

Furthermore, make the readbuffer a fixed member of nsCSSScanner.
And, EatNewline changed to IsNewline, to better cooperate with the above changes (and less Read/Pushback calls).

Any volunteers to do profiling and/or performance testing?
Attachment #200996 - Flags: review?
Attachment #200996 - Flags: review? → review?(dbaron)
Attachment #200996 - Attachment is obsolete: true
Attachment #200996 - Flags: review?(dbaron)
Attachment #200997 - Flags: review?(dbaron)
Profiling on the canvas testcase in bug 297959 mostly shows that jprof's statistical sampling is kinda unreliable for things like this (about 2% of total time)....  That said, it looks like while the time in GatherIdent is smaller the total time under CSSParserImpl::GetToken is not any smaller (and may be bigger) than it used to be.
Attachment #200997 - Flags: review?(dbaron)
Overview of changes:
* Made all of gLexTable, and its constants and macros 'static inline' in nsCSSScanner.cpp (used to be spread all over)
* Made mBuffer a fixed element of nsCSSScanner (prevents a malloc)
* Added 'EnsureBuffer' to read more into the byffer if needed
* Removed //printf's...
* EatNewline -> IsNewline 
* CheckLexTable removed
* ::Next changed to used 'switch' statement, for faster jumping, and for preventing 'Peek'/'Read'/'Pushback'
* ::ParseEscape: Simplified the parsing of hexnumbers: less 'pushback'
* ::GatherIdent: Append as much as possible in one go
* ::ParseNumber: Direct parse digits into integer, and only do float parsing if it has a dot, and use 'PR_strtod' directly on a local char buffer, instead of ToFloat (which copies, mallocs, parses, frees, etc). Except for dimensions, this then completely skips mIdent.Append stuff.
* ::ParseString: Append as much as possible in one go
* nsCSSToken: Moved PRPackedBool to the end to save 4 bytes in the struct

Sounds like a lot, but all are simple optimizations making the parsing simpler and faster. Codesize has even shrunk a little:
OLD: .lib: 1.487.868 nsCSSScanner.obj: 23.539
NEW: .lib: 1.487.126 nsCSSScanner.obj: 23.350

Note, I tested 'acid2', lots of websites, and the css2.1 test cases from w3c, and it all seems to parse OK, according to javascript console and the look of the pages.
Attachment #200997 - Attachment is obsolete: true
Attachment #201096 - Flags: review?
Attachment #201096 - Flags: review? → review?(dbaron)
Testresults:
Count		Old:		New:		Opt:	
1		27036		24177		23494	
2		27336		23864		23374	
3		27004		24284		23564	
4		27190		24128		23828	
5		27133		24041		26670	
6		26950		24222		23493	
7		26738		24114		23610	
8		27515		24353		23773	
9		26880		23961		23355	
10		27043		24353		23511	
11		27161		24119		23480	
12		27208		24085		23480	
		27099,5		24141,75	23802,66667	
				10,91%		12%	
		485,5327083	432,5396875	426,4644444	Milliseconds for parsing 1.1Mb css file.
				52,99302083	59,06826389	Milliseconds saved!

Old is the old code, new is the code as in the current patch (v4).
Opt is further optimized, but I will submit this as a separate bug&patch for further improvements on the scanner (such as removal of a 'aErrorCode' argument and more).

In summary, this patch saves for parsing of 1.1Mb css file 53 milliseconds, from 485 milliseconds.

Or looking at startup and loading of normal pages:
For startup this patch results in 5milliseconds savings!
For a pageload estimated saving is about 1 to 2 millisecond!
If you keep adding pieces to the patch, it'll never get reviewed.  It's also easier to review smaller pieces.  (Also, when you attach a patch, you can request review from a person in one step, you don't need to set to review?, submit, and then fill in a name.)
Is there anything in attachment 200997 [details] [diff] [review] that was corrected (rather than added to) in attachment 201096 [details] [diff] [review]?  I might start by reviewing the former.
Dbaron, I tried to do it in one step, but it didn't work because I didn't fill in a complete email address. Next time better ;-)

Also the v4 patch is for this bug the last one (except for changes for review comments), and the main difference is the faster parsing of numbers. But I you want to do it step by step, you can do V3 first.
Blocks: 229391
Cleaner and smaller patch coming up (saving the more exotic optimisations (such as number parsing, move mToken, mErrorCode to nsScanner (and out of parameters of many calls to a separate bug).
Attachment #201096 - Flags: review?(dbaron)
Attached patch V5: Updated to current trunk (obsolete) — Splinter Review
Summary of changes/fixes:
* Moved all static stuff to nsCSSScanner.cpp as real local static stuff for better inlining, and less member variables.
* Modified Read/Unread/Pushback to use 'EnsureBuffer' so that GatherString can easier get to the buffer itself (also gets rid of Unread and mLastRead)
* Moved the \r\n handling to within Read itself, so that scanning logic doesn't need to bother about it (gets rid of EatNewline)
* Next: use 'switch' and improve scan logic to only peek when really needed.
* Optimized ParseEscape: tigher loops
* GatherIdent, ParseString: Append as much as possible directly from buffer into the string
* ParseNumber: directly parse number as integer/float, skipping the aIdent.Append(c) and the ToFloat/ToInteger alltogether. ToFloat (and ToInteger?) do all kinds of allocs/memcpy/strtod stuff for most of the cases to convert a '1' or '1px' ...

Some speedtesting using the large cube at http://www.speich.net/computer/3d.php:
New	Old     (IE6.0)
3'095	3'454   2'816 ms
Improvement: 10,4% and very close to IE6.0!
Attachment #202972 - Flags: review?(dbaron)
You removed Unread from the .cpp file but not the header.

It might be nice to see a patch merged with the checkins for bug 316394.
Blocks: 91242
Any chance you could attach a merged patch addressing comment 15?
Creating a new patch, it will take some while to build and test it though...
Build and tested with seamonkey, including the 3d testcase above.
Note, also replaced the nsFlatString with nsString in nsCSSParser.cpp.
Also made EatWhitespace return void, as the return value was never used.
Attachment #201096 - Attachment is obsolete: true
Attachment #202972 - Attachment is obsolete: true
Attachment #208958 - Flags: review?(dbaron)
Attachment #202972 - Flags: review?(dbaron)
I find it somewhat odd that you go out of your way in EnsureBuffer to ensure that it never returns 0.  It seems more natural to make 0 be the "no characters to read" return value, since the return value is the number of characters available.  You'd also then avoid the confusing (although harmless, I think) inconsistency between the way you check return values of EnsureBuffer.  You could also make mOffset and mCount unsigned, I think.

-  PRInt32 ch = Read(aErrorCode);
+  PRInt32 followingChar, nextChar, ch = Read(aErrorCode);

The reverse order of declaration would make a good bit more sense.
+    if (nextChar == '-') {
+      Read(aErrorCode);
+      if (LookAhead(aErrorCode, '>')) {
+        aToken.mType = eCSSToken_HTMLComment;
+        aToken.mIdent.AssignLiteral("-->");
+        return PR_TRUE;
+      }
+    }

You need to Pushback(nextChar) (or '-') in the case where the third char isn't '>', i.e., after the } following the return.

Please add a comment for ParseNumber that it handles '-' signs but not '+' signs.
You changed the behavior of NextURL in the case of URLs that are invalid because of whitespace in the middle -- it used to read up to the closing ')', but now doesn't.  Could you restore the old behavior unless there's a good reason for that change?  (I think our parsing is probably buggy either way, but correct parser code is probably more likely to benefit from reading up to the closing ')'.)
Your changes to the numeric escape parsing in ParseAndAppendEscape are wrong in at least the following ways (I didn't look too closely for other errors):
 * they eat arbitrary amounts of whitespace at the end of an escape when instead precisely one whitespace character is supposed to be eaten (treating CR-LF as one character)
 * they cause escapes terminated by EOF to be dropped (e.g., style="color: gree\6e")
 * they remove the protection against insertion of null bytes
+    // If nothing in pushback, first try get as much as possible in one go

s/try get/try to get/

+      // See how much we can Eat and Append in one go
+      PRInt32 walker = mOffset;

Eat and Append shouldn't be capitalized, and consume is probably better than eat

Also, walker is a bad variable name; a better term would be cursor, but variable names p or i would probably be better.
Your ParseNumber changes handle negative numbers with decimals incorrectly; they treat -7.1 as -6.9.  They also introduce the possibility of rounding error accumulation; you should instead use the standard string to float conversion functions.  I haven't reviewed the ParseNumber changes for any other problems.

Your ParseString changes need the same fixes I described in GatherIdent.
Comment on attachment 208958 [details] [diff] [review]
V6: Updated again (and comment #16)

review- per comment 19 through comment 25
Attachment #208958 - Flags: review?(dbaron) → review-
Attached file Interdiff between version 6 and 8 (obsolete) —
This is a diff between patch 6 and 8 showing how I incorporated the comments
Attachment #210223 - Attachment is patch: false
> I find it somewhat odd that you go out of your way in EnsureBuffer to ensure that it never returns 0.  It seems more natural to make 0 be the "no charactersto read" return value, since the return value is the number of characters available.  You'd also then avoid the confusing (although harmless, I think) inconsistency between the way you check return values of nsureBuffer.  You could also make mOffset and mCount unsigned, I think.
Done, made return 0 for EOF or error, and used PRUint32 for mCount and mOffset.

> -  PRInt32 ch = Read(aErrorCode);
> +  PRInt32 followingChar, nextChar, ch = Read(aErrorCode);
> The reverse order of declaration would make a good bit more sense.
Done, made ch a 'const PRInt32 ch', and moved followingChar and nextChar

> +    if (nextChar == '-') {
> +      Read(aErrorCode);
> +      if (LookAhead(aErrorCode, '>')) {
> +        aToken.mType = eCSSToken_HTMLComment;
> +        aToken.mIdent.AssignLiteral("-->");
> +        return PR_TRUE;
> +      }
> +    }
> 
> You need to Pushback(nextChar) (or '-') in the case where the third char isn't
> '>', i.e., after the } following the return.
Good catch, made logic as follows:
    nextChar = Read()
    if (nextChar =='-' and LookAhead('>') return HTML Comment
    else if (isDigit(nextChar) or nextChar=='.' and IsDigit(Peek))
        parsenumber
        negate number
        return
    PushBack(nextChar)
    if (IsIdentStart(nextChar))
        return ParseIdent
    break;
This minimizes PushBack

> Please add a comment for ParseNumber that it handles '-' signs but not '+'
> signs.
Made ParseNumber to also not handle '-', see above, and less code for positive numbers

> You changed the behavior of NextURL in the case of URLs that are invalid because of whitespace in the middle -- it used to read up to the closing ')', but now doesn't.  Could you restore the old behavior unless there's a good reason for that change?  (I think our parsing is probably buggy either way, but correct parser code is probably more likely to benefit from reading up to the closing ')'.)
Actually in both the original code and the patched version, NextURL always eats the whitespace, upto the ')', with the ')' PushBack'd. 
Because:
-          if (LookAhead(aErrorCode, ')')) {
-            Pushback(')');  // leave the closing symbol
is actually the same as:
+          if (Peek(aErrorCode) != ')') {

> Your changes to the numeric escape parsing in ParseAndAppendEscape are wrong in at least the following ways (I didn't look too closely for other errors):
>  * they eat arbitrary amounts of whitespace at the end of an escape when instead precisely one whitespace character is supposed to be eaten (treating CR-LF as one character)
> * they cause escapes terminated by EOF to be dropped (e.g., style="color: gree\6e")
> * they remove the protection against insertion of null bytes
Fixed to only read ony whitespace character, my wrong interpretation of the old code
Fixed to not drop escapes terminated by EOF
Fixed to not insert or pushback a null byte


> +    // If nothing in pushback, first try get as much as possible in one go
> s/try get/try to get/
Fixed

> +      // See how much we can Eat and Append in one go
> +      PRInt32 walker = mOffset;
> Eat and Append shouldn't be capitalized, and consume is probably better than eat
> Also, walker is a bad variable name; a better term would be cursor, but variable names p or i would probably be better.
> Your ParseString changes need the same fixes I described in GatherIdent.
Fixed in both locations, used 'n' as the counter (as in n bytes to consume)

> Your ParseNumber changes handle negative numbers with decimals incorrectly; they treat -7.1 as -6.9.  They also introduce the possibility of rounding error accumulation; you should instead use the standard string to float conversion functions.  I haven't reviewed the ParseNumber changes for any other problems.
Fixed, do Integer parsing directly, but used ToFloat to do float conversion (from nsCAutoString to prevent malloc and strcpy). This way string to float conversion is only done when really needed.
Attachment #208958 - Attachment is obsolete: true
Attachment #210224 - Flags: review?(dbaron)
Comment on attachment 210223 [details]
Interdiff between version 6 and 8

Not sure what this interdiff is between, but it's not between v6 and v8, since v6 didn't have FillBuffer at all.
Attachment #210223 - Attachment is obsolete: true
Attachment #210223 - Flags: review-
The replacement of EnsureBuffer with FillBuffer seems bad to me:  if you're going to do the same test before every call to a function, it should be part of the function.  If you want it outside for performance reasons (which I expect is why you made the change), then just make an inline function.  So you should just add an inline function EnsureBuffer in nsCSSScanner.h:

PRBool EnsureBuffer(nsresult &aErrorCode) {
  return mOffset < mCount || FillBuffer(aErrorCode);
}

and use that instead of the way you call FillBuffer?
Comment on attachment 210224 [details] [diff] [review]
Patch v8: With comments from dbaron incorporated

> #define nsCSSScanner_h___
> 
> #include "nsString.h"
> #include "nsCOMPtr.h"
> class nsIUnicharInputStream;
> 
>+
> // XXX turn this off for minimo builds
> #define CSS_REPORT_PARSE_ERRORS

Please just leave this as it was.
Attachment #211339 - Attachment description: accurate interdiff between v6 and v6 → accurate interdiff between V6 and v8
Thanks for the review, 
I have incorporated the last two comments.
Attachment #210224 - Attachment is obsolete: true
Attachment #211483 - Flags: review?(dbaron)
Attachment #210224 - Flags: review?(dbaron)
Attached patch V10: updated to current trunk (obsolete) — Splinter Review
Attachment #211483 - Attachment is obsolete: true
Attachment #211483 - Flags: review?(dbaron)
Comment on attachment 216822 [details] [diff] [review]
V10: updated to current trunk

dbaron, can you please review this patch before it is bitrotted again?
Attachment #216822 - Flags: review?(dbaron)
I'll try to get to it soon.  But see comment 10.
Yes, I know about trying to keep patches small, but this is what I feel to be a consistent patch. There are some other changes pending as well but I keep those for separate bugs.
David, any chance of reviewing this patch?
If not, who else can take this up?
It's a *lot* of work to review this patch.  It's on my list of things to do, but I have to do it carefully, given that:

 * you made a huge number of serious mistakes in the original patch -- a number that means your patches still need careful review and thus that you shouldn't be attaching large combined patches (see comment 10).  Doing good optimization requires understanding when two things are equivalent, when they're not, and when the difference in behavior matters, which is something that review showed I can't yet trust you to do.  (This means it might actually be easier for me to rewrite from scratch patches doing the same thing, from a list of the optimizations made, than to review yours.)

 * you've provided inaccurate diffs (see comment 29), so I can't trust you to provide such diffs and I have to make them myself, which becomes harder and harder as the patches get farther apart in time.

So to be honest I'm not sure I'm ever going to get to it given that it's one huge patch, but it does need my review before it's checked in.
David, I can understand your position, given the history of this bug, but I do have some experience in submitting code (netwerk/cache, modules/libjar, modules/libpr0n).

So, to help reviewing this one, I have:
* updated the patch to current tree
* reduced the size by only doing the following:
  * Code cleanup:
    + Moved the 'static' pieces from the class to the .cpp file itself
    + Made all references to gLexTable a direct reference
    + Replace the CheckLexTable with IsDigit, IsHexDigit, IsWhitespace
  * Ident gathering optimization:
    + EnsureBuffer/Read to make it easier for ParseString/Number/Ident to read directly from the buffer
    + Removed Unread and mLastRead (to make things simpler)
    + Made ParseString/GatherIdent to append as much as possible in one go directly from the read buffer.
    + Simplified ParseAndAppendEscape
    + ParseNumber: only gather the number in ident when we are parsing a float

So, this at least removes a lot of other changes to the scanner logic, and only the GatherIdent/ParseString/ParseNumber/ParseAndAppendEscape functions are really changed.
Attachment #201658 - Attachment is obsolete: true
Attachment #211339 - Attachment is obsolete: true
Attachment #216822 - Attachment is obsolete: true
Attachment #230994 - Flags: review?(dbaron)
Attachment #216822 - Flags: review?(dbaron)
Blocks: 203448
Attachment #230994 - Flags: review?(dbaron) → review?(bzbarsky)
Alfred, I'm pretty swamped in terms of reviews.  I'm not likely to get to this for months, and it would still need sr from dbaron after that, I think...
Attachment #230994 - Flags: review?(bzbarsky) → review?(dbaron)
For what it's worth, I think the best way to get this reviewed would be to completely split apart cleanup and optimization into separate patches.  That would certainly make me both able to get this sooner and happier to r+sr on my own.  Especially for the cleanup patch.

That said, I'd want to know the rationale for the cleanup.  Why is using global statics in the file better than using class statics?
Just do the GatherIdent, ParseNumber and ParseString optimization.
Notes:
* IsDigit and IsHexDigit added to class like IsIdentStart to keep class constants in place. Removed redundant 'GetLexTable'.
* Removed 'gLexTable' definition in nsCSSSCanner.cpp
* Removed unused BUFFER_SIZE define
* Replaced Unread with Pushback to explicitely push back the right character, removing the need for mLastRead
* Made '\r\n' filtering part of the Read function, so that other parsing doesn't have to bother about it.
Attachment #230994 - Attachment is obsolete: true
Attachment #246382 - Flags: review?(bzbarsky)
Attachment #230994 - Flags: review?(dbaron)
> * Removed 'gLexTable' definition in nsCSSSCanner.cpp

Why?  It seems to me like it should still be needed.

What tests have you run on this code?  The first thing that jumped out at me when looking at ParseNumber is that when parsing something like "height: 100.px" the old code will tokenize it as "height:" followed by the valid-integer number token "100", followed by junk that causes the declaration to be thrown out.  Your new code will get to the |+  if ((c == '.') && (IsDigit(c = Read(aErrorCode)))) {| line, set c to be 'p', test false.  So  mIntegerValid will remain true and since c >= 0 and 'p' is definitely an ident start, we'll parse this as a eCSSToken_Dimension with an integer of 100 and a unit of "px", which is wrong per CSS spec.

Is there any conceivable way to split up the three remaining optimizations into three separate patches?  Again, the smaller a patch, the easier it is to review it; review time grows _very_ nonlinearly with patch size...
Attached patch V13: More minimizations (obsolete) — Splinter Review
Fixed 100.px parsing.
Added back gLexTable define.
Removed some other changes.
Attachment #246382 - Attachment is obsolete: true
Attachment #246382 - Flags: review?(bzbarsky)
Again, it'd make me a lot happier to review this if I knew that every single change here has had a test written for it... Ideally, we'd land such tests with the patch itself.

As things stand, to be happy about it I'd need to first write said tests, then test the patch, then review the code...
QA Contact: ian → style-system
Attached patch Part 1: Scanning cleanups (obsolete) — Splinter Review
This patch contains all the cleanups/adding the functionality necessary for optimizations.

Pushback(ch) is equivalent to Unread() when the character in question was just read.  This patch restructures Read to allow a separate EnsureData call, and simplifies the handling of newlines by centralizing it in Read.
Attachment #271617 - Flags: review?
Attachment #271617 - Flags: review? → review?(bzbarsky)
Same question as for Alfred: what sort of testing have you done?
Eli, see comment 48.
Depends on: 392837
No longer depends on: 392837
Comment on attachment 271617 [details] [diff] [review]
Part 1: Scanning cleanups

Doing the cleanup in separate bugs.
Attachment #271617 - Attachment is obsolete: true
Attachment #271617 - Flags: review?(bzbarsky)
Comment on attachment 246759 [details] [diff] [review]
V13: More minimizations

This patch is obsolete, now that little parts of it have been addressed in separate patches:
3.96	sharparrow1%yahoo.com	2007-08-25 19:20	 	Bug 393499: Clean up character lookup in nsCSSScanner. r=bzbarsky, sr+a=dbaron
3.95	sharparrow1%yahoo.com	2007-08-23 16:01	 	Bug 393286: Make nsCSSScanner::Read regularize newlines. r+sr=bzbarsky, a=dbaron
3.94	sharparrow1%yahoo.com	2007-08-22 10:32	 	Bug 393080: Make count and offset in nsCSSScanner unsigned. r+sr=bzbarsky, a=dbaron
3.93	sharparrow1%yahoo.com	2007-08-21 11:29	 	Bug 392840: Get rid of nsCSSScanner::Unread. r+sr=bzbarsky, r+a=dbaron
3.92	sharparrow1%yahoo.com	2007-08-19 20:39	 	Bug 387511: CSs scanner causes parse error for URLs starting with a codepoint > 255. r+sr=bzbarsky, a=dbaron
Attachment #246759 - Attachment is obsolete: true
After the submits of the other bits, this parch is now really simple.

More optimizations are:
* ParseNumber (as in previous attached patches)
* Use 'switch' in Next(): instead of large number of if statements.
* NextURL: gather URL: Same principle as for GatherIdent, but also other cleanup possible.
* Make aErrorCcode and aToken params part of nsCSSScanner class.
* Kill EatNewline (only used in one location)
* Simplify parameters to the Report error functions (=step3)
Assignee: dbaron → alfredkayser
Status: NEW → ASSIGNED
Attachment #278633 - Flags: review?(bzbarsky)
Comment on attachment 278633 [details] [diff] [review]
Only do GatherIdent and ParseString

>+      for (;n < mCount; n++) {
>+        PRUnichar nextChar = mReadPointer[n];
>+        if ((nextChar == '\n') || (nextChar == '\r') ||

You also need to check for '\f'.

I'd also suggest taking a look at the mochitest test_parse_rule.html, which would have caught this mistake.  I wrote that test, which includes some tests from CSS testsuites plus some of my own tests.  It's not really complete, but it's a start; I'd appreciate any additions to that test to make it more thorough.

>     PRInt32 ch = Read(aErrorCode);
>-    if (ch < 0) {
>+    if (ch < 0  || ch == '\n') {
>+      aToken.mType = eCSSToken_Error;
>+#ifdef CSS_REPORT_PARSE_ERRORS
>+      ReportUnexpectedToken(aToken, "SEUnterminatedString");
>+#endif

Actually, according to the current WD, for EOF, we should just return PR_TRUE, and pretend that the string was in fact terminated.  (I sent an email to www-style, and this interpretation was confirmed by Ian Hickson.)
Comment on attachment 278633 [details] [diff] [review]
Only do GatherIdent and ParseString

As David and I have repeatedly asked:

  What tests have you run on this code? 

The fact that this patch fails one of our _existing_ regression tests makes me pretty suspicious off the bat.  I would have to see fairly detailed tests for the functionality the patch is changing, including various corner cases, as part of any patch for this bug.
Attachment #278633 - Flags: review?(bzbarsky) → review-
This what I get from the mochutest, even with the test for '\f' added as in comment #53

Toggle passed tests Toggle failed tests
Passed: 118
Failed: 1
Todo: 4
...
ok - .a::before {content: 'This is \a'}#a::before {content: 'FAIL \ \a'}
ok - #a:before {content: 'This is \a'}
ok - .a::before {content: 'This is \a'}#a:: before {content: 'FAIL'}
ok - .a::before {content: 'This is \a'}#a ::before {content: 'FAIL'}
not ok - #a::before {content: 'This is \a got "\"This is \\A \"", expected "\"This is \\A \""

So, I am not sure what the result is telling me.
The 'got' part is the same the 'expected'. So why is it failing.
I am new with the Mochitest framework, and find it difficult to determine what really went wrong.

(In reply to comment #55)
> So, I am not sure what the result is telling me.

That test tests the unterminated string case; it's currently marked todo.  If you fixed it in your tree, then you should remove the todo marking; in this file, the todo tests are listed at the bottom of each section of tests.
This patch handles \f correctly, and doesn't report an error for strings ending at EOF, and the corresponding test file has been updated.
This patch now runs fine against the Mochi test and also ACID2 is fine.
Attachment #278633 - Attachment is obsolete: true
Attachment #279426 - Flags: review?(bzbarsky)
See comment 54 last sentence.
See http://lxr.mozilla.org/mozilla/source/layout/style/test/test_parse_rule.html
That test file already contains very specific tests for ident and string parsing:

".IdE6n-3t0_6, #a { color: green }",
"#IdE6n-3t0_6, #a { color: green }",
"._ident, #a { color: green }",
"#_ident, #a { color: green }",
".-ident, .a { color: green; }", // Testsuite has incorrect version
"#怀ident, .a { color: green }",
"#iden怀t怀, .a { color: green }",
"#\\6000ident, .a { color: green }",
"#iden\\6000t\\6000, .a { color: green }",
".怀ident, .a { color: green }",
".iden怀t怀, .a { color: green }",
".\\6000ident, .a { color: green }",
".iden\\6000t\\6000, .a { color: green }",
base + "#6ident, #a {color: red }",
".id4ent6, .a { color: green }",
"#\\ident, .a { color: green; }",
"#ide\\n\\t, .a { color: green; }",
".\\6ident, .a { color: green; }",
".\\--ident, .a { color: green; }",

// Tests for strings and pseudos
{base : base = ".a::before {content: 'This is \\a'}",
 tests : [
// CSS 2.1 section 4.1.3
"#a::before {content: 'This is \\a '}",
"#a::before {content: 'This is \\A '}",
"#a::before {content: 'This is \\0000a '}",
"#a::before {content: 'This is \\00000a '}",
"#a::before {content: 'This is \\\n\\00000a '}",
"#a::before {content: 'This is \\\015\012\\00000a '}",
"#a::before {content: 'This is \\\015\\00000a '}",
"#a::before {content: 'This is \\\f\\00000a '}",
"#a::before {content: 'This is\\20\f\\a'}",
"#a::before {content: 'This is\\20\r\\a'}",
"#a::before {content: 'This is\\20\n\\a'}",
"#a::before {content: 'This is\\20\r\n\\a'}",
base + "#a::before {content: 'FAIL \f\\a'}",
base + "#a::before {content: 'FAIL \\\n\r\\a'}",
"#a:before {content: 'This is \\a'}",
"#a::before {content: 'This is \\a",

base + "#a:: before {content: 'FAIL'}",
base + "#a ::before {content: 'FAIL'}",
"#a::before {content: 'This is \\a",

], prop: "content", pseudo: "::before"
},


With the last patch, I could even move one 'todo' to 'ok' status.
I doubt I'll be able to review this in any sort of reasonable timeframe.  I'll see what I can do, but no promises.  :(
Attachment #279426 - Flags: review?(bzbarsky) → review?(dbaron)
OS: Linux → All
Hardware: PC → All
Any news on this?
Comment on attachment 279426 [details] [diff] [review]
V2: Handle \f and do update the Mochi test

bz, do you have some time now to review this perf win patch?
Attachment #279426 - Flags: superreview?(bzbarsky)
Flags: blocking1.9?
The only change from comment 60 is that I have less Mozilla time available now than I did then.  It'll only get worse as time goes on, up through May.
(In reply to comment #63)
> The only change from comment 60 is that I have less Mozilla time available now
> than I did then.  It'll only get worse as time goes on, up through May.
> 

Can you recommend another reviewer?
(In reply to comment #64)
> (In reply to comment #63)
> > The only change from comment 60 is that I have less Mozilla time available now
> > than I did then.  It'll only get worse as time goes on, up through May.
> > 
> 
> Can you recommend another reviewer?
> 

Roc you want to take this?
I think this should get review from me or Boris, and the work needed to review it is roughly equivalent to the work needed to write the patch (and tests) from scratch, given the prior issues with the patches in this bug.
(In reply to comment #66)
> I think this should get review from me or Boris, and the work needed to review
> it is roughly equivalent to the work needed to write the patch (and tests) from
> scratch, given the prior issues with the patches in this bug.
> 

Can I convince you to take the bug :-)?
--> DBaron.  Yell if you disagree :)
Assignee: alfredkayser → dbaron
Status: ASSIGNED → NEW
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
Comment on attachment 279426 [details] [diff] [review]
V2: Handle \f and do update the Mochi test

+        n++;

+      for (;n < mCount; n++) {

+          mColNumber++;

When using increments as a statement, prefer pre-increment over
post-increment.  It's conceptually simpler and more similar to other
things (e.g., =, +=), and when you're working with C++ operator
overloading can be computationally simpler as well, so it's a good
habit.

+      // Count number of Ident characters that can be processed

In ParseString, this comment is incorrect.  I'd just remove "Ident ".

+    if (ch == '\n') {
+      aToken.mType = eCSSToken_Error;
+#ifdef CSS_REPORT_PARSE_ERRORS
+      ReportUnexpectedToken(aToken, "SEUnterminatedString");
+#endif
+      return PR_FALSE;

returning false here is incorrect.  False means EOF.  You should return
PR_TRUE like the old code did.

An example of what this would break is probably:

p { color: green; foo: { "bar
; color: red; } }

(It would make paragraphs red, whereas the color:red should be part of
the declaration.)

If I'm correct that this is a testcase for what you broke, you should
add this to the parsing testcases; if not, you should figure out a
testcase.


+    if (ch < 0 || ch == aStop)
       break;

Follow local style and keep the braces around the break statement.


# +"#a::before {content: 'This is \\a",
#  
#  base + "#a:: before {content: 'FAIL'}",
#  base + "#a ::before {content: 'FAIL'}",
#  "#a::before {content: 'This is \\a",

The line you're adding is already at the bottom of the quoted 5 lines,
no need to add it again.  The todo list is a list of items already
listed elsewhere; just remove from the todo and don't add.


You'll get r+sr=dbaron if you fix *exactly* these things and don't
change anything else.
Attachment #279426 - Flags: superreview?(bzbarsky)
Attachment #279426 - Flags: superreview-
Attachment #279426 - Flags: review?(dbaron)
Attachment #279426 - Flags: review-
Performed and succeeded in running the Mochitests with this patch applied.
Attachment #279426 - Attachment is obsolete: true
Attachment #304183 - Flags: superreview?(dbaron)
Attachment #304183 - Flags: review?(dbaron)
You missed the following items from my previous review:

1:
> +        n++;

2:
> +          mColNumber++;

3:
> If I'm correct that this is a testcase for what you broke, you should
> add this to the parsing testcases; if not, you should figure out a
> testcase.

You also introduced two new errors:

4. You added:
+    if (ch < 0) {
+      return PR_FALSE;
+    }
Whereas it should have stayed a |break;|, so that GatherIdent returns true, since there is an identifier to return.  (It may be the last thing in the file, but the file's not over until after it's been parsed.)

Please add a test for this as well.

5. You added a trailing comma:

-], prop: "content", pseudo: "::before"
+], prop: "content", pseudo: "::before",

I'm not sure whether or not this breaks the test, although I suspect it does.
Attachment #304183 - Flags: superreview?(dbaron)
Attachment #304183 - Flags: superreview-
Attachment #304183 - Flags: review?(dbaron)
Attachment #304183 - Flags: review-
(In reply to comment #72)
> Please add a test for this as well.

Then again, I'd think we already have plenty of tests testing style="color:green", so it probably breaks some existing tests.  If there's already one that it breaks, no need to add another.
1: > +        n++;
Done.

2: > +          mColNumber++;
Done.

3: > If I'm correct that this is a testcase for what you broke, you should
> add this to the parsing testcases; if not, you should figure out a
> testcase.
No testcase added, as the existing cases already test all combinations of string termination.

4. You added:
+    if (ch < 0) {
+      return PR_FALSE;
+    }
Done. GatherIdent returns true, even when identifier is terminated by EOF.

> Please add a test for this as well.
Existing testcases do cover this already.

5. You added a trailing comma:
Removed.
Attachment #304183 - Attachment is obsolete: true
Attachment #304433 - Flags: superreview?(dbaron)
Comment on attachment 304433 [details] [diff] [review]
V4: Addressed all comments of comment #72

Sorry, my point number 4 wasn't clear -- it should have referred to ParseString rather than GatherIdent -- and it was code that you were putting back (i.e., the diff was smaller, but it showed up in the interdiff) -- that's what I get for reading interdiffs.

In any case, it's an error in either place.  Putting the old code to return 0 back removed your fix for the test you're unmarking todo, so the tests fail with this version of the patch.  That change should remain as it was in patch V2 so that the test failure fix is actually fixed.

Could I ask why I should believe you when you say that other tests already cover something if you haven't actually run the tests?  Next time, please actually run the tests before posting a new patch.

The change from "return PR_TRUE;" to "break;" between V3 and V4 is OK, but it was yet *more* for me to review.
Attachment #304433 - Flags: superreview?(dbaron)
Attachment #304433 - Flags: superreview-
Attachment #304433 - Flags: review-
By the way, I understand that what I say may not be perfectly clear:  I make typos, mistakes, and am sometimes unclear despite those.  But please don't pretend you've addressed comments if you don't understand what I'm asking.  Ask me what I meant.
(In reply to comment #74)
> 3: > If I'm correct that this is a testcase for what you broke, you should
> > add this to the parsing testcases; if not, you should figure out a
> > testcase.
> No testcase added, as the existing cases already test all combinations of
> string termination.

I reintroduced the mistake, and none of the existing testcases failed.  This isn't surprising, since they mostly test parsing of just one declaration.
This addresses the remaining comments; I'll land it shortly.  I'm tired of iterating here.
Attachment #304433 - Attachment is obsolete: true
Checked in to trunk.

Unfortunately I think I'm going to have to mark this bug fixed at this point even if it's not fully fixed.  The main problem Boris pointed out is fixed, but I think there are further things we could do to reduce string appending; I filed bug 418952.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Didn't seem to have any noticeable effect on the talos Ts numbers.
(In reply to comment #80)
> Didn't seem to have any noticeable effect on the talos Ts numbers.
> 

That's strange. I just tried the effect of this patch because it is similar to bug #121341 where the startup was then faster. Here are the numbers from my linux box:

without patch #304872
startup 572ms
valgrind Ir:
  GatherIdent 16121515
  ParseString 7846740

with patch #304872
startup 566ms
valgrind Ir:
  GatherIdent 7328073
  ParseString 1460635
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: