Closed
Bug 311566
Opened 19 years ago
Closed 17 years ago
nsCSSScanner::GatherIdent does lots of string appending
Categories
(Core :: CSS Parsing and Computation, defect, P3)
Core
CSS Parsing and Computation
Tracking
()
RESOLVED
FIXED
People
(Reporter: bzbarsky, Assigned: dbaron)
References
(Blocks 1 open bug)
Details
(Keywords: perf)
Attachments
(1 file, 20 obsolete files)
5.26 KB,
patch
|
Details | Diff | Splinter Review |
The profiler breakdown here looks like:
45819 385 2335 nsCSSScanner::GatherIdent(unsigned&, int, nsString&)
1538 nsSubstring::Replace(unsigned, unsigned, unsigned short
const*, unsigned)
256 nsCSSScanner::Read(unsigned&)
50 memcpy
45 nsSubstring::ReplacePrep(unsigned, unsigned, unsigned)
30 __i686.get_pc_thunk.bx
27 _init
4 StringUnicharInputStream::Read(unsigned short*, unsigned,
unsigned*)
This is when the total amount of time spent in CSS parsing was:
45198 705 18681 CSSParserImpl::ParseColorString
At least part of the problem is we're appending things one char at a time; it
would be nice if we could do better there, either by changing the scanner logic
or by having strings deal faster somehow (though I'm not sure how).
Comment 1•19 years ago
|
||
Instead of .Append'ing each character of an identifier or string, use a buffer to gather first as much as possible. With a buffer of 256, generally almost all strings and identifiers will then be .Append'ed in one .Append call.
Also, made mBuffer instead of dynamically allocated at creation of nsCSSScanner, a fixed member of the struct. (no risk of null pointer, and less malloc's).
Who can do some testing and performance analysis if this really helps?
Updated•19 years ago
|
Attachment #200982 -
Flags: review?(dbaron)
Assignee | ||
Comment 2•19 years ago
|
||
We're already reading out of a buffer. I think we shouldn't have another buffer; the ideal fix should just append from the initial buffer to the final one in larger chunks (though not necessarily all at once).
It also seems like your patch is missing changes (probably to the .h file).
Assignee | ||
Comment 3•19 years ago
|
||
(Boris, was that your intent when filing this?)
Reporter | ||
Comment 4•19 years ago
|
||
The intent was just to have a specific bug on something I saw taking up a large part of CSS property value parsing in some profiles....
Updated•19 years ago
|
Attachment #200982 -
Attachment is obsolete: true
Attachment #200982 -
Flags: review?(dbaron)
Comment 5•19 years ago
|
||
This version directly eats as much as possible from the read buffer, for both strings and ident's. NextURL could possibly have the same treatment, but is probably not worth the trouble. (url("..") are parsed as strings, but url(http..) not).
Furthermore, make the readbuffer a fixed member of nsCSSScanner.
And, EatNewline changed to IsNewline, to better cooperate with the above changes (and less Read/Pushback calls).
Any volunteers to do profiling and/or performance testing?
Attachment #200996 -
Flags: review?
Updated•19 years ago
|
Attachment #200996 -
Flags: review? → review?(dbaron)
Comment 6•19 years ago
|
||
Attachment #200996 -
Attachment is obsolete: true
Attachment #200996 -
Flags: review?(dbaron)
Updated•19 years ago
|
Attachment #200997 -
Flags: review?(dbaron)
Reporter | ||
Comment 7•19 years ago
|
||
Profiling on the canvas testcase in bug 297959 mostly shows that jprof's statistical sampling is kinda unreliable for things like this (about 2% of total time).... That said, it looks like while the time in GatherIdent is smaller the total time under CSSParserImpl::GetToken is not any smaller (and may be bigger) than it used to be.
Updated•19 years ago
|
Attachment #200997 -
Flags: review?(dbaron)
Comment 8•19 years ago
|
||
Overview of changes:
* Made all of gLexTable, and its constants and macros 'static inline' in nsCSSScanner.cpp (used to be spread all over)
* Made mBuffer a fixed element of nsCSSScanner (prevents a malloc)
* Added 'EnsureBuffer' to read more into the byffer if needed
* Removed //printf's...
* EatNewline -> IsNewline
* CheckLexTable removed
* ::Next changed to used 'switch' statement, for faster jumping, and for preventing 'Peek'/'Read'/'Pushback'
* ::ParseEscape: Simplified the parsing of hexnumbers: less 'pushback'
* ::GatherIdent: Append as much as possible in one go
* ::ParseNumber: Direct parse digits into integer, and only do float parsing if it has a dot, and use 'PR_strtod' directly on a local char buffer, instead of ToFloat (which copies, mallocs, parses, frees, etc). Except for dimensions, this then completely skips mIdent.Append stuff.
* ::ParseString: Append as much as possible in one go
* nsCSSToken: Moved PRPackedBool to the end to save 4 bytes in the struct
Sounds like a lot, but all are simple optimizations making the parsing simpler and faster. Codesize has even shrunk a little:
OLD: .lib: 1.487.868 nsCSSScanner.obj: 23.539
NEW: .lib: 1.487.126 nsCSSScanner.obj: 23.350
Note, I tested 'acid2', lots of websites, and the css2.1 test cases from w3c, and it all seems to parse OK, according to javascript console and the look of the pages.
Attachment #200997 -
Attachment is obsolete: true
Attachment #201096 -
Flags: review?
Updated•19 years ago
|
Attachment #201096 -
Flags: review? → review?(dbaron)
Comment 9•19 years ago
|
||
Testresults:
Count Old: New: Opt:
1 27036 24177 23494
2 27336 23864 23374
3 27004 24284 23564
4 27190 24128 23828
5 27133 24041 26670
6 26950 24222 23493
7 26738 24114 23610
8 27515 24353 23773
9 26880 23961 23355
10 27043 24353 23511
11 27161 24119 23480
12 27208 24085 23480
27099,5 24141,75 23802,66667
10,91% 12%
485,5327083 432,5396875 426,4644444 Milliseconds for parsing 1.1Mb css file.
52,99302083 59,06826389 Milliseconds saved!
Old is the old code, new is the code as in the current patch (v4).
Opt is further optimized, but I will submit this as a separate bug&patch for further improvements on the scanner (such as removal of a 'aErrorCode' argument and more).
In summary, this patch saves for parsing of 1.1Mb css file 53 milliseconds, from 485 milliseconds.
Or looking at startup and loading of normal pages:
For startup this patch results in 5milliseconds savings!
For a pageload estimated saving is about 1 to 2 millisecond!
Assignee | ||
Comment 10•19 years ago
|
||
If you keep adding pieces to the patch, it'll never get reviewed. It's also easier to review smaller pieces. (Also, when you attach a patch, you can request review from a person in one step, you don't need to set to review?, submit, and then fill in a name.)
Assignee | ||
Comment 11•19 years ago
|
||
Is there anything in attachment 200997 [details] [diff] [review] that was corrected (rather than added to) in attachment 201096 [details] [diff] [review]? I might start by reviewing the former.
Comment 12•19 years ago
|
||
Dbaron, I tried to do it in one step, but it didn't work because I didn't fill in a complete email address. Next time better ;-)
Also the v4 patch is for this bug the last one (except for changes for review comments), and the main difference is the faster parsing of numbers. But I you want to do it step by step, you can do V3 first.
Comment 13•19 years ago
|
||
Cleaner and smaller patch coming up (saving the more exotic optimisations (such as number parsing, move mToken, mErrorCode to nsScanner (and out of parameters of many calls to a separate bug).
Updated•19 years ago
|
Attachment #201096 -
Flags: review?(dbaron)
Comment 14•19 years ago
|
||
Summary of changes/fixes:
* Moved all static stuff to nsCSSScanner.cpp as real local static stuff for better inlining, and less member variables.
* Modified Read/Unread/Pushback to use 'EnsureBuffer' so that GatherString can easier get to the buffer itself (also gets rid of Unread and mLastRead)
* Moved the \r\n handling to within Read itself, so that scanning logic doesn't need to bother about it (gets rid of EatNewline)
* Next: use 'switch' and improve scan logic to only peek when really needed.
* Optimized ParseEscape: tigher loops
* GatherIdent, ParseString: Append as much as possible directly from buffer into the string
* ParseNumber: directly parse number as integer/float, skipping the aIdent.Append(c) and the ToFloat/ToInteger alltogether. ToFloat (and ToInteger?) do all kinds of allocs/memcpy/strtod stuff for most of the cases to convert a '1' or '1px' ...
Some speedtesting using the large cube at http://www.speich.net/computer/3d.php:
New Old (IE6.0)
3'095 3'454 2'816 ms
Improvement: 10,4% and very close to IE6.0!
Attachment #202972 -
Flags: review?(dbaron)
Assignee | ||
Comment 15•19 years ago
|
||
You removed Unread from the .cpp file but not the header.
It might be nice to see a patch merged with the checkins for bug 316394.
Assignee | ||
Comment 16•19 years ago
|
||
Any chance you could attach a merged patch addressing comment 15?
Comment 17•19 years ago
|
||
Creating a new patch, it will take some while to build and test it though...
Comment 18•19 years ago
|
||
Build and tested with seamonkey, including the 3d testcase above.
Note, also replaced the nsFlatString with nsString in nsCSSParser.cpp.
Also made EatWhitespace return void, as the return value was never used.
Attachment #201096 -
Attachment is obsolete: true
Attachment #202972 -
Attachment is obsolete: true
Attachment #208958 -
Flags: review?(dbaron)
Attachment #202972 -
Flags: review?(dbaron)
Assignee | ||
Comment 19•19 years ago
|
||
I find it somewhat odd that you go out of your way in EnsureBuffer to ensure that it never returns 0. It seems more natural to make 0 be the "no characters to read" return value, since the return value is the number of characters available. You'd also then avoid the confusing (although harmless, I think) inconsistency between the way you check return values of EnsureBuffer. You could also make mOffset and mCount unsigned, I think.
Assignee | ||
Comment 20•19 years ago
|
||
- PRInt32 ch = Read(aErrorCode);
+ PRInt32 followingChar, nextChar, ch = Read(aErrorCode);
The reverse order of declaration would make a good bit more sense.
Assignee | ||
Comment 21•19 years ago
|
||
+ if (nextChar == '-') {
+ Read(aErrorCode);
+ if (LookAhead(aErrorCode, '>')) {
+ aToken.mType = eCSSToken_HTMLComment;
+ aToken.mIdent.AssignLiteral("-->");
+ return PR_TRUE;
+ }
+ }
You need to Pushback(nextChar) (or '-') in the case where the third char isn't '>', i.e., after the } following the return.
Please add a comment for ParseNumber that it handles '-' signs but not '+' signs.
Assignee | ||
Comment 22•19 years ago
|
||
You changed the behavior of NextURL in the case of URLs that are invalid because of whitespace in the middle -- it used to read up to the closing ')', but now doesn't. Could you restore the old behavior unless there's a good reason for that change? (I think our parsing is probably buggy either way, but correct parser code is probably more likely to benefit from reading up to the closing ')'.)
Assignee | ||
Comment 23•19 years ago
|
||
Your changes to the numeric escape parsing in ParseAndAppendEscape are wrong in at least the following ways (I didn't look too closely for other errors):
* they eat arbitrary amounts of whitespace at the end of an escape when instead precisely one whitespace character is supposed to be eaten (treating CR-LF as one character)
* they cause escapes terminated by EOF to be dropped (e.g., style="color: gree\6e")
* they remove the protection against insertion of null bytes
Assignee | ||
Comment 24•19 years ago
|
||
+ // If nothing in pushback, first try get as much as possible in one go
s/try get/try to get/
+ // See how much we can Eat and Append in one go
+ PRInt32 walker = mOffset;
Eat and Append shouldn't be capitalized, and consume is probably better than eat
Also, walker is a bad variable name; a better term would be cursor, but variable names p or i would probably be better.
Assignee | ||
Comment 25•19 years ago
|
||
Your ParseNumber changes handle negative numbers with decimals incorrectly; they treat -7.1 as -6.9. They also introduce the possibility of rounding error accumulation; you should instead use the standard string to float conversion functions. I haven't reviewed the ParseNumber changes for any other problems.
Your ParseString changes need the same fixes I described in GatherIdent.
Assignee | ||
Comment 26•19 years ago
|
||
Comment on attachment 208958 [details] [diff] [review]
V6: Updated again (and comment #16)
review- per comment 19 through comment 25
Attachment #208958 -
Flags: review?(dbaron) → review-
Comment 27•19 years ago
|
||
This is a diff between patch 6 and 8 showing how I incorporated the comments
Updated•19 years ago
|
Attachment #210223 -
Attachment is patch: false
Comment 28•19 years ago
|
||
> I find it somewhat odd that you go out of your way in EnsureBuffer to ensure that it never returns 0. It seems more natural to make 0 be the "no charactersto read" return value, since the return value is the number of characters available. You'd also then avoid the confusing (although harmless, I think) inconsistency between the way you check return values of nsureBuffer. You could also make mOffset and mCount unsigned, I think.
Done, made return 0 for EOF or error, and used PRUint32 for mCount and mOffset.
> - PRInt32 ch = Read(aErrorCode);
> + PRInt32 followingChar, nextChar, ch = Read(aErrorCode);
> The reverse order of declaration would make a good bit more sense.
Done, made ch a 'const PRInt32 ch', and moved followingChar and nextChar
> + if (nextChar == '-') {
> + Read(aErrorCode);
> + if (LookAhead(aErrorCode, '>')) {
> + aToken.mType = eCSSToken_HTMLComment;
> + aToken.mIdent.AssignLiteral("-->");
> + return PR_TRUE;
> + }
> + }
>
> You need to Pushback(nextChar) (or '-') in the case where the third char isn't
> '>', i.e., after the } following the return.
Good catch, made logic as follows:
nextChar = Read()
if (nextChar =='-' and LookAhead('>') return HTML Comment
else if (isDigit(nextChar) or nextChar=='.' and IsDigit(Peek))
parsenumber
negate number
return
PushBack(nextChar)
if (IsIdentStart(nextChar))
return ParseIdent
break;
This minimizes PushBack
> Please add a comment for ParseNumber that it handles '-' signs but not '+'
> signs.
Made ParseNumber to also not handle '-', see above, and less code for positive numbers
> You changed the behavior of NextURL in the case of URLs that are invalid because of whitespace in the middle -- it used to read up to the closing ')', but now doesn't. Could you restore the old behavior unless there's a good reason for that change? (I think our parsing is probably buggy either way, but correct parser code is probably more likely to benefit from reading up to the closing ')'.)
Actually in both the original code and the patched version, NextURL always eats the whitespace, upto the ')', with the ')' PushBack'd.
Because:
- if (LookAhead(aErrorCode, ')')) {
- Pushback(')'); // leave the closing symbol
is actually the same as:
+ if (Peek(aErrorCode) != ')') {
> Your changes to the numeric escape parsing in ParseAndAppendEscape are wrong in at least the following ways (I didn't look too closely for other errors):
> * they eat arbitrary amounts of whitespace at the end of an escape when instead precisely one whitespace character is supposed to be eaten (treating CR-LF as one character)
> * they cause escapes terminated by EOF to be dropped (e.g., style="color: gree\6e")
> * they remove the protection against insertion of null bytes
Fixed to only read ony whitespace character, my wrong interpretation of the old code
Fixed to not drop escapes terminated by EOF
Fixed to not insert or pushback a null byte
> + // If nothing in pushback, first try get as much as possible in one go
> s/try get/try to get/
Fixed
> + // See how much we can Eat and Append in one go
> + PRInt32 walker = mOffset;
> Eat and Append shouldn't be capitalized, and consume is probably better than eat
> Also, walker is a bad variable name; a better term would be cursor, but variable names p or i would probably be better.
> Your ParseString changes need the same fixes I described in GatherIdent.
Fixed in both locations, used 'n' as the counter (as in n bytes to consume)
> Your ParseNumber changes handle negative numbers with decimals incorrectly; they treat -7.1 as -6.9. They also introduce the possibility of rounding error accumulation; you should instead use the standard string to float conversion functions. I haven't reviewed the ParseNumber changes for any other problems.
Fixed, do Integer parsing directly, but used ToFloat to do float conversion (from nsCAutoString to prevent malloc and strcpy). This way string to float conversion is only done when really needed.
Attachment #208958 -
Attachment is obsolete: true
Attachment #210224 -
Flags: review?(dbaron)
Assignee | ||
Comment 29•19 years ago
|
||
Comment on attachment 210223 [details]
Interdiff between version 6 and 8
Not sure what this interdiff is between, but it's not between v6 and v8, since v6 didn't have FillBuffer at all.
Attachment #210223 -
Attachment is obsolete: true
Attachment #210223 -
Flags: review-
Assignee | ||
Comment 30•19 years ago
|
||
The replacement of EnsureBuffer with FillBuffer seems bad to me: if you're going to do the same test before every call to a function, it should be part of the function. If you want it outside for performance reasons (which I expect is why you made the change), then just make an inline function. So you should just add an inline function EnsureBuffer in nsCSSScanner.h:
PRBool EnsureBuffer(nsresult &aErrorCode) {
return mOffset < mCount || FillBuffer(aErrorCode);
}
and use that instead of the way you call FillBuffer?
Assignee | ||
Comment 31•19 years ago
|
||
Assignee | ||
Comment 32•19 years ago
|
||
Comment on attachment 210224 [details] [diff] [review]
Patch v8: With comments from dbaron incorporated
> #define nsCSSScanner_h___
>
> #include "nsString.h"
> #include "nsCOMPtr.h"
> class nsIUnicharInputStream;
>
>+
> // XXX turn this off for minimo builds
> #define CSS_REPORT_PARSE_ERRORS
Please just leave this as it was.
Assignee | ||
Updated•19 years ago
|
Attachment #211339 -
Attachment description: accurate interdiff between v6 and v6 → accurate interdiff between V6 and v8
Comment 33•19 years ago
|
||
Thanks for the review,
I have incorporated the last two comments.
Attachment #210224 -
Attachment is obsolete: true
Attachment #211483 -
Flags: review?(dbaron)
Attachment #210224 -
Flags: review?(dbaron)
Comment 34•19 years ago
|
||
Attachment #211483 -
Attachment is obsolete: true
Attachment #211483 -
Flags: review?(dbaron)
Comment 35•19 years ago
|
||
Comment on attachment 216822 [details] [diff] [review]
V10: updated to current trunk
dbaron, can you please review this patch before it is bitrotted again?
Attachment #216822 -
Flags: review?(dbaron)
Assignee | ||
Comment 36•19 years ago
|
||
I'll try to get to it soon. But see comment 10.
Comment 37•19 years ago
|
||
Yes, I know about trying to keep patches small, but this is what I feel to be a consistent patch. There are some other changes pending as well but I keep those for separate bugs.
Comment 38•18 years ago
|
||
David, any chance of reviewing this patch?
If not, who else can take this up?
Assignee | ||
Comment 39•18 years ago
|
||
It's a *lot* of work to review this patch. It's on my list of things to do, but I have to do it carefully, given that:
* you made a huge number of serious mistakes in the original patch -- a number that means your patches still need careful review and thus that you shouldn't be attaching large combined patches (see comment 10). Doing good optimization requires understanding when two things are equivalent, when they're not, and when the difference in behavior matters, which is something that review showed I can't yet trust you to do. (This means it might actually be easier for me to rewrite from scratch patches doing the same thing, from a list of the optimizations made, than to review yours.)
* you've provided inaccurate diffs (see comment 29), so I can't trust you to provide such diffs and I have to make them myself, which becomes harder and harder as the patches get farther apart in time.
So to be honest I'm not sure I'm ever going to get to it given that it's one huge patch, but it does need my review before it's checked in.
Comment 40•18 years ago
|
||
David, I can understand your position, given the history of this bug, but I do have some experience in submitting code (netwerk/cache, modules/libjar, modules/libpr0n).
So, to help reviewing this one, I have:
* updated the patch to current tree
* reduced the size by only doing the following:
* Code cleanup:
+ Moved the 'static' pieces from the class to the .cpp file itself
+ Made all references to gLexTable a direct reference
+ Replace the CheckLexTable with IsDigit, IsHexDigit, IsWhitespace
* Ident gathering optimization:
+ EnsureBuffer/Read to make it easier for ParseString/Number/Ident to read directly from the buffer
+ Removed Unread and mLastRead (to make things simpler)
+ Made ParseString/GatherIdent to append as much as possible in one go directly from the read buffer.
+ Simplified ParseAndAppendEscape
+ ParseNumber: only gather the number in ident when we are parsing a float
So, this at least removes a lot of other changes to the scanner logic, and only the GatherIdent/ParseString/ParseNumber/ParseAndAppendEscape functions are really changed.
Attachment #201658 -
Attachment is obsolete: true
Attachment #211339 -
Attachment is obsolete: true
Attachment #216822 -
Attachment is obsolete: true
Attachment #230994 -
Flags: review?(dbaron)
Attachment #216822 -
Flags: review?(dbaron)
Updated•18 years ago
|
Attachment #230994 -
Flags: review?(dbaron) → review?(bzbarsky)
Reporter | ||
Comment 41•18 years ago
|
||
Alfred, I'm pretty swamped in terms of reviews. I'm not likely to get to this for months, and it would still need sr from dbaron after that, I think...
Updated•18 years ago
|
Attachment #230994 -
Flags: review?(bzbarsky) → review?(dbaron)
Reporter | ||
Comment 42•18 years ago
|
||
For what it's worth, I think the best way to get this reviewed would be to completely split apart cleanup and optimization into separate patches. That would certainly make me both able to get this sooner and happier to r+sr on my own. Especially for the cleanup patch.
That said, I'd want to know the rationale for the cleanup. Why is using global statics in the file better than using class statics?
Comment 43•18 years ago
|
||
Just do the GatherIdent, ParseNumber and ParseString optimization.
Notes:
* IsDigit and IsHexDigit added to class like IsIdentStart to keep class constants in place. Removed redundant 'GetLexTable'.
* Removed 'gLexTable' definition in nsCSSSCanner.cpp
* Removed unused BUFFER_SIZE define
* Replaced Unread with Pushback to explicitely push back the right character, removing the need for mLastRead
* Made '\r\n' filtering part of the Read function, so that other parsing doesn't have to bother about it.
Attachment #230994 -
Attachment is obsolete: true
Attachment #246382 -
Flags: review?(bzbarsky)
Attachment #230994 -
Flags: review?(dbaron)
Reporter | ||
Comment 44•18 years ago
|
||
> * Removed 'gLexTable' definition in nsCSSSCanner.cpp
Why? It seems to me like it should still be needed.
What tests have you run on this code? The first thing that jumped out at me when looking at ParseNumber is that when parsing something like "height: 100.px" the old code will tokenize it as "height:" followed by the valid-integer number token "100", followed by junk that causes the declaration to be thrown out. Your new code will get to the |+ if ((c == '.') && (IsDigit(c = Read(aErrorCode)))) {| line, set c to be 'p', test false. So mIntegerValid will remain true and since c >= 0 and 'p' is definitely an ident start, we'll parse this as a eCSSToken_Dimension with an integer of 100 and a unit of "px", which is wrong per CSS spec.
Is there any conceivable way to split up the three remaining optimizations into three separate patches? Again, the smaller a patch, the easier it is to review it; review time grows _very_ nonlinearly with patch size...
Comment 45•18 years ago
|
||
Fixed 100.px parsing.
Added back gLexTable define.
Removed some other changes.
Attachment #246382 -
Attachment is obsolete: true
Attachment #246382 -
Flags: review?(bzbarsky)
Reporter | ||
Comment 46•18 years ago
|
||
Again, it'd make me a lot happier to review this if I knew that every single change here has had a test written for it... Ideally, we'd land such tests with the patch itself.
As things stand, to be happy about it I'd need to first write said tests, then test the patch, then review the code...
Assignee | ||
Updated•17 years ago
|
QA Contact: ian → style-system
Comment 47•17 years ago
|
||
This patch contains all the cleanups/adding the functionality necessary for optimizations.
Pushback(ch) is equivalent to Unread() when the character in question was just read. This patch restructures Read to allow a separate EnsureData call, and simplifies the handling of newlines by centralizing it in Read.
Attachment #271617 -
Flags: review?
Updated•17 years ago
|
Attachment #271617 -
Flags: review? → review?(bzbarsky)
Reporter | ||
Comment 48•17 years ago
|
||
Same question as for Alfred: what sort of testing have you done?
Reporter | ||
Comment 49•17 years ago
|
||
Eli, see comment 48.
Comment 50•17 years ago
|
||
Comment on attachment 271617 [details] [diff] [review]
Part 1: Scanning cleanups
Doing the cleanup in separate bugs.
Attachment #271617 -
Attachment is obsolete: true
Attachment #271617 -
Flags: review?(bzbarsky)
Comment 51•17 years ago
|
||
Comment on attachment 246759 [details] [diff] [review]
V13: More minimizations
This patch is obsolete, now that little parts of it have been addressed in separate patches:
3.96 sharparrow1%yahoo.com 2007-08-25 19:20 Bug 393499: Clean up character lookup in nsCSSScanner. r=bzbarsky, sr+a=dbaron
3.95 sharparrow1%yahoo.com 2007-08-23 16:01 Bug 393286: Make nsCSSScanner::Read regularize newlines. r+sr=bzbarsky, a=dbaron
3.94 sharparrow1%yahoo.com 2007-08-22 10:32 Bug 393080: Make count and offset in nsCSSScanner unsigned. r+sr=bzbarsky, a=dbaron
3.93 sharparrow1%yahoo.com 2007-08-21 11:29 Bug 392840: Get rid of nsCSSScanner::Unread. r+sr=bzbarsky, r+a=dbaron
3.92 sharparrow1%yahoo.com 2007-08-19 20:39 Bug 387511: CSs scanner causes parse error for URLs starting with a codepoint > 255. r+sr=bzbarsky, a=dbaron
Attachment #246759 -
Attachment is obsolete: true
Comment 52•17 years ago
|
||
After the submits of the other bits, this parch is now really simple.
More optimizations are:
* ParseNumber (as in previous attached patches)
* Use 'switch' in Next(): instead of large number of if statements.
* NextURL: gather URL: Same principle as for GatherIdent, but also other cleanup possible.
* Make aErrorCcode and aToken params part of nsCSSScanner class.
* Kill EatNewline (only used in one location)
* Simplify parameters to the Report error functions (=step3)
Comment 53•17 years ago
|
||
Comment on attachment 278633 [details] [diff] [review]
Only do GatherIdent and ParseString
>+ for (;n < mCount; n++) {
>+ PRUnichar nextChar = mReadPointer[n];
>+ if ((nextChar == '\n') || (nextChar == '\r') ||
You also need to check for '\f'.
I'd also suggest taking a look at the mochitest test_parse_rule.html, which would have caught this mistake. I wrote that test, which includes some tests from CSS testsuites plus some of my own tests. It's not really complete, but it's a start; I'd appreciate any additions to that test to make it more thorough.
> PRInt32 ch = Read(aErrorCode);
>- if (ch < 0) {
>+ if (ch < 0 || ch == '\n') {
>+ aToken.mType = eCSSToken_Error;
>+#ifdef CSS_REPORT_PARSE_ERRORS
>+ ReportUnexpectedToken(aToken, "SEUnterminatedString");
>+#endif
Actually, according to the current WD, for EOF, we should just return PR_TRUE, and pretend that the string was in fact terminated. (I sent an email to www-style, and this interpretation was confirmed by Ian Hickson.)
Reporter | ||
Comment 54•17 years ago
|
||
Comment on attachment 278633 [details] [diff] [review]
Only do GatherIdent and ParseString
As David and I have repeatedly asked:
What tests have you run on this code?
The fact that this patch fails one of our _existing_ regression tests makes me pretty suspicious off the bat. I would have to see fairly detailed tests for the functionality the patch is changing, including various corner cases, as part of any patch for this bug.
Attachment #278633 -
Flags: review?(bzbarsky) → review-
Comment 55•17 years ago
|
||
This what I get from the mochutest, even with the test for '\f' added as in comment #53
Toggle passed tests Toggle failed tests
Passed: 118
Failed: 1
Todo: 4
...
ok - .a::before {content: 'This is \a'}#a::before {content: 'FAIL \ \a'}
ok - #a:before {content: 'This is \a'}
ok - .a::before {content: 'This is \a'}#a:: before {content: 'FAIL'}
ok - .a::before {content: 'This is \a'}#a ::before {content: 'FAIL'}
not ok - #a::before {content: 'This is \a got "\"This is \\A \"", expected "\"This is \\A \""
So, I am not sure what the result is telling me.
The 'got' part is the same the 'expected'. So why is it failing.
I am new with the Mochitest framework, and find it difficult to determine what really went wrong.
Comment 56•17 years ago
|
||
(In reply to comment #55)
> So, I am not sure what the result is telling me.
That test tests the unterminated string case; it's currently marked todo. If you fixed it in your tree, then you should remove the todo marking; in this file, the todo tests are listed at the bottom of each section of tests.
Comment 57•17 years ago
|
||
This patch handles \f correctly, and doesn't report an error for strings ending at EOF, and the corresponding test file has been updated.
This patch now runs fine against the Mochi test and also ACID2 is fine.
Attachment #278633 -
Attachment is obsolete: true
Attachment #279426 -
Flags: review?(bzbarsky)
Reporter | ||
Comment 58•17 years ago
|
||
See comment 54 last sentence.
Comment 59•17 years ago
|
||
See http://lxr.mozilla.org/mozilla/source/layout/style/test/test_parse_rule.html
That test file already contains very specific tests for ident and string parsing:
".IdE6n-3t0_6, #a { color: green }",
"#IdE6n-3t0_6, #a { color: green }",
"._ident, #a { color: green }",
"#_ident, #a { color: green }",
".-ident, .a { color: green; }", // Testsuite has incorrect version
"#怀ident, .a { color: green }",
"#iden怀t怀, .a { color: green }",
"#\\6000ident, .a { color: green }",
"#iden\\6000t\\6000, .a { color: green }",
".怀ident, .a { color: green }",
".iden怀t怀, .a { color: green }",
".\\6000ident, .a { color: green }",
".iden\\6000t\\6000, .a { color: green }",
base + "#6ident, #a {color: red }",
".id4ent6, .a { color: green }",
"#\\ident, .a { color: green; }",
"#ide\\n\\t, .a { color: green; }",
".\\6ident, .a { color: green; }",
".\\--ident, .a { color: green; }",
// Tests for strings and pseudos
{base : base = ".a::before {content: 'This is \\a'}",
tests : [
// CSS 2.1 section 4.1.3
"#a::before {content: 'This is \\a '}",
"#a::before {content: 'This is \\A '}",
"#a::before {content: 'This is \\0000a '}",
"#a::before {content: 'This is \\00000a '}",
"#a::before {content: 'This is \\\n\\00000a '}",
"#a::before {content: 'This is \\\015\012\\00000a '}",
"#a::before {content: 'This is \\\015\\00000a '}",
"#a::before {content: 'This is \\\f\\00000a '}",
"#a::before {content: 'This is\\20\f\\a'}",
"#a::before {content: 'This is\\20\r\\a'}",
"#a::before {content: 'This is\\20\n\\a'}",
"#a::before {content: 'This is\\20\r\n\\a'}",
base + "#a::before {content: 'FAIL \f\\a'}",
base + "#a::before {content: 'FAIL \\\n\r\\a'}",
"#a:before {content: 'This is \\a'}",
"#a::before {content: 'This is \\a",
base + "#a:: before {content: 'FAIL'}",
base + "#a ::before {content: 'FAIL'}",
"#a::before {content: 'This is \\a",
], prop: "content", pseudo: "::before"
},
With the last patch, I could even move one 'todo' to 'ok' status.
Reporter | ||
Comment 60•17 years ago
|
||
I doubt I'll be able to review this in any sort of reasonable timeframe. I'll see what I can do, but no promises. :(
Updated•17 years ago
|
Attachment #279426 -
Flags: review?(bzbarsky) → review?(dbaron)
Updated•17 years ago
|
OS: Linux → All
Hardware: PC → All
Comment 61•17 years ago
|
||
Any news on this?
Comment 62•17 years ago
|
||
Comment on attachment 279426 [details] [diff] [review]
V2: Handle \f and do update the Mochi test
bz, do you have some time now to review this perf win patch?
Attachment #279426 -
Flags: superreview?(bzbarsky)
Updated•17 years ago
|
Flags: blocking1.9?
Reporter | ||
Comment 63•17 years ago
|
||
The only change from comment 60 is that I have less Mozilla time available now than I did then. It'll only get worse as time goes on, up through May.
Comment 64•17 years ago
|
||
(In reply to comment #63)
> The only change from comment 60 is that I have less Mozilla time available now
> than I did then. It'll only get worse as time goes on, up through May.
>
Can you recommend another reviewer?
Comment 65•17 years ago
|
||
(In reply to comment #64)
> (In reply to comment #63)
> > The only change from comment 60 is that I have less Mozilla time available now
> > than I did then. It'll only get worse as time goes on, up through May.
> >
>
> Can you recommend another reviewer?
>
Roc you want to take this?
Assignee | ||
Comment 66•17 years ago
|
||
I think this should get review from me or Boris, and the work needed to review it is roughly equivalent to the work needed to write the patch (and tests) from scratch, given the prior issues with the patches in this bug.
Comment 67•17 years ago
|
||
(In reply to comment #66)
> I think this should get review from me or Boris, and the work needed to review
> it is roughly equivalent to the work needed to write the patch (and tests) from
> scratch, given the prior issues with the patches in this bug.
>
Can I convince you to take the bug :-)?
Assignee | ||
Comment 68•17 years ago
|
||
The reason for the attention seems to be the data in bug 406810, including http://www.allpeers.com/download/michal/new/callgraph/FF3/nsCSSScanner+GatherIdent.png
Comment 69•17 years ago
|
||
--> DBaron. Yell if you disagree :)
Assignee: alfredkayser → dbaron
Status: ASSIGNED → NEW
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
Assignee | ||
Updated•17 years ago
|
Priority: P2 → P3
Assignee | ||
Comment 70•17 years ago
|
||
Comment on attachment 279426 [details] [diff] [review]
V2: Handle \f and do update the Mochi test
+ n++;
+ for (;n < mCount; n++) {
+ mColNumber++;
When using increments as a statement, prefer pre-increment over
post-increment. It's conceptually simpler and more similar to other
things (e.g., =, +=), and when you're working with C++ operator
overloading can be computationally simpler as well, so it's a good
habit.
+ // Count number of Ident characters that can be processed
In ParseString, this comment is incorrect. I'd just remove "Ident ".
+ if (ch == '\n') {
+ aToken.mType = eCSSToken_Error;
+#ifdef CSS_REPORT_PARSE_ERRORS
+ ReportUnexpectedToken(aToken, "SEUnterminatedString");
+#endif
+ return PR_FALSE;
returning false here is incorrect. False means EOF. You should return
PR_TRUE like the old code did.
An example of what this would break is probably:
p { color: green; foo: { "bar
; color: red; } }
(It would make paragraphs red, whereas the color:red should be part of
the declaration.)
If I'm correct that this is a testcase for what you broke, you should
add this to the parsing testcases; if not, you should figure out a
testcase.
+ if (ch < 0 || ch == aStop)
break;
Follow local style and keep the braces around the break statement.
# +"#a::before {content: 'This is \\a",
#
# base + "#a:: before {content: 'FAIL'}",
# base + "#a ::before {content: 'FAIL'}",
# "#a::before {content: 'This is \\a",
The line you're adding is already at the bottom of the quoted 5 lines,
no need to add it again. The todo list is a list of items already
listed elsewhere; just remove from the todo and don't add.
You'll get r+sr=dbaron if you fix *exactly* these things and don't
change anything else.
Attachment #279426 -
Flags: superreview?(bzbarsky)
Attachment #279426 -
Flags: superreview-
Attachment #279426 -
Flags: review?(dbaron)
Attachment #279426 -
Flags: review-
Comment 71•17 years ago
|
||
Performed and succeeded in running the Mochitests with this patch applied.
Attachment #279426 -
Attachment is obsolete: true
Attachment #304183 -
Flags: superreview?(dbaron)
Updated•17 years ago
|
Attachment #304183 -
Flags: review?(dbaron)
Assignee | ||
Comment 72•17 years ago
|
||
You missed the following items from my previous review:
1:
> + n++;
2:
> + mColNumber++;
3:
> If I'm correct that this is a testcase for what you broke, you should
> add this to the parsing testcases; if not, you should figure out a
> testcase.
You also introduced two new errors:
4. You added:
+ if (ch < 0) {
+ return PR_FALSE;
+ }
Whereas it should have stayed a |break;|, so that GatherIdent returns true, since there is an identifier to return. (It may be the last thing in the file, but the file's not over until after it's been parsed.)
Please add a test for this as well.
5. You added a trailing comma:
-], prop: "content", pseudo: "::before"
+], prop: "content", pseudo: "::before",
I'm not sure whether or not this breaks the test, although I suspect it does.
Assignee | ||
Updated•17 years ago
|
Attachment #304183 -
Flags: superreview?(dbaron)
Attachment #304183 -
Flags: superreview-
Attachment #304183 -
Flags: review?(dbaron)
Attachment #304183 -
Flags: review-
Assignee | ||
Comment 73•17 years ago
|
||
(In reply to comment #72)
> Please add a test for this as well.
Then again, I'd think we already have plenty of tests testing style="color:green", so it probably breaks some existing tests. If there's already one that it breaks, no need to add another.
Comment 74•17 years ago
|
||
1: > + n++;
Done.
2: > + mColNumber++;
Done.
3: > If I'm correct that this is a testcase for what you broke, you should
> add this to the parsing testcases; if not, you should figure out a
> testcase.
No testcase added, as the existing cases already test all combinations of string termination.
4. You added:
+ if (ch < 0) {
+ return PR_FALSE;
+ }
Done. GatherIdent returns true, even when identifier is terminated by EOF.
> Please add a test for this as well.
Existing testcases do cover this already.
5. You added a trailing comma:
Removed.
Attachment #304183 -
Attachment is obsolete: true
Attachment #304433 -
Flags: superreview?(dbaron)
Assignee | ||
Comment 75•17 years ago
|
||
Comment on attachment 304433 [details] [diff] [review]
V4: Addressed all comments of comment #72
Sorry, my point number 4 wasn't clear -- it should have referred to ParseString rather than GatherIdent -- and it was code that you were putting back (i.e., the diff was smaller, but it showed up in the interdiff) -- that's what I get for reading interdiffs.
In any case, it's an error in either place. Putting the old code to return 0 back removed your fix for the test you're unmarking todo, so the tests fail with this version of the patch. That change should remain as it was in patch V2 so that the test failure fix is actually fixed.
Could I ask why I should believe you when you say that other tests already cover something if you haven't actually run the tests? Next time, please actually run the tests before posting a new patch.
The change from "return PR_TRUE;" to "break;" between V3 and V4 is OK, but it was yet *more* for me to review.
Attachment #304433 -
Flags: superreview?(dbaron)
Attachment #304433 -
Flags: superreview-
Attachment #304433 -
Flags: review-
Assignee | ||
Comment 76•17 years ago
|
||
By the way, I understand that what I say may not be perfectly clear: I make typos, mistakes, and am sometimes unclear despite those. But please don't pretend you've addressed comments if you don't understand what I'm asking. Ask me what I meant.
Assignee | ||
Comment 77•17 years ago
|
||
(In reply to comment #74)
> 3: > If I'm correct that this is a testcase for what you broke, you should
> > add this to the parsing testcases; if not, you should figure out a
> > testcase.
> No testcase added, as the existing cases already test all combinations of
> string termination.
I reintroduced the mistake, and none of the existing testcases failed. This isn't surprising, since they mostly test parsing of just one declaration.
Assignee | ||
Comment 78•17 years ago
|
||
This addresses the remaining comments; I'll land it shortly. I'm tired of iterating here.
Attachment #304433 -
Attachment is obsolete: true
Assignee | ||
Comment 79•17 years ago
|
||
Checked in to trunk.
Unfortunately I think I'm going to have to mark this bug fixed at this point even if it's not fully fixed. The main problem Boris pointed out is fixed, but I think there are further things we could do to reduce string appending; I filed bug 418952.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 80•17 years ago
|
||
Didn't seem to have any noticeable effect on the talos Ts numbers.
Comment 81•17 years ago
|
||
(In reply to comment #80)
> Didn't seem to have any noticeable effect on the talos Ts numbers.
>
That's strange. I just tried the effect of this patch because it is similar to bug #121341 where the startup was then faster. Here are the numbers from my linux box:
without patch #304872
startup 572ms
valgrind Ir:
GatherIdent 16121515
ParseString 7846740
with patch #304872
startup 566ms
valgrind Ir:
GatherIdent 7328073
ParseString 1460635
You need to log in
before you can comment on or make changes to this bug.
Description
•