Closed Bug 311566 Opened 20 years ago Closed 17 years ago

nsCSSScanner::GatherIdent does lots of string appending

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: dbaron)

References

Details

(Keywords: perf)

Attachments

(1 file, 20 obsolete files)

Quick patch to use a local buffer to gather strings and identifiers 20 years ago Alfred Kayser 6.90 KB, patch		Details \| Diff \| Splinter Review
V2: Directy append string/ident chunks from the readbuffer 20 years ago Alfred Kayser 10.39 KB, patch		Details \| Diff \| Splinter Review
Patch complete with the include file changes! 20 years ago Alfred Kayser 12.74 KB, patch		Details \| Diff \| Splinter Review
V4: More improvements, especially on number and escape parsing 20 years ago Alfred Kayser 30.25 KB, patch		Details \| Diff \| Splinter Review
spreadsheet with some performance testing results 20 years ago Alfred Kayser 17.00 KB, application/octet-stream		Details
V5: Updated to current trunk 20 years ago Alfred Kayser 28.74 KB, patch		Details \| Diff \| Splinter Review
V6: Updated again (and comment #16) 20 years ago Alfred Kayser 35.46 KB, patch	dbaron : review-	Details \| Diff \| Splinter Review
Interdiff between version 6 and 8 20 years ago Alfred Kayser 16.59 KB, text/plain	dbaron : review-	Details
Patch v8: With comments from dbaron incorporated 20 years ago Alfred Kayser 39.55 KB, patch		Details \| Diff \| Splinter Review
accurate interdiff between V6 and v8 19 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 22.71 KB, patch		Details \| Diff \| Splinter Review
V9: With the last two comments incorporated 19 years ago Alfred Kayser 39.16 KB, patch		Details \| Diff \| Splinter Review
V10: updated to current trunk 19 years ago Alfred Kayser 39.33 KB, patch		Details \| Diff \| Splinter Review
V11: Updated to current trunk and reduced to half 19 years ago Alfred Kayser 24.62 KB, patch		Details \| Diff \| Splinter Review
V12: Minimial changes to just optimize string and number parsing 19 years ago Alfred Kayser 18.90 KB, patch		Details \| Diff \| Splinter Review
V13: More minimizations 19 years ago Alfred Kayser 16.56 KB, patch		Details \| Diff \| Splinter Review
Part 1: Scanning cleanups 18 years ago Eli Friedman 15.03 KB, patch		Details \| Diff \| Splinter Review
Only do GatherIdent and ParseString 18 years ago Alfred Kayser 3.35 KB, patch	bzbarsky : review-	Details \| Diff \| Splinter Review
V2: Handle \f and do update the Mochi test 18 years ago Alfred Kayser 4.66 KB, patch	dbaron : review- dbaron : superreview-	Details \| Diff \| Splinter Review
V3: Incorporate exactly the comments from David Baron 17 years ago Alfred Kayser 4.32 KB, patch	dbaron : review- dbaron : superreview-	Details \| Diff \| Splinter Review
V4: Addressed all comments of comment #72 17 years ago Alfred Kayser 4.40 KB, patch	dbaron : review- dbaron : superreview-	Details \| Diff \| Splinter Review
V5: address comments 17 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 5.26 KB, patch		Details \| Diff \| Splinter Review

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

20 years ago

The profiler breakdown here looks like: 45819 385 2335 nsCSSScanner::GatherIdent(unsigned&, int, nsString&) 1538 nsSubstring::Replace(unsigned, unsigned, unsigned short const*, unsigned) 256 nsCSSScanner::Read(unsigned&) 50 memcpy 45 nsSubstring::ReplacePrep(unsigned, unsigned, unsigned) 30 __i686.get_pc_thunk.bx 27 _init 4 StringUnicharInputStream::Read(unsigned short*, unsigned, unsigned*) This is when the total amount of time spent in CSS parsing was: 45198 705 18681 CSSParserImpl::ParseColorString At least part of the problem is we're appending things one char at a time; it would be nice if we could do better there, either by changing the scanner logic or by having strings deal faster somehow (though I'm not sure how).

Alfred Kayser

Updated

•

20 years ago

Blocks: 106356

Alfred Kayser

Comment 1

•

20 years ago

Attached patch Quick patch to use a local buffer to gather strings and identifiers (obsolete) — Details — Splinter Review

Instead of .Append'ing each character of an identifier or string, use a buffer to gather first as much as possible. With a buffer of 256, generally almost all strings and identifiers will then be .Append'ed in one .Append call. Also, made mBuffer instead of dynamically allocated at creation of nsCSSScanner, a fixed member of the struct. (no risk of null pointer, and less malloc's). Who can do some testing and performance analysis if this really helps?

Alfred Kayser

Updated

•

20 years ago

Attachment #200982 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 2

•

20 years ago

We're already reading out of a buffer. I think we shouldn't have another buffer; the ideal fix should just append from the initial buffer to the final one in larger chunks (though not necessarily all at once). It also seems like your patch is missing changes (probably to the .h file).

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 3

•

20 years ago

(Boris, was that your intent when filing this?)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 4

•

20 years ago

The intent was just to have a specific bug on something I saw taking up a large part of CSS property value parsing in some profiles....

Alfred Kayser

Updated

•

20 years ago

Attachment #200982 - Attachment is obsolete: true

Attachment #200982 - Flags: review?(dbaron)

Alfred Kayser

Comment 5

•

20 years ago

Attached patch V2: Directy append string/ident chunks from the readbuffer (obsolete) — Details — Splinter Review

This version directly eats as much as possible from the read buffer, for both strings and ident's. NextURL could possibly have the same treatment, but is probably not worth the trouble. (url("..") are parsed as strings, but url(http..) not). Furthermore, make the readbuffer a fixed member of nsCSSScanner. And, EatNewline changed to IsNewline, to better cooperate with the above changes (and less Read/Pushback calls). Any volunteers to do profiling and/or performance testing?

Attachment #200996 - Flags: review?

Alfred Kayser

Updated

•

20 years ago

Attachment #200996 - Flags: review? → review?(dbaron)

Alfred Kayser

Comment 6

•

20 years ago

Attached patch Patch complete with the include file changes! (obsolete) — Details — Splinter Review

Attachment #200996 - Attachment is obsolete: true

Attachment #200996 - Flags: review?(dbaron)

Alfred Kayser

Updated

•

20 years ago

Attachment #200997 - Flags: review?(dbaron)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 7

•

20 years ago

Profiling on the canvas testcase in bug 297959 mostly shows that jprof's statistical sampling is kinda unreliable for things like this (about 2% of total time).... That said, it looks like while the time in GatherIdent is smaller the total time under CSSParserImpl::GetToken is not any smaller (and may be bigger) than it used to be.

Alfred Kayser

Updated

•

20 years ago

Attachment #200997 - Flags: review?(dbaron)

Alfred Kayser

Comment 8

•

20 years ago

Attached patch V4: More improvements, especially on number and escape parsing (obsolete) — Details — Splinter Review

Overview of changes: * Made all of gLexTable, and its constants and macros 'static inline' in nsCSSScanner.cpp (used to be spread all over) * Made mBuffer a fixed element of nsCSSScanner (prevents a malloc) * Added 'EnsureBuffer' to read more into the byffer if needed * Removed //printf's... * EatNewline -> IsNewline * CheckLexTable removed * ::Next changed to used 'switch' statement, for faster jumping, and for preventing 'Peek'/'Read'/'Pushback' * ::ParseEscape: Simplified the parsing of hexnumbers: less 'pushback' * ::GatherIdent: Append as much as possible in one go * ::ParseNumber: Direct parse digits into integer, and only do float parsing if it has a dot, and use 'PR_strtod' directly on a local char buffer, instead of ToFloat (which copies, mallocs, parses, frees, etc). Except for dimensions, this then completely skips mIdent.Append stuff. * ::ParseString: Append as much as possible in one go * nsCSSToken: Moved PRPackedBool to the end to save 4 bytes in the struct Sounds like a lot, but all are simple optimizations making the parsing simpler and faster. Codesize has even shrunk a little: OLD: .lib: 1.487.868 nsCSSScanner.obj: 23.539 NEW: .lib: 1.487.126 nsCSSScanner.obj: 23.350 Note, I tested 'acid2', lots of websites, and the css2.1 test cases from w3c, and it all seems to parse OK, according to javascript console and the look of the pages.

Attachment #200997 - Attachment is obsolete: true

Attachment #201096 - Flags: review?

Alfred Kayser

Updated

•

20 years ago

Attachment #201096 - Flags: review? → review?(dbaron)

Alfred Kayser

Comment 9

•

20 years ago

Attached file spreadsheet with some performance testing results (obsolete) — Details

Testresults: Count Old: New: Opt: 1 27036 24177 23494 2 27336 23864 23374 3 27004 24284 23564 4 27190 24128 23828 5 27133 24041 26670 6 26950 24222 23493 7 26738 24114 23610 8 27515 24353 23773 9 26880 23961 23355 10 27043 24353 23511 11 27161 24119 23480 12 27208 24085 23480 27099,5 24141,75 23802,66667 10,91% 12% 485,5327083 432,5396875 426,4644444 Milliseconds for parsing 1.1Mb css file. 52,99302083 59,06826389 Milliseconds saved! Old is the old code, new is the code as in the current patch (v4). Opt is further optimized, but I will submit this as a separate bug&patch for further improvements on the scanner (such as removal of a 'aErrorCode' argument and more). In summary, this patch saves for parsing of 1.1Mb css file 53 milliseconds, from 485 milliseconds. Or looking at startup and loading of normal pages: For startup this patch results in 5milliseconds savings! For a pageload estimated saving is about 1 to 2 millisecond!

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 10

•

20 years ago

If you keep adding pieces to the patch, it'll never get reviewed. It's also easier to review smaller pieces. (Also, when you attach a patch, you can request review from a person in one step, you don't need to set to review?, submit, and then fill in a name.)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 11

•

20 years ago

Is there anything in attachment 200997 [details] [diff] [review] that was corrected (rather than added to) in attachment 201096 [details] [diff] [review]? I might start by reviewing the former.

Alfred Kayser

Comment 12

•

20 years ago

Dbaron, I tried to do it in one step, but it didn't work because I didn't fill in a complete email address. Next time better ;-) Also the v4 patch is for this bug the last one (except for changes for review comments), and the main difference is the faster parsing of numbers. But I you want to do it step by step, you can do V3 first.

Alfred Kayser

Updated

•

20 years ago

Blocks: 229391

Alfred Kayser

Comment 13

•

20 years ago

Cleaner and smaller patch coming up (saving the more exotic optimisations (such as number parsing, move mToken, mErrorCode to nsScanner (and out of parameters of many calls to a separate bug).

Alfred Kayser

Updated

•

20 years ago

Attachment #201096 - Flags: review?(dbaron)

Alfred Kayser

Comment 14

•

20 years ago

Attached patch V5: Updated to current trunk (obsolete) — Details — Splinter Review

Summary of changes/fixes: * Moved all static stuff to nsCSSScanner.cpp as real local static stuff for better inlining, and less member variables. * Modified Read/Unread/Pushback to use 'EnsureBuffer' so that GatherString can easier get to the buffer itself (also gets rid of Unread and mLastRead) * Moved the \r\n handling to within Read itself, so that scanning logic doesn't need to bother about it (gets rid of EatNewline) * Next: use 'switch' and improve scan logic to only peek when really needed. * Optimized ParseEscape: tigher loops * GatherIdent, ParseString: Append as much as possible directly from buffer into the string * ParseNumber: directly parse number as integer/float, skipping the aIdent.Append(c) and the ToFloat/ToInteger alltogether. ToFloat (and ToInteger?) do all kinds of allocs/memcpy/strtod stuff for most of the cases to convert a '1' or '1px' ... Some speedtesting using the large cube at http://www.speich.net/computer/3d.php: New Old (IE6.0) 3'095 3'454 2'816 ms Improvement: 10,4% and very close to IE6.0!

Attachment #202972 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 15

•

20 years ago

You removed Unread from the .cpp file but not the header. It might be nice to see a patch merged with the checkins for bug 316394.

Alfred Kayser

Updated

•

20 years ago

Blocks: 91242

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 16

•

20 years ago

Any chance you could attach a merged patch addressing comment 15?

Alfred Kayser

Comment 17

•

20 years ago

Creating a new patch, it will take some while to build and test it though...

Alfred Kayser

Comment 18

•

20 years ago

Attached patch V6: Updated again (and comment #16) (obsolete) — Details — Splinter Review

Build and tested with seamonkey, including the 3d testcase above. Note, also replaced the nsFlatString with nsString in nsCSSParser.cpp. Also made EatWhitespace return void, as the return value was never used.

Attachment #201096 - Attachment is obsolete: true

Attachment #202972 - Attachment is obsolete: true

Attachment #208958 - Flags: review?(dbaron)

Attachment #202972 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 19

•

20 years ago

I find it somewhat odd that you go out of your way in EnsureBuffer to ensure that it never returns 0. It seems more natural to make 0 be the "no characters to read" return value, since the return value is the number of characters available. You'd also then avoid the confusing (although harmless, I think) inconsistency between the way you check return values of EnsureBuffer. You could also make mOffset and mCount unsigned, I think.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 20

•

20 years ago

- PRInt32 ch = Read(aErrorCode); + PRInt32 followingChar, nextChar, ch = Read(aErrorCode); The reverse order of declaration would make a good bit more sense.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 21

•

20 years ago

+ if (nextChar == '-') { + Read(aErrorCode); + if (LookAhead(aErrorCode, '>')) { + aToken.mType = eCSSToken_HTMLComment; + aToken.mIdent.AssignLiteral("-->"); + return PR_TRUE; + } + } You need to Pushback(nextChar) (or '-') in the case where the third char isn't '>', i.e., after the } following the return. Please add a comment for ParseNumber that it handles '-' signs but not '+' signs.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 22

•

20 years ago

You changed the behavior of NextURL in the case of URLs that are invalid because of whitespace in the middle -- it used to read up to the closing ')', but now doesn't. Could you restore the old behavior unless there's a good reason for that change? (I think our parsing is probably buggy either way, but correct parser code is probably more likely to benefit from reading up to the closing ')'.)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 23

•

20 years ago

Your changes to the numeric escape parsing in ParseAndAppendEscape are wrong in at least the following ways (I didn't look too closely for other errors): * they eat arbitrary amounts of whitespace at the end of an escape when instead precisely one whitespace character is supposed to be eaten (treating CR-LF as one character) * they cause escapes terminated by EOF to be dropped (e.g., style="color: gree\6e") * they remove the protection against insertion of null bytes

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 24

•

20 years ago

+ // If nothing in pushback, first try get as much as possible in one go s/try get/try to get/ + // See how much we can Eat and Append in one go + PRInt32 walker = mOffset; Eat and Append shouldn't be capitalized, and consume is probably better than eat Also, walker is a bad variable name; a better term would be cursor, but variable names p or i would probably be better.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 25

•

20 years ago

Your ParseNumber changes handle negative numbers with decimals incorrectly; they treat -7.1 as -6.9. They also introduce the possibility of rounding error accumulation; you should instead use the standard string to float conversion functions. I haven't reviewed the ParseNumber changes for any other problems. Your ParseString changes need the same fixes I described in GatherIdent.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 26

•

20 years ago

Comment on attachment 208958 [details] [diff] [review] V6: Updated again (and comment #16) review- per comment 19 through comment 25

Attachment #208958 - Flags: review?(dbaron) → review-

Alfred Kayser

Comment 27

•

20 years ago

Attached file Interdiff between version 6 and 8 (obsolete) — Details

This is a diff between patch 6 and 8 showing how I incorporated the comments

Alfred Kayser

Updated

•

20 years ago

Attachment #210223 - Attachment is patch: false

Alfred Kayser

Comment 28

•

20 years ago

Attached patch Patch v8: With comments from dbaron incorporated (obsolete) — Details — Splinter Review

> I find it somewhat odd that you go out of your way in EnsureBuffer to ensure that it never returns 0. It seems more natural to make 0 be the "no charactersto read" return value, since the return value is the number of characters available. You'd also then avoid the confusing (although harmless, I think) inconsistency between the way you check return values of nsureBuffer. You could also make mOffset and mCount unsigned, I think. Done, made return 0 for EOF or error, and used PRUint32 for mCount and mOffset. > - PRInt32 ch = Read(aErrorCode); > + PRInt32 followingChar, nextChar, ch = Read(aErrorCode); > The reverse order of declaration would make a good bit more sense. Done, made ch a 'const PRInt32 ch', and moved followingChar and nextChar > + if (nextChar == '-') { > + Read(aErrorCode); > + if (LookAhead(aErrorCode, '>')) { > + aToken.mType = eCSSToken_HTMLComment; > + aToken.mIdent.AssignLiteral("-->"); > + return PR_TRUE; > + } > + } > > You need to Pushback(nextChar) (or '-') in the case where the third char isn't > '>', i.e., after the } following the return. Good catch, made logic as follows: nextChar = Read() if (nextChar =='-' and LookAhead('>') return HTML Comment else if (isDigit(nextChar) or nextChar=='.' and IsDigit(Peek)) parsenumber negate number return PushBack(nextChar) if (IsIdentStart(nextChar)) return ParseIdent break; This minimizes PushBack > Please add a comment for ParseNumber that it handles '-' signs but not '+' > signs. Made ParseNumber to also not handle '-', see above, and less code for positive numbers > You changed the behavior of NextURL in the case of URLs that are invalid because of whitespace in the middle -- it used to read up to the closing ')', but now doesn't. Could you restore the old behavior unless there's a good reason for that change? (I think our parsing is probably buggy either way, but correct parser code is probably more likely to benefit from reading up to the closing ')'.) Actually in both the original code and the patched version, NextURL always eats the whitespace, upto the ')', with the ')' PushBack'd. Because: - if (LookAhead(aErrorCode, ')')) { - Pushback(')'); // leave the closing symbol is actually the same as: + if (Peek(aErrorCode) != ')') { > Your changes to the numeric escape parsing in ParseAndAppendEscape are wrong in at least the following ways (I didn't look too closely for other errors): > * they eat arbitrary amounts of whitespace at the end of an escape when instead precisely one whitespace character is supposed to be eaten (treating CR-LF as one character) > * they cause escapes terminated by EOF to be dropped (e.g., style="color: gree\6e") > * they remove the protection against insertion of null bytes Fixed to only read ony whitespace character, my wrong interpretation of the old code Fixed to not drop escapes terminated by EOF Fixed to not insert or pushback a null byte > + // If nothing in pushback, first try get as much as possible in one go > s/try get/try to get/ Fixed > + // See how much we can Eat and Append in one go > + PRInt32 walker = mOffset; > Eat and Append shouldn't be capitalized, and consume is probably better than eat > Also, walker is a bad variable name; a better term would be cursor, but variable names p or i would probably be better. > Your ParseString changes need the same fixes I described in GatherIdent. Fixed in both locations, used 'n' as the counter (as in n bytes to consume) > Your ParseNumber changes handle negative numbers with decimals incorrectly; they treat -7.1 as -6.9. They also introduce the possibility of rounding error accumulation; you should instead use the standard string to float conversion functions. I haven't reviewed the ParseNumber changes for any other problems. Fixed, do Integer parsing directly, but used ToFloat to do float conversion (from nsCAutoString to prevent malloc and strcpy). This way string to float conversion is only done when really needed.

Attachment #208958 - Attachment is obsolete: true

Attachment #210224 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 29

•

19 years ago

Comment on attachment 210223 [details] Interdiff between version 6 and 8 Not sure what this interdiff is between, but it's not between v6 and v8, since v6 didn't have FillBuffer at all.

Attachment #210223 - Attachment is obsolete: true

Attachment #210223 - Flags: review-

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 30

•

19 years ago

The replacement of EnsureBuffer with FillBuffer seems bad to me: if you're going to do the same test before every call to a function, it should be part of the function. If you want it outside for performance reasons (which I expect is why you made the change), then just make an inline function. So you should just add an inline function EnsureBuffer in nsCSSScanner.h: PRBool EnsureBuffer(nsresult &aErrorCode) { return mOffset < mCount || FillBuffer(aErrorCode); } and use that instead of the way you call FillBuffer?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 31

•

19 years ago

Attached patch accurate interdiff between V6 and v8 (obsolete) — Details — Splinter Review

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 32

•

19 years ago

Comment on attachment 210224 [details] [diff] [review] Patch v8: With comments from dbaron incorporated > #define nsCSSScanner_h___ > > #include "nsString.h" > #include "nsCOMPtr.h" > class nsIUnicharInputStream; > >+ > // XXX turn this off for minimo builds > #define CSS_REPORT_PARSE_ERRORS Please just leave this as it was.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Updated

•

19 years ago

Attachment #211339 - Attachment description: accurate interdiff between v6 and v6 → accurate interdiff between V6 and v8

Alfred Kayser

Comment 33

•

19 years ago

Attached patch V9: With the last two comments incorporated (obsolete) — Details — Splinter Review

Thanks for the review, I have incorporated the last two comments.

Attachment #210224 - Attachment is obsolete: true

Attachment #211483 - Flags: review?(dbaron)

Attachment #210224 - Flags: review?(dbaron)

Alfred Kayser

Comment 34

•

19 years ago

Attached patch V10: updated to current trunk (obsolete) — Details — Splinter Review

Attachment #211483 - Attachment is obsolete: true

Attachment #211483 - Flags: review?(dbaron)

Alfred Kayser

Comment 35

•

19 years ago

Comment on attachment 216822 [details] [diff] [review] V10: updated to current trunk dbaron, can you please review this patch before it is bitrotted again?

Attachment #216822 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 36

•

19 years ago

I'll try to get to it soon. But see comment 10.

Alfred Kayser

Comment 37

•

19 years ago

Yes, I know about trying to keep patches small, but this is what I feel to be a consistent patch. There are some other changes pending as well but I keep those for separate bugs.

Alfred Kayser

Comment 38

•

19 years ago

David, any chance of reviewing this patch? If not, who else can take this up?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 39

•

19 years ago

It's a *lot* of work to review this patch. It's on my list of things to do, but I have to do it carefully, given that: * you made a huge number of serious mistakes in the original patch -- a number that means your patches still need careful review and thus that you shouldn't be attaching large combined patches (see comment 10). Doing good optimization requires understanding when two things are equivalent, when they're not, and when the difference in behavior matters, which is something that review showed I can't yet trust you to do. (This means it might actually be easier for me to rewrite from scratch patches doing the same thing, from a list of the optimizations made, than to review yours.) * you've provided inaccurate diffs (see comment 29), so I can't trust you to provide such diffs and I have to make them myself, which becomes harder and harder as the patches get farther apart in time. So to be honest I'm not sure I'm ever going to get to it given that it's one huge patch, but it does need my review before it's checked in.

Alfred Kayser

Comment 40

•

19 years ago

Attached patch V11: Updated to current trunk and reduced to half (obsolete) — Details — Splinter Review

David, I can understand your position, given the history of this bug, but I do have some experience in submitting code (netwerk/cache, modules/libjar, modules/libpr0n). So, to help reviewing this one, I have: * updated the patch to current tree * reduced the size by only doing the following: * Code cleanup: + Moved the 'static' pieces from the class to the .cpp file itself + Made all references to gLexTable a direct reference + Replace the CheckLexTable with IsDigit, IsHexDigit, IsWhitespace * Ident gathering optimization: + EnsureBuffer/Read to make it easier for ParseString/Number/Ident to read directly from the buffer + Removed Unread and mLastRead (to make things simpler) + Made ParseString/GatherIdent to append as much as possible in one go directly from the read buffer. + Simplified ParseAndAppendEscape + ParseNumber: only gather the number in ident when we are parsing a float So, this at least removes a lot of other changes to the scanner logic, and only the GatherIdent/ParseString/ParseNumber/ParseAndAppendEscape functions are really changed.

Attachment #201658 - Attachment is obsolete: true

Attachment #211339 - Attachment is obsolete: true

Attachment #216822 - Attachment is obsolete: true

Attachment #230994 - Flags: review?(dbaron)

Attachment #216822 - Flags: review?(dbaron)

Markus Hübner

Updated

•

19 years ago

Blocks: 203448

Alfred Kayser

Updated

•

19 years ago

Attachment #230994 - Flags: review?(dbaron) → review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 41

•

19 years ago

Alfred, I'm pretty swamped in terms of reviews. I'm not likely to get to this for months, and it would still need sr from dbaron after that, I think...

Alfred Kayser

Updated

•

19 years ago

Attachment #230994 - Flags: review?(bzbarsky) → review?(dbaron)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 42

•

19 years ago

For what it's worth, I think the best way to get this reviewed would be to completely split apart cleanup and optimization into separate patches. That would certainly make me both able to get this sooner and happier to r+sr on my own. Especially for the cleanup patch. That said, I'd want to know the rationale for the cleanup. Why is using global statics in the file better than using class statics?

Alfred Kayser

Comment 43

•

19 years ago

Attached patch V12: Minimial changes to just optimize string and number parsing (obsolete) — Details — Splinter Review

Just do the GatherIdent, ParseNumber and ParseString optimization. Notes: * IsDigit and IsHexDigit added to class like IsIdentStart to keep class constants in place. Removed redundant 'GetLexTable'. * Removed 'gLexTable' definition in nsCSSSCanner.cpp * Removed unused BUFFER_SIZE define * Replaced Unread with Pushback to explicitely push back the right character, removing the need for mLastRead * Made '\r\n' filtering part of the Read function, so that other parsing doesn't have to bother about it.

Attachment #230994 - Attachment is obsolete: true

Attachment #246382 - Flags: review?(bzbarsky)

Attachment #230994 - Flags: review?(dbaron)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 44

•

19 years ago

> * Removed 'gLexTable' definition in nsCSSSCanner.cpp Why? It seems to me like it should still be needed. What tests have you run on this code? The first thing that jumped out at me when looking at ParseNumber is that when parsing something like "height: 100.px" the old code will tokenize it as "height:" followed by the valid-integer number token "100", followed by junk that causes the declaration to be thrown out. Your new code will get to the |+ if ((c == '.') && (IsDigit(c = Read(aErrorCode)))) {| line, set c to be 'p', test false. So mIntegerValid will remain true and since c >= 0 and 'p' is definitely an ident start, we'll parse this as a eCSSToken_Dimension with an integer of 100 and a unit of "px", which is wrong per CSS spec. Is there any conceivable way to split up the three remaining optimizations into three separate patches? Again, the smaller a patch, the easier it is to review it; review time grows _very_ nonlinearly with patch size...

Alfred Kayser

Comment 45

•

19 years ago

Attached patch V13: More minimizations (obsolete) — Details — Splinter Review

Fixed 100.px parsing. Added back gLexTable define. Removed some other changes.

Attachment #246382 - Attachment is obsolete: true

Attachment #246382 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 46

•

19 years ago

Again, it'd make me a lot happier to review this if I knew that every single change here has had a test written for it... Ideally, we'd land such tests with the patch itself. As things stand, to be happy about it I'd need to first write said tests, then test the patch, then review the code...

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Updated

•

18 years ago

QA Contact: ian → style-system

Eli Friedman

Comment 47

•

18 years ago

Attached patch Part 1: Scanning cleanups (obsolete) — Details — Splinter Review

This patch contains all the cleanups/adding the functionality necessary for optimizations. Pushback(ch) is equivalent to Unread() when the character in question was just read. This patch restructures Read to allow a separate EnsureData call, and simplifies the handling of newlines by centralizing it in Read.

Attachment #271617 - Flags: review?

Eli Friedman

Updated

•

18 years ago

Attachment #271617 - Flags: review? → review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 48

•

18 years ago

Same question as for Alfred: what sort of testing have you done?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 49

•

18 years ago

Eli, see comment 48.

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

18 years ago

Depends on: 392837

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

18 years ago

No longer depends on: 392837

Eli Friedman

Comment 50

•

18 years ago

Comment on attachment 271617 [details] [diff] [review] Part 1: Scanning cleanups Doing the cleanup in separate bugs.

Attachment #271617 - Attachment is obsolete: true

Attachment #271617 - Flags: review?(bzbarsky)

Alfred Kayser

Comment 51

•

18 years ago

Comment on attachment 246759 [details] [diff] [review] V13: More minimizations This patch is obsolete, now that little parts of it have been addressed in separate patches: 3.96 sharparrow1%yahoo.com 2007-08-25 19:20 Bug 393499: Clean up character lookup in nsCSSScanner. r=bzbarsky, sr+a=dbaron 3.95 sharparrow1%yahoo.com 2007-08-23 16:01 Bug 393286: Make nsCSSScanner::Read regularize newlines. r+sr=bzbarsky, a=dbaron 3.94 sharparrow1%yahoo.com 2007-08-22 10:32 Bug 393080: Make count and offset in nsCSSScanner unsigned. r+sr=bzbarsky, a=dbaron 3.93 sharparrow1%yahoo.com 2007-08-21 11:29 Bug 392840: Get rid of nsCSSScanner::Unread. r+sr=bzbarsky, r+a=dbaron 3.92 sharparrow1%yahoo.com 2007-08-19 20:39 Bug 387511: CSs scanner causes parse error for URLs starting with a codepoint > 255. r+sr=bzbarsky, a=dbaron

Attachment #246759 - Attachment is obsolete: true

Alfred Kayser

Comment 52

•

18 years ago

Attached patch Only do GatherIdent and ParseString (obsolete) — Details — Splinter Review

After the submits of the other bits, this parch is now really simple. More optimizations are: * ParseNumber (as in previous attached patches) * Use 'switch' in Next(): instead of large number of if statements. * NextURL: gather URL: Same principle as for GatherIdent, but also other cleanup possible. * Make aErrorCcode and aToken params part of nsCSSScanner class. * Kill EatNewline (only used in one location) * Simplify parameters to the Report error functions (=step3)

Assignee: dbaron → alfredkayser

Status: NEW → ASSIGNED

Attachment #278633 - Flags: review?(bzbarsky)

Eli Friedman

Comment 53

•

18 years ago

Comment on attachment 278633 [details] [diff] [review] Only do GatherIdent and ParseString >+ for (;n < mCount; n++) { >+ PRUnichar nextChar = mReadPointer[n]; >+ if ((nextChar == '\n') || (nextChar == '\r') || You also need to check for '\f'. I'd also suggest taking a look at the mochitest test_parse_rule.html, which would have caught this mistake. I wrote that test, which includes some tests from CSS testsuites plus some of my own tests. It's not really complete, but it's a start; I'd appreciate any additions to that test to make it more thorough. > PRInt32 ch = Read(aErrorCode); >- if (ch < 0) { >+ if (ch < 0 || ch == '\n') { >+ aToken.mType = eCSSToken_Error; >+#ifdef CSS_REPORT_PARSE_ERRORS >+ ReportUnexpectedToken(aToken, "SEUnterminatedString"); >+#endif Actually, according to the current WD, for EOF, we should just return PR_TRUE, and pretend that the string was in fact terminated. (I sent an email to www-style, and this interpretation was confirmed by Ian Hickson.)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 54

•

18 years ago

Comment on attachment 278633 [details] [diff] [review] Only do GatherIdent and ParseString As David and I have repeatedly asked: What tests have you run on this code? The fact that this patch fails one of our _existing_ regression tests makes me pretty suspicious off the bat. I would have to see fairly detailed tests for the functionality the patch is changing, including various corner cases, as part of any patch for this bug.

Attachment #278633 - Flags: review?(bzbarsky) → review-

Alfred Kayser

Comment 55

•

18 years ago

This what I get from the mochutest, even with the test for '\f' added as in comment #53 Toggle passed tests Toggle failed tests Passed: 118 Failed: 1 Todo: 4 ... ok - .a::before {content: 'This is \a'}#a::before {content: 'FAIL \ \a'} ok - #a:before {content: 'This is \a'} ok - .a::before {content: 'This is \a'}#a:: before {content: 'FAIL'} ok - .a::before {content: 'This is \a'}#a ::before {content: 'FAIL'} not ok - #a::before {content: 'This is \a got "\"This is \\A \"", expected "\"This is \\A \"" So, I am not sure what the result is telling me. The 'got' part is the same the 'expected'. So why is it failing. I am new with the Mochitest framework, and find it difficult to determine what really went wrong.

Eli Friedman

Comment 56

•

18 years ago

(In reply to comment #55) > So, I am not sure what the result is telling me. That test tests the unterminated string case; it's currently marked todo. If you fixed it in your tree, then you should remove the todo marking; in this file, the todo tests are listed at the bottom of each section of tests.

Alfred Kayser

Comment 57

•

18 years ago

Attached patch V2: Handle \f and do update the Mochi test (obsolete) — Details — Splinter Review

This patch handles \f correctly, and doesn't report an error for strings ending at EOF, and the corresponding test file has been updated. This patch now runs fine against the Mochi test and also ACID2 is fine.

Attachment #278633 - Attachment is obsolete: true

Attachment #279426 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 58

•

18 years ago

See comment 54 last sentence.

Alfred Kayser

Comment 59

•

18 years ago

See http://lxr.mozilla.org/mozilla/source/layout/style/test/test_parse_rule.html That test file already contains very specific tests for ident and string parsing: ".IdE6n-3t0_6, #a { color: green }", "#IdE6n-3t0_6, #a { color: green }", "._ident, #a { color: green }", "#_ident, #a { color: green }", ".-ident, .a { color: green; }", // Testsuite has incorrect version "#æ€€ident, .a { color: green }", "#idenæ€€tæ€€, .a { color: green }", "#\\6000ident, .a { color: green }", "#iden\\6000t\\6000, .a { color: green }", ".æ€€ident, .a { color: green }", ".idenæ€€tæ€€, .a { color: green }", ".\\6000ident, .a { color: green }", ".iden\\6000t\\6000, .a { color: green }", base + "#6ident, #a {color: red }", ".id4ent6, .a { color: green }", "#\\ident, .a { color: green; }", "#ide\\n\\t, .a { color: green; }", ".\\6ident, .a { color: green; }", ".\\--ident, .a { color: green; }", // Tests for strings and pseudos {base : base = ".a::before {content: 'This is \\a'}", tests : [ // CSS 2.1 section 4.1.3 "#a::before {content: 'This is \\a '}", "#a::before {content: 'This is \\A '}", "#a::before {content: 'This is \\0000a '}", "#a::before {content: 'This is \\00000a '}", "#a::before {content: 'This is \\\n\\00000a '}", "#a::before {content: 'This is \\\015\012\\00000a '}", "#a::before {content: 'This is \\\015\\00000a '}", "#a::before {content: 'This is \\\f\\00000a '}", "#a::before {content: 'This is\\20\f\\a'}", "#a::before {content: 'This is\\20\r\\a'}", "#a::before {content: 'This is\\20\n\\a'}", "#a::before {content: 'This is\\20\r\n\\a'}", base + "#a::before {content: 'FAIL \f\\a'}", base + "#a::before {content: 'FAIL \\\n\r\\a'}", "#a:before {content: 'This is \\a'}", "#a::before {content: 'This is \\a", base + "#a:: before {content: 'FAIL'}", base + "#a ::before {content: 'FAIL'}", "#a::before {content: 'This is \\a", ], prop: "content", pseudo: "::before" }, With the last patch, I could even move one 'todo' to 'ok' status.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 60

•

18 years ago

I doubt I'll be able to review this in any sort of reasonable timeframe. I'll see what I can do, but no promises. :(

Alfred Kayser

Updated

•

18 years ago

Attachment #279426 - Flags: review?(bzbarsky) → review?(dbaron)

Alfred Kayser

Updated

•

18 years ago

OS: Linux → All

Hardware: PC → All

Ryan VanderMeulen [:RyanVM]

Comment 61

•

18 years ago

Any news on this?

Reed Loden [:reed]

Comment 62

•

18 years ago

Comment on attachment 279426 [details] [diff] [review] V2: Handle \f and do update the Mochi test bz, do you have some time now to review this perf win patch?

Attachment #279426 - Flags: superreview?(bzbarsky)

Reed Loden [:reed]

Updated

•

18 years ago

Flags: blocking1.9?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 63

•

18 years ago

The only change from comment 60 is that I have less Mozilla time available now than I did then. It'll only get worse as time goes on, up through May.

Mike Schroepfer

Comment 64

•

18 years ago

(In reply to comment #63) > The only change from comment 60 is that I have less Mozilla time available now > than I did then. It'll only get worse as time goes on, up through May. > Can you recommend another reviewer?

Mike Schroepfer

Comment 65

•

18 years ago

(In reply to comment #64) > (In reply to comment #63) > > The only change from comment 60 is that I have less Mozilla time available now > > than I did then. It'll only get worse as time goes on, up through May. > > > > Can you recommend another reviewer? > Roc you want to take this?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 66

•

18 years ago

I think this should get review from me or Boris, and the work needed to review it is roughly equivalent to the work needed to write the patch (and tests) from scratch, given the prior issues with the patches in this bug.

Mike Schroepfer

Comment 67

•

18 years ago

(In reply to comment #66) > I think this should get review from me or Boris, and the work needed to review > it is roughly equivalent to the work needed to write the patch (and tests) from > scratch, given the prior issues with the patches in this bug. > Can I convince you to take the bug :-)?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 68

•

18 years ago

The reason for the attention seems to be the data in bug 406810, including http://www.allpeers.com/download/michal/new/callgraph/FF3/nsCSSScanner+GatherIdent.png

Mike Schroepfer

Comment 69

•

18 years ago

--> DBaron. Yell if you disagree :)

Assignee: alfredkayser → dbaron

Status: ASSIGNED → NEW

Flags: blocking1.9? → blocking1.9+

Priority: -- → P2

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Updated

•

18 years ago

Priority: P2 → P3

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 70

•

17 years ago

Comment on attachment 279426 [details] [diff] [review] V2: Handle \f and do update the Mochi test + n++; + for (;n < mCount; n++) { + mColNumber++; When using increments as a statement, prefer pre-increment over post-increment. It's conceptually simpler and more similar to other things (e.g., =, +=), and when you're working with C++ operator overloading can be computationally simpler as well, so it's a good habit. + // Count number of Ident characters that can be processed In ParseString, this comment is incorrect. I'd just remove "Ident ". + if (ch == '\n') { + aToken.mType = eCSSToken_Error; +#ifdef CSS_REPORT_PARSE_ERRORS + ReportUnexpectedToken(aToken, "SEUnterminatedString"); +#endif + return PR_FALSE; returning false here is incorrect. False means EOF. You should return PR_TRUE like the old code did. An example of what this would break is probably: p { color: green; foo: { "bar ; color: red; } } (It would make paragraphs red, whereas the color:red should be part of the declaration.) If I'm correct that this is a testcase for what you broke, you should add this to the parsing testcases; if not, you should figure out a testcase. + if (ch < 0 || ch == aStop) break; Follow local style and keep the braces around the break statement. # +"#a::before {content: 'This is \\a", # # base + "#a:: before {content: 'FAIL'}", # base + "#a ::before {content: 'FAIL'}", # "#a::before {content: 'This is \\a", The line you're adding is already at the bottom of the quoted 5 lines, no need to add it again. The todo list is a list of items already listed elsewhere; just remove from the todo and don't add. You'll get r+sr=dbaron if you fix *exactly* these things and don't change anything else.

Attachment #279426 - Flags: superreview?(bzbarsky)

Attachment #279426 - Flags: superreview-

Attachment #279426 - Flags: review?(dbaron)

Attachment #279426 - Flags: review-

Alfred Kayser

Comment 71

•

17 years ago

Attached patch V3: Incorporate exactly the comments from David Baron (obsolete) — Details — Splinter Review

Performed and succeeded in running the Mochitests with this patch applied.

Attachment #279426 - Attachment is obsolete: true

Attachment #304183 - Flags: superreview?(dbaron)

Reed Loden [:reed]

Updated

•

17 years ago

Attachment #304183 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 72

•

17 years ago

You missed the following items from my previous review: 1: > + n++; 2: > + mColNumber++; 3: > If I'm correct that this is a testcase for what you broke, you should > add this to the parsing testcases; if not, you should figure out a > testcase. You also introduced two new errors: 4. You added: + if (ch < 0) { + return PR_FALSE; + } Whereas it should have stayed a |break;|, so that GatherIdent returns true, since there is an identifier to return. (It may be the last thing in the file, but the file's not over until after it's been parsed.) Please add a test for this as well. 5. You added a trailing comma: -], prop: "content", pseudo: "::before" +], prop: "content", pseudo: "::before", I'm not sure whether or not this breaks the test, although I suspect it does.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Updated

•

17 years ago

Attachment #304183 - Flags: superreview?(dbaron)

Attachment #304183 - Flags: superreview-

Attachment #304183 - Flags: review?(dbaron)

Attachment #304183 - Flags: review-

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 73

•

17 years ago

(In reply to comment #72) > Please add a test for this as well. Then again, I'd think we already have plenty of tests testing style="color:green", so it probably breaks some existing tests. If there's already one that it breaks, no need to add another.

Alfred Kayser

Comment 74

•

17 years ago

Attached patch V4: Addressed all comments of comment #72 (obsolete) — Details — Splinter Review

1: > + n++; Done. 2: > + mColNumber++; Done. 3: > If I'm correct that this is a testcase for what you broke, you should > add this to the parsing testcases; if not, you should figure out a > testcase. No testcase added, as the existing cases already test all combinations of string termination. 4. You added: + if (ch < 0) { + return PR_FALSE; + } Done. GatherIdent returns true, even when identifier is terminated by EOF. > Please add a test for this as well. Existing testcases do cover this already. 5. You added a trailing comma: Removed.

Attachment #304183 - Attachment is obsolete: true

Attachment #304433 - Flags: superreview?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 75

•

17 years ago

Comment on attachment 304433 [details] [diff] [review] V4: Addressed all comments of comment #72 Sorry, my point number 4 wasn't clear -- it should have referred to ParseString rather than GatherIdent -- and it was code that you were putting back (i.e., the diff was smaller, but it showed up in the interdiff) -- that's what I get for reading interdiffs. In any case, it's an error in either place. Putting the old code to return 0 back removed your fix for the test you're unmarking todo, so the tests fail with this version of the patch. That change should remain as it was in patch V2 so that the test failure fix is actually fixed. Could I ask why I should believe you when you say that other tests already cover something if you haven't actually run the tests? Next time, please actually run the tests before posting a new patch. The change from "return PR_TRUE;" to "break;" between V3 and V4 is OK, but it was yet *more* for me to review.

Attachment #304433 - Flags: superreview?(dbaron)

Attachment #304433 - Flags: superreview-

Attachment #304433 - Flags: review-

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 76

•

17 years ago

By the way, I understand that what I say may not be perfectly clear: I make typos, mistakes, and am sometimes unclear despite those. But please don't pretend you've addressed comments if you don't understand what I'm asking. Ask me what I meant.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 77

•

17 years ago

(In reply to comment #74) > 3: > If I'm correct that this is a testcase for what you broke, you should > > add this to the parsing testcases; if not, you should figure out a > > testcase. > No testcase added, as the existing cases already test all combinations of > string termination. I reintroduced the mistake, and none of the existing testcases failed. This isn't surprising, since they mostly test parsing of just one declaration.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 78

•

17 years ago

Attached patch V5: address comments — Details — Splinter Review

This addresses the remaining comments; I'll land it shortly. I'm tired of iterating here.

Attachment #304433 - Attachment is obsolete: true

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 79

•

17 years ago

Checked in to trunk. Unfortunately I think I'm going to have to mark this bug fixed at this point even if it's not fully fixed. The main problem Boris pointed out is fixed, but I think there are further things we could do to reduce string appending; I filed bug 418952.

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 80

•

17 years ago

Didn't seem to have any noticeable effect on the talos Ts numbers.

Michal Novotny [:michal]

Comment 81

•

17 years ago

(In reply to comment #80) > Didn't seem to have any noticeable effect on the talos Ts numbers. > That's strange. I just tried the effect of this patch because it is similar to bug #121341 where the startup was then faster. Here are the numbers from my linux box: without patch #304872 startup 572ms valgrind Ir: GatherIdent 16121515 ParseString 7846740 with patch #304872 startup 566ms valgrind Ir: GatherIdent 7328073 ParseString 1460635

You need to log in before you can comment on or make changes to this bug.