Sample quantify run starting up with about:blank. Function: Tokenizer_HandleExternalEntityRef Calls: 33 Function time: 8.48 usec (0.00% of Focus) F+D time: 283,385.23 usec (0.02% of Focus) Avg F time: 0.26 usec Min F time: 0.26 usec Max F time: 0.26 usec Module: y:\mozilla\dist\WIN32_O.OBJ\bin\components\gkparser.dll Source File: y:\mozilla\htmlparser\src\nsExpatTokenizer.cpp Measurement: Line Hidden functions: (None)
Unfortunately the patch does not avoid the copy, it just moves it to somewhere else. However, I believe this has the advange that if somebody some day figures a way to do this without a copy they can do that easily. There is a real performance improvement in the patch though, regarding how we parse XML declaration and get the encoding. The old code needs to do string copy & allocation almost every time (we end up here 22 times in startup, doing allocation 18 times of those). This is 0.00% of startup focus according to Purify, but the new code takes just a fraction of the time the old code does. It is also more correct. Finally there are some cleanups. Reviews?
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.6
Comment on attachment 53122 [details] [diff] [review] Cleanup & speedup Needs more commenting. Especially places where you look for 'g' & 'n'.
Attachment #53122 - Flags: review+
Ok, I added this for 'n' and 'g': // Want to avoid string comparisons, hence looking for 'g' // and only if found check the string leading to it. Not // foolproof, but fast. I see it extremely unlikely that someone would give us something like: <?xml fooversion"1" barencoding"UTF-8"?> which would pass this detection as "valid". All this happens before we pass the data to Expat, we only use this routine to convert the data we see to format that we want to feed Expat. If Expat barfed on the XML declaration (it does not since it is not a validating parser) the user would get an error page explaining that the XML file is not correct. We also use the knowledge of earliest possible offset into our advantage in the routine, and this is explained by saying something like: // The shortest string so far (strlen==5): // <?xml after which we start the loop at position 6 and so forth.
Comment on attachment 53122 [details] [diff] [review] Cleanup & speedup sr=vidur Not a big deal, but nsMemory::Free() is preferred over nsCRT::free(). Also, could you not just use an nsXPIDLString for the filename in the nsExpatTokenizer part of the patch? For future reference, there's a local class in nsParser.cpp called CWordTokenizer that finds word boundaries and could probably be used to simplify the logic of the XML decl parsing code that you have.
Attachment #53122 - Flags: superreview+
I changed to nsXPIDLCString. I tried to simplify the obfuscated code by using CWordTokenizer<char> but became blocked by bug 104479. Marking fixed.
Status: ASSIGNED → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED
Marking verified on the Oct 22nd builds.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.