Closed
Bug 98214
Opened 23 years ago
Closed 23 years ago
Avoid unnecessary string copy in xml parser
Categories
(Core :: XML, defect, P3)
Tracking
()
VERIFIED
FIXED
mozilla0.9.6
People
(Reporter: harishd, Assigned: hjtoi-bugzilla)
Details
(Keywords: perf, Whiteboard: [fixinhand])
Attachments
(1 file)
14.41 KB,
patch
|
harishd
:
review+
vidur
:
superreview+
|
Details | Diff | Splinter Review |
Sample quantify run starting up with about:blank. Function: Tokenizer_HandleExternalEntityRef Calls: 33 Function time: 8.48 usec (0.00% of Focus) F+D time: 283,385.23 usec (0.02% of Focus) Avg F time: 0.26 usec Min F time: 0.26 usec Max F time: 0.26 usec Module: y:\mozilla\dist\WIN32_O.OBJ\bin\components\gkparser.dll Source File: y:\mozilla\htmlparser\src\nsExpatTokenizer.cpp Measurement: Line Hidden functions: (None)
Assignee | ||
Comment 1•23 years ago
|
||
Assignee | ||
Comment 2•23 years ago
|
||
Unfortunately the patch does not avoid the copy, it just moves it to somewhere else. However, I believe this has the advange that if somebody some day figures a way to do this without a copy they can do that easily. There is a real performance improvement in the patch though, regarding how we parse XML declaration and get the encoding. The old code needs to do string copy & allocation almost every time (we end up here 22 times in startup, doing allocation 18 times of those). This is 0.00% of startup focus according to Purify, but the new code takes just a fraction of the time the old code does. It is also more correct. Finally there are some cleanups. Reviews?
Status: NEW → ASSIGNED
Priority: -- → P3
Whiteboard: [fixinhand]
Target Milestone: --- → mozilla0.9.6
Comment on attachment 53122 [details] [diff] [review] Cleanup & speedup Needs more commenting. Especially places where you look for 'g' & 'n'.
Attachment #53122 -
Flags: review+
Assignee | ||
Comment 4•23 years ago
|
||
Ok, I added this for 'n' and 'g': // Want to avoid string comparisons, hence looking for 'g' // and only if found check the string leading to it. Not // foolproof, but fast. I see it extremely unlikely that someone would give us something like: <?xml fooversion"1" barencoding"UTF-8"?> which would pass this detection as "valid". All this happens before we pass the data to Expat, we only use this routine to convert the data we see to format that we want to feed Expat. If Expat barfed on the XML declaration (it does not since it is not a validating parser) the user would get an error page explaining that the XML file is not correct. We also use the knowledge of earliest possible offset into our advantage in the routine, and this is explained by saying something like: // The shortest string so far (strlen==5): // <?xml after which we start the loop at position 6 and so forth.
Comment 5•23 years ago
|
||
Comment on attachment 53122 [details] [diff] [review] Cleanup & speedup sr=vidur Not a big deal, but nsMemory::Free() is preferred over nsCRT::free(). Also, could you not just use an nsXPIDLString for the filename in the nsExpatTokenizer part of the patch? For future reference, there's a local class in nsParser.cpp called CWordTokenizer that finds word boundaries and could probably be used to simplify the logic of the XML decl parsing code that you have.
Attachment #53122 -
Flags: superreview+
Assignee | ||
Comment 6•23 years ago
|
||
I changed to nsXPIDLCString. I tried to simplify the obfuscated code by using CWordTokenizer<char> but became blocked by bug 104479. Marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•