Closed Bug 98214 Opened 23 years ago Closed 23 years ago

Avoid unnecessary string copy in xml parser

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla0.9.6

People

(Reporter: harishd, Assigned: hjtoi-bugzilla)

Details

(Keywords: perf, Whiteboard: [fixinhand])

Attachments

(1 file)

Cleanup & speedup 23 years ago Heikki Toivonen (remove -bugzilla when emailing directly) 14.41 KB, patch	harishd : review+ vidur : superreview+	Details \| Diff \| Splinter Review

harishd

Reporter

Description

•

23 years ago

Sample quantify run starting up with about:blank.

Function:
Tokenizer_HandleExternalEntityRef
Calls:
33
Function time:	8.48 usec (0.00% of Focus)
F+D time:	283,385.23 usec (0.02% of Focus)
Avg F time:	0.26 usec
Min F time:	0.26 usec
Max F time:	0.26 usec
Module:
y:\mozilla\dist\WIN32_O.OBJ\bin\components\gkparser.dll
Source File:	y:\mozilla\htmlparser\src\nsExpatTokenizer.cpp
Measurement:
Line
Hidden functions:	(None)

harishd

Reporter

Updated

•

23 years ago

Keywords: perf

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 1

•

23 years ago

Attached patch Cleanup & speedup — Details — Splinter Review

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 2

•

23 years ago

Unfortunately the patch does not avoid the copy, it just moves it to somewhere
else. However, I believe this has the advange that if somebody some day figures
a way to do this without a copy they can do that easily.

There is a real performance improvement in the patch though, regarding how we
parse XML declaration and get the encoding. The old code needs to do string copy
& allocation almost every time (we end up here 22 times in startup, doing
allocation 18 times of those). This is 0.00% of startup focus according to
Purify, but the new code takes just a fraction of the time the old code does. It
is also more correct.

Finally there are some cleanups. Reviews?

Status: NEW → ASSIGNED

Priority: -- → P3

Whiteboard: [fixinhand]

Target Milestone: --- → mozilla0.9.6

harishd

Reporter

Comment 3

•

23 years ago

Comment on attachment 53122 [details] [diff] [review]
Cleanup & speedup

Needs more commenting. Especially places where you
look for 'g' & 'n'.

Attachment #53122 - Flags: review+

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 4

•

23 years ago

Ok, I added this for 'n' and 'g':

// Want to avoid string comparisons, hence looking for 'g'
// and only if found check the string leading to it. Not
// foolproof, but fast.

I see it extremely unlikely that someone would give us something like:

<?xml fooversion"1" barencoding"UTF-8"?>

which would pass this detection as "valid". All this happens before we pass the
data to Expat, we only use this routine to convert the data we see to format
that we want to feed Expat. If Expat barfed on the XML declaration (it does not
since it is not a validating parser) the user would get an error page explaining
that the XML file is not correct.

We also use the knowledge of earliest possible offset into our advantage in the
routine, and this is explained by saying something like:

// The shortest string so far (strlen==5):
// <?xml

after which we start the loop at position 6 and so forth.

vidur (gone)

Comment 5

•

23 years ago

Comment on attachment 53122 [details] [diff] [review]
Cleanup & speedup

sr=vidur

Not a big deal, but nsMemory::Free() is preferred over nsCRT::free(). Also, could you not just use an nsXPIDLString for the filename in the nsExpatTokenizer part of the patch?

For future reference, there's a local class in nsParser.cpp called CWordTokenizer that finds word boundaries and could probably be used to simplify the logic of the XML decl parsing code that you have.

Attachment #53122 - Flags: superreview+

Heikki Toivonen (remove -bugzilla when emailing directly)

Assignee

Comment 6

•

23 years ago

I changed to nsXPIDLCString.

I tried to simplify the obfuscated code by using CWordTokenizer<char> but became
blocked by bug 104479.

Marking fixed.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Chris Petersen

Comment 7

•

23 years ago

Marking verified on the Oct 22nd builds.

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Avoid unnecessary string copy in xml parser

Categories

(Core :: XML, defect, P3)

Tracking

()

People

(Reporter: harishd, Assigned: hjtoi-bugzilla)

References

Details

(Keywords: perf, Whiteboard: [fixinhand])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Attachment

General

Description

File Name

Content Type