Closed Bug 235500 Opened 21 years ago Closed 9 years ago

improve nsIParserNode interface to reduce string copies

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: darin.moz, Unassigned)

Details

(Keywords: perf, Whiteboard: [WONTFIX?])

nsIParserNode::GetText implies that the nsIParserNode has a nsAString member that it can return. This requires the node implementation allocate a buffer for this data. We should consider changing this API so any copying or buffer allocations can be deferred. We have many options. I'm not sure what the best solution is yet.
There's a couple of things today that happen during parsing on the road from scanner buffer to nsTextFragment in the resulting nsTextNode... 1. The scanner looks at every character in a piece of text, looking for the end of the text fragment. 2. The data is copied from the shared scanner buffer in the parse node (token) into a intermediate buffer in the html sink (SinkContext::mText). During this copy newlines are normalized (using nsContentUtils::CopyNewlineNormalizedUnicodeTo()). 3. The data is copied from the intermediate buffer into the text node (malloced buffer, except for single \n text fragments). Before the data is copied, we walk the string checking if it is ASCII or not, if it is, we convert to ASCII while copying to the final buffer, if not, we just copy. It'd be good to eliminate some of this string-walking and string copying if at all possible. Ideally the scanner, which already looks at every character in a text fragment, would check if the data is ASCII or not and store that along with the data, and the scanner could also keep track of what's needed to know how long the string will be after the newline normalization has been performed. That way, we could have an API on nsIParserNode that the sink could use to check if a parser node contains ASCII-only data, or unicode, and the sink could create a text node, tell it to pre-allocate storage, enough to fit the whole string, after newlines have been normalized. And with an API on nsIParserNode to copy its data into a pre-allocated buffer, we'd be all set. No extra copies, no extra string-walking to normalize newlines before copying, no more SinkContext::mText, or any of that. Does this sound at all reasonable? This is related to bug 235255, btw.
Assignee: parser → nobody
QA Contact: parser
This bug seems moot now.
Whiteboard: [WONTFIX?]
(In reply to Henri Sivonen (:hsivonen) from comment #2) > This bug seems moot now. Is the same true for bug 17191 and bug 235255 ?
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #3) > (In reply to Henri Sivonen (:hsivonen) from comment #2) > > This bug seems moot now. > > Is the same true for bug 17191 Possibly, but, if so, for reasons independent of the parser. > and bug 235255 ? Likely yes. Resolving this as WFM, since the code this bug was reported about is no longer relevant.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.