Closed
Bug 235500
Opened 21 years ago
Closed 9 years ago
improve nsIParserNode interface to reduce string copies
Categories
(Core :: DOM: HTML Parser, defect)
Core
DOM: HTML Parser
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: darin.moz, Unassigned)
Details
(Keywords: perf, Whiteboard: [WONTFIX?])
nsIParserNode::GetText implies that the nsIParserNode has a nsAString member
that it can return. This requires the node implementation allocate a buffer for
this data. We should consider changing this API so any copying or buffer
allocations can be deferred. We have many options. I'm not sure what the best
solution is yet.
Comment 1•21 years ago
|
||
There's a couple of things today that happen during parsing on the road from
scanner buffer to nsTextFragment in the resulting nsTextNode...
1. The scanner looks at every character in a piece of text, looking for the end
of the text fragment.
2. The data is copied from the shared scanner buffer in the parse node (token)
into a intermediate buffer in the html sink (SinkContext::mText). During this
copy newlines are normalized (using
nsContentUtils::CopyNewlineNormalizedUnicodeTo()).
3. The data is copied from the intermediate buffer into the text node (malloced
buffer, except for single \n text fragments). Before the data is copied, we walk
the string checking if it is ASCII or not, if it is, we convert to ASCII while
copying to the final buffer, if not, we just copy.
It'd be good to eliminate some of this string-walking and string copying if at
all possible. Ideally the scanner, which already looks at every character in a
text fragment, would check if the data is ASCII or not and store that along with
the data, and the scanner could also keep track of what's needed to know how
long the string will be after the newline normalization has been performed. That
way, we could have an API on nsIParserNode that the sink could use to check if a
parser node contains ASCII-only data, or unicode, and the sink could create a
text node, tell it to pre-allocate storage, enough to fit the whole string,
after newlines have been normalized. And with an API on nsIParserNode to copy
its data into a pre-allocated buffer, we'd be all set. No extra copies, no extra
string-walking to normalize newlines before copying, no more SinkContext::mText,
or any of that.
Does this sound at all reasonable?
This is related to bug 235255, btw.
Updated•16 years ago
|
Assignee: parser → nobody
QA Contact: parser
Comment 3•9 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #2)
> This bug seems moot now.
Is the same true for bug 17191 and bug 235255 ?
Comment 4•9 years ago
|
||
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #3)
> (In reply to Henri Sivonen (:hsivonen) from comment #2)
> > This bug seems moot now.
>
> Is the same true for bug 17191
Possibly, but, if so, for reasons independent of the parser.
> and bug 235255 ?
Likely yes.
Resolving this as WFM, since the code this bug was reported about is no longer relevant.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•