Open Bug 368015 Opened 18 years ago Updated 2 years ago

nsScannerBufferList::AllocBuffer shows up in profiles

Categories

(Core :: XML, defect)

x86
Linux
defect

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Well, more precisely its calling malloc() does. Are buffer length distributions such that we could use some sort of arena allocation here?
Blocks: 350228
(In reply to comment #0) > Are buffer length distributions such that we could use some sort of arena > allocation here? I just took a quick look; the answer is yes, it is. For the testcase in bug 350228, the buffer allocated was almost always exactly the same size (a few bytes over 8KB). Actually, I think it's almost always less than or equal to that size (at least given the encoding of the page), so the buffer could probably be recycled.
Hmm. That's a problem right there -- allocating somethign that's a tad over a multiple of the page size is silly. Why are we doing that?
(In reply to comment #2) > Hmm. That's a problem right there -- allocating somethign that's a tad over a > multiple of the page size is silly. Why are we doing that? The reason is that the data comes out of the input stream in 4KB chunks. Therefore, the data (which I think is assumed to be ISO-8859-1) will decode to 4096 characters. That plus a null terminator and the overhead of a buffer object adds up to a little over 8KB. We could possibly improve the situation for ISO-8859-1 by fixing things not to require the null terminator and having the buffer object point to the data, but that immediately breaks down for UTF-8, which requires a buffer one byte bigger. If the page sizes really make a difference, it might be worth some extra calls into nsIUnicodeDecoder::Convert to put the data into page-sized buffers rather than decoding the full 4KB chunks all at once. That would also make recycling much easier.
That wouldn't do quite enough, since we want a bit less than 4KB of data (because we want the malloc overhead plus data size to be about 4KB). I guess we should try seeing whether we can arena-allocate this stuff, and see..
(In reply to comment #4) > That wouldn't do quite enough, since we want a bit less than 4KB of data > (because we want the malloc overhead plus data size to be about 4KB). Okay, I've looked at the code a bit more carefully (and come up with a code cleanup patch that makes nsScanner a lot simpler/clearer; but that's for a different bug). The way the flow works is that first the parser gets notified about 64KB of content (usually, although not always; I'm not completely sure why). Then, in ParserWriteFunc, all of this gets appended to the scanner in 4KB chunks. Then, the parser resumes, and parses the data which just got converted (128KB, stored in the malloc'ed chunks slightly over 8KB.) The memory allocated get freed as the data gets parsed. Arena allocation would probably help; see Bug 105138.
(In reply to comment #5) > Arena allocation would probably help; see Bug 105138. I meant Bug 116023.
(In reply to comment #5) > Okay, I've looked at the code a bit more carefully (and come up with a code > cleanup patch that makes nsScanner a lot simpler/clearer; but that's for a > different bug). I'll gladly review this!
Assignee: mrbkap → nobody
Component: HTML: Parser → XML
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.