Open Bug 489822 Opened 15 years ago Updated 2 years ago

[HTML5] Read input once and write to tree builder directly

Categories

(Core :: DOM: HTML Parser, enhancement, P5)

Other Branch
enhancement

Tracking

()

People

(Reporter: hsivonen, Unassigned)

References

(Depends on 1 open bug)

Details

As an artifact of SAX, the HTML5 tokenizer tries to batch its character data transfers from the input buffer into the tree builder accumulation buffer by flushing runs of characters via memcpy.

However, the tokenizer will have read all those characters once by then. If the write operation to the accumulation buffer on a per character basis were more efficient than the per-character amortized memcpy (re-)read&write, it would be worthwhile to write characters one by one into the accumulation buffer.

This means the tokenizer/treebuilder boundary can't become fully virtual for sanitization layers or such, since the per character write should be inlineable.

This would probably require bug 489820 and bug 489821 as prerequisites.
Priority: -- → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.