[HTML5] Read input once and write to tree builder directly

NEW
Unassigned

Status

()

Core
HTML: Parser
P5
enhancement
9 years ago
8 years ago

People

(Reporter: hsivonen, Unassigned)

Tracking

(Depends on: 1 bug)

Other Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

9 years ago
As an artifact of SAX, the HTML5 tokenizer tries to batch its character data transfers from the input buffer into the tree builder accumulation buffer by flushing runs of characters via memcpy.

However, the tokenizer will have read all those characters once by then. If the write operation to the accumulation buffer on a per character basis were more efficient than the per-character amortized memcpy (re-)read&write, it would be worthwhile to write characters one by one into the accumulation buffer.

This means the tokenizer/treebuilder boundary can't become fully virtual for sanitization layers or such, since the per character write should be inlineable.

This would probably require bug 489820 and bug 489821 as prerequisites.
(Reporter)

Updated

8 years ago
Priority: -- → P5
You need to log in before you can comment on or make changes to this bug.