Closed Bug 489820 Opened 16 years ago Closed 9 years ago

[HTML5] Avoid repeated conditionals upon buffer append by sizing buffers for worst case

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla43

Tracking Flags:

Tracking

Status

firefox43

---

fixed

People

(Reporter: hsivonen, Assigned: hsivonen)

References

(Blocks 3 open bugs)

Details

Attachments

(4 files, 17 obsolete files)

htmlparser repo Part 1: Consolidate strBuf and longStrBuf (comments fixed in a follow-up) 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 109.79 KB, patch		Details \| Diff \| Splinter Review
m-c part 1: Consolidate strBuf and longStrBuf (comments fixed in a follow-up) 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 178.21 KB, patch		Details \| Diff \| Splinter Review
m-c part 2: Make charRefBuf non-growable 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 9.07 KB, patch		Details \| Diff \| Splinter Review
htmlparser repo part 2: Make charRefBuf non-growable 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 5.43 KB, patch		Details \| Diff \| Splinter Review
htmlparser repo part 3: Size the buffers to worst case before each tokenization call 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 14.69 KB, patch		Details \| Diff \| Splinter Review
m-c part 3: Size the buffers to worst case before each tokenization call 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 33.42 KB, patch		Details \| Diff \| Splinter Review
m-c part 3: Size the buffers to worst case before each tokenization call, v2 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 33.45 KB, patch		Details \| Diff \| Splinter Review
m-c part 3: Size the buffers to worst case before each tokenization call, v3 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 34.26 KB, patch		Details \| Diff \| Splinter Review
htmlparser repo Part 1: Consolidate strBuf and longStrBuf (comments fixed in a follow-up), v2 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 110.85 KB, patch		Details \| Diff \| Splinter Review
m-c part 1: Consolidate strBuf and longStrBuf (comments fixed in a follow-up), v2 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 180.01 KB, patch		Details \| Diff \| Splinter Review
htmlparser repo part 1: Make charRefBuf non-growable 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 5.43 KB, patch	wchen : review+	Details \| Diff \| Splinter Review
m-c part 1: Make charRefBuf non-growable 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 9.07 KB, patch	wchen : review+	Details \| Diff \| Splinter Review
htmlparser repo part 2: Size the buffers to worst case before each tokenization call 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 14.72 KB, patch	wchen : review-	Details \| Diff \| Splinter Review
m-c part 2: Size the buffers to worst case before each tokenization call 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 34.30 KB, patch		Details \| Diff \| Splinter Review
m-c part 2: Size the buffers to worst case before each tokenization call, vN 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 34.25 KB, patch	wchen : review-	Details \| Diff \| Splinter Review
m-c part 3 - undo part of part 2 to avoid crashing on web-platform-tests-2 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 3.90 KB, patch	wchen : review-	Details \| Diff \| Splinter Review
htmlparser repo part 3 - undo part of part 2 to avoid crashing on web-platform-tests-2 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 1.21 KB, patch		Details \| Diff \| Splinter Review
htmlparser repo part 1: Make charRefBuf non-growable, with r+ in comment 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 5.42 KB, patch	hsivonen : review+	Details \| Diff \| Splinter Review
m-c part 1: Make charRefBuf non-growable, with updated hg comment 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 9.05 KB, patch	hsivonen : review+	Details \| Diff \| Splinter Review
htmlparser repo part 2: Size the buffers to worst case before each tokenization call, revised 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 15.00 KB, patch	wchen : review+	Details \| Diff \| Splinter Review
m-c part 2: Size the buffers to worst case before each tokenization call, revised 9 years ago Henri Sivonen (:hsivonen) (temporarily away from Bugzilla) 34.99 KB, patch	wchen : review+	Details \| Diff \| Splinter Review

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Description

•

16 years ago

Various buffers are grown dynamically on a per-append basis. This involves a buffer available space check on each append. A given call into to tokenizer cannot cause buffers to grow by more than the length of the input buffer. Thus, it would be possible to use more RAM and run less conditionals by reserving buffer space for current size + length of tokenizer input buffer.

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

16 years ago

Blocks: 489822

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

15 years ago

Priority: -- → P4

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

15 years ago

Assignee: nobody → hsivonen

Status: NEW → ASSIGNED

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 1

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/a0f0fde99844

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 2

•

15 years ago

The fix here was broken. See bug 555462. It was broken, because it didn't take into account that there can be appendable data baked into the current tokenizer state (either a failed character reference in strBuf or something like a single '<' as part of the state itself.) Instead of trying to change things to be more clever here, I think the right fix for now is to back this out instead of digging the hole deeper. Then, if we really want this later, this should be re-tried as a well-isolated landing.

Depends on: 555462

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 3

•

15 years ago

Note to self: This solution needs special casing for U+0000 to U+FFFD mapping if the tokenizer is ever changed to operate on UTF-8 instead of UTF-16.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

13 years ago

Blocks: 695313

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

13 years ago

Blocks: 569328

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

10 years ago

Blocks: 1027585

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

9 years ago

Blocks: 1029671

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

9 years ago

Depends on: 1173818

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 4

•

9 years ago

Attached patch htmlparser repo Part 1: Consolidate strBuf and longStrBuf (comments fixed in a follow-up) (obsolete) — Details — Splinter Review

Attachment #8625260 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 5

•

9 years ago

Attached patch m-c part 1: Consolidate strBuf and longStrBuf (comments fixed in a follow-up) (obsolete) — Details — Splinter Review

Attachment #8625261 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 6

•

9 years ago

The patch for bug 1176668 goes between part 1 and part 2 here, because I thought part 1 here had broken NCR handling. By the time I realized I had found an ancient bug, the patches were already entangled enough that it's not worthwhile to disentangle them.

Status: REOPENED → ASSIGNED

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 7

•

9 years ago

Attached patch m-c part 2: Make charRefBuf non-growable (obsolete) — Details — Splinter Review

Attachment #8625268 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 8

•

9 years ago

Attached patch htmlparser repo part 2: Make charRefBuf non-growable (obsolete) — Details — Splinter Review

Attachment #8625269 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 9

•

9 years ago

Attached patch htmlparser repo part 3: Size the buffers to worst case before each tokenization call (obsolete) — Details — Splinter Review

Attachment #8625276 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 10

•

9 years ago

Attached patch m-c part 3: Size the buffers to worst case before each tokenization call (obsolete) — Details — Splinter Review

Attachment #8625277 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

9 years ago

Blocks: 1176681

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 11

•

9 years ago

Comment on attachment 8625277 [details] [diff] [review] m-c part 3: Size the buffers to worst case before each tokenization call Oops. Gecko-specific code in part 3 is still wrong.

Attachment #8625277 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 12

•

9 years ago

Attached patch m-c part 3: Size the buffers to worst case before each tokenization call, v2 (obsolete) — Details — Splinter Review

Attachment #8625277 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 13

•

9 years ago

So the key insight here is that (when the input to the HTML parser is UTF-16 and the output is UTF-16 as is the case in Gecko) the HTML parsing algorithm never expands strings. It either keeps the length the same or contracts them. Specifically, various markup does not end up in subsequent buffers at all and a single character reference is always at least 4 UTF-16 code units long in the input and always at most two UTF-16 code units in the output. U+0000 can turn into U+FFFD, but the two are both equally long (one code unit) in UTF-16. Part 3 of the patch set still crashes, because I forgot about View Source. Oops...

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 14

•

9 years ago

It's a bit unfortunate that this makes tokenizeBuffer() a footgun in the sense that you must consistently call another method right before or memory-unsafety ensues, but this solution seems to be the most sensible one given the existing code written to constraints that assumed that the allocator subsystem would take care of stopping the parser when needed and the current reality of that part of the "infallible" allocator plan having gotten dropped.

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 15

•

9 years ago

Also: I have noticed the opportunity to save memory by unifying the buffer in the tokenizer and the buffer in the tree builder. However, this patch queue is risky enough as-is and there's no way I'd get the unification done before I go out of office for 3 weeks, so I'm leaving the unification as a follow-up.

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 16

•

9 years ago

Attached patch m-c part 3: Size the buffers to worst case before each tokenization call, v3 (obsolete) — Details — Splinter Review

Attachment #8625326 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 17

•

9 years ago

Attached patch htmlparser repo Part 1: Consolidate strBuf and longStrBuf (comments fixed in a follow-up), v2 (obsolete) — Details — Splinter Review

Attachment #8625260 - Attachment is obsolete: true

Attachment #8625260 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 18

•

9 years ago

Attached patch m-c part 1: Consolidate strBuf and longStrBuf (comments fixed in a follow-up), v2 (obsolete) — Details — Splinter Review

Attachment #8625261 - Attachment is obsolete: true

Attachment #8625261 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 19

•

9 years ago

Comment on attachment 8625679 [details] [diff] [review] htmlparser repo Part 1: Consolidate strBuf and longStrBuf (comments fixed in a follow-up), v2 Moving the buffer consolidation to bug 559303 where it belongs to make it easier to explain the order in which this patch queue needs to land.

Attachment #8625679 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

9 years ago

Attachment #8625680 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 20

•

9 years ago

Attached patch htmlparser repo part 1: Make charRefBuf non-growable (obsolete) — Details — Splinter Review

Attachment #8625269 - Attachment is obsolete: true

Attachment #8625269 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 21

•

9 years ago

Attached patch m-c part 1: Make charRefBuf non-growable (obsolete) — Details — Splinter Review

Attachment #8625268 - Attachment is obsolete: true

Attachment #8625268 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 22

•

9 years ago

Attached patch htmlparser repo part 2: Size the buffers to worst case before each tokenization call (obsolete) — Details — Splinter Review

Attachment #8625276 - Attachment is obsolete: true

Attachment #8625276 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 23

•

9 years ago

Attached patch m-c part 2: Size the buffers to worst case before each tokenization call (obsolete) — Details — Splinter Review

Attachment #8625457 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 24

•

9 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #6) > The patch for bug 1176668 goes between part 1 and part 2 here, because I > thought part 1 here had broken NCR handling. By the time I realized I had > found an ancient bug, the patches were already entangled enough that it's > not worthwhile to disentangle them. This no longer applies. Now the order is simply: bug 559303, bug 1176668, this bug.

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 25

•

9 years ago

Attached patch m-c part 2: Size the buffers to worst case before each tokenization call, vN (obsolete) — Details — Splinter Review

Attachment #8625704 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

9 years ago

Attachment #8625701 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

9 years ago

Attachment #8625702 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

9 years ago

Attachment #8625703 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 26

•

9 years ago

Comment on attachment 8625705 [details] [diff] [review] m-c part 2: Size the buffers to worst case before each tokenization call, vN See bug 1029671 comment 36 for the big picture of the whole queue and about landing.

Attachment #8625705 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 27

•

9 years ago

Attached patch m-c part 3 - undo part of part 2 to avoid crashing on web-platform-tests-2 (obsolete) — Details — Splinter Review

Attachment #8626084 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 28

•

9 years ago

Attached patch htmlparser repo part 3 - undo part of part 2 to avoid crashing on web-platform-tests-2 (obsolete) — Details — Splinter Review

Attachment #8626085 - Flags: review?(wchen)

William Chen [:wchen]

Updated

•

9 years ago

Attachment #8625701 - Flags: review?(wchen) → review+

William Chen [:wchen]

Updated

•

9 years ago

Attachment #8625702 - Flags: review?(wchen) → review+

William Chen [:wchen]

Comment 29

•

9 years ago

Comment on attachment 8625703 [details] [diff] [review] htmlparser repo part 2: Size the buffers to worst case before each tokenization call Review of attachment 8625703 [details] [diff] [review]: ----------------------------------------------------------------- I took a look at the crashes and it seems like it's essentially the same problem as comment 2, and the insight of comment 13 doesn't hold. The W2 test that is causing the crash looks like this (I've simplified the example): var s = "<script></a"; for (var i=0; i<s.length; i++) { document.write(s[i]); } Here is what I think is happening: It's writing to the tokenizer one character at a time, so we don't grow the buffer more than 2 characters (mozilla::RoundUpPow2(1)). At the same time, this is causing a change in tokenizer state. When the tokenizer consumes the final "a", it's in a state that will cause it to output "</a", but the buffer can't hold that many character and overflows.

Attachment #8625703 - Flags: review?(wchen) → review-

William Chen [:wchen]

Comment 30

•

9 years ago

Comment on attachment 8625705 [details] [diff] [review] m-c part 2: Size the buffers to worst case before each tokenization call, vN Review of attachment 8625705 [details] [diff] [review]: ----------------------------------------------------------------- ::: parser/html/nsHtml5TreeBuilderCppSupplement.h @@ +894,5 @@ > > void > nsHtml5TreeBuilder::accumulateCharacters(const char16_t* aBuf, int32_t aStart, int32_t aLength) > { > + memcpy(charBuffer + charBufferLen, aBuf + aStart, sizeof(char16_t) * aLength); We overflow here. There isn't an assertion here which is why you weren't getting anything from debug builds.

Attachment #8625705 - Flags: review?(wchen) → review-

William Chen [:wchen]

Comment 31

•

9 years ago

Comment on attachment 8626084 [details] [diff] [review] m-c part 3 - undo part of part 2 to avoid crashing on web-platform-tests-2 Review of attachment 8626084 [details] [diff] [review]: ----------------------------------------------------------------- This patch shouldn't be necessary if you fix the previously mentioned bug.

Attachment #8626084 - Flags: review?(wchen) → review-

William Chen [:wchen]

Updated

•

9 years ago

Attachment #8626085 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

9 years ago

Attachment #8626084 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

9 years ago

Attachment #8626085 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 32

•

9 years ago

Attached patch htmlparser repo part 1: Make charRefBuf non-growable, with r+ in comment — Details — Splinter Review

Thanks. Attaching patch 1 with an updated hg comment to keep track of stuff.

Attachment #8625701 - Attachment is obsolete: true

Attachment #8644266 - Flags: review+

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 33

•

9 years ago

Attached patch m-c part 1: Make charRefBuf non-growable, with updated hg comment — Details — Splinter Review

Attachment #8625702 - Attachment is obsolete: true

Attachment #8644267 - Flags: review+

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 34

•

9 years ago

Attached patch htmlparser repo part 2: Size the buffers to worst case before each tokenization call, revised — Details — Splinter Review

Thanks for finding the bug! Attaching revised part 2.

Attachment #8625703 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 35

•

9 years ago

Attached patch m-c part 2: Size the buffers to worst case before each tokenization call, revised — Details — Splinter Review

Attachment #8625705 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

9 years ago

Attachment #8644268 - Flags: review?(wchen)

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Updated

•

9 years ago

Attachment #8644269 - Flags: review?(wchen)

William Chen [:wchen]

Comment 36

•

9 years ago

Comment on attachment 8644268 [details] [diff] [review] htmlparser repo part 2: Size the buffers to worst case before each tokenization call, revised Looks good. Patch introduces some new trailing whitespace which should be removed.

Attachment #8644268 - Flags: review?(wchen) → review+

William Chen [:wchen]

Comment 37

•

9 years ago

Comment on attachment 8644269 [details] [diff] [review] m-c part 2: Size the buffers to worst case before each tokenization call, revised Trailing whitespace

Attachment #8644269 - Flags: review?(wchen) → review+

Pulsebot

Comment 38

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/c5c201250f2a https://hg.mozilla.org/integration/mozilla-inbound/rev/bb89c4d2d004

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Assignee

Comment 39

•

9 years ago

Thanks. Landed with the trailing whitespace fixed. I think I'll file a follow-up to zap all trailing whitespace from the Java code after which I can make my editor enforce it.

Ryan VanderMeulen [:RyanVM]

Comment 40

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/c5c201250f2a https://hg.mozilla.org/mozilla-central/rev/bb89c4d2d004

Status: ASSIGNED → RESOLVED

Closed: 15 years ago → 9 years ago

status-firefox43: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla43

Daniel Veditz [:dveditz]

Updated

•

8 years ago

Depends on: 1286911

Daniel Veditz [:dveditz]

Updated

•

8 years ago

Depends on: 1309834

You need to log in before you can comment on or make changes to this bug.