Closed Bug 1410848 Opened 7 years ago Closed 7 years ago

Upgrade all nsHtml5TreeOpExecutor::mFlushState checks to actually enforce correct state in all types of builds

Categories

(Core :: DOM: HTML Parser, defect)

Product:

Component:

Type:

defect

Priority:

Not set

Severity:

normal

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla58

Tracking Flags:

Tracking

Status

firefox-esr52

---

wontfix

firefox56

---

wontfix

firefox57

---

wontfix

firefox58

---

fixed

People

(Reporter: hsivonen, Assigned: hsivonen)

References

(Blocks 1 open bug)

Details

(Keywords: sec-audit, Whiteboard: [adv-main58-][post-critsmash-triage])

Attachments

(1 file, 2 obsolete files)

Use RAII and MOZ_RELEASE_ASSERT 7 years ago Henri Sivonen (:hsivonen) 18.28 KB, patch		Details \| Diff \| Splinter Review
Use MOZ_RELEASE_ASSERT and RAII 7 years ago Henri Sivonen (:hsivonen) 21.07 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
Patch with C++11isms reverted 7 years ago Henri Sivonen (:hsivonen) 21.37 KB, patch	smaug : review+	Details \| Diff \| Splinter Review

Henri Sivonen (:hsivonen)

Assignee

Description

•

7 years ago

Use-after-free avoidance on nsHtml5TreeOpExecutor::mOpQueue is based on the checks of nsHtml5TreeOpExecutor::mFlushState doing the right thing. The problem is that some of those checks are mere NS_ASSERTION or NS_PRECONDITION. Those don't actually enforce anything. We should upgrade NS_ASSERTION/NS_PRECONDITION checks of mFlushState to MOZ_RELEASE_ASSERT. In some cases, we might be able to avoid MOZ_CRASHing by making them non-crashing early returns instead. Bug 1364399 is the closest we have to steps to reproduce a violation of these constraints.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 1

•

7 years ago

But MOZ_RELEASE_ASSERT just makes us crash safely. We should ensure the assertion never ever fires.

Henri Sivonen (:hsivonen)

Assignee

Updated

•

7 years ago

Assignee: nobody → hsivonen

Status: NEW → ASSIGNED

Henri Sivonen (:hsivonen)

Assignee

Comment 2

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6897ae76a012fe8a7c3d95b4d3e8cb129d374a85

Henri Sivonen (:hsivonen)

Assignee

Comment 3

•

7 years ago

(In reply to Olli Pettay [:smaug] from comment #1) > But MOZ_RELEASE_ASSERT just makes us crash safely. We should ensure the > assertion never ever fires. Here's how it's supposed to work: * RunFlushLoop() returns right way if another copy is already on the stack, thanks to mRunFlushLoopOnStack. * FlushDocumentWrite() returns after flushing speculative loads if there is already another flush loop executing ("if (mFlushState != eNotFlushing)"). * Ops can be added to the queue via: * When moving speculations over from the parser thread only after the last <script> has finished after all document.write layers have finished. (This is indeed rather hand-wavy.) * From within RunFlushLoop() before entering the actual flush loop. * From document.write(), in a manner which - Happens right before FlushDocumentWrite() is called. - Is not supposed to happen during RunFlushLoop() except as a tail call (after the mOpQueue processing) when executing a <script>. The interaction of doc update pauses with the last point is the most questionable thing.

Henri Sivonen (:hsivonen)

Assignee

Comment 4

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a0ccd07230d454d17d3bc3fb906fd5c7bd9d4df8

Henri Sivonen (:hsivonen)

Assignee

Comment 5

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b48ef5dd7eb5f0df600ea62e421a574335aa2ea

Henri Sivonen (:hsivonen)

Assignee

Comment 6

•

7 years ago

The screenshot orange is likely a known problem from Tuesday. Trying again (this time browser-chrome is known orange): https://treeherder.mozilla.org/#/jobs?repo=try&revision=21849249e91435ae97da1f38e8ddd57c2b90791e

Henri Sivonen (:hsivonen)

Assignee

Comment 7

•

7 years ago

Attached patch Use RAII and MOZ_RELEASE_ASSERT (obsolete) — Details — Splinter Review

The assertions are upgraded. RAII is used to increase confidence in mFlushState transitions not getting forgotten. C++11 loops are used to enhance code readability.

Attachment #8922258 - Flags: review?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 8

•

7 years ago

ranged-for loop are less safe than old style loops. Though, it shouldn't cause anymore security issues, since we'll just crash if the array is modified https://bugzilla.mozilla.org/show_bug.cgi?id=1299489

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 9

•

7 years ago

Comment on attachment 8922258 [details] [diff] [review] Use RAII and MOZ_RELEASE_ASSERT >+class MOZ_RAII nsHtml5AutoFlush final >+{ >+private: >+ RefPtr<nsHtml5TreeOpExecutor> mExecutor; >+ size_t mOpsToRemove; >+ >+public: >+ explicit nsHtml5AutoFlush(nsHtml5TreeOpExecutor* aExecutor) >+ : mExecutor(aExecutor) >+ , mOpsToRemove(aExecutor->OpQueueLength()) I don't understand this mOpsToRemove part. This looks like as if we were expecting the op queue to get more entries while nsHtml5AutoFlush is on stack. But modifying the opqueue array while iterating is not safe. I think I'm missing something here. What guarantees the op queue isn't modified while it is being iterated? Please explain and ask review again.

Attachment #8922258 - Flags: review?(bugs)

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Keywords: sec-audit

Henri Sivonen (:hsivonen)

Assignee

Comment 10

•

7 years ago

Comment on attachment 8922258 [details] [diff] [review] Use RAII and MOZ_RELEASE_ASSERT (In reply to Olli Pettay [:smaug] from comment #9) > Comment on attachment 8922258 [details] [diff] [review] > Use RAII and MOZ_RELEASE_ASSERT > > >+class MOZ_RAII nsHtml5AutoFlush final > >+{ > >+private: > >+ RefPtr<nsHtml5TreeOpExecutor> mExecutor; > >+ size_t mOpsToRemove; > >+ > >+public: > >+ explicit nsHtml5AutoFlush(nsHtml5TreeOpExecutor* aExecutor) > >+ : mExecutor(aExecutor) > >+ , mOpsToRemove(aExecutor->OpQueueLength()) > I don't understand this mOpsToRemove part. By default, the destructor of nsHtml5AutoFlush removes all ops in mOpQueue, but when we exit the flush loop early due to the having taken too much wall-clock time in the loop, we need to remove only the ops that we had time to loop over. That's why before exiting the loop in that case, mOpsToRemove is set to a smaller number. The number of ops in the queue can't change between the constructor and destructor of nsHtml5AutoFlush, so the default case is handled by initializing mOpsToRemove to the number of ops. mOpsToRemove exists so that the time-based interrupt case can be handled with RAII. (If the time-based interrupt didn't exist, mOpsToRemove wouldn't exist and the destructor would call ClearOpQueue().) > This looks like as if we were expecting the op queue to get more entries > while nsHtml5AutoFlush is on stack. No, there's an assertion checking that if mOpsToRemove is modified after the construction of nsHtml5AutoFlush, it is set to a smaller number than it was initialized to. > But modifying the opqueue array while iterating is not safe. Correct. > I think I'm missing something here. What guarantees the op queue isn't > modified while it is being iterated? All modifications MOZ_RELEASE_ASSERT on mFlushState == eNotFlushing. mFlushState is always set to != eNotFlushing while iterating over the queue (using nsHtml5AutoFlush around iteration).

Attachment #8922258 - Flags: review?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 11

•

7 years ago

MOZ_*_ASSERT don't guarantee anything. I mean, sure we crash safely, but they should never fire. If one would remove the assertions, is there any way mOpQueue might change while it is being iterated?

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Group: core-security → dom-core-security

Henri Sivonen (:hsivonen)

Assignee

Comment 12

•

7 years ago

(In reply to Olli Pettay [:smaug] from comment #11) > MOZ_*_ASSERT don't guarantee anything. MOZ_RELEASE_ASSERT before each modification guarantees that if there's a way to try to cause a bad modification, it won't be a security bug. > I mean, sure we crash safely, but they should never fire. > If one would remove the assertions, is there any way mOpQueue might change > while it is being iterated? Per comment 3, there is not supposed to be, and there isn't anything in our test suite that triggers the MOZ_RELEASE_ASSERTs added here. As for the last remark in comment 3, here's how the document.write() part works: The nsHtml5AutoFlush in this patch sets mFlushState to not eNotFlushing. nsHTMLDocument, upon document.write(), calls nsHtml5Parser::IsInsertionPointDefined(), which return false when mFlushState is not eNotFlushing. If the insertion point isn't defined, document.write() blows away the document instead of adding ops to the op queue. (There are kungfuDeathGrips to prevent the nsHtml5TreeOpExecutor getting deleted immediately during the mOpQueue iteration when the document is blown away.) However, bug 1364399 indicates that some edge case existed when that bug was filed despite the reasoning in comment 3 and the paragraph above. I suggest we land this patch and fix the edge case if the MOZ_RELEASE_ASSERTs ever fire and give us steps to reproduce.

Henri Sivonen (:hsivonen)

Assignee

Comment 13

•

7 years ago

Comment on attachment 8922258 [details] [diff] [review] Use RAII and MOZ_RELEASE_ASSERT Going to rebase now that bug 1378079 has landed.

Attachment #8922258 - Attachment is obsolete: true

Attachment #8922258 - Flags: review?(bugs)

Henri Sivonen (:hsivonen)

Assignee

Comment 14

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6c5a58c81f2c7d1b45c46a08c8049ac5b4376ba4

Henri Sivonen (:hsivonen)

Assignee

Comment 15

•

7 years ago

Attached patch Use MOZ_RELEASE_ASSERT and RAII (obsolete) — Details — Splinter Review

Henri Sivonen (:hsivonen)

Assignee

Updated

•

7 years ago

Attachment #8922691 - Flags: review?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 16

•

7 years ago

FWIW, I'm not worried about document.write, but the case when data comes from network while we spin event loop inside some parser operation (because of running scripts there)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 17

•

7 years ago

I think http://searchfox.org/mozilla-central/rev/40e8eb46609dcb8780764774ec550afff1eed3a5/parser/html/nsHtml5TreeOpExecutor.cpp#829 is the code I'm worried. What guarantees that never runs when we're processing op queue?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 18

•

7 years ago

And saying that there is an assertion isn't the answer ;)

Flags: needinfo?(hsivonen)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 19

•

7 years ago

The worry with just adding release asserts I have is that the possible security critical issue there is happens only if the current array allocation isn't enough for taking more elements to the array, so that the array capacity needs to be increased. If that that happens very very rarely, but we do still append to the array somewhat often when we shouldn't, we end up causing quite a few crashes (technically even in cases which aren't unsafe). So I think we really should try to find the reason for bug 1364399, and also figure out if the parser thread can push more ops to the queue while main thread is spinning the event loop nestedly.

Henri Sivonen (:hsivonen)

Assignee

Comment 20

•

7 years ago

(In reply to Olli Pettay [:smaug] from comment #17) > I think > http://searchfox.org/mozilla-central/rev/ > 40e8eb46609dcb8780764774ec550afff1eed3a5/parser/html/nsHtml5TreeOpExecutor. > cpp#829 is the code I'm worried. > What guarantees that never runs when we're processing op queue? This runs in two cases, AFAICT, 1) document.write(). 2) When scripts are done and the main thread checks if there are pending speculations waiting in nsHtml5StreamParser. The document.write() case is covered in comment 12. Case 2 happens from within nsHtml5TreeOpExecutor::RunFlushLoop() after it has checked that there isn't another copy of itself on the stack and before it has entered the loop that reads the op queue (and before it has instantiated nsHtml5AutoFlush). The case of FlushDocumentWrite() being called in response to document.write from within an inline script is OK, because in that case nsHtml5TreeOpExecutor::RunFlushLoop() is on stack and can't be re-entered per the above paragraph. However, when FlushDocumentWrite() is being called in response to document.write() form within an external script and there's a nested event loop, we may have a bug! I guess I'll need to pay more attention to that case. (In reply to Olli Pettay [:smaug] from comment #19) > So I think we really should try to find the reason for bug 1364399, I agree, but can we still land this on its own? Bug 1364399 seems to be related to an async script blowing away the document using document.write(), so even if the bug persists, sites shouldn't be triggering it legitimately, so we shouldn't expect this patch to cause a lot of MOZ_CRASHes in the wild for legitimate sites. > and also > figure out if the parser thread can push more ops to the queue while main > thread is spinning the event loop nestedly. The parser thread can't push to mOpQueue directly. It pushes to mStage and nsHtml5TreeOpExecutor::RunFlushLoop() moves the ops from there to mOpQueue when nsHtml5TreeOpExecutor::RunFlushLoop() evidently isn't iterating over mOpQueue.

Henri Sivonen (:hsivonen)

Assignee

Comment 21

•

7 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #20) > However, when FlushDocumentWrite() is being called in response to > document.write() form within an external script and there's a nested event > loop, we may have a bug! I guess I'll need to pay more attention to that > case. This covered by if (mFlushState != eNotFlushing) { // XXX Can this happen? In case it can, let's avoid crashing. return; } in nsHtml5TreeOpExecutor::RunFlushLoop().

Flags: needinfo?(hsivonen)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 22

•

7 years ago

ok, thanks for clarifying. I'm still not quite sure if MOZ_RELEASE_ASSERT is the right thing to do, or should we just bail out and stop parsing or something if we're in a weird state. But I'll review tomorrow.

Henri Sivonen (:hsivonen)

Assignee

Comment 23

•

7 years ago

I'm guessing bug 1364399 relates to step 5. at https://html.spec.whatwg.org/#dom-document-close

Henri Sivonen (:hsivonen)

Assignee

Comment 24

•

7 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #23) > I'm guessing bug 1364399 relates to data: URLs loading synchronously seems like a never-ending source of bugs.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 25

•

7 years ago

Might be worth to land this stuff only once bug 1364399 is fixed.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 26

•

7 years ago

Comment on attachment 8922691 [details] [diff] [review] Use MOZ_RELEASE_ASSERT and RAII >+ // autoFlush clears mOpQueue in its destructor unless >+ // SetNumberOfOpsToRemove is called first, in which case only >+ // some ops from the start of the queue are cleared. >+ nsHtml5AutoFlush autoFlush(this); >+ >+ auto first = mOpQueue.begin(); >+ auto last = mOpQueue.end() - 1; >+ for (auto iter = first;; ++iter) { (huh, I hate auto usage. Makes code so much harder to read when one needs to explicitly go and check what kind of type the auto actually is. But I guess we've agreed that in case of iteration and static_cast and such use of auto is ok)

Attachment #8922691 - Flags: review?(bugs) → review+

Henri Sivonen (:hsivonen)

Assignee

Comment 27

•

7 years ago

(In reply to Olli Pettay [:smaug] from comment #25) > Might be worth to land this stuff only once bug 1364399 is fixed. OK. (In reply to Olli Pettay [:smaug] from comment #26) > (huh, I hate auto usage. Makes code so much harder to read when one needs to > explicitly > go and check what kind of type the auto actually is. > But I guess we've agreed that in case of iteration and static_cast and such > use of auto is ok) I guess I should revert the "auto" bits. Logically they don't really belong in this patch anyway.

Henri Sivonen (:hsivonen)

Assignee

Comment 28

•

7 years ago

Attached patch Patch with C++11isms reverted — Details — Splinter Review

Attachment #8922691 - Attachment is obsolete: true

Attachment #8923499 - Flags: review?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

7 years ago

Attachment #8923499 - Flags: review?(bugs) → review+

Henri Sivonen (:hsivonen)

Assignee

Comment 29

•

7 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #23) > I'm guessing bug 1364399 relates to step 5. at > https://html.spec.whatwg.org/#dom-document-close Step 5 is OK. (It's in nsHtml5Parser.cpp instead of being in nsHTMLDocument.cpp.) The actual root cause is in bug 1364399 comment 11.

Henri Sivonen (:hsivonen)

Assignee

Comment 30

•

7 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/3660e2cf9fdbb05e7890f70e9a35572a0d714694

Ryan VanderMeulen [:RyanVM]

Comment 31

•

7 years ago

https://hg.mozilla.org/mozilla-central/rev/3660e2cf9fdb I assume this is fine riding the 58 train and doesn't require backport anywhere.

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox56: --- → wontfix

status-firefox57: --- → wontfix

status-firefox58: --- → fixed

status-firefox-esr52: --- → wontfix

Resolution: --- → FIXED

Target Milestone: --- → mozilla58

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Group: dom-core-security → core-security-release

Al Billings [:abillings - ex-MoCo]

Updated

•

7 years ago

Whiteboard: [adv-main58-]

Matt Wobensmith [:mwobensmith][:matt:]

Updated

•

7 years ago

Flags: qe-verify-

Whiteboard: [adv-main58-] → [adv-main58-][post-critsmash-triage]

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: core-security-release

You need to log in before you can comment on or make changes to this bug.