Closed
Bug 1410848
Opened 7 years ago
Closed 7 years ago
Upgrade all nsHtml5TreeOpExecutor::mFlushState checks to actually enforce correct state in all types of builds
Categories
(Core :: DOM: HTML Parser, defect)
Core
DOM: HTML Parser
Tracking
()
RESOLVED
FIXED
mozilla58
People
(Reporter: hsivonen, Assigned: hsivonen)
References
(Blocks 1 open bug)
Details
(Keywords: sec-audit, Whiteboard: [adv-main58-][post-critsmash-triage])
Attachments
(1 file, 2 obsolete files)
21.37 KB,
patch
|
smaug
:
review+
|
Details | Diff | Splinter Review |
Use-after-free avoidance on nsHtml5TreeOpExecutor::mOpQueue is based on the checks of nsHtml5TreeOpExecutor::mFlushState doing the right thing.
The problem is that some of those checks are mere NS_ASSERTION or NS_PRECONDITION. Those don't actually enforce anything.
We should upgrade NS_ASSERTION/NS_PRECONDITION checks of mFlushState to MOZ_RELEASE_ASSERT. In some cases, we might be able to avoid MOZ_CRASHing by making them non-crashing early returns instead.
Bug 1364399 is the closest we have to steps to reproduce a violation of these constraints.
Comment 1•7 years ago
|
||
But MOZ_RELEASE_ASSERT just makes us crash safely. We should ensure the assertion never ever fires.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•7 years ago
|
||
Assignee | ||
Comment 3•7 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #1)
> But MOZ_RELEASE_ASSERT just makes us crash safely. We should ensure the
> assertion never ever fires.
Here's how it's supposed to work:
* RunFlushLoop() returns right way if another copy is already on the stack, thanks to mRunFlushLoopOnStack.
* FlushDocumentWrite() returns after flushing speculative loads if there is already another flush loop executing ("if (mFlushState != eNotFlushing)").
* Ops can be added to the queue via:
* When moving speculations over from the parser thread only after the last <script> has finished after all document.write layers have finished. (This is indeed rather hand-wavy.)
* From within RunFlushLoop() before entering the actual flush loop.
* From document.write(), in a manner which
- Happens right before FlushDocumentWrite() is called.
- Is not supposed to happen during RunFlushLoop() except as a tail call (after the mOpQueue processing) when executing a <script>.
The interaction of doc update pauses with the last point is the most questionable thing.
Assignee | ||
Comment 4•7 years ago
|
||
Assignee | ||
Comment 5•7 years ago
|
||
Assignee | ||
Comment 6•7 years ago
|
||
The screenshot orange is likely a known problem from Tuesday. Trying again (this time browser-chrome is known orange):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=21849249e91435ae97da1f38e8ddd57c2b90791e
Assignee | ||
Comment 7•7 years ago
|
||
The assertions are upgraded. RAII is used to increase confidence in mFlushState transitions not getting forgotten. C++11 loops are used to enhance code readability.
Attachment #8922258 -
Flags: review?(bugs)
Comment 8•7 years ago
|
||
ranged-for loop are less safe than old style loops.
Though, it shouldn't cause anymore security issues, since we'll just crash if the array is modified
https://bugzilla.mozilla.org/show_bug.cgi?id=1299489
Comment 9•7 years ago
|
||
Comment on attachment 8922258 [details] [diff] [review]
Use RAII and MOZ_RELEASE_ASSERT
>+class MOZ_RAII nsHtml5AutoFlush final
>+{
>+private:
>+ RefPtr<nsHtml5TreeOpExecutor> mExecutor;
>+ size_t mOpsToRemove;
>+
>+public:
>+ explicit nsHtml5AutoFlush(nsHtml5TreeOpExecutor* aExecutor)
>+ : mExecutor(aExecutor)
>+ , mOpsToRemove(aExecutor->OpQueueLength())
I don't understand this mOpsToRemove part.
This looks like as if we were expecting the op queue to get more entries while nsHtml5AutoFlush is on stack.
But modifying the opqueue array while iterating is not safe.
I think I'm missing something here. What guarantees the op queue isn't modified while it is being iterated?
Please explain and ask review again.
Attachment #8922258 -
Flags: review?(bugs)
Assignee | ||
Comment 10•7 years ago
|
||
Comment on attachment 8922258 [details] [diff] [review]
Use RAII and MOZ_RELEASE_ASSERT
(In reply to Olli Pettay [:smaug] from comment #9)
> Comment on attachment 8922258 [details] [diff] [review]
> Use RAII and MOZ_RELEASE_ASSERT
>
> >+class MOZ_RAII nsHtml5AutoFlush final
> >+{
> >+private:
> >+ RefPtr<nsHtml5TreeOpExecutor> mExecutor;
> >+ size_t mOpsToRemove;
> >+
> >+public:
> >+ explicit nsHtml5AutoFlush(nsHtml5TreeOpExecutor* aExecutor)
> >+ : mExecutor(aExecutor)
> >+ , mOpsToRemove(aExecutor->OpQueueLength())
> I don't understand this mOpsToRemove part.
By default, the destructor of nsHtml5AutoFlush removes all ops in mOpQueue, but when we exit the flush loop early due to the having taken too much wall-clock time in the loop, we need to remove only the ops that we had time to loop over. That's why before exiting the loop in that case, mOpsToRemove is set to a smaller number.
The number of ops in the queue can't change between the constructor and destructor of nsHtml5AutoFlush, so the default case is handled by initializing mOpsToRemove to the number of ops. mOpsToRemove exists so that the time-based interrupt case can be handled with RAII. (If the time-based interrupt didn't exist, mOpsToRemove wouldn't exist and the destructor would call ClearOpQueue().)
> This looks like as if we were expecting the op queue to get more entries
> while nsHtml5AutoFlush is on stack.
No, there's an assertion checking that if mOpsToRemove is modified after the construction of nsHtml5AutoFlush, it is set to a smaller number than it was initialized to.
> But modifying the opqueue array while iterating is not safe.
Correct.
> I think I'm missing something here. What guarantees the op queue isn't
> modified while it is being iterated?
All modifications MOZ_RELEASE_ASSERT on mFlushState == eNotFlushing.
mFlushState is always set to != eNotFlushing while iterating over the queue (using nsHtml5AutoFlush around iteration).
Attachment #8922258 -
Flags: review?(bugs)
Comment 11•7 years ago
|
||
MOZ_*_ASSERT don't guarantee anything. I mean, sure we crash safely, but they should never fire.
If one would remove the assertions, is there any way mOpQueue might change while it is being iterated?
Updated•7 years ago
|
Group: core-security → dom-core-security
Assignee | ||
Comment 12•7 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #11)
> MOZ_*_ASSERT don't guarantee anything.
MOZ_RELEASE_ASSERT before each modification guarantees that if there's a way to try to cause a bad modification, it won't be a security bug.
> I mean, sure we crash safely, but they should never fire.
> If one would remove the assertions, is there any way mOpQueue might change
> while it is being iterated?
Per comment 3, there is not supposed to be, and there isn't anything in our test suite that triggers the MOZ_RELEASE_ASSERTs added here.
As for the last remark in comment 3, here's how the document.write() part works:
The nsHtml5AutoFlush in this patch sets mFlushState to not eNotFlushing. nsHTMLDocument, upon document.write(), calls nsHtml5Parser::IsInsertionPointDefined(), which return false when mFlushState is not eNotFlushing. If the insertion point isn't defined, document.write() blows away the document instead of adding ops to the op queue. (There are kungfuDeathGrips to prevent the nsHtml5TreeOpExecutor getting deleted immediately during the mOpQueue iteration when the document is blown away.)
However, bug 1364399 indicates that some edge case existed when that bug was filed despite the reasoning in comment 3 and the paragraph above. I suggest we land this patch and fix the edge case if the MOZ_RELEASE_ASSERTs ever fire and give us steps to reproduce.
Assignee | ||
Comment 13•7 years ago
|
||
Comment on attachment 8922258 [details] [diff] [review]
Use RAII and MOZ_RELEASE_ASSERT
Going to rebase now that bug 1378079 has landed.
Attachment #8922258 -
Attachment is obsolete: true
Attachment #8922258 -
Flags: review?(bugs)
Assignee | ||
Comment 14•7 years ago
|
||
Assignee | ||
Comment 15•7 years ago
|
||
Assignee | ||
Updated•7 years ago
|
Attachment #8922691 -
Flags: review?(bugs)
Comment 16•7 years ago
|
||
FWIW, I'm not worried about document.write, but the case when data comes from network while we spin event loop inside some parser operation (because of running scripts there)
Comment 17•7 years ago
|
||
I think http://searchfox.org/mozilla-central/rev/40e8eb46609dcb8780764774ec550afff1eed3a5/parser/html/nsHtml5TreeOpExecutor.cpp#829 is the code I'm worried.
What guarantees that never runs when we're processing op queue?
Comment 18•7 years ago
|
||
And saying that there is an assertion isn't the answer ;)
Flags: needinfo?(hsivonen)
Comment 19•7 years ago
|
||
The worry with just adding release asserts I have is that the possible security critical issue there is happens only if the current array allocation isn't enough for taking more elements to the array, so that the array capacity needs to be increased. If that that happens very very rarely, but we do
still append to the array somewhat often when we shouldn't, we end up causing quite a few crashes (technically even in cases which aren't unsafe).
So I think we really should try to find the reason for bug 1364399, and also
figure out if the parser thread can push more ops to the queue while main thread is spinning the event loop nestedly.
Assignee | ||
Comment 20•7 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #17)
> I think
> http://searchfox.org/mozilla-central/rev/
> 40e8eb46609dcb8780764774ec550afff1eed3a5/parser/html/nsHtml5TreeOpExecutor.
> cpp#829 is the code I'm worried.
> What guarantees that never runs when we're processing op queue?
This runs in two cases, AFAICT,
1) document.write().
2) When scripts are done and the main thread checks if there are pending speculations waiting in nsHtml5StreamParser.
The document.write() case is covered in comment 12.
Case 2 happens from within nsHtml5TreeOpExecutor::RunFlushLoop() after it has checked that there isn't another copy of itself on the stack and before it has entered the loop that reads the op queue (and before it has instantiated nsHtml5AutoFlush).
The case of FlushDocumentWrite() being called in response to document.write from within an inline script is OK, because in that case nsHtml5TreeOpExecutor::RunFlushLoop() is on stack and can't be re-entered per the above paragraph.
However, when FlushDocumentWrite() is being called in response to document.write() form within an external script and there's a nested event loop, we may have a bug! I guess I'll need to pay more attention to that case.
(In reply to Olli Pettay [:smaug] from comment #19)
> So I think we really should try to find the reason for bug 1364399,
I agree, but can we still land this on its own? Bug 1364399 seems to be related to an async script blowing away the document using document.write(), so even if the bug persists, sites shouldn't be triggering it legitimately, so we shouldn't expect this patch to cause a lot of MOZ_CRASHes in the wild for legitimate sites.
> and also
> figure out if the parser thread can push more ops to the queue while main
> thread is spinning the event loop nestedly.
The parser thread can't push to mOpQueue directly. It pushes to mStage and nsHtml5TreeOpExecutor::RunFlushLoop() moves the ops from there to mOpQueue when nsHtml5TreeOpExecutor::RunFlushLoop() evidently isn't iterating over mOpQueue.
Assignee | ||
Comment 21•7 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #20)
> However, when FlushDocumentWrite() is being called in response to
> document.write() form within an external script and there's a nested event
> loop, we may have a bug! I guess I'll need to pay more attention to that
> case.
This covered by
if (mFlushState != eNotFlushing) {
// XXX Can this happen? In case it can, let's avoid crashing.
return;
}
in nsHtml5TreeOpExecutor::RunFlushLoop().
Flags: needinfo?(hsivonen)
Comment 22•7 years ago
|
||
ok, thanks for clarifying. I'm still not quite sure if MOZ_RELEASE_ASSERT is the right thing to do, or should we just bail out and stop parsing or something if we're in a weird state.
But I'll review tomorrow.
Assignee | ||
Comment 23•7 years ago
|
||
I'm guessing bug 1364399 relates to step 5. at
https://html.spec.whatwg.org/#dom-document-close
Assignee | ||
Comment 24•7 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #23)
> I'm guessing bug 1364399 relates to
data: URLs loading synchronously seems like a never-ending source of bugs.
Comment 25•7 years ago
|
||
Might be worth to land this stuff only once bug 1364399 is fixed.
Comment 26•7 years ago
|
||
Comment on attachment 8922691 [details] [diff] [review]
Use MOZ_RELEASE_ASSERT and RAII
>+ // autoFlush clears mOpQueue in its destructor unless
>+ // SetNumberOfOpsToRemove is called first, in which case only
>+ // some ops from the start of the queue are cleared.
>+ nsHtml5AutoFlush autoFlush(this);
>+
>+ auto first = mOpQueue.begin();
>+ auto last = mOpQueue.end() - 1;
>+ for (auto iter = first;; ++iter) {
(huh, I hate auto usage. Makes code so much harder to read when one needs to explicitly
go and check what kind of type the auto actually is.
But I guess we've agreed that in case of iteration and static_cast and such use of auto is ok)
Attachment #8922691 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 27•7 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #25)
> Might be worth to land this stuff only once bug 1364399 is fixed.
OK.
(In reply to Olli Pettay [:smaug] from comment #26)
> (huh, I hate auto usage. Makes code so much harder to read when one needs to
> explicitly
> go and check what kind of type the auto actually is.
> But I guess we've agreed that in case of iteration and static_cast and such
> use of auto is ok)
I guess I should revert the "auto" bits. Logically they don't really belong in this patch anyway.
Assignee | ||
Comment 28•7 years ago
|
||
Attachment #8922691 -
Attachment is obsolete: true
Attachment #8923499 -
Flags: review?(bugs)
Updated•7 years ago
|
Attachment #8923499 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 29•7 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #23)
> I'm guessing bug 1364399 relates to step 5. at
> https://html.spec.whatwg.org/#dom-document-close
Step 5 is OK. (It's in nsHtml5Parser.cpp instead of being in nsHTMLDocument.cpp.)
The actual root cause is in bug 1364399 comment 11.
Assignee | ||
Comment 30•7 years ago
|
||
Comment 31•7 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/3660e2cf9fdb
I assume this is fine riding the 58 train and doesn't require backport anywhere.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox56:
--- → wontfix
status-firefox57:
--- → wontfix
status-firefox58:
--- → fixed
status-firefox-esr52:
--- → wontfix
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Updated•7 years ago
|
Group: dom-core-security → core-security-release
Updated•7 years ago
|
Whiteboard: [adv-main58-]
Updated•7 years ago
|
Flags: qe-verify-
Whiteboard: [adv-main58-] → [adv-main58-][post-critsmash-triage]
Updated•6 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•