Closed Bug 1753298 Opened 2 years ago Closed 2 years ago

Assertion failure: aTerminated || mDocument->GetReadyStateEnum() == Document::READYSTATE_LOADING (Bad readyState), at src/dom/base/nsContentSink.cpp:789

Categories

(Core :: Audio/Video, defect, P2)

defect

Tracking

()

RESOLVED FIXED
100 Branch
Tracking Status
firefox-esr91 100+ fixed
firefox98 --- wontfix
firefox99 --- wontfix
firefox100 + fixed

People

(Reporter: tsmith, Assigned: kinetik)

References

(Blocks 1 open bug)

Details

(Keywords: assertion, sec-moderate, Whiteboard: [post-critsmash-triage][adv-main100+r][adv-esr91.9+r])

Attachments

(3 files)

Found while fuzzing m-c 20220103-1cb2015e6fbc (--enable-debug --enable-fuzzing)

A reliable rescued testcase is unavailable. I was able to get an rr trace and I will attach a Pernosco session shortly.

Assertion failure: aTerminated || mDocument->GetReadyStateEnum() == Document::READYSTATE_LOADING (Bad readyState), at src/dom/base/nsContentSink.cpp:789

#0 0x7fca8f612cb7 in nsContentSink::DidBuildModelImpl(bool) src/dom/base/nsContentSink.cpp:787:3
#1 0x7fca8eb94bd9 in nsHtml5TreeOpExecutor::DidBuildModel(bool) src/parser/html/nsHtml5TreeOpExecutor.cpp:178:3
#2 0x7fca8eb965af in nsHtml5TreeOpExecutor::RunFlushLoop() src/parser/html/nsHtml5TreeOpExecutor.cpp:680:7
#3 0x7fca8eb9dfca in nsHtml5ExecutorReflusher::Run() src/parser/html/nsHtml5TreeOpExecutor.cpp:78:16
#4 0x7fca8d5e1ed2 in mozilla::SchedulerGroup::Runnable::Run() src/xpcom/threads/SchedulerGroup.cpp:140:20
#5 0x7fca8d6121de in mozilla::RunnableTask::Run() src/xpcom/threads/TaskController.cpp:467:16
#6 0x7fca8d5ec066 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) src/xpcom/threads/TaskController.cpp:770:26
#7 0x7fca8d5ead28 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) src/xpcom/threads/TaskController.cpp:606:15
#8 0x7fca8d5eafa3 in mozilla::TaskController::ProcessPendingMTTask(bool) src/xpcom/threads/TaskController.cpp:390:36
#9 0x7fca8d615216 in operator() src/xpcom/threads/TaskController.cpp:124:37
#10 0x7fca8d615216 in mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_0>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:531:5
#11 0x7fca8d600933 in nsThread::ProcessNextEvent(bool, bool*) src/xpcom/threads/nsThread.cpp:1195:16
#12 0x7fca8d607a1a in NS_ProcessNextEvent(nsIThread*, bool) src/xpcom/threads/nsThreadUtils.cpp:467:10
#13 0x7fca8e0a7ce6 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) src/ipc/glue/MessagePump.cpp:85:21
#14 0x7fca8dfcca47 in MessageLoop::RunInternal() src/ipc/chromium/src/base/message_loop.cc:331:10
#15 0x7fca8dfcc952 in RunHandler src/ipc/chromium/src/base/message_loop.cc:324:3
#16 0x7fca8dfcc952 in MessageLoop::Run() src/ipc/chromium/src/base/message_loop.cc:306:3
#17 0x7fca92291ed8 in nsBaseAppShell::Run() src/widget/nsBaseAppShell.cpp:137:27
#18 0x7fca942e9633 in XRE_RunAppShell() src/toolkit/xre/nsEmbedFunctions.cpp:870:20
#19 0x7fca8e0a8bda in mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) src/ipc/glue/MessagePump.cpp:235:9
#20 0x7fca8dfcca47 in MessageLoop::RunInternal() src/ipc/chromium/src/base/message_loop.cc:331:10
#21 0x7fca8dfcc952 in RunHandler src/ipc/chromium/src/base/message_loop.cc:324:3
#22 0x7fca8dfcc952 in MessageLoop::Run() src/ipc/chromium/src/base/message_loop.cc:306:3
#23 0x7fca942e8c6c in XRE_InitChildProcess(int, char**, XREChildData const*) src/toolkit/xre/nsEmbedFunctions.cpp:707:34
#24 0x558975a8f009 in content_process_main src/browser/app/../../ipc/contentproc/plugin-container.cpp:57:28
#25 0x558975a8f009 in main src/browser/app/nsBrowserApp.cpp:327:18
#26 0x7fcaa240c0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:308:16
#27 0x558975a6a79c in _start (/home/worker/builds/m-c-20220130093554-fuzzing-debug/firefox-bin+0x1579c)

A Pernosco session is available here: https://pernos.co/debug/eKh9ZmlVAbHsViiD5V-TTg/index.html

Flags: needinfo?(hsivonen)

The mForegroundCount of the load group seems to underflow before the parse starts. Still looking further.

As an oddity that may be a distraction or may be important, it seems that Necko reports the content of the HTML document via OnDataAvailable twice (to the same parser instance, so it doesn't look like a reload).

Group: core-security
Flags: needinfo?(hsivonen)

(In reply to Henri Sivonen (:hsivonen) from comment #3)

The underflow happens from here:
https://searchfox.org/mozilla-central/rev/38652b98c6dd3bf42403eeb8c5305902b9a6e938/netwerk/ipc/DocumentChannelChild.cpp#442

Nevermind. The root cause underflow happens much earlier.

Some observations:

  1. The key lines for tracking the problem are nsLoadGroup.cpp lines 461 and 602 with this==0x5650e4b23750. (See the Pernosco Notebook for more.)
  2. When things go wrong, we have a pending WebM request whose removal underflows mForegroundCount.
  3. Necko appears to use the same nsIRequest address for a bunch of different subresources. Am I reading things wrong? Is this normal?
  4. When a media element encounters a decode error, it takes the request out of the load group, changes flags, and puts it back.
  5. It seems bad that we have a single hashtable for background and foreground loads and a counter for tracking how many are foreground loads. Would it be bad to have two hashtables: one for background requests and another for foreground requests, and instead looking at mForegroundCount we'd look at the size of the foreground hashtable? It would be harder to confuse that kind of setup.

Needinfoing Valentin for point 3, Bryce for point 4, and smaug for point 5 for ideas.

Anyway, AFAICT, this isn't a parser bug but a bug earlier in subresource bookkeeping that manifests upon the parse finishing.

As for the reproducibility: The test case seems to repeatedly blow away the document by document.write implying document.open, followed by window.location.reload(). Eventually the timing of the subresources differs enough to trigger the bug.

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(bvandyk)
Flags: needinfo?(bugs)

(In reply to Henri Sivonen (:hsivonen) from comment #5)

  1. Necko appears to use the same nsIRequest address for a bunch of different subresources. Am I reading things wrong? Is this normal?

I haven't looked at the pernosco trace, but I assume the channel gets freed and another one gets allocated at the same address.

Flags: needinfo?(valentin.gosu)
Group: core-security → dom-core-security

Sounds scary, but not very reliable. I'll mark it sec-moderate for now.

Keywords: sec-moderate
Flags: needinfo?(bugs) → needinfo?(hsivonen)

This is suspicious:
Media code modifies the load flags
https://searchfox.org/mozilla-central/rev/ad7ecfa618ec3a65db8405d9f1125059fe4a6a15/dom/media/ChannelMediaResource.cpp#327,342 which calls
https://searchfox.org/mozilla-central/rev/ad7ecfa618ec3a65db8405d9f1125059fe4a6a15/dom/media/BaseMediaResource.cpp#143

That is done, I think, after the foreground counter has been decreased already in
https://searchfox.org/mozilla-central/source/netwerk/base/nsLoadGroup.cpp#602,614,624
but before the request is removed from the hashtable. So when the code in BaseMediaResource calls RemoveRequest, it decreases the counter again. (the pernosco session isn't available atm, so can't verify this from there.)
Anyhow, this feels like something very unusual happening in the media loading code.

Component: DOM: Core & HTML → Audio/Video
Flags: needinfo?(hsivonen)
Group: dom-core-security → media-core-security
Blocks: media-triage
Flags: needinfo?(bvandyk)
No longer blocks: media-triage
Flags: needinfo?(kinetik)

I wonder if one issue is that BaseMediaResource::ModifyLoadFlags handles the load group removal/addition incorrectly here - if loadGroup is non-null but RemoveRequest fails, ModifyLoadFlags will go ahead and modify the channel's load flags while the channel is still in the load group. See also bug 910329 for a related instance of this.

It also appears to be unsafe to modify load group membership from OnStopRequest, which is violated by ChannelMediaResource here. In the Pernosco trace, I see a ChannelMediaResource::OnStopRequest call leading to the BaseMediaResource::ModifyLoadFlags failure case I mentioned above, where the channel's status and the result of RemoveRequest are both NS_BINDING_ABORTED. Solving that may be as simple as dispatching a new event to the main thread to call BaseMediaResource::ModifyLoadFlags.

Flags: needinfo?(kinetik)
Assignee: nobody → kinetik
Status: NEW → ASSIGNED

The patch in comment 11 addresses the ModifyLoadFlags error handling mentioned in the first paragraph. I'm not sure if the issue mentioned in the second paragraph also needs to be addressed to solve this.

The severity field is not set for this bug.
:jimm, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jmathies)
Severity: -- → S4
Flags: needinfo?(jmathies)
Priority: -- → P2

(In reply to Matthew Gregan [:kinetik] from comment #12)

The patch in comment 11 addresses the ModifyLoadFlags error handling mentioned in the first paragraph. I'm not sure if the issue mentioned in the second paragraph also needs to be addressed to solve this.

Probably worth a shot, but I'm not sure what to do to exercise this.

I shouldn't have used NS_ASSERTION for the case in ChannelMediaResource::OnStopRequest, since we expect that to fail in this situation. I'll push a small fix.

At least this proves this code path is triggered in CI, so we have some test coverage for the fix.

Flags: needinfo?(kinetik)
Group: media-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 100 Branch
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]

(In reply to Matthew Gregan [:kinetik] from comment #16)

I shouldn't have used NS_ASSERTION for the case in ChannelMediaResource::OnStopRequest, since we expect that to fail in this situation. I'll push a small fix.

This was also true for the assert added in BaseMediaResource::SetLoadInBackground, resulting in bug 1761865. I've changed that into a warning also.

Comment on attachment 9272573 [details]
Bug 1753298 - Improve error handling in BaseMediaResource::ModifyLoadFlags. r=padenot

Beta/Release Uplift Approval Request

  • User impact if declined: Browser crash when a media load fails, possibly with confusing/unrelated stacks due to the load group being corrupted - possible UAF.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Low risk - small change adding error handling to channel flag handling.
  • String changes made/needed: no
Attachment #9272573 - Attachment description: Bug 1753298 - Improve error handling in BaseMediaResource::ModifyLoadFlags. r=padenot → (esr91) Bug 1753298 - Improve error handling in BaseMediaResource::ModifyLoadFlags. r=padenot
Attachment #9272573 - Flags: approval-mozilla-beta?
Attachment #9272573 - Attachment description: (esr91) Bug 1753298 - Improve error handling in BaseMediaResource::ModifyLoadFlags. r=padenot → Bug 1753298 - Improve error handling in BaseMediaResource::ModifyLoadFlags. r=padenot

:kinetik The nominated patch failed to import. can you take a look?

Flags: needinfo?(kinetik)

Comment on attachment 9272573 [details]
Bug 1753298 - Improve error handling in BaseMediaResource::ModifyLoadFlags. r=padenot

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Simple/safe fix to address a crash and potential security issue.
  • User impact if declined: Browser crash when a media load fails, possibly with confusing/unrelated stacks due to the load group being corrupted - possible UAF.
  • Fix Landed on Version: 100
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Low risk - small change adding error handling to channel flag handling.
Flags: needinfo?(kinetik)
Attachment #9272573 - Flags: approval-mozilla-beta? → approval-mozilla-esr91?

(In reply to Dianna Smith [:diannaS] from comment #22)

:kinetik The nominated patch failed to import. can you take a look?

Apologies, I requested the wrong approval - this patch is for esr91, not beta.

Comment on attachment 9272573 [details]
Bug 1753298 - Improve error handling in BaseMediaResource::ModifyLoadFlags. r=padenot

Approved for 91.9esr

Attachment #9272573 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main100+r][adv-esr1.9100+r]
Whiteboard: [post-critsmash-triage][adv-main100+r][adv-esr1.9100+r] → [post-critsmash-triage][adv-main100+r][adv-esr91.9+r]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: