Closed Bug 1715755 Opened 3 years ago Closed 3 years ago

Parent crash [@ headerT<IPC::Message::Header>] after receiving malformed message from child

Categories

(Core :: IPC, defect)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
93 Branch
Tracking Status
firefox-esr78 --- wontfix
firefox-esr91 93+ fixed
firefox91 --- wontfix
firefox92 --- wontfix
firefox93 + fixed

People

(Reporter: decoder, Assigned: nika, NeedInfo)

Details

(Keywords: sec-audit, sec-want, Whiteboard: [adv-main93+r][adv-esr91.2+r])

Crash Data

Attachments

(4 files)

Attached file test.zip

Researchers from RUB have developed a VM snapshot based fuzzing approach and got a proof-of-concept to work with Firefox. They found 3 crashes pretty quickly (in all cases, this is the child sending something to the parent and the parent crashes as a result).

It would be incredibly useful for us if someone from the IPC team could assess these crashes in terms of validity. Afterwards, we can discuss how to move this project forward during the course of H2 and how we can best help the IPC team with this new tool.

Reproducing requires little effort as the researchers provided a small shim to LD_PRELOAD that can reproduce the bug even with our existing fuzzing builds (fuzzing flag is required to disable IPC sentinels). I looked at the first crash and was able to reproduce it on fuzzing builds up to 2021-05-20, then it stopped reproducing (which could simply be because something in the ipdl changed, or maybe we fixed the bug).

In order to reproduce, the following steps should work:

  1. Unpack test.zip as attached and change into the resulting test directory.

  2. Compile the shim to preload

./compile.sh

  1. Fetch the right Firefox build (e.g. with fuzzfetch) from TC:

python3 -mfuzzfetch --asan --fuzzing --build 2021-05-20 -n firefox

  1. Reproduce the bug

./run1.sh

Logs for each run are also included in the zip file. We are also looking into providing Pernosco sessions for these crashes.

Please keep this bug confidential, it contains code/approaches not yet published by the authors and we don't want to draw attention to our new IPC fuzzing attempts.

Flags: needinfo?(nika)
Crash Signature: [@ headerT<IPC::Message::Header>][IPC::Message::set_seqno] → [@ headerT<IPC::Message::Header>][@ IPC::Message::set_seqno]
Flags: needinfo?(davidp99)
Group: core-security → dom-core-security

Needinfo for a Pernosco session for bug #1 (and if possible bug #3. Bug #2 apparently is an intended MOZ_CRASH). Thanks!

Flags: needinfo?(twsmith)

The Pernosco sessions are available:

Unfortunately the build that was used (2021-05-20) does not have a corresponding no-opt build so I'm not sure how useful these will be. Bug 1681209 was fixed a day or two after (so close!). Anything newer we can replay with a fuzzing no-opt build and it will work much better with rr/Pernosco.

Can the test cases can be updated to work with a newer build?

Flags: needinfo?(twsmith)

I finally found some time to look into these issues, and figure out what was going on. I ended up decoding the payloads from the reproducer.c file into the relevant message headers to do this. It seems like issues 1 and 3 are bugs, and issue 2 is an explicit crash which we should relax for fuzzing purposes. These bugs were not caught by previous fuzzing attempts, as they are exercising more of MessageChannel, which was previously bypassed by FuzzProtocol, as it directly called OnMessageReceived on the underlying protocol (https://searchfox.org/mozilla-central/rev/a9851cca236e03d088d7528dd27445080475a68e/tools/fuzzing/ipc/ProtocolFuzzer.h#95-100). I've put my explorations below:

Bug1

.payload_size = 0x4
.routing = 0x54
.type = 0x6d003f
.flags = 0x1d (SYNC_BIT | MEDIUMHIGH_PRIORITY | NOT_NESTED)
.num_fds = 0x0
.txid = 0xffffffff
.interrupt_local_stack_depth = 0xffffffff
.seqno = 0x0
[0, 0, 0, 0]

This case appears to be caused by an issue with sync IPC which I added on accident 2 years ago in bug 1547085. In that bug, I changed behaviour so that if an IPC message is received for an unknown actor, it discards the message rather than causing a crash. Unfortunately, it seems that I also made this behaviour apply to sync IPC messages. With sync IPC there is an aReply outparameter from the OnMessageReceived callback (https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2117), which is expected to always be initialized with a reply message if OnMessageReceived() returns a success value (https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2130-2132). This message has the SYNC_BIT flag set, and the target routing ID of 0x54 does not exist. This means that the aReply value is never filled, and the attempt to set_seqno on it (https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2133) will hit a nullptr crash.

We should probably make the check a bit more resillient to getting a nullptr aReply, as well as removing the dropped message trick for sync IPC, as it only really makes sense for async IPC.

Bug2

.payload_size = 0x4
.routing = 0x80000000 (MSG_ROUTING_NONE)
.type = 0x180005
.flags = 0x2 (REPLY_BIT)
.num_fds = 0x0
.txid = 0xffffffff
.interrupt_local_stack_depth = 0xffffffff
.seqno = 0x0
[0, 0, 0, 0]

This is an explicit crash happening on this line: https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2143. It's happening because the given routing ID of -1 is MSG_ROUTING_NONE, and that routing ID is always expected to be used with special messages, but the type ID of 0x180005 is not a recognized special message. We should adjust MessageChannel to not hard crash, and instead treat it as a normal IPC message error.

Bug3

.payload_size = 0x4
.routing = 0x54
.type = 0x3a0018
.flags = 0x80 (INTERRUPT_BIT)
.num_fds = 0x0
.interrupt_remote_stack_depth_guess = 0xffffffff (-1)
.interrupt_local_stack_depth = 0xffffffff (-1)
.seqno = 0x0
[0, 0, 0, 0]

This is an interrupt message, due to the INTERRUPT_BIT flag, and it appears to be failing due to a bug in how we handle interrupt races. The interrupt_remote_stack_depth_guess field is -1, while our value for RemoteViewOfStackDepth is 0, so the equality check here (https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2208-2214) fails, and we think we've detected a race. We then incorrectly assume that this check failing means that mInterruptStack is non-empty, and try to read an element from it here: https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2221. This ends up crashing with a high address somewhere in the std::stack::top() implementation, as we violated its precondition. We should probably fix this by making sure to check if mInterruptStack is empty, and recovering in some manner.

Flags: needinfo?(nika)

When this change was first implemented, it ignored dead actors for all types of
messages, but for messages with replies they cannot be ignored, as a reply must
be sent. This should fix that oversight.

Assignee: nobody → nika
Status: NEW → ASSIGNED

This change instead recovers from unhandled special messages by reporting a
normal IPC error, which should be handled using the normal IPC error
mechanisms.

Depends on D123148

Previously this would misbehave as it couldn't get the interrupt stack.

Depends on D123149

Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 93 Branch
Flags: qe-verify-

Please nominate this for ESR91 approval when you get a chance.

Flags: needinfo?(nika)

Comment on attachment 9237099 [details]
Bug 1715755 - Part 1: Don't ignore dead actors for sync and intr messages, r=handyman

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Security bug with low risk
  • User impact if declined: Potential security issues discovered by fuzzing unfixed on esr
  • Fix Landed on Version: 93
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Small changes well-tested on Nightly
  • String or UUID changes made by this patch: None
Flags: needinfo?(nika)
Attachment #9237099 - Flags: approval-mozilla-esr91?
Attachment #9237100 - Flags: approval-mozilla-esr91?
Attachment #9237101 - Flags: approval-mozilla-esr91?

Comment on attachment 9237099 [details]
Bug 1715755 - Part 1: Don't ignore dead actors for sync and intr messages, r=handyman

Approved for ESR91.2, thanks

Attachment #9237099 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+
Attachment #9237100 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+
Attachment #9237101 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+
Whiteboard: [adv-main93+r]
Whiteboard: [adv-main93+r] → [adv-main93+r][adv-esr91.2+r]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: