Closed Bug 1715755 Opened 3 years ago Closed 3 years ago

Parent crash [@ headerT<IPC::Message::Header>] after receiving malformed message from child

Tracking

()

Status:

RESOLVED FIXED

Milestone:

93 Branch

Tracking Flags:

Tracking

Status

firefox-esr78

---

wontfix

firefox-esr91

93+

fixed

firefox91

---

wontfix

firefox92

---

wontfix

firefox93

fixed

People

(Reporter: decoder, Assigned: nika, NeedInfo)

Details

(Keywords: sec-audit, sec-want, Whiteboard: [adv-main93+r][adv-esr91.2+r])

Crash Data

Attachments

(4 files)

test.zip 3 years ago Christian Holler (:decoder) 1.82 MB, application/zip		Details
Bug 1715755 - Part 1: Don't ignore dead actors for sync and intr messages, r=handyman 3 years ago Nika Layzell [:nika] (ni? for response) 48 bytes, text/x-phabricator-request	pascalc : approval-mozilla-esr91+	Details \| Review
Bug 1715755 - Part 2: Recover from unhandled special messages without crashing, r=handyman 3 years ago Nika Layzell [:nika] (ni? for response) 48 bytes, text/x-phabricator-request	pascalc : approval-mozilla-esr91+	Details \| Review
Bug 1715755 - Part 3: Don't defer messages if the interrupt stack is empty, r=handyman 3 years ago Nika Layzell [:nika] (ni? for response) 48 bytes, text/x-phabricator-request	pascalc : approval-mozilla-esr91+	Details \| Review

Multiple Authors

Description

•

3 years ago

•

Edited

Attached file test.zip — Details

Researchers from RUB have developed a VM snapshot based fuzzing approach and got a proof-of-concept to work with Firefox. They found 3 crashes pretty quickly (in all cases, this is the child sending something to the parent and the parent crashes as a result).

It would be incredibly useful for us if someone from the IPC team could assess these crashes in terms of validity. Afterwards, we can discuss how to move this project forward during the course of H2 and how we can best help the IPC team with this new tool.

Reproducing requires little effort as the researchers provided a small shim to LD_PRELOAD that can reproduce the bug even with our existing fuzzing builds (fuzzing flag is required to disable IPC sentinels). I looked at the first crash and was able to reproduce it on fuzzing builds up to 2021-05-20, then it stopped reproducing (which could simply be because something in the ipdl changed, or maybe we fixed the bug).

In order to reproduce, the following steps should work:

Unpack test.zip as attached and change into the resulting test directory.
Compile the shim to preload

./compile.sh

Fetch the right Firefox build (e.g. with fuzzfetch) from TC:

python3 -mfuzzfetch --asan --fuzzing --build 2021-05-20 -n firefox

Reproduce the bug

./run1.sh

Logs for each run are also included in the zip file. We are also looking into providing Pernosco sessions for these crashes.

Please keep this bug confidential, it contains code/approaches not yet published by the authors and we don't want to draw attention to our new IPC fuzzing attempts.

Flags: needinfo?(nika)

Christian Holler (:decoder)

Reporter

Updated

•

3 years ago

Crash Signature: [@ headerT<IPC::Message::Header>][IPC::Message::set_seqno] → [@ headerT<IPC::Message::Header>][@ IPC::Message::set_seqno]

Christian Holler (:decoder)

Reporter

Updated

•

3 years ago

Flags: needinfo?(davidp99)

Daniel Veditz [:dveditz]

Updated

•

3 years ago

Group: core-security → dom-core-security

Christian Holler (:decoder)

Reporter

Comment 1

•

3 years ago

Needinfo for a Pernosco session for bug #1 (and if possible bug #3. Bug #2 apparently is an intended MOZ_CRASH). Thanks!

Flags: needinfo?(twsmith)

Tyson Smith [:tsmith]

Comment 2

•

3 years ago

The Pernosco sessions are available:

Unfortunately the build that was used (2021-05-20) does not have a corresponding no-opt build so I'm not sure how useful these will be. Bug 1681209 was fixed a day or two after (so close!). Anything newer we can replay with a fuzzing no-opt build and it will work much better with rr/Pernosco.

Can the test cases can be updated to work with a newer build?

Flags: needinfo?(twsmith)

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 3

•

3 years ago

I finally found some time to look into these issues, and figure out what was going on. I ended up decoding the payloads from the reproducer.c file into the relevant message headers to do this. It seems like issues 1 and 3 are bugs, and issue 2 is an explicit crash which we should relax for fuzzing purposes. These bugs were not caught by previous fuzzing attempts, as they are exercising more of MessageChannel, which was previously bypassed by FuzzProtocol, as it directly called OnMessageReceived on the underlying protocol (https://searchfox.org/mozilla-central/rev/a9851cca236e03d088d7528dd27445080475a68e/tools/fuzzing/ipc/ProtocolFuzzer.h#95-100). I've put my explorations below:

Bug1

.payload_size = 0x4
.routing = 0x54
.type = 0x6d003f
.flags = 0x1d (SYNC_BIT | MEDIUMHIGH_PRIORITY | NOT_NESTED)
.num_fds = 0x0
.txid = 0xffffffff
.interrupt_local_stack_depth = 0xffffffff
.seqno = 0x0
[0, 0, 0, 0]

This case appears to be caused by an issue with sync IPC which I added on accident 2 years ago in bug 1547085. In that bug, I changed behaviour so that if an IPC message is received for an unknown actor, it discards the message rather than causing a crash. Unfortunately, it seems that I also made this behaviour apply to sync IPC messages. With sync IPC there is an aReply outparameter from the OnMessageReceived callback (https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2117), which is expected to always be initialized with a reply message if OnMessageReceived() returns a success value (https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2130-2132). This message has the SYNC_BIT flag set, and the target routing ID of 0x54 does not exist. This means that the aReply value is never filled, and the attempt to set_seqno on it (https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2133) will hit a nullptr crash.

We should probably make the check a bit more resillient to getting a nullptr aReply, as well as removing the dropped message trick for sync IPC, as it only really makes sense for async IPC.

Bug2

.payload_size = 0x4
.routing = 0x80000000 (MSG_ROUTING_NONE)
.type = 0x180005
.flags = 0x2 (REPLY_BIT)
.num_fds = 0x0
.txid = 0xffffffff
.interrupt_local_stack_depth = 0xffffffff
.seqno = 0x0
[0, 0, 0, 0]

This is an explicit crash happening on this line: https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2143. It's happening because the given routing ID of -1 is MSG_ROUTING_NONE, and that routing ID is always expected to be used with special messages, but the type ID of 0x180005 is not a recognized special message. We should adjust MessageChannel to not hard crash, and instead treat it as a normal IPC message error.

Bug3

.payload_size = 0x4
.routing = 0x54
.type = 0x3a0018
.flags = 0x80 (INTERRUPT_BIT)
.num_fds = 0x0
.interrupt_remote_stack_depth_guess = 0xffffffff (-1)
.interrupt_local_stack_depth = 0xffffffff (-1)
.seqno = 0x0
[0, 0, 0, 0]

This is an interrupt message, due to the INTERRUPT_BIT flag, and it appears to be failing due to a bug in how we handle interrupt races. The interrupt_remote_stack_depth_guess field is -1, while our value for RemoteViewOfStackDepth is 0, so the equality check here (https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2208-2214) fails, and we think we've detected a race. We then incorrectly assume that this check failing means that mInterruptStack is non-empty, and try to read an element from it here: https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2221. This ends up crashing with a high address somewhere in the std::stack::top() implementation, as we violated its precondition. We should probably fix this by making sure to check if mInterruptStack is empty, and recovering in some manner.

Flags: needinfo?(nika)

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 4

•

3 years ago

Attached file Bug 1715755 - Part 1: Don't ignore dead actors for sync and intr messages, r=handyman — Details

When this change was first implemented, it ignored dead actors for all types of
messages, but for messages with replies they cannot be ignored, as a reply must
be sent. This should fix that oversight.

Phabricator Automation

Updated

•

3 years ago

Assignee: nobody → nika

Status: NEW → ASSIGNED

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 5

•

3 years ago

Attached file Bug 1715755 - Part 2: Recover from unhandled special messages without crashing, r=handyman — Details

This change instead recovers from unhandled special messages by reporting a
normal IPC error, which should be handled using the normal IPC error
mechanisms.

Depends on D123148

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 6

•

3 years ago

Attached file Bug 1715755 - Part 3: Don't defer messages if the interrupt stack is empty, r=handyman — Details

Previously this would misbehave as it couldn't get the interrupt stack.

Depends on D123149

Ryan VanderMeulen [:RyanVM]

Comment 7

•

3 years ago

https://hg.mozilla.org/mozilla-central/rev/2175b7a03b09
https://hg.mozilla.org/mozilla-central/rev/e78e7cd3067c
https://hg.mozilla.org/mozilla-central/rev/6c984a259bdc

Group: dom-core-security → core-security-release

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox91: affected → wontfix

status-firefox92: --- → wontfix

status-firefox93: --- → fixed

status-firefox-esr78: --- → wontfix

status-firefox-esr91: --- → affected

tracking-firefox93: --- → +

tracking-firefox-esr91: --- → 93+

Resolution: --- → FIXED

Target Milestone: --- → 93 Branch

Mihai Boldan, Desktop QA [:mboldan]

Updated

•

3 years ago

Flags: qe-verify-

Ryan VanderMeulen [:RyanVM]

Comment 8

•

3 years ago

Please nominate this for ESR91 approval when you get a chance.

Ryan VanderMeulen [:RyanVM]

Updated

•

3 years ago

Flags: needinfo?(nika)

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 9

•

3 years ago

Comment on attachment 9237099 [details]
Bug 1715755 - Part 1: Don't ignore dead actors for sync and intr messages, r=handyman

ESR Uplift Approval Request

If this is not a sec:{high,crit} bug, please state case for ESR consideration: Security bug with low risk
User impact if declined: Potential security issues discovered by fuzzing unfixed on esr
Fix Landed on Version: 93
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): Small changes well-tested on Nightly
String or UUID changes made by this patch: None

Flags: needinfo?(nika)

Attachment #9237099 - Flags: approval-mozilla-esr91?

Nika Layzell [:nika] (ni? for response)

Assignee

Updated

•

3 years ago

Attachment #9237100 - Flags: approval-mozilla-esr91?

Attachment #9237101 - Flags: approval-mozilla-esr91?

Pascal Chevrel:pascalc

Comment 10

•

3 years ago

Comment on attachment 9237099 [details]
Bug 1715755 - Part 1: Don't ignore dead actors for sync and intr messages, r=handyman

Approved for ESR91.2, thanks

Attachment #9237099 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+

Pascal Chevrel:pascalc

Updated

•

3 years ago

Attachment #9237100 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+

Pascal Chevrel:pascalc

Updated

•

3 years ago

Attachment #9237101 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+

Pascal Chevrel:pascalc

Comment 11

•

3 years ago

uplift

https://hg.mozilla.org/releases/mozilla-esr91/rev/13d948df9e81
https://hg.mozilla.org/releases/mozilla-esr91/rev/c44b947e5236
https://hg.mozilla.org/releases/mozilla-esr91/rev/f8ef5376d115

status-firefox-esr91: affected → fixed

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Updated

•

3 years ago

Whiteboard: [adv-main93+r]

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Updated

•

3 years ago

Whiteboard: [adv-main93+r] → [adv-main93+r][adv-esr91.2+r]

Daniel Veditz [:dveditz]

Updated

•

2 years ago

Group: core-security-release

You need to log in before you can comment on or make changes to this bug.