Parent crash [@ headerT<IPC::Message::Header>] after receiving malformed message from child
Categories
(Core :: IPC, defect)
Tracking
()
People
(Reporter: decoder, Assigned: nika, NeedInfo)
Details
(Keywords: sec-audit, sec-want, Whiteboard: [adv-main93+r][adv-esr91.2+r])
Crash Data
Attachments
(4 files)
1.82 MB,
application/zip
|
Details | |
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-esr91+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-esr91+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-esr91+
|
Details | Review |
Researchers from RUB have developed a VM snapshot based fuzzing approach and got a proof-of-concept to work with Firefox. They found 3 crashes pretty quickly (in all cases, this is the child sending something to the parent and the parent crashes as a result).
It would be incredibly useful for us if someone from the IPC team could assess these crashes in terms of validity. Afterwards, we can discuss how to move this project forward during the course of H2 and how we can best help the IPC team with this new tool.
Reproducing requires little effort as the researchers provided a small shim to LD_PRELOAD
that can reproduce the bug even with our existing fuzzing builds (fuzzing flag is required to disable IPC sentinels). I looked at the first crash and was able to reproduce it on fuzzing builds up to 2021-05-20, then it stopped reproducing (which could simply be because something in the ipdl changed, or maybe we fixed the bug).
In order to reproduce, the following steps should work:
-
Unpack test.zip as attached and change into the resulting test directory.
-
Compile the shim to preload
./compile.sh
- Fetch the right Firefox build (e.g. with fuzzfetch) from TC:
python3 -mfuzzfetch --asan --fuzzing --build 2021-05-20 -n firefox
- Reproduce the bug
./run1.sh
Logs for each run are also included in the zip file. We are also looking into providing Pernosco sessions for these crashes.
Please keep this bug confidential, it contains code/approaches not yet published by the authors and we don't want to draw attention to our new IPC fuzzing attempts.
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Updated•3 years ago
|
Reporter | ||
Comment 1•3 years ago
|
||
Needinfo for a Pernosco session for bug #1 (and if possible bug #3. Bug #2 apparently is an intended MOZ_CRASH). Thanks!
Comment 2•3 years ago
|
||
The Pernosco sessions are available:
- 1 - https://pernos.co/debug/nbhsWNjzGuuH1HOMAkpCdA/index.html
- 3 - https://pernos.co/debug/hXs5gncFInC7VZkAV_rb2w/index.html
Unfortunately the build that was used (2021-05-20) does not have a corresponding no-opt build so I'm not sure how useful these will be. Bug 1681209 was fixed a day or two after (so close!). Anything newer we can replay with a fuzzing no-opt build and it will work much better with rr/Pernosco.
Can the test cases can be updated to work with a newer build?
Assignee | ||
Comment 3•3 years ago
|
||
I finally found some time to look into these issues, and figure out what was going on. I ended up decoding the payloads from the reproducer.c
file into the relevant message headers to do this. It seems like issues 1 and 3 are bugs, and issue 2 is an explicit crash which we should relax for fuzzing purposes. These bugs were not caught by previous fuzzing attempts, as they are exercising more of MessageChannel
, which was previously bypassed by FuzzProtocol
, as it directly called OnMessageReceived
on the underlying protocol (https://searchfox.org/mozilla-central/rev/a9851cca236e03d088d7528dd27445080475a68e/tools/fuzzing/ipc/ProtocolFuzzer.h#95-100). I've put my explorations below:
Bug1
.payload_size = 0x4
.routing = 0x54
.type = 0x6d003f
.flags = 0x1d (SYNC_BIT | MEDIUMHIGH_PRIORITY | NOT_NESTED)
.num_fds = 0x0
.txid = 0xffffffff
.interrupt_local_stack_depth = 0xffffffff
.seqno = 0x0
[0, 0, 0, 0]
This case appears to be caused by an issue with sync IPC which I added on accident 2 years ago in bug 1547085. In that bug, I changed behaviour so that if an IPC message is received for an unknown actor, it discards the message rather than causing a crash. Unfortunately, it seems that I also made this behaviour apply to sync IPC messages. With sync IPC there is an aReply
outparameter from the OnMessageReceived
callback (https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2117), which is expected to always be initialized with a reply message if OnMessageReceived()
returns a success value (https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2130-2132). This message has the SYNC_BIT
flag set, and the target routing ID of 0x54 does not exist. This means that the aReply
value is never filled, and the attempt to set_seqno
on it (https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2133) will hit a nullptr crash.
We should probably make the check a bit more resillient to getting a nullptr aReply
, as well as removing the dropped message trick for sync IPC, as it only really makes sense for async IPC.
Bug2
.payload_size = 0x4
.routing = 0x80000000 (MSG_ROUTING_NONE)
.type = 0x180005
.flags = 0x2 (REPLY_BIT)
.num_fds = 0x0
.txid = 0xffffffff
.interrupt_local_stack_depth = 0xffffffff
.seqno = 0x0
[0, 0, 0, 0]
This is an explicit crash happening on this line: https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2143. It's happening because the given routing ID of -1 is MSG_ROUTING_NONE
, and that routing ID is always expected to be used with special messages, but the type ID of 0x180005 is not a recognized special message. We should adjust MessageChannel to not hard crash, and instead treat it as a normal IPC message error.
Bug3
.payload_size = 0x4
.routing = 0x54
.type = 0x3a0018
.flags = 0x80 (INTERRUPT_BIT)
.num_fds = 0x0
.interrupt_remote_stack_depth_guess = 0xffffffff (-1)
.interrupt_local_stack_depth = 0xffffffff (-1)
.seqno = 0x0
[0, 0, 0, 0]
This is an interrupt message, due to the INTERRUPT_BIT
flag, and it appears to be failing due to a bug in how we handle interrupt races. The interrupt_remote_stack_depth_guess
field is -1
, while our value for RemoteViewOfStackDepth
is 0
, so the equality check here (https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2208-2214) fails, and we think we've detected a race. We then incorrectly assume that this check failing means that mInterruptStack
is non-empty, and try to read an element from it here: https://searchfox.org/mozilla-central/rev/19f47e75ab79318caf3a8f23964137d96ae23c89/ipc/glue/MessageChannel.cpp#2221. This ends up crashing with a high address somewhere in the std::stack::top()
implementation, as we violated its precondition. We should probably fix this by making sure to check if mInterruptStack
is empty, and recovering in some manner.
Assignee | ||
Comment 4•3 years ago
|
||
When this change was first implemented, it ignored dead actors for all types of
messages, but for messages with replies they cannot be ignored, as a reply must
be sent. This should fix that oversight.
Updated•3 years ago
|
Assignee | ||
Comment 5•3 years ago
|
||
This change instead recovers from unhandled special messages by reporting a
normal IPC error, which should be handled using the normal IPC error
mechanisms.
Depends on D123148
Assignee | ||
Comment 6•3 years ago
|
||
Previously this would misbehave as it couldn't get the interrupt stack.
Depends on D123149
Comment 7•3 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/2175b7a03b09
https://hg.mozilla.org/mozilla-central/rev/e78e7cd3067c
https://hg.mozilla.org/mozilla-central/rev/6c984a259bdc
Updated•3 years ago
|
Comment 8•3 years ago
|
||
Please nominate this for ESR91 approval when you get a chance.
Updated•3 years ago
|
Assignee | ||
Comment 9•3 years ago
|
||
Comment on attachment 9237099 [details]
Bug 1715755 - Part 1: Don't ignore dead actors for sync and intr messages, r=handyman
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Security bug with low risk
- User impact if declined: Potential security issues discovered by fuzzing unfixed on esr
- Fix Landed on Version: 93
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Small changes well-tested on Nightly
- String or UUID changes made by this patch: None
Assignee | ||
Updated•3 years ago
|
Comment 10•3 years ago
|
||
Comment on attachment 9237099 [details]
Bug 1715755 - Part 1: Don't ignore dead actors for sync and intr messages, r=handyman
Approved for ESR91.2, thanks
Updated•3 years ago
|
Updated•3 years ago
|
Comment 11•3 years ago
|
||
uplift |
Updated•3 years ago
|
Updated•3 years ago
|
Updated•2 years ago
|
Description
•