Closed Bug 1857607 Opened 2 years ago Closed 1 year ago

PlainOldDataSerializer is a footgun for undefined behavior and data leaks

Categories

(Core :: IPC, defect, P1)

defect

Tracking

()

RESOLVED FIXED
130 Branch
Tracking Status
firefox-esr115 --- wontfix
firefox-esr128 130+ fixed
firefox128 --- wontfix
firefox129 --- wontfix
firefox130 + fixed

People

(Reporter: truber, Assigned: jgilbert)

References

Details

(6 keywords, Whiteboard: [fuzzblocker][adv-main130+r][adv-esr128.2+r])

Attachments

(2 files, 2 obsolete files)

In experimental IPC fuzzing, we found the following crash on mozilla-central revision 20231004-c559095402a2 (fuzzing-asan-nyx-opt build):

/dom/canvas/WebGLContextGL.cpp:1407:30: runtime error: load of value 149, which is not a valid value for type 'bool'
    #0 0x7fffdfaf0a9a in mozilla::WebGLContext::CreateOpaqueFramebuffer(mozilla::webgl::OpaqueFramebufferOptions const&) /dom/canvas/WebGLContextGL.cpp:1407:30
    #1 0x7fffdf82377d in mozilla::HostWebGLContext::CreateOpaqueFramebuffer(unsigned long, mozilla::webgl::OpaqueFramebufferOptions const&) /dom/canvas/HostWebGLContext.cpp:127:20
    #2 0x7fffdfb5564e in mozilla::dom::WebGLParent::RecvCreateOpaqueFramebuffer(unsigned long, mozilla::webgl::OpaqueFramebufferOptions const&, bool*) /dom/canvas/WebGLParent.cpp:233:17
    #3 0x7fffdfcc9df3 in mozilla::dom::PWebGLParent::OnMessageReceived(IPC::Message const&, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message>>&) /builds/worker/workspace/obj-build/ipc/ipdl/PWebGLParent.cpp:844:79
    #4 0x7fffdb713197 in mozilla::gfx::PCanvasManagerParent::OnMessageReceived(IPC::Message const&, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message>>&) /builds/worker/workspace/obj-build/ipc/ipdl/PCanvasManagerParent.cpp:485:32
    #5 0x7fffd9adf0ee in mozilla::ipc::MessageChannel::DispatchSyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message>>&) /ipc/glue/MessageChannel.cpp:1767:25
    #6 0x7fffd9ada928 in mozilla::ipc::MessageChannel::DispatchMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message>>) /ipc/glue/MessageChannel.cpp:1723:9
    #7 0x7fffd9adc0bd in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::ipc::MessageChannel::MessageTask&) /ipc/glue/MessageChannel.cpp:1525:3
    #8 0x7fffd9addc44 in mozilla::ipc::MessageChannel::MessageTask::Run() /ipc/glue/MessageChannel.cpp:1623:14
    #9 0x7fffd7777f40 in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1192:16
    #10 0x7fffd7789d76 in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:480:10
    #11 0x7fffd9aebc64 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:330:5
    #12 0x7fffd989413f in MessageLoop::RunInternal() /ipc/chromium/src/base/message_loop.cc:370:10
    #13 0x7fffd989413f in MessageLoop::RunHandler() /ipc/chromium/src/base/message_loop.cc:363:3
    #14 0x7fffd989413f in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:345:3
    #15 0x7fffd776c951 in nsThread::ThreadFunc(void*) /xpcom/threads/nsThread.cpp:370:10
    #16 0x7ffff7f5d5dd in _pt_root /nsprpub/pr/src/pthreads/ptthread.c:201:5
    #17 0x5555556bf41a in asan_thread_start(void*) _asan_rtl_:31
    #18 0x7ffff7f83608 in ?? ??:0
    #19 0x7fffd0a71292 in getxattr ??:0:0

The attached testcase can be reproduced using a special build to inject IPC messages (--enable-snapshot-fuzzing --enable-undefined-sanitizer, then set environment variables MOZ_FUZZ_TESTFILE=test.bin NYX_FUZZER="IPC_Generic" when launching).

Attached file Testcase โ€”

This looks like an UBSan error. It is happening on this line:

uint32_t samples = options.antialias ? StaticPrefs::webgl_msaa_samples() : 0;

I'm not really sure what is going on at a glance.

Group: core-security → gfx-core-security
Component: DOM: Content Processes → Graphics: CanvasWebGL

My guess is that what is going on here is that OpaqueFramebufferOptions uses PlainOldDataSerializer and is defined like this:

struct OpaqueFramebufferOptions final {
  bool depthStencil = true;
  bool antialias = true;
  uint32_t width = 0;
  uint32_t height = 0;
};

The POD serializer just pulls out raw data. Presumably due to alignment or whatever there's some extra padding in there, so when we read the antialias field it reads out a full byte and can read random extra junk that a hostile sender can stick in there. I don't think this is a security issue though.

Makes sense. I think that it is difficult to exploit unexpected bits in the padding of booleans although this stackoverflow question is a good illustration of an optimizing compiler relying on a boolean's padding bits to be zero and doing some arithmetic using the bits instead of a branch. If we are unlucky and the arithmetic is used for sizing buffers or computing unchecked offsets it could be turned into out of bounds accesses.

We could go through all serializable structures and replace booleans with something that sanitizes all of the bits.

I'm surprised that fuzzing at the IPC level hasn't run into this type of things before (or has it?).

(In reply to Nicolas Silva [:nical] from comment #5)

I'm surprised that fuzzing at the IPC level hasn't run into this type of things before (or has it?).

I suspect that Jesse has just started running the IPC fuzzer using UBSan, as I don't see any other bugs blocking the IPC fuzzing meta bug (bug 1723912) that have UndefinedBehaviorSanitizer in the summary. Only UBSan would notice an issue like this. And yeah, I'd expect this is an issue for any PlainOldDataSerializer thing with booleans.

There are "only" around 30 subclasses of PlainOldDataSerializer, so it wouldn't be too hard to audit, though really we'd want some kind of static analysis (or is there a way to detect this with template magic?) if we decide to ban this. I'll see what Nika thinks after she gets back from holiday.

I guess bool fields for PlainOldDataSerializer classes also leaks a few bits of memory to the other process, so that's not ideal either. jemalloc poisoning will help, in at least some cases.

This feels like a general IPC problem.

asuth mentioned has_unique_object_representations which at a glance looks like it would fit the bill. I tried adding this as a requirement for PlainOldDataSerializer and I got compilation errors for

  • mozilla::wr::FontInstanceOptions: the render_mode field is an enum with 4 values in a byte.
  • mozilla::webgl::InitContextDesc, mozilla::webgl::OpaqueFramebufferOptions, mozilla::webgl::Limits: bools as fields
  • mozilla::wr::LayoutSize, mozilla::wr::LayoutRect, mozilla::wr::LayoutPoint: not sure what the issue is, but I can't find the definition of LayoutPixel which these include.

LayoutPixel is a phantom type that doesn't get represented so that can't be the problem.

LayoutSize is just a pair of floats, LayoutRect is a pair of pair of floats, and LayoutPoint is a pair of floats, so I'm not sure why those are failing.

Ah, right, float has multiple representations of some values, so that's why it fails. So this isn't quite the ideal "no bits of padding" check we want.

Group: gfx-core-security → dom-core-security
Component: Graphics: CanvasWebGL → IPC
OS: Linux → All
Hardware: x86_64 → All

This is csectype-undefined because a process receiving a message from a compromised process could potentially end up with bad undefined behavior depending on compiler optimizations (see an example of what this could be in comment 5). It is also csectype-sandbox escape because that vector would also be a sandbox escape.

I was worried about a compromised process receiving one of these data structures and getting uninitialized memory, but we're talking about sub-byte amounts of memory, so I'm guessing the compiler will just initialize the entire byte because that's more efficient than somehow writing out a single byte...

(In reply to Andrew McCreight [:mccr8] from comment #11

I was worried about a compromised process receiving one of these data structures and getting uninitialized memory, but we're talking about sub-byte amounts of memory, so I'm guessing the compiler will just initialize the entire byte because that's more efficient than somehow writing out a single byte...

The UBSan errors we saw in fuzzing actually vary in the loaded value. So at least this is not constant or null-initialized and highly likely real uninitialized memory.

Yeah, I missed that OpaqueFramebufferOptions has full bytes of padding (Nika mentioned that to me), so it makes sense there could be uninitialized memory here.

Nika is familiar with this issue and had some thoughts, so I'll leave a needinfo for her to put more information in here when she has a chance.

Flags: needinfo?(nika)
Whiteboard: [fuzzblocker]

It looks like Nika hasn't had a chance to reply yet, so I'll try to write down what I remember from what she said when we discussed this issue, plus a few older issues I found. I looked for bugs with PlainOldDataSerializer in the first comment and I found a few previously fixed security issues involving the use of PlainOldDataSerializer (found by Valgrind): bug 1613009, bug 1620719, and bug 1773363.

The basic problem is that PlainOldDataSerializer is not sound, as we've seen here. In bug 1778808, Kelsey wrote "is_trivially_copyable does not mean deserializing with memcpy is safe. In particular PlainOldDataSerializer should be removed". In that bug, Kelsey introduced a new mechanism "TiedFields" which I think is akin to the has_unique_object_representations I used above, except that it is actually sound.

Fundamentally, I think somebody needs to finish what Kelsey started and get rid of PlainOldDataSerializer entirely. In the last few years, we've fixed at least 4 uses of PlainOldDataSerializer (the 3 bugs I linked in the first paragraph in this comment), and my hacky investigation in comment 4 found four more problematic looking uses (FontInstanceOptions, InitContextDesc, OpaqueFramebufferOptions, and Limits), and there may be more. We've marked all of these sec-moderate, but they are all potential sandbox escapes, so they are probably more dangerous than the average sec-moderate.

I see at least 32 different uses of PlainOldDataSerializer, and all but 2 are in graphics code. My understanding is that PlainOldDataSerializer was added because graphics people didn't want to deal with the weird IPDL struct cruft, so while the PlainOldDataSerializer code lives in the ipc/ directory, I think this is something that should be addressed by graphics people.

Bob, could you find somebody to work on this? Thank you.

Group: dom-core-security → gfx-core-security
Component: IPC → Graphics
Flags: needinfo?(bhood)
See Also: → 1778808
Component: Graphics → Graphics: CanvasWebGL
Flags: needinfo?(bhood)
Assignee: nobody → jgilbert
Severity: -- → S3
Priority: -- → P1

This bug prevents fuzzing from making progress; however, it has low severity. It is important for fuzz blocker bugs to be addressed in a timely manner (see here why?).
:jgilbert, could you consider increasing the severity?

For more information, please visit BugBot documentation.

Flags: needinfo?(jgilbert)
See Also: → 1613009, 1620719, 1773363

Actually this would be a great bug for me to guide someone to build knowledge of this important part of robust (webgl) serialization.

Flags: needinfo?(jgilbert)

I'm going to increase the severity of this from sec-moderate to sec-high, and make the summary about the more general issue. While each individual instance of this that we've seen doesn't look too severe, there are currently 32 uses of PlainOldDataSerializer in the tree so it is hard to say what exactly could be done across all of those classes. PlainOldDataSerializer is a fundamentally dangerous approach to IPC serialization that we should remove.

On a more positive note, I looked through all of the instances of "public PlainOldDataSerializer" in the tree, and none have been added since this bug was filed 5 months ago, so this is at least not an expanding problem.

Keywords: sec-moderatesec-high
Summary: UndefinedBehaviorSanitizer: /dom/canvas/WebGLContextGL.cpp:1407:30: runtime error: load of value 149, which is not a valid value for type 'bool' → Undefined behavior and data leaks with PlainOldDataSerializer

The severity field for this bug is set to S3. However, the bug is flagged with the sec-high keyword.
:jgilbert, could you consider increasing the severity of this security bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(jgilbert)
Flags: needinfo?(jgilbert)
Summary: Undefined behavior and data leaks with PlainOldDataSerializer → PlainOldDataSerializer is a footgun for undefined behavior and data leaks

In that case, this is no longer a WebGL issue, but rather an IPC issue.

Component: Graphics: CanvasWebGL → IPC

This entire facility only exists because graphics people want it.

Thank you for writing a patch to remove uses of it.

We have the TiedFields stuff I wrote, which is about as robust as possible, and easy to use so switching to it should not be bad.

Manually writing ParamTrait serialization is inherently unsafe because there's no warning when you add a new field but not its serialization.

Flags: needinfo?(nika)

(In reply to Andrew McCreight [:mccr8] from comment #8)

This feels like a general IPC problem.

asuth mentioned has_unique_object_representations which at a glance looks like it would fit the bill. I tried adding this as a requirement for PlainOldDataSerializer and I got compilation errors for

  • mozilla::wr::FontInstanceOptions: the render_mode field is an enum with 4 values in a byte.
  • mozilla::webgl::InitContextDesc, mozilla::webgl::OpaqueFramebufferOptions, mozilla::webgl::Limits: bools as fields
  • mozilla::wr::LayoutSize, mozilla::wr::LayoutRect, mozilla::wr::LayoutPoint: not sure what the issue is, but I can't find the definition of LayoutPixel which these include.

For completeness I will mention that I did go on a quest to try to figure out a better solution, including has_unique_object_representations. Unfortunately, even has_unique_object_representations isn't fit for our usage. We needed something new if we were going to improve on manually writing serializers.

In WebGL, I added these:

  • [Queue]ParamTraits_TiedFields
    • Robust due to static_assert(mozilla::AssertTiedFieldsAreExhaustive<T>());.
  • QueueParamTraits_IsEnumCase
    • Robust due to -Werror=switch in dom/canvas/*.

These make serialization of objects and enums robust to even additions of fields and enum cases, and are easy to use.

Depends on: 1885028
Attachment #9390877 - Attachment is obsolete: true
Attachment #9390009 - Attachment is obsolete: true

I'm going to be making public bugs for these, because it's plausible that we want to switch for public reasons, and we're not looking to backport this.

Switching to TiedFields-based serialization for webrender cbindgen types is somewhat more involved than just fixing webgl, so I'm fixing webgl first.

Depends on: 1885245

In addition to the WebRender uses of PlainOldDataSerializer mentioned by Kelsey, there are also a few uses in WebVR. I tried looking at fixing those, but those data types use a large number of fixed-length C-style arrays, eg VRControllerState mControllerState[kVRControllerMaxCount]. In total, I see at least 16 different arrays across VRDisplayInfo, VRDisplayState, VRHMDSensorState, and VRPose (all of those types are contained within VRDisplayInfo). Trying to get an lvalue of one of these things for std::tie seems to drop the length, so you end up with something useless. Looking at the other instances of the TiedFields stuff, it seems like the right way to do this would be to convert things to std::array, eg std::array<VRControllerState, kVRControllerMaxCount> mControllerState. I converted a couple of these fields before I realized how many there were. It doesn't seem like a great use of time to convert all of them, honestly. That time would probably be better spent making sure WebVR IPC is well and truly disabled (last time I looked, at least some of it was still possibly active).

I was also vaguely wondering if there was some way to "inline" the array using template magic, so that say int32_t mFoo[3] could be turned into mFoo[0], mFoo[1], mFoo[2] in the MutTiedFields definition but I'm not sure if that's actually possible.

Bug 1885245 has now landed, so this is fixed on Nightly. I'm not sure how reasonable this would be to backport. At least the VR code hasn't changed much but I don't know about the rest of it.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED

The dependent bugs would need a bit of rebasing even for ESR128. Given comment 18, it sounds like the existing uses weren't particularly dangerous - it was more a concern about being a potential footgun down the road? Seems like a wontfix on backports would make sense if we're not concerned about any of the existing usage. Seems like comment 27 was leaning that way also.

Group: gfx-core-security → core-security-release
Flags: needinfo?(continuation)
Target Milestone: --- → 130 Branch

There's definitely existing problems. Kelsey's patch added padding fields in a number of places, which is where we might be leaking parent process memory (depending on what direction the message goes), and there are many enums that are a danger in the other direction, though it did look like at least some of the code already had checks for those. Each individual thing isn't that bad, so it is hard to say whether it is worth the backporting. I'm fine with wontfixing it.

Flags: needinfo?(continuation)

[Tracking Requested - why for this release]: Because ESR128 will be with us a while, we should backport this (and its dependencies) to ESR128.

ESR115 and Beta129 will go away soon, so I don't think it's worth it to try to backport this prophylactically.
I think we should look at this as an investment in making sure that backports to esr128 over its lifetime won't need to be rewritten due to not having backported 1885245. (since that adds both engineering cost and some risk to the backporting process)

QA Whiteboard: [post-critsmash-triage]
Flags: qe-verify-

We uplifted bug 1909019 to ESR128, but it looks like we need at least bug 1909018 and bug 1885245 also (both of which graft cleanly)? Any others?

Flags: needinfo?(jgilbert)

Nope those are it!

Flags: needinfo?(jgilbert)

All dependencies have been uplifted for 128.2esr now.

Whiteboard: [fuzzblocker] → [fuzzblocker][adv-main130+r][adv-esr128.2+r]

Thanks all!

Thank you, Kelsey, for driving this large and annoying project to completion!

Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: