Closed Bug 1762367 Opened 3 years ago Closed 3 years ago

Crash in [@ free | webrender::render_backend::RenderBackend::process_transaction]

Categories

(Core :: Graphics: WebRender, defect)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: gsvelto, Unassigned)

References

Details

(Keywords: crash, csectype-uaf, sec-high)

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/6c1ea7d1-0590-4c84-a3fc-c9b380211007

Reason: SIGSEGV / SEGV_ACCERR

Top 9 frames of crashing thread:

0 libpthread.so.0 __GI___pthread_mutex_lock 
1 firefox-bin free memory/build/malloc_decls.h:54
2 libxul.so webrender::render_backend::RenderBackend::process_transaction gfx/wr/webrender/src/render_backend.rs:1001
3 libxul.so webrender::render_backend::RenderBackend::process_api_msg gfx/wr/webrender/src/render_backend.rs:1194
4 libxul.so std::sys_common::backtrace::__rust_begin_short_backtrace library/std/src/sys_common/backtrace.rs:125
5 libxul.so core::ops::function::FnOnce::call_once{{vtable.shim}} library/core/src/ops/function.rs:227
6 libxul.so std::sys::unix::thread::Thread::new::thread_start library/std/src/sys/unix/thread.rs:74
7 libpthread.so.0 start_thread 
8 libc.so.6 __GI___clone 

This appears to be a double-free caught by PHC. The allocation stack was the following:

#0    malloc (firefox-bin)
#1    webrender::render_api::RenderApi::send_transaction (libxul.so)
#2    wr_api_send_transaction (libxul.so)
#3    mozilla::layers::WebRenderBridgeParent::MaybeGenerateFrame(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, bool) (libxul.so)
#4    mozilla::layers::WebRenderBridgeParent::CompositeToTarget(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::gfx::DrawTarget*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const*) (libxul.so)
#5    mozilla::layers::CompositorVsyncScheduler::Composite(mozilla::VsyncEvent const&) (libxul.so)
#6    mozilla::detail::RunnableMethodImpl<mozilla::layers::CompositorVsyncScheduler*, void (mozilla::layers::CompositorVsyncScheduler::*)(mozilla::VsyncEvent const&), true, (mozilla::RunnableKind)1, mozilla::VsyncEvent>::Run() (libxul.so)
#7    nsThread::ProcessNextEvent(bool, bool*) (libxul.so)
#8    mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) (libxul.so)
#9    MessageLoop::Run() (libxul.so)
#10    nsThread::ThreadFunc(void*) (libxul.so)
#11    _pt_root (libnspr4.so)
#12    start_thread (libpthread.so.0)
#13    __GI___clone (libc.so.6)
#14    ??? (firefox-bin)

And the free stack is this one:

#0    free (firefox-bin)
#1    webrender::render_backend::RenderBackend::process_transaction (libxul.so)
#2    webrender::render_backend::RenderBackend::process_api_msg (libxul.so)
#3    std::sys_common::backtrace::__rust_begin_short_backtrace (libxul.so)
#4    core::ops::function::FnOnce::call_once{{vtable.shim}} (libxul.so)
#5    std::sys::unix::thread::Thread::new::thread_start (libxul.so)
#6    start_thread (libpthread.so.0)
#7    __GI___clone (libc.so.6)
#8    ??? (firefox-bin)

Other crashes under this signature have the relevant assertion, like this one which has the MozCrashReason annotation set to:

MOZ_RELEASE_ASSERT((run->mRegionsMask[elm] & (1U << bit)) == 0) (Double-free?)

Note that PHC deals with UAFs but in this case the second use was the free() call.

Component: Graphics → Graphics: WebRender
Blocks: gfx-triage

Jeff, could you examine this and see if you can provide any insights? It seems rather odd that we'd be getting a double-free in the Rust code; perhaps this is in the C bindings?

Flags: needinfo?(jmuizelaar)

Having inline call stacks would make it more obvious what was actually being freed.

Depends on: 1398533

This doesn't happen on Windows so looking at minidump is not very practical which makes this not very actionable.

Flags: needinfo?(jmuizelaar)
No longer blocks: gfx-triage

Assigning to get sec bugs owned. Feel free to hand off to someone else as needed.

Assignee: nobody → gwatson
Assignee: gwatson → nobody

If I click the link to the crash reports in the initial comment it says it's not found - do old crash reports expire and get removed? Do we know if this is still occurring, and if so, if there's anything we can action on this?

Flags: needinfo?(jmuizelaar)
Flags: needinfo?(gsvelto)

(In reply to Glenn Watson [:gw] from comment #5)

If I click the link to the crash reports in the initial comment it says it's not found - do old crash reports expire and get removed?

Indeed they do: 6 months. Also reflected in the graph shown for crash-signature frequency in the "Crash Data" section up at the top. One catch there is that the graph reflects the signatures entered into the signatures field which someone might forget to do, or more commonly a bug might be about a subset of crashes with the same signature (for example, ones with a specific calling function up the stack).

Do we know if this is still occurring, and if so, if there's anything we can action on this?

Clicking on the crash signature line and then modifying the date range there are only 9 crashes in 6 months; most are on ESR-91. Three are on then-current(ish) Release versions, the most recent being 101 in June.

A couple of those stacks look similar to what gsvelto described up top, like bp-8a4c11de-7a87-4581-9893-bf4f10220607. But note the specific crash he was reporting was caught by PHC, and that's going to be rare in any case. Like ASAN reports, PHC finds problems that wouldn't necessarily be detected "in the wild", or at least not necessarily at that same location. It's quite possible the problem is still there even though the crash volume is low because the probability of PHC is low. (none of the currently recorded crashes appear to involve PHC)

Because of that, the extra PHC stack information provided in comment 0 is still the best thing to look at.

I only see the MOZ_CRASH_REASON mentioned in comment 0 in two crashes, both 91-ESR, such as bp-d380853a-76c5-4556-b1dc-3065e0220321

I think the issue we've had is that from looking at the stack and the code in question, we couldn't see how this could occur (not saying it doesn't, just that we've taken a look at the code and it wasn't clear to us how it could).

I imagine it's probably something related to some unsafety in the FFI / bindings layer between WR and Gecko, but it's going to be hard to do anything about it without a repro since we couldn't come up with a theory on how it might happen from inspecting the code.

Is that summary roughly correct Jeff?

Yeah, we purge crashes after 6 months since they contain sensitive data. Given this was detected by PHC but only one crash ever had PHC annotations it's possible that it was either a false positive (maybe caused by bad hardware?) or something that got fixed along the way. If it were a valid bug we'd see at least some volume, and definitely a few more caught by PHC. If it were for me I'd close this as INVALID.

Flags: needinfo?(gsvelto)
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(jmuizelaar)

In the past 6 months only 11 crashes were recorded in the wild, and 8 of them were ESR-91 builds. Only one of the 11 crashes have the MOZ_RELEASE_ASSERT mentioned in comment 0 (an ESR-91 crash from March). I agree this bug isn't doing us a lot of good and could be resolved INCOMPLETE or WORKSFORME

If it were a valid bug we'd see at least some volume, and definitely a few more caught by PHC.

Would we? I only see 13 PHC-triggered crashes of any kind in the last month; it seems like we have the probabilities turned way down. I searched for "PHC Kind exists" -- is there something better?

Flags: needinfo?(gsvelto)

(In reply to Daniel Veditz [:dveditz] from comment #9)

In the past 6 months only 11 crashes were recorded in the wild, and 8 of them were ESR-91 builds. Only one of the 11 crashes have the MOZ_RELEASE_ASSERT mentioned in comment 0 (an ESR-91 crash from March). I agree this bug isn't doing us a lot of good and could be resolved INCOMPLETE or WORKSFORME

I agree.

Would we? I only see 13 PHC-triggered crashes of any kind in the last month; it seems like we have the probabilities turned way down. I searched for "PHC Kind exists" -- is there something better?

In the past we saw PHC consistently catching bugs, it's possible that something changed and indeed we're getting relatively few crashes.

Flags: needinfo?(gsvelto)
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(jmuizelaar)
Resolution: --- → INCOMPLETE
Group: gfx-core-security
You need to log in before you can comment on or make changes to this bug.