Closed Bug 1811637 Opened 2 years ago Closed 2 years ago

Use-after-free crash in [@ g_type_check_instance_cast]

Categories

(Core :: Widget: Gtk, defect, P1)

defect

Tracking

()

RESOLVED FIXED
112 Branch
Tracking Status
firefox-esr102 111+ fixed
firefox109 --- wontfix
firefox110 --- wontfix
firefox111 + fixed
firefox112 + fixed

People

(Reporter: mccr8, Assigned: stransky)

References

(Regression, )

Details

(4 keywords, Whiteboard: [adv-main111+r][adv-esr102.9+r])

Crash Data

Attachments

(2 files)

Crash report: https://crash-stats.mozilla.org/report/index/76fdcfa7-05c8-4b27-8cb9-2cc250230120

Reason: SIGSEGV / SI_KERNEL

Top 10 frames of crashing thread:

0  libgobject-2.0.so.0  g_type_check_instance_cast  /usr/src/debug/glib2-2.74.1-2.fc37.x86_64/gobject/gtype.c:4127
1  libgtk-3.so.0  gtk_container_accessible_get_n_children  /usr/src/debug/gtk3-3.24.36-1.fc37.x86_64/gtk/a11y/gtkcontaineraccessible.c:51
2  libatk-bridge-2.0.so.0  impl_get_ChildCount  /usr/src/debug/at-spi2-atk-2.38.0-5.fc37.x86_64/atk-adaptor/adaptors/accessible-adaptor.c:139
3  libatk-bridge-2.0.so.0  impl_prop_GetSet  /usr/src/debug/at-spi2-atk-2.38.0-5.fc37.x86_64/droute/droute.c:364
4  libatk-bridge-2.0.so.0  handle_properties  /usr/src/debug/at-spi2-atk-2.38.0-5.fc37.x86_64/droute/droute.c:443
4  libatk-bridge-2.0.so.0  handle_message  /usr/src/debug/at-spi2-atk-2.38.0-5.fc37.x86_64/droute/droute.c:597
5  libdbus-1.so.3  _dbus_object_tree_dispatch_and_unlock  /usr/src/debug/dbus-1.14.4-1.fc37.x86_64/dbus/dbus-object-tree.c:1021
5  libdbus-1.so.3  <name omitted>  /usr/src/debug/dbus-1.14.4-1.fc37.x86_64/dbus/dbus-connection.c:4742
5  libdbus-1.so.3  dbus_connection_dispatch  /usr/src/debug/dbus-1.14.4-1.fc37.x86_64/dbus/dbus-connection.c:4574
6  libatspi.so.0  message_queue_dispatch  /usr/src/debug/at-spi2-core-2.44.1-2.fc37.x86_64/atspi/atspi-gmain.c:89

This looks like a UAF inside GTK accessibility code. There are 28 crashes like that in the last week, which isn't a huge number in absolute terms, but seems like it must be a lot for Linux accessibility code.

Summary: Crash in [@ g_type_check_instance_cast] → Use-after-free crash in [@ g_type_check_instance_cast]

I'm not sure if this should go here or in Widget: GTK.

I wonder if this is related to bug 1760819? It's really hard to debug this without any idea of what the GTK a11y code was even trying to access.

Severity: -- → S2

I checked more than 10 of these reports. All of them were during shutdown and all with Wayland.

The fact that these occur during shutdown makes me think an Accessible is being destroyed somehow before the widget is destroyed. I don't see how that could happen, though. The RootAccessible associated with the widget should have been destroyed when the PresShell was destroyed. That said, I don't have a solid understanding of the order of events when a root widget shuts down. Is it possible for the PresShell to outlive the widget? I wouldn't think so, but if it is, maybe that could cause a problem?

Alternatively, maybe we're destroying things correctly but not sending some notification we should be sending. I'm somewhat unclear as to how our ATK code notifies the OS about created and destroyed Accessibles. There are create and destroy signals, but we don't seem to use them. We do unref ATK objects when we destroy them, but maybe that isn't enough somehow?

I also can't figure out how we associate our custom ATKObject with the widget (if we do at all), nor how the OS learns about our ApplicationAccessible (we fire children changed notifications on that, but how does the system learn about it in the first place?).

Eitan, do you have any thoughts on any of this?

Flags: needinfo?(eitan)

Please run with MOZ_LOG="Widget:5 WidgetPopup:5" and you get info when Gtk widgets are deleted (it's by nsWindow::Destroy()).

From the code:

static dbus_bool_t
impl_get_ChildCount (DBusMessageIter * iter, void *user_data)
{
  AtkObject *object = (AtkObject *) user_data;
  int childCount;

  g_return_val_if_fail (ATK_IS_OBJECT (user_data), FALSE);

  childCount = (ATK_IS_SOCKET (object) && atk_socket_is_occupied (ATK_SOCKET (object)))
               ? 1
               : atk_object_get_n_accessible_children (object);
  return droute_return_v_int32 (iter, childCount);
}

It looks like ATK has stored already released object (the void *user_data param).

(In reply to Martin Stránský [:stransky] (ni? me) from comment #4)

Please run with MOZ_LOG="Widget:5 WidgetPopup:5" and you get info when Gtk widgets are deleted (it's by nsWindow::Destroy()).

Thanks. However, it's still unclear to me how this fits in with the PresShell, which is where the RootAccessible is held.

(In reply to Martin Stránský [:stransky] (ni? me) from comment #5)

It looks like ATK has stored already released object (the void *user_data param).

Which leads to the question: is that object something created by Gecko or something created by GTK? I don't see anywhere we tell the widget to use our AtkObject instead of its default implementation, but that seems odd to me, so I must be missing something.

Can you use ASAN build or rr to trace it? You can but breakpoint to impl_get_ChildCount() and trace it back where the object is created.

I can't reproduce it - I'm not even set up to run Linux builds currently (but even people on my team using a11y on Linux don't appear to see this) - so unfortunately no.

I'm going to try to reproduce this with a local asan build and record it in rr.

Flags: needinfo?(eitan)

Martin,

This looks like a bug in GTK3, if a widget is finalized before it is destroyed it de-references and drops the pointer to its accessible. But the accessible still keeps a reference to the widget. So if the accessible is still alive and queried via dbus it uses the freed address space where the widget used to be.

GtkWidget's destroy method does this correctly:
https://gitlab.gnome.org/GNOME/gtk/-/blob/3.24.36/gtk/gtkwidget.c#L12335-12340

But the finalize method unilaterally drops the accessible pointer value without clearing the widget's pointer value in the accessible:
https://gitlab.gnome.org/GNOME/gtk/-/blob/3.24.36/gtk/gtkwidget.c#L12374

Note that this is a problem because the accessible does not hold a strong reference to the widget:
https://gitlab.gnome.org/GNOME/gtk/-/blob/3.24.36/gtk/gtkaccessible.c#L169-172

I believe our MozContainer is finalized before it is destroyed and leaves a dangling reference to itself in its corresponding accessible that outlives it.

To fix the bug on our end we need to override some assured cleanup method in MozContainer (finalize?) to remove the MozContainer reference from the accessible.

As for an upstream fix, this is a UAF, so I don't know what their protocol is for that kind of thing.

Flags: needinfo?(stransky)

Thanks for the investigation, will look at it.

Gtk4 seems to have it fixed and doesn't delete accessible reference from finalize. From my understanding finalize is called when reference to GObject drops to 0.

Assignee: nobody → stransky
Status: NEW → ASSIGNED

Comment on attachment 9316555 [details]
Bug 1811637 [Linux] Detach accessible from moz_container before it's deleted r?emilio

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: It reliably crashes when any Firefox window is closed due to access to freed memory. The crash happens because ATK toolkit access (via DBus callback) to already deleted GtkWidget which holds Firefox main window.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: Seems to be caused by Gtk3 library which doesn't clear GtkWidget reference from ATK object. It happens for custom GtkWidgets which doesn't call stock (and private) gtk_widget_real_destroy() but only gtk_widget_finalize(). gtk_widget_finalize() doesn't clear back reference from ATK object to GtkWidget so when ATK object is used (after GtkWidget free) the freed GtkWidget is accessed.
  • Do you have backports for the affected branches?: Yes
  • If not, how different, hard to create, and risky will they be?: No risk, we just clear GtkWidget reference from ATK object so the callback is not performed. It's the same for ESR line.
  • How likely is this patch to cause regressions; how much testing does it need?: Should not cause a regression, at already crashes whole browser.
  • Is Android affected?: No
Flags: needinfo?(stransky)
Attachment #9316555 - Flags: sec-approval?
Component: Disability Access APIs → Widget: Gtk
Priority: -- → P1
Duplicate of this bug: 1760819

Guys, this is easy and simple fix, can we land it now also for release? Otherwise I'm going to ship it as Fedora/RHEL downstream patch.

Flags: needinfo?(dveditz)

Copying crash signatures from duplicate bugs.

Crash Signature: [@ g_type_check_instance_cast] → [@ g_type_check_instance_cast] [@ gtk_container_foreach]

The following field has been copied from a duplicate bug:

Field Value Source
Regressed by bug 1530052 bug 1760819

For more information, please visit auto_nag documentation.

Keywords: regression
Regressed by: 1530052

Set release status flags based on info from the regressing bug 1530052

Comment on attachment 9316555 [details]
Bug 1811637 [Linux] Detach accessible from moz_container before it's deleted r?emilio

Approved to request uplift and land

Attachment #9316555 - Flags: sec-approval? → sec-approval+
Crash Signature: [@ g_type_check_instance_cast] [@ gtk_container_foreach] → [@ g_type_check_instance_cast] [@ gtk_container_foreach]

(In reply to Martin Stránský [:stransky] (ni? me) from comment #17)

Guys, this is easy and simple fix, can we land it now also for release? Otherwise I'm going to ship it as Fedora/RHEL downstream patch.

Hey Martin, sorry I didn't see this when I did the sec-approval - sec-approvals get paused once the uplift period ends, which was on February 2nd, and resume after the merge. (Future dates are visible on fx-trains.) If there is something after the last uplift that is needed urgently in the release then the best thing to do is to contact relman - the Release Owner for a given release is on fx-trains - you would check who the owner is for the beta you want to uplift it into.

Flags: needinfo?(dveditz)
Group: layout-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 112 Branch

The patch landed in nightly and beta is affected.
:stransky, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox111 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(stransky)

Comment on attachment 9316555 [details]
Bug 1811637 [Linux] Detach accessible from moz_container before it's deleted r?emilio

Beta/Release Uplift Approval Request

  • User impact if declined: Firefox crashes if any opened Firefox window is closed and accessibility is enabled (screen reader for instance).
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Should not cause a regression, it already crashes whole browser when accessibility is enabled.
  • String changes made/needed:
  • Is Android affected?: No
Flags: needinfo?(stransky)
Attachment #9316555 - Flags: approval-mozilla-beta?

Do we need this on ESR102 also?

Flags: needinfo?(stransky)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #26)

Do we need this on ESR102 also?

Yes, it's affected too.

Flags: needinfo?(stransky)

Comment on attachment 9316555 [details]
Bug 1811637 [Linux] Detach accessible from moz_container before it's deleted r?emilio

Approved for 111.0b5

Attachment #9316555 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Please attach a rebased patch for ESR102 and nominate it for uplift when you get a chance.

Flags: needinfo?(stransky)
Attached patch esr102.patchSplinter Review

Sorry, I'm unable to submit as phabricator patch due to python error:
CommandError: command 'status' failed to complete successfully
so attaching as plain diff here.

Flags: needinfo?(stransky)
Attachment #9320436 - Flags: approval-mozilla-esr102?

Comment on attachment 9320436 [details] [diff] [review]
esr102.patch

Approved for 102.9esr.

Attachment #9320436 - Flags: approval-mozilla-esr102? → approval-mozilla-esr102+
QA Whiteboard: [post-critsmash-triage]
Whiteboard: [adv-main111+r]
Whiteboard: [adv-main111+r] → [adv-main111+r][adv-esr102.9+r]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: