Open Bug 1954023 Opened 1 month ago Updated 1 month ago

[Linux/Wayland/HDR] Thread lock detection eats 100% of cpu and blocks UI

Categories

(Core :: XPCOM, defect, P3)

defect

Tracking

()

People

(Reporter: stransky, Unassigned)

References

(Blocks 1 open bug)

Details

Linux HDR debug build is unusable. Thread lock detection eats 100% of cpu and blocks UI, prints on terminal lot of messages like:

###!!! ERROR: Potential deadlock detected:
=== Cyclical dependency starts at
--- Mutex : WaylandSurface calling context
  [stack trace unavailable]

--- Next dependency:
--- Mutex : WaylandSurface (currently acquired)
 calling context
  [stack trace unavailable]

=== Cycle completed at
--- Mutex : WaylandSurface calling context
  [stack trace unavailable]

Deadlock may happen for some other execution

Non-debug build works fine.

Reproductions steps:

  1. Downlad latest nightly / debug build, run on Linux/Wayland
  2. Set gfx.webrender.compositor.force-enabled / gfx.webrender.compositor to true
  3. Restart browser, try to scroll any page

When I stop Firefox in gdb, I see the main thread is cycling in the thread lock detection code which is very deep.

Is there any way how to disable the detector? I'd like to run it in TSAN but it used debug build AFAIK which is unusable.

IIRC it's possible to do a non-debug TSAN build, which might be what you want here. Though this does suggest that there are perhaps lock-inversion issues with the "WaylandSurface" mutex (https://searchfox.org/mozilla-central/rev/4ce36232b265b53de4fb7eb754430f94e262bbbe/widget/gtk/WaylandSurface.h#386), which should be fixed to avoid potential deadlocks. My guess is that sometimes code holds locks for multiple WaylandSurface objects at the same time in a non-globally-consistent order which is both leading to a giant dependency tree (slowing down the deadlock detector), and potentially could lead to a deadlock in some cases.

Unfortunately I don't think we have a different flag for the deadlock detector, it's just using #ifdef DEBUG, so if you want the detector disabled, you'll need to use a non-debug build, or refactor the code in BlockingResourceBase.{h,cpp}.

Severity: -- → S3
You need to log in before you can comment on or make changes to this bug.