Hang while running APZCBasicTester.OverScroll_Bug1152051a (infinite loop in BlockingResourceBase::Release())

RESOLVED FIXED in Firefox 49

Status

()

RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: botond, Assigned: erahm)

Tracking

Trunk
mozilla49
Points:
---

Firefox Tracking Flags

(firefox49 fixed)

Details

(Whiteboard: [gfx-noted])

Attachments

(3 attachments, 1 obsolete attachment)

(Reporter)

Description

2 years ago
Created attachment 8749780 [details]
Partial backtrace

bd339 reported (in bug 1170062 comment 7) an issue where running APZ gtests resulted in a hang while running a particular test, APZCBasicTester.OverScroll_Bug1152051a

I can't reproduce the hang, but bd339 sent me a partial backtrace:

[14:20] <bd339> #0  0x00007f05d0debee2 in mozilla::BlockingResourceBase::Release (this=0x7f05c8ae71e8) at /xpcom/glue/BlockingResourceBase.cpp:339
[14:20] <bd339> #1  0x00007f05d0dec27d in mozilla::ReentrantMonitor::Exit (this=0x7f05c8ae71e8) at /xpcom/glue/BlockingResourceBase.cpp:444
[14:20] <bd339> #2  0x00007f05d0843793 in mozilla::ReentrantMonitorAutoEnter::~ReentrantMonitorAutoEnter (this=0x7ffeed596c00, __in_chrg=<optimized out>)
[14:20] <bd339>     at /obj-x86_64-pc-linux-gnu/dist/include/mozilla/ReentrantMonitor.h:185
[14:20] <bd339> #3  0x00007f05d2287e47 in mozilla::layers::StateChangeNotificationBlocker::StateChangeNotificationBlocker (this=0x7ffeed596c70, aApzc=0x7f05c8ae7000)
[14:20] <bd339>     at /gfx/layers/apz/src/AsyncPanZoomController.cpp:418

Not sure what's going on here... infinite loop in BlockingResourceBase::Release() maybe?

Attached is a list of the other threads that are running.
(Reporter)

Comment 1

2 years ago
(In reply to Botond Ballo [:botond] from comment #0)
> Not sure what's going on here... infinite loop in
> BlockingResourceBase::Release() maybe?

Yeah - bd339 said one CPU core is maxed out during the hang. So it's not a hang, but an infinite loop, seemingly here:

https://dxr.mozilla.org/mozilla-central/rev/e5a10bc7dac4ee2453d8319165c1f6578203eac7/xpcom/glue/BlockingResourceBase.cpp#364
(Reporter)

Comment 2

2 years ago
Nathan, have you seen this sort of infinite loop in BlockingResourceBase::Release() before? Do you have any suggestions for how to further diagnose / debug one?
Flags: needinfo?(nfroyd)
Summary: Hang while running APZCBasicTester.OverScroll_Bug1152051a → Hang while running APZCBasicTester.OverScroll_Bug1152051a (infinite loop in BlockingResourceBase::Release())
I haven't seen anything like this, though Eric has done some work with the deadlock detector, maybe he's seen something like this?

The warning for that case is a little worrisome:

    NS_WARNING("Resource acquired at calling context\n");
    NS_WARNING("  [stack trace unavailable]\n");
    NS_WARNING("\nis being released in non-LIFO order; why?");

I'd be curious to see a fuller stack trace, especially if there happen to be traces from other threads involved.
Flags: needinfo?(nfroyd) → needinfo?(erahm)
(Reporter)

Comment 4

2 years ago
(In reply to Nathan Froyd [:froydnj] from comment #3)
> I'd be curious to see a fuller stack trace, especially if there happen to be
> traces from other threads involved.

bd339, could you obtain and post stack traces for all threads? You can get them with a single command, "thread apply all bt".
Flags: needinfo?(bd339)
(Assignee)

Comment 5

2 years ago
In theory to get stacks you have to enable them at build time: https://dxr.mozilla.org/mozilla-central/rev/e5a10bc7dac4ee2453d8319165c1f6578203eac7/xpcom/glue/BlockingResourceBase.h#20-21, but it looks like we missed adding them back to this spot in bug 1050445.

I'm guessing re-entrant + async stuff is wreaking havoc. It predates me, but I'm probably the most up to date on it so I'll take a look at the Release-out-of-order logic.

Comment 6

2 years ago
Created attachment 8749846 [details]
stack trace for all threads during hang
Flags: needinfo?(bd339)
(Assignee)

Comment 7

2 years ago
Created attachment 8755567 [details] [diff] [review]
Add stack traces for blocking resource base warnings

Through code inspection I'm not seeing how we could get into an infinite loop
there. This at least re-enables stack printing in the monitor case.
Attachment #8755567 - Flags: review?(nfroyd)
(Assignee)

Updated

2 years ago
Assignee: nobody → erahm
Status: NEW → ASSIGNED
(Assignee)

Updated

2 years ago
Flags: needinfo?(erahm)
Comment on attachment 8755567 [details] [diff] [review]
Add stack traces for blocking resource base warnings

Review of attachment 8755567 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, but I'd like to double-check the addition of the second Print call below.

::: xpcom/glue/BlockingResourceBase.cpp
@@ +418,5 @@
>          NS_WARNING(
> +          "Re-entering ReentrantMonitor after acquiring other resources.");
> +
> +        nsCString tmp;
> +        Print(tmp);

I don't think this one is necessary, as CheckAcquire() below should Print() this via the PrintCycle call?
Attachment #8755567 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 9

2 years ago
(In reply to Nathan Froyd [:froydnj] from comment #8)
> Comment on attachment 8755567 [details] [diff] [review]
> Add stack traces for blocking resource base warnings
> 
> Review of attachment 8755567 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me, but I'd like to double-check the addition of the second Print call
> below.
> 
> ::: xpcom/glue/BlockingResourceBase.cpp
> @@ +418,5 @@
> >          NS_WARNING(
> > +          "Re-entering ReentrantMonitor after acquiring other resources.");
> > +
> > +        nsCString tmp;
> > +        Print(tmp);
> 
> I don't think this one is necessary, as CheckAcquire() below should Print()
> this via the PrintCycle call?

Yeah I'll remove the Print call.
(Assignee)

Comment 10

2 years ago
Created attachment 8755606 [details] [diff] [review]
Add stack traces for blocking resource base warnings
(Assignee)

Updated

2 years ago
Attachment #8755567 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Attachment #8755606 - Flags: review+
(Assignee)

Updated

2 years ago
Keywords: checkin-needed

Comment 12

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/2fd48137efe5
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox49: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
You need to log in before you can comment on or make changes to this bug.