Closed Bug 1270938 Opened 9 years ago Closed 9 years ago

Hang while running APZCBasicTester.OverScroll_Bug1152051a (infinite loop in BlockingResourceBase::Release())

Categories

(Core :: Panning and Zooming, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed

People

(Reporter: botond, Assigned: erahm)

Details

(Whiteboard: [gfx-noted])

Attachments

(3 files, 1 obsolete file)

Attached file Partial backtrace
bd339 reported (in bug 1170062 comment 7) an issue where running APZ gtests resulted in a hang while running a particular test, APZCBasicTester.OverScroll_Bug1152051a I can't reproduce the hang, but bd339 sent me a partial backtrace: [14:20] <bd339> #0 0x00007f05d0debee2 in mozilla::BlockingResourceBase::Release (this=0x7f05c8ae71e8) at /xpcom/glue/BlockingResourceBase.cpp:339 [14:20] <bd339> #1 0x00007f05d0dec27d in mozilla::ReentrantMonitor::Exit (this=0x7f05c8ae71e8) at /xpcom/glue/BlockingResourceBase.cpp:444 [14:20] <bd339> #2 0x00007f05d0843793 in mozilla::ReentrantMonitorAutoEnter::~ReentrantMonitorAutoEnter (this=0x7ffeed596c00, __in_chrg=<optimized out>) [14:20] <bd339> at /obj-x86_64-pc-linux-gnu/dist/include/mozilla/ReentrantMonitor.h:185 [14:20] <bd339> #3 0x00007f05d2287e47 in mozilla::layers::StateChangeNotificationBlocker::StateChangeNotificationBlocker (this=0x7ffeed596c70, aApzc=0x7f05c8ae7000) [14:20] <bd339> at /gfx/layers/apz/src/AsyncPanZoomController.cpp:418 Not sure what's going on here... infinite loop in BlockingResourceBase::Release() maybe? Attached is a list of the other threads that are running.
(In reply to Botond Ballo [:botond] from comment #0) > Not sure what's going on here... infinite loop in > BlockingResourceBase::Release() maybe? Yeah - bd339 said one CPU core is maxed out during the hang. So it's not a hang, but an infinite loop, seemingly here: https://dxr.mozilla.org/mozilla-central/rev/e5a10bc7dac4ee2453d8319165c1f6578203eac7/xpcom/glue/BlockingResourceBase.cpp#364
Nathan, have you seen this sort of infinite loop in BlockingResourceBase::Release() before? Do you have any suggestions for how to further diagnose / debug one?
Flags: needinfo?(nfroyd)
Summary: Hang while running APZCBasicTester.OverScroll_Bug1152051a → Hang while running APZCBasicTester.OverScroll_Bug1152051a (infinite loop in BlockingResourceBase::Release())
I haven't seen anything like this, though Eric has done some work with the deadlock detector, maybe he's seen something like this? The warning for that case is a little worrisome: NS_WARNING("Resource acquired at calling context\n"); NS_WARNING(" [stack trace unavailable]\n"); NS_WARNING("\nis being released in non-LIFO order; why?"); I'd be curious to see a fuller stack trace, especially if there happen to be traces from other threads involved.
Flags: needinfo?(nfroyd) → needinfo?(erahm)
(In reply to Nathan Froyd [:froydnj] from comment #3) > I'd be curious to see a fuller stack trace, especially if there happen to be > traces from other threads involved. bd339, could you obtain and post stack traces for all threads? You can get them with a single command, "thread apply all bt".
Flags: needinfo?(bd339)
In theory to get stacks you have to enable them at build time: https://dxr.mozilla.org/mozilla-central/rev/e5a10bc7dac4ee2453d8319165c1f6578203eac7/xpcom/glue/BlockingResourceBase.h#20-21, but it looks like we missed adding them back to this spot in bug 1050445. I'm guessing re-entrant + async stuff is wreaking havoc. It predates me, but I'm probably the most up to date on it so I'll take a look at the Release-out-of-order logic.
Flags: needinfo?(bd339)
Through code inspection I'm not seeing how we could get into an infinite loop there. This at least re-enables stack printing in the monitor case.
Attachment #8755567 - Flags: review?(nfroyd)
Assignee: nobody → erahm
Status: NEW → ASSIGNED
Flags: needinfo?(erahm)
Comment on attachment 8755567 [details] [diff] [review] Add stack traces for blocking resource base warnings Review of attachment 8755567 [details] [diff] [review]: ----------------------------------------------------------------- r=me, but I'd like to double-check the addition of the second Print call below. ::: xpcom/glue/BlockingResourceBase.cpp @@ +418,5 @@ > NS_WARNING( > + "Re-entering ReentrantMonitor after acquiring other resources."); > + > + nsCString tmp; > + Print(tmp); I don't think this one is necessary, as CheckAcquire() below should Print() this via the PrintCycle call?
Attachment #8755567 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #8) > Comment on attachment 8755567 [details] [diff] [review] > Add stack traces for blocking resource base warnings > > Review of attachment 8755567 [details] [diff] [review]: > ----------------------------------------------------------------- > > r=me, but I'd like to double-check the addition of the second Print call > below. > > ::: xpcom/glue/BlockingResourceBase.cpp > @@ +418,5 @@ > > NS_WARNING( > > + "Re-entering ReentrantMonitor after acquiring other resources."); > > + > > + nsCString tmp; > > + Print(tmp); > > I don't think this one is necessary, as CheckAcquire() below should Print() > this via the PrintCycle call? Yeah I'll remove the Print call.
Attachment #8755567 - Attachment is obsolete: true
Attachment #8755606 - Flags: review+
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: