Constant main-thread hangs in parent process when top-level modal windows opened since bug 1368286 landed

VERIFIED FIXED in Firefox 55

Status

()

Core
XPCOM
P1
normal
VERIFIED FIXED
5 months ago
5 months ago

People

(Reporter: mconley, Unassigned)

Tracking

unspecified
mozilla55
Points:
---

Firefox Tracking Flags

(firefox53 unaffected, firefox54 unaffected, firefox55 fixed)

Details

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:55.0) Gecko/20100101 Firefox/55.0

Build ID: 20170531030204

STR (at least for me):

1) Log in to Treeherder
2) Go to any build on Mozilla central
3) Click on the little dropdown arrow to the right of the build symbols, and choose "BuildAPI"

ER:

A new tab with a modal dialog should open, and the browser should still be usable.

AR:

The modal dialog opens in the new tab, but the dialog will not accept user input.

I bisected this, and got to bug 1368286:

14:19.87 INFO: No more inbound revisions, bisection finished.
14:19.87 INFO: Last good revision: ad742747a8d096ec7dad94bafb81567f40c6e80d
14:19.87 INFO: First bad revision: 0b814165b471a980ad792d2495f9589f5aab6c84
14:19.87 INFO: Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=ad742747a8d096ec7dad94bafb81567f40c6e80d&tochange=0b814165b471a980ad792d2495f9589f5aab6c84

14:21.20 INFO: Looks like the following bug has the  changes which introduced the regression:
https://bugzilla.mozilla.org/show_bug.cgi?id=1368286


This bug is absolutely killing my productivity - I'm having to force-kill Firefox every time the master password dialog or a HTTP authentication dialog comes up. :(
ni? to Ehsan for ideas on why his patch caused regressions.
Flags: needinfo?(ehsan)
Priority: -- → P1

Comment 2

5 months ago
Seems maybe related to mac specific spin loop in bug 1369093?  I do see HasPendingEvents() in the profile over there.
See Also: → bug 1369093
See Also: → bug 1368603

Comment 3

5 months ago
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #2)
> Seems maybe related to mac specific spin loop in bug 1369093?  I do see
> HasPendingEvents() in the profile over there.

Do you mean https://perfht.ml/2qCwSBd?  This doesn't look immediately related to me.  It's hard to tell as that bug has no description besides the profile.

Comment 4

5 months ago
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #3)
> (In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #2)
> > Seems maybe related to mac specific spin loop in bug 1369093?  I do see
> > HasPendingEvents() in the profile over there.
> 
> Do you mean https://perfht.ml/2qCwSBd?  This doesn't look immediately
> related to me.  It's hard to tell as that bug has no description besides the
> profile.

Actually I think that profile is related but the information is hidden.  I think what's happening is we get to here with something in our idle queue <https://searchfox.org/mozilla-central/rev/1a0d9545b9805f50a70de703a3c04fc0d22e3839/widget/nsBaseAppShell.cpp#104> and NS_HasPendingEvents() returns true, and on Cocoa this takes us to <https://searchfox.org/mozilla-central/rev/1a0d9545b9805f50a70de703a3c04fc0d22e3839/widget/cocoa/nsAppShell.mm#503>.  I don't really understand this code and can't test any of this because I don't have a Mac right now.

We also have a favor performance mode which interacts with the priority of handling native events, but I don't really understand it all that well.

I should also say that I couldn't really find the evidence behind bug 1368286 comment 5 in the code (IOW the reason for why the dummy event is still needed.  But this code is super complex and I don't claim to understand it very well.  Removing that event certainly didn't break anything on the try server, but neither did the patch which landed for that bug.  :/

I'm not sure how to proceed here.  We can back out the patch that landed but AFAICT that would mean that we can indefinitely get blocked waiting to receive a native message when we have idle runnables to run which is bad also.  It would help if someone who has a Mac can help debug this.

Olli, what do you think?
Flags: needinfo?(ehsan) → needinfo?(bugs)
Definitely back out now and investigate later.
Flags: needinfo?(bugs)

Comment 6

5 months ago
OK, bug I'm actually interested in what you think with regards to how to deal with the problem. :-)
Flags: needinfo?(bugs)

Comment 7

5 months ago
You could also just disable the change on mac os for now.

Comment 8

5 months ago
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #7)
> You could also just disable the change on mac os for now.

I backed out bug 1368286 for now.  We should figure out a proper fix for 55 because otherwise idle tasks will potentially get starved...  Let's see what Olli suggests, and I'll see if I can catch this in the debugger tomorrow.
Confirmed that this is fixed in today's Nightly with bug 1368286 backed out.
Status: NEW → RESOLVED
Last Resolved: 5 months ago
status-firefox53: --- → unaffected
status-firefox54: --- → unaffected
status-firefox55: --- → fixed
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
Target Milestone: --- → mozilla55
(Thanks!)

Updated

5 months ago
Flags: needinfo?(bugs)

Updated

5 months ago
Duplicate of this bug: 1369093
You need to log in before you can comment on or make changes to this bug.