Closed Bug 1422855 Opened 7 years ago Closed 3 years ago

MacOS - Firefox sometimes freezes until the process is restarted when switching users.

Categories

(Core :: Graphics, defect, P1)

All
macOS
defect

Tracking

()

VERIFIED FIXED
87 Branch
Tracking Status
firefox-esr68 --- wontfix
firefox-esr78 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox74 --- wontfix
firefox75 --- wontfix
firefox76 --- wontfix
firefox77 --- wontfix
firefox86 --- verified
firefox87 --- verified

People

(Reporter: jon.hansen.home, Assigned: mstange)

References

Details

(Keywords: hang, regression, Whiteboard: [mac:hang])

Attachments

(7 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:57.0) Gecko/20100101 Firefox/57.0
Build ID: 20171128222554

Steps to reproduce:

Assumption:
- You are running the newest software release of MacOS (Currently High Sierra Version 10.13.1)
- You have two user accounts created with firefox accessible on each.

Steps to Reproduce:
1. Open firefox on User 1
2. Using the account toggle feature in macOS (typically your users name is on the top right corner of the menu bar located at the top of your screen) switch to User 2
3. Open firefox on User 2
4. Switch back to User 1 and attempt to use firefox


Actual results:

User 1 firefox is completely frozen. If you try to click anything on the gui, nothing happens. If you then unfocus the firefox window, you will see firefox show an update relating to the action you attempted to do (for example, click back button -> nothing happens -> unfocus -> [Firefox will update] -> refocus and attempt another action -> Rinse/Repeat

This occurs with a moderate to high frequency (50%+) (NOTE: NOT 100% reproducible) 

Additional Details:
I don't log my users out, I simply switch accounts. I work on my profile, and my wife works on hers. Often, one of our firefox clients freeze and we'll have to close it completely and reopen it to clear the problem. I have not personally noticed any specific action or commonality being required to trigger the problem, I just know I have to close firefox and reopen it pretty much every day. 


Expected results:

I believe that should be obvious in this case.

This is quite frustrating because I have three monitors and I have different browser windows on all three monitors for various reasons. When this happens I have to reconfigure all three monitors and it wastes a lot of time.
Summary: macOS High Sierra v10.13.1 -- Firefox 57.0.1 (64-bit) → macOS High Sierra v10.13.1 -- Firefox 57.0.1 (64-bit) - Firefox Freezes until the process is restarted.
Severity: normal → critical
Keywords: hang
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0
Firefox: 59.0a1, Build ID 20171206221407

I have managed to reproduce this issue on Mac 10.12 and Mac 10.13, with Firefox release (57.0.1) and the latest Nightly (59.0a1) build, using the steps provided in the description. However I have not managed to reproduce the issue on Firefox (55.0.2).
Considering this, using the Mozregression tool, I have tried to find a regression window which failed to download the last few builds and only managed to get the following pushlog:

Last good revision: e916ab827babb677ce5ab2cac0390c1401eaca0e
First bad revision: edb7e1ddd9b61e2af2a75cfe5baa0f92a54a2716
Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e916ab827babb677ce5ab2cac0390c1401eaca0e&tochange=edb7e1ddd9b61e2af2a75cfe5baa0f92a54a2716

Due to the fact that Firefox hangs, I've used the Gecko Profiler add-on and I've get the following report : https://goo.gl/LS9sbe

Mike, can you please look over the pushlog and see if anything stands out?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(mconley)
OS: Unspecified → Mac OS X
Hardware: Unspecified → All
Nothing stands out, I'm afraid. I think we need to narrow the regression range further.

cmarius, when you said the download failed during your mozregression session, can you be more specific? What error did it give you?
Flags: needinfo?(mconley) → needinfo?(marius.coman)
@Mike, I've rerun the regression on two user accounts simultaneously to make sure that it was the same Firefox build on both accounts. Here are the console outputs: https://goo.gl/LpFWGn. 

When I've tried to find the regression range using the changesets the following error was displayed:
"INFO: Getting mozilla-inbound builds between c01aa84ded7e and 85c7a1f9a5b4
ERROR: The url 'https://hg.mozilla.org/integration/mozilla-inbound/json-pushes?fromchange=c01aa84ded7e&tochange=85c7a1f9a5b4' contains no pushlog. Maybe use another range ?"

Also when I've tried to use the "autoland" repo the following error was displayed: 
"INFO: Getting autoland builds between c01aa84ded7e and 85c7a1f9a5b4
ERROR: No such branch 'autoland'"
Flags: needinfo?(marius.coman)
Thanks cmarius.

Do you know what cmarius is running into, wlach? Are there some extra arguments he needs to supply, or do we just not have those builds available to test?
Flags: needinfo?(wlachance)
I suspect that the command line arguments to mozregression aren't quite correct. How are you launching it :cmarius?
Flags: needinfo?(wlachance) → needinfo?(marius.coman)
We have Mozregression installed in a virtual environment, to access it I've used the following command :"source /users/<user_name>/mozregression/bin/activate". After that I've used the "mozregression --good (date) --bad (date)" command.
Also when I've tried to use the autoland repo I've used the "mozregression --good (changeset) --bad (changeset) --repo autoland" command and it worked in the past.
Flags: needinfo?(marius.coman)
pong-ing back to wlach for comment 6.
Flags: needinfo?(wlachance)
Those arguments look correct to me. I'm not sure why it's not recognizing autoland though, it seems to be working for me.

Looking more closely at the logs, it doesn't look like you were able to go past mozilla-central in your initial bisection?

I see that you ended up with this range:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=92dc60b522d81862e52bff5cdb1b698eb5608658&tochange=c01aa84ded7eb0b3e691f8bcc5cd887c960a779e

It doesn't seem like there's anything there that could be responsible for this bug (no integration branch merges either). I think the regression range is wrong...
Flags: needinfo?(wlachance)
Let me know if you need any more information from me. I am monitoring.
Additional information:

I managed to unfreeze my firefox. I dunno if I just needed to wait or what the deal is, but what I did to unfreeze it is:

1. Opened a new tab (in one of the all frozen windows)
2. I then toggled back and forth between a hangouts message window that was popped out (Using the firefox addon Messenger for Google™ Hangouts) (Popped out: arrow button on top right corner of a hangouts message)

Result: my firefox began functioning normally after I had toggled back and forth between a window and one of these hangouts popouts a few times.
I've tried again to reproduce the issue and now it is more intermittent that before (~1 in 20-30 attempts). Also on further investigation I've found that the Bug 1425127 describe the same behavior.
Considering this I will assign a component to this issue in order to get the engineering team involved.
Component: Untriaged → Widget: Cocoa
Product: Firefox → Core
See Also: → 1425127
Still happening to me, and more frequently than what Marius is reporting. I would guess at least once every other day if not more often.
Priority: -- → P2
Does this continue to be an issue with the latest version of macOS?
Flags: needinfo?(jon.hansen.home)
(In reply to Stephen A Pohl [:spohl] from comment #14)
> Does this continue to be an issue with the latest version of macOS?

As of macOS High Sierra Version 10.13.4 this is still an issue.
Flags: needinfo?(jon.hansen.home)
Since you can reproduce more often, can you try to run mozregression?
Flags: needinfo?(jon.hansen.home)
(In reply to Marco Castelluccio [:marco] from comment #16)
> Since you can reproduce more often, can you try to run mozregression?

I looked at the mozregression video but it seems to indicate there should be a good version. So far I have not seen a good version. I can run mozregression, but I will need to be provided a how to as I am not familiar with the tool.
Flags: needinfo?(jon.hansen.home)
(In reply to Jon from comment #17)
> (In reply to Marco Castelluccio [:marco] from comment #16)
> > Since you can reproduce more often, can you try to run mozregression?
> 
> I looked at the mozregression video but it seems to indicate there should be
> a good version. So far I have not seen a good version. I can run
> mozregression, but I will need to be provided a how to as I am not familiar
> with the tool.

Try a very old date as a "good" version. If you still see the issue (so it is not "good"), try an even older date.
Sorry guys, I am not sure I am going to be much help. It seems very complicated and I am very busy. If you give me a step by step procedure on Mac High Sierra I will give it a try but otherwise I am not going to be able to help. I attempted to install the stuff, but I get stuck on the step to: 
sudo pip2 install -U mozregression


Output:
  Downloading https://files.pythonhosted.org/packages/dd/96/b05c6d357f8d6932bea2b360537360517d1154b82cc71b8eccb70b28bdde/slugid-1.0.7.tar.gz
Requirement not upgraded as not directly required: cffi>=1.7; platform_python_implementation != "PyPy" in /Library/Python/2.7/site-packages (from cryptography>=1.3.4; extra == "security"->requests[security]==2.15.1->mozregression) (1.11.5)
Requirement not upgraded as not directly required: enum34; python_version < "3" in /Library/Python/2.7/site-packages (from cryptography>=1.3.4; extra == "security"->requests[security]==2.15.1->mozregression) (1.1.6)
Requirement not upgraded as not directly required: asn1crypto>=0.21.0 in /Library/Python/2.7/site-packages (from cryptography>=1.3.4; extra == "security"->requests[security]==2.15.1->mozregression) (0.24.0)
Requirement not upgraded as not directly required: ipaddress; python_version < "3" in /Library/Python/2.7/site-packages (from cryptography>=1.3.4; extra == "security"->requests[security]==2.15.1->mozregression) (1.0.22)
Requirement not upgraded as not directly required: pycparser in /Library/Python/2.7/site-packages (from cffi>=1.7; platform_python_implementation != "PyPy"->cryptography>=1.3.4; extra == "security"->requests[security]==2.15.1->mozregression) (2.18)
matplotlib 1.3.1 requires nose, which is not installed.
matplotlib 1.3.1 requires tornado, which is not installed.
awscli 1.10.44 has requirement colorama<=0.3.3,>=0.2.5, but you'll have colorama 0.3.7 which is incompatible.
Installing collected packages: pyOpenSSL, requests, mozinstall, mozversion, mohawk, slugid, taskcluster, configobj, mozregression
  Found existing installation: pyOpenSSL 0.13.1
Cannot uninstall 'pyOpenSSL'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
FYI this is still happening as of v62
63.0.3 still happening
Have you ever been able to install and run mozregression? If not, could you try running the following commands and see if that works?

sudo easy_install pip
sudo pip2 install -U mozregression --ignore-installed
mozregression
Flags: needinfo?(jon.hansen.home)
That worked, it's installed. Can you give me the procedure for getting you the data you need now?
Flags: needinfo?(jon.hansen.home)
Once you type `mozregression` in Terminal, a number of Firefox versions will open in succession to narrow down when this started occurring. Simply type "good" or "bad" in Terminal based on whether or not a build reproduces the bug. Once there are no more builds available, it should give you a range when this issue first started occurring. All we would need is that last bit of information in Terminal that shows the date and build range for the regression, if this does indeed turn out to be a regression.
Flags: needinfo?(jon.hansen.home)
I ran mozregression. The 2009 app was a blast from the past, but crashed straight after fast user switching back. I marked it as bad, but it got grumpy with me:
	**********
	You should use a config file. Please use the --write-config command line flag to help you create one.
	**********

	 0:00.67 INFO: No 'bad' option specified, using 2018-12-05
	 0:00.67 INFO: No 'good' option specified, using 2009-01-01
	 0:03.22 WARNING: Skipping build 2018-12-05: Unable to find build info for 2018-12-05
	 0:08.50 INFO: Testing good and bad builds to ensure that they are really good and bad...
	 0:08.50 INFO: Downloading build from: https://archive.mozilla.org/pub/firefox/nightly/2009/01/2009-01-01-02-mozilla-central/firefox-3.2a1pre.en-US.mac.dmg
	===== Downloaded 100% =====
	 0:16.35 INFO: Running mozilla-central build for 2009-01-01
	 0:23.48 INFO: Launching /private/var/folders/13/21bxw3ps27bfpp14txmbpcnr0000gq/T/tmpMpleBw/Minefield.app/Contents/MacOS/firefox-bin
	 0:23.48 INFO: Application command: /private/var/folders/13/21bxw3ps27bfpp14txmbpcnr0000gq/T/tmpMpleBw/Minefield.app/Contents/MacOS/firefox-bin -foreground -profile /var/folders/13/21bxw3ps27bfpp14txmbpcnr0000gq/T/tmpOubfhI.mozrunner
	 0:23.49 INFO: application_buildid: 20090101020547
	 0:23.49 INFO: application_changeset: e807ec425ad7
	 0:23.49 INFO: application_copyright: Copyright (c) 1998 - 2009 mozilla.org
	 0:23.49 INFO: application_name: Firefox
	 0:23.49 INFO: application_repository: http://hg.mozilla.org/mozilla-central
	 0:23.49 INFO: application_version: 3.2a1pre
	Was this nightly build good, bad, or broken? (type 'good', 'bad', 'skip', 'retry' or 'exit' and press Enter): bad
	 1:14.06 ERROR: Build was expected to be good! The initial good/bad range seems incorrect.

I then ran it and tried skip, but it crashed without even user-switching.
2009 may be too old. You can try with a known good date of let's say 2014 like so:

$ mozregression --good 2014-12-25

More documentation about mozregression can be found here: https://mozilla.github.io/mozregression/quickstart.html
Flags: needinfo?(mozilla05-12-18)
I was able to track the problem down to 2016-07-12 (aac8ff1024c553d9c92b85b8b6ba90f65de2ed08).

I couldn't get versions from about 2014-06-01 till 2016-07-12 to run: Firefox process takes 100% CPU but the application window isn't displayed.

To recap:

2016-07-12 - present: confirm the bug exists
~2014-06-01 - 2016-07-12: can't verify, the app doesn't start
~2014-06-01 and earlier: can't reproduce the bug

I don't want this to be misleading, but I noticed that version 35.0 released on 2015-01-13 introduced "tiled rendering on OS X". Could that be the cause? The problem looks like a rendering issue: Firefox keeps running normally after switching accounts, it just can't re-draw its window unless it goes out of focus.
An addition with regards to the test case: you do not have to launch Firefox on the other account to run into this bug. Using Safari on the other account also triggers this bug. However, for some reason, I did not encounter it before upgrading to Mojave. Maybe there was a change in Safari with Mojave which now affects Firefox, or maybe this is just incidental.
Having same issue on MacOS 10.14.2 and Firefox 64.0

Same issue here since Firefox 57, currently running mac OS 10.14.2

Firefox 65.01 and MacOS 10.14.2 (18C54) - Still happening. Annoying that this hasn't been fixed in so long.

Same problem here. This seems to depend on the time duration I am using the other user. E.g. if I have Firefox open as user A, switch to user B for only 10 seconds and back to a, usually nothing happens. If I do this for 2-5 minutes, I can revive the first firefox of user A by right-clicking on the icon in the task bar several times. For longer periods, Firefox A is hopelessly lost, but maybe could be revived if I just waited long enough. It seems to me that there is some kind of queue, which is filled up regularily, but can only be processed when the User of this Firefox is in the foreground.

And, additional info, this can be prevented if I minimize firefox before switching to user B. Firefox 65.01 on Mojave.

I can confirm with FF 65.0.1 (64-bit) \ macOS 10.14.4 Beta (18E194d).

I can also confirm that minimizing Firefox before locking my screen appears to work around the hang.

I can confirm this bug as well with FF 65.01/macOS 10.14.3
I use also the old FF ESR 52.9.0 because I need Google Hangout Plugin. Both FFs are affected. Interesting thing is that the old FF ESR 52.9.0 is the same version which I have used before the upgrade from macOS El Capitan and on the previous OS switching account didn't affect FF at all.

I can also confirm that the bug occurence is somehow dependent from the time how long I use the another OS account. The longer time I spend as user 2 the bigger change that FF on user 1 will be frozen after switching over account and back to user 1.

I cannot bring back FF to work by minimalizing windows, opening/closing new windows, creating new cards - the only way is to stop/kill FF and run it again.

Since this appears to be a regression, it would be great to get a regression range using mozregression[1] from someone who can reproduce reliably. To install and run mozregression, simply type the following three commands in a Terminal window:

sudo easy_install pip
sudo pip2 install -U mozregression --ignore-installed
mozregression

A number of Firefox versions will open in succession to narrow down when this started occurring. Simply type "good" or "bad" in Terminal based on whether or not a build reproduces the bug.

[1] https://mozilla.github.io/mozregression/

I tested it with mozregression already, see the results in my comment above.

(In reply to Andrey Fedoseev from comment #41)

I tested it with mozregression already, see the results in my comment above.

I didn't mean to dismiss those results. I was hoping to get a more precise regression range than ~2014-06-01 - 2016-07-12 if someone can run builds in that range.

I just tried mozregession, too. Unfortunately with the same result, anything between 2014-06-01 and 2016-07-12 takes 100% CPU and does not open any window but takes 99% CPU instead. Since I used Firefox during that time on my very same Macbook, I think that these versions might have a problem with Mojave. Unfortunately I don't have any older Boot images to test with Sierra or so. Are you interested in stack dumps of theres non-working versions or similar? Maybe it's easy to fix?

Hi, the versions in question hang here:
(lldb) thread b all

  • thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    • #0: 0x00007fff72c5baee in __mprotect + 10
      #1: 0x00007fff72cd1abe in malloc_zone_register_while_locked + 372
      #2: 0x00007fff72cdbfec in malloc_zone_register + 58
      #3: 0x0000000100010668 in register_zone + 360
      #4: 0x0000000105a8bcc8 in ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 518
      #5: 0x0000000105a8bec6 in ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40
      #6: 0x0000000105a870da in ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 358
      #7: 0x0000000105a8706d in ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 249
      #8: 0x0000000105a86254 in ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 134
      #9: 0x0000000105a862e8 in ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 74
      #10: 0x0000000105a75774 in dyld::initializeMainExecutable() + 199
      #11: 0x0000000105a7a78f in dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) + 6237
      #12: 0x0000000105a744f6 in dyldbootstrap::start(macho_header const*, int, char const**, long, macho_header const*, unsigned long*) + 1154
      #13: 0x0000000105a74036 in _dyld_start + 54

Does that ring any bell?

Another thing, I could reproduce the same bug with Thunderbird, so it's probably with some shared code. Maybe I get more luck trying to run mozregression for thunderbird?

same with --app=thunderbird :-(

I found an old bootable disk with MacOS Mavericks on it and here I can start mozregression versions between 2014-06-01 to 2016-07-12. But unfortunately there the problem does not exist here at all, even the newest Mozilla works fine.
I had another image with Sierra, and here it is the same as with Mojave.
I tried on another machine, which has El Capitan on it, and it seems there all versions work fine, too, although I have only VNC access there.
So it looks like the problem somehow came with Sierra support. 2016-07-12 the development versions were already out, so
probably someone fixed something for Sierra then, so that firefox worked at all. Unfortunately this means, that
we cannot find the bug with mozregression, only if one would backport the fix to older versions and recompile. Don't know if that's easily possible.

I have noticed this happening reliably, more than half of the time, on macOS Mojave 10.14.4 with Firefox Release 66.0.3. Can I help to move this bug forward? Are there any logs that would be helpful?

(In reply to subscribe from comment #37)

And, additional info, this can be prevented if I minimize firefox before switching to user B. Firefox 65.01 on Mojave.

This does not prevent the bug, but it definitely does reduce the occurrence of it.

-- Adding my additional observations since I still deal with this on a regular basis, though less so now that I know minimizing firefox reduces the occurrence of this issue greatly

  • Length of time spent on the opposite user directly correlates with the likelihood of the bug occurring.
    • User A is the active user
    • User B is the inactive user - this user's firefox will bug out eventually for the longer period of time User A is active.
  • Minimizing firefox on the inactive user results in requiring the active user to be logged in longer prior to the bug occurring.
    • User A is active user
    • User B is the inactive user with firefox minimized - User A must be logged in significantly longer prior to bug occurrence.
Flags: needinfo?(jon.hansen.home)

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

The situation seems to have improved under Mac OS X Catalina (10.15). Are you still able to reproduce the bug with the latest Mac OS X version ?

(In reply to Matthieu Beaumel from comment #51)

The situation seems to have improved under Mac OS X Catalina (10.15). Are you still able to reproduce the bug with the latest Mac OS X version ?

I'm still on 10.14, but also haven't had it in a while now.

I'm on MacOS 10.14.6, Firefox 70.0.1 and don't appear to be having the issue anymore.

It seems to be still present but very rare now. I've only seen it happen once in the last month.

reproducing 100%
Firefox 71.0 stable channel, MacOS 10.13.6.
using firefox in separate work & personal accounts - very annoying bug.

Extremely reliable when switching between users with Firefox 71 and OS X 10.14.6. Only one of the users is running Firefox.

Still an issue on 10.15.3 Catalina and FF 73.

Can run some tests to help.

Summary: macOS High Sierra v10.13.1 -- Firefox 57.0.1 (64-bit) - Firefox Freezes until the process is restarted. → MacOS - Firefox sometimes freezes until the process is restarted when switching users.

Would it be useful for someone to collect more info from one of these hung processes using activity monitor's "sample process" or a similar tool? What other steps could we try to get this bug into a more actionable state?

Flags: needinfo?(spohl.mozilla.bugs)

It would be interesting to see if the Firefox Profiler can shed any light on what's going on here:

https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Profiling_with_the_Built-in_Profiler#Profiling_a_hung_process

Flags: needinfo?(spohl.mozilla.bugs)

On OSX, you can also get a stack trace with the Activity Monitor. Click on the process of interest, then click on the gear and choose "Sample Process".

Attached file about:support contents
New to this thread, been having this problem since version 66. I have two local users, only one of which uses Firefox. In terms of re-producing the problem, it seems more prone to happen when I am switched to the non-Firefox user profile for a significant period of times (several hours). When I switch back to my macOS user with Firefox, Firefox will be extremely sluggish to all input - even checking for updates in the "About" window. Like everyone else, only quitting Firefox and re-launching brings it back to life. Neither safe mode nor 'cleaning' Firefox changes the reported behavior.

I haven't create a stack trace, but will do so next time it happens. For the sake of provided information, here are the contents of my "about:support" page:
Attached file about:support contents
New to this thread, been having this problem since version 66. I have two local users, only one of which uses Firefox. In terms of re-producing the problem, it seems more prone to happen when I am switched to the non-Firefox user profile for a significant period of times (several hours). When I switch back to my macOS user with Firefox, Firefox will be extremely sluggish to all input - even checking for updates in the "About" window. Like everyone else, only quitting Firefox and re-launching brings it back to life. Neither safe mode nor 'cleaning' Firefox changes the reported behavior.

I haven't create a stack trace, but will do so next time it happens. For the sake of provided information, here are the contents of my "about:support" page.

New to this thread, been having this problem since version 66. I have two local users, only one of which uses Firefox. In terms of re-producing the problem, it seems more prone to happen when I am switched to the non-Firefox user profile for a significant period of times (several hours). When I switch back to my macOS user with Firefox, Firefox will be extremely sluggish to all input - even checking for updates in the "About" window. Like everyone else, only quitting Firefox and re-launching brings it back to life. Neither safe mode nor 'cleaning' Firefox changes the reported behavior.

I haven't create a stack trace, but will do so next time it happens. For the sake of provided information, I've attached the contents of my "about:support" page.

See Also: → 1567018

I was able to reproduce this, but haven't uncovered anything so far. I didn't see any suspicious stacks in Activity Monitor and the parent process CPU utilization was low. One content process was running with spiking CPU load, but I didn't see anything in Activity Monitor. The build was a Release build which wasn't debuggable (due to Notarization/hardened runtime). I'll try to reproduce this with a developer Nightly build.

I have found that the hang can be recovered from by using the new window keyboard shortcut. A new window will then open and the original window(s) will become responsive again.

I attached an Activity Monitor sample from while in the hung state and after the "fix". Both samples are in the same file.

I had another occurrence of the problem today. Tried the 'new window' trick from #65, but it didn't work. I captured sampled the Firefox process during this hang period. Didn't quit Firefox, but switched back to my other user to do more work there. After several hours, I switched back to my Firefox user and Firefox was not hung. I've sampled the process then, too, for comparison.
I had another occurrence of the problem today. Tried the 'new window' trick from #65, but it didn't work. I captured sampled the Firefox process during this hang period. Didn't quit Firefox, but switched back to my other user to do more work there. After several hours, I switched back to my Firefox user and Firefox was not hung. I've sampled the process then, too, for comparison.

Thank you for Activity Monitor logs. I'm going through those, but don't see anything concrete yet.

One observation: this might be a "red herring", but in the samples provided and in my own logs, in the hang case, there is always a single CVDisplayLink thread where all the samples are from the following stack. For the non-hang case, there is a CVDisplayLink thread with at least one sample doing something else.

    1224 Thread_1102533: CVDisplayLink
    + 1224 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fffde11f08d]
    +   1224 _pthread_start  (in libsystem_pthread.dylib) + 286  [0x7fffde11f887]
    +     1224 _pthread_body  (in libsystem_pthread.dylib) + 180  [0x7fffde11f93b]
    +       1224 CVDisplayLink::runIOThread()  (in CoreVideo) + 520  [0x7fffc9dbf762]
    +         1224 CVDisplayLink::waitUntil(unsigned long long)  (in CoreVideo) + 233  [0x7fffc9dbf977]
    +           1224 _pthread_cond_wait  (in libsystem_pthread.dylib) + 769  [0x7fffde120833]
    +             1224 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fffde034bf2]

Additionally, there are some docs online for Fast User Switching. From https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPMultipleUsers/Concepts/FastUserSwitching.html

Processes in a switched-out login session continue running as before. They can continue processing data, communicating with the system, and drawing to the screen buffer as before. However, because they are switched out, they do not receive input from the keyboard and mouse. Similarly, if they were to check, the monitor would appear to be in sleep mode. As a result, it may benefit some applications to adjust their behavior while in a switched-out state to avoid wasting system resources.

Here's a profile I collected on Nightly. https://perfht.ml/2Xp896m

I started Nightly with a new profile. I had 1 window open with 4 JS-heavy tabs. I switched to another user and then switched back (after less than one minute). The browser was not responding to track pad scrolling. I hit Cmd-shift-1 to start the profiler which resulted in no user-visible feedback. After some number of seconds I hit Cmd-shift-2 to stop the profiler. Eventually the profiler window appeared at the browser was responsive again.

One possible clue here is the content process compositor thread recording a CONTENT_FRAME_TIME for ~15 seconds.

See comment 68 about the monitor appearing to be in sleep mode when a login session is switched out.

Edit: jrmuizel looked at the profile and commented that a long CONTENT_FRAME_TIME indicates we're taking a long time waiting for a paint to show up at the GPU.

More debugging needed.

Flags: needinfo?(u629422)
Assignee: nobody → haftandilian

I don't think I can take this any further without more graphics compositing expertise.

What I suspect is happening is that when we enter user switching, access to the GPU is shutdown in some way leaving our compositing code waiting for an update. In profile https://perfht.ml/2Xp896m the first 9 seconds are during the hung period and the graphics hang coincides with the long CONTENT_FRAME_TIME marker. Keyboard events are handled during the graphics hang allowing me to start the profiler, create new windows, etc.

The online documentation for CVDisplayLinkStop states that in macOS 10.4 and later, the display link thread is automatically stopped if the user employs Fast User Switching. The display link is restarted when switching back to the original user.

I'm attaching a patch that registers for the faster user switching notifications in case that is useful for testing the fix. I tried calling GPUProcessManager::SimulateDeviceReset() on these notifications (both switch out and switch back), but could still reproduce the problem.

Assignee: haftandilian → nobody
Component: Widget: Cocoa → XUL

Re-triage needed.

Component: XUL → Graphics
Flags: needinfo?(jbonisteel)

I don't think this has been mentioned before, but it might be useful. I've been seeing this quite a bit recently.

On one of my accounts I have both Firefox (nightly) and Thunderbird (68esr). When I switch back and hit this, I've only ever seen it be one of them that has frozen, and I'm pretty sure it is always the one that was in the foreground. I'm not quite sure if having another application in the foreground when switching away prevents it or not, but will try and monitor.

I can confirm that bug has been present since I updated to Mojave (now 10.14.6) from Sierra, both for Firefox (stable) and Thunderbird (still running 60.9.1). It only applies when the application is in the foreground when switching to a different user, and can be circumvented by putting a non-affected program in the foreground before switching users.

Flags: needinfo?(jessiebonisteel)

Markus, would you be able to help here, or direct to someone who could? See comment 70.

Flags: needinfo?(mstange.moz)

Just chiming in to say that this bug happens to me on macOS 10.14.6 and Firefox 84.0, which severely impacts my use-case.

I've got it 100% reproducible now with Firefox 84 on Big Sur mac mini (ARM):

  1. Open Firefox
  2. Hibernate (Apple -> Sleep)
  3. Wake mac up
  4. Try to interact with Firefox (e.g. click different tab in header)
  5. No visual feedback
  6. Tab switch to different app
  7. Tab switch back to FF
  8. Suddenly the other tab is active and rendered
  9. Go back to step 4 and loop

I'm commenting as the submitter of Bug 1684322. I can reliably reproduce the above comment 81 on my setup (Big Sur 11.1, Macbook Pro 13" M1 2020, Firefox 84.0.1, Pro Display XDR).

Some notes:

  1. I can ONLY reproduce it when connected to my Pro Display XDR. I have tried several times to reproduce it with just my laptop screen and have been unable to do so.
  2. I'm not sure of the mechanics of Apple's sleep, but the bug only reliably happens when the laptop has been asleep for awhile. If I put it to sleep (Apple -> Sleep) and then wake it up within a minute or two, everything works fine. But if it's been asleep for some amount of time, this bug behavior is reliably reproduced.
  3. I do not have additional users on this laptop, so I'm not sure if this is a different manifestation of the same bug, or a different bug altogether.
  4. Even though step #8 ("Suddenly the other tab is active and rendered") happens, the tab is non-responsive (no scrolling, text input, etc). Firefox must still be restarted.

Markus, can you try the steps to reproduce?

This is 100% reproducible with comment 81 instructions on mac mini Big Sur 11.1 (M1) using latest nightly 86.0a1 (2021-01-13) (64-bit). Fresh install, fresh profile.

I have same comment 81 issue in my Mac mini (M1, 2020).
This issue occurs when Rosetta is turned off.
Then I turn on Rosetta, this issue does not occur, and no problem.

Flags: needinfo?(mstange.moz)
Flags: needinfo?(mstange.moz)
See Also: → 1688037

I haven't been able to reproduce this yet, but I'll try some more on Monday.

See Also: → 1688149
See Also: 16881491415923

I have been able to reproduce the unresponsiveness after sleep, which I will investigate in bug 1682713.
I have not yet been able to reproduce the unresponsiveness after user switching.

Flags: needinfo?(mstange.moz)
Depends on: 1682713

I have been able to reproduce this bug, and I have found a fix. I will make test builds soon.

Assignee: nobody → mstange.moz
Status: NEW → ASSIGNED
Priority: P2 → P1
Whiteboard: [mac:hang]
Version: 57 Branch → Trunk

bug wkae from sleep or bug when switching user?

Wake from sleep is bug 1682713, but might be fixed by the same fix.

(In reply to Markus Stange [:mstange] from comment #95)

Wake from sleep is bug 1682713, but might be fixed by the same fix.

Nice i hope a fix very soon ;) thanks

This fixes a problem where the callback just wouldn't be called for the duration
of about a minute after fast user switching. I'm hoping it'll also help with a
similar problem after screen lock and after sleep (bug 1682713).

The documentation for CVDisplayLinkStop says the following:

In macOS 10.4 and later, the display link thread is automatically stopped if
the user employs Fast User Switching. The display link is restarted when
switching back to the original user.

This probably works for display links that were created before fast user
switching. However, we sometimes create a new CVDisplayLink while our user is
"in the background", and this new display link happily keeps running while we're
in the background. Then, when switching back to the original user, that's when
the display link is stopped. And then it eventually starts again. I'm not sure
what causes it to re-start.
Creating a CVDisplayLink while the machine is fast user switched to a different
user is probably not a well-exercised codepath. Things might work more reliably
if we keep reusing the same CVDisplayLink instance and just stop and start it
as needed. But that's a more risky change that I don't want to uplift.
Also, starting to listen for vsync while a different user is the "current" user
is probably a mistake anyway. We should find out if there's a way to suspend
compositing and drawing in that state. However, it seems that the window doesn't
enter the occluded state during this time, because in that case we would
de-activate the content docshell and not run requestAnimationFrame callbacks.
But the profiler clearly shows rAF running during the switched-away time.

I'm seeing the following display reconfiguration callbacks during fast user
switching, 2077750265 being the display link's current display:

[1] DisplayReconfiguration for 2077750265: BeginConfiguration
[2] DisplayReconfiguration for 1104977158: BeginConfiguration
[3] DisplayReconfiguration for 2077750265: Remove Disabled
[4] DisplayReconfiguration for 1104977158: Moved SetMain SetMode Add Enabled

[5] DisplayReconfiguration for 1104977158: BeginConfiguration
[6] DisplayReconfiguration for 2077750265: BeginConfiguration
[7] DisplayReconfiguration for 1104977158: Remove Disabled DesktopShapeChanged
[8] DisplayReconfiguration for 2077750265: Moved SetMain SetMode Add Enabled DesktopShapeChanged

With this patch, we restart the display link at notification 4 and 8.

In the future, we should switch to per-monitor (or per-window) vsync rather than
global vsync, and clean this whole situation up a little.

hi, we can't open it "it seems to be be damaged"...

Oh, shoot. That's probably because Apple Silicon has stricter requirements for signed apps. Let me ask around.

of course it's sure ;)

(In reply to tonerre26 from comment #99)

hi, we can't open it "it seems to be be damaged"...

Here's a workaround you can use. Apple Silicon devices are more strict in that they require all executables to be signed. In the developer target.dmg, some files are not signed and this appears to be tripping it up. You can self-sign the build to get around this (codesign -s -). These steps worked for me. After copying Firefox Nightly from target.dmg to the Desktop:

$ cd ~/Desktop
$ find Firefox\ Nightly.app/ -type f -exec codesign -f -s - {} \;
$ codesign -f -s - Firefox\ Nightly.app
$ open ~/Desktop
  1. Right-click on Firefox Nightly and select Open (this will not work the first time - click OK)
  2. Right-click on Firefox Nightly and select Open and then click Open again in the dialog.
Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/c27aac8c84b7
Tickle the CVDisplayLink after display reconfigurations, to help it get unstuck. r=mattwoodrow

(In reply to Haik Aftandilian [:haik] from comment #102)

(In reply to tonerre26 from comment #99)

hi, we can't open it "it seems to be be damaged"...

Here's a workaround you can use. Apple Silicon devices are more strict in that they require all executables to be signed. In the developer target.dmg, some files are not signed and this appears to be tripping it up. You can self-sign the build to get around this (codesign -s -). These steps worked for me. After copying Firefox Nightly from target.dmg to the Desktop:

$ cd ~/Desktop
$ find Firefox\ Nightly.app/ -type f -exec codesign -f -s - {} \;
$ codesign -f -s - Firefox\ Nightly.app
$ open ~/Desktop
  1. Right-click on Firefox Nightly and select Open (this will not work the first time - click OK)
  2. Right-click on Firefox Nightly and select Open and then click Open again in the dialog.

i'll wait about a fix in a future release...soon?

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 87 Branch

Markus, would it make sense to request uplift for the patch so it might be available in the 87 release? We are just about to start the new cycle, and it seems to be important enough.

Flags: needinfo?(mstange.moz)

Oh, please drop. The merge didn't happen yet so it already landed for 87.

Flags: needinfo?(mstange.moz)
Blocks: 1682713
No longer depends on: 1682713
Flags: qe-verify+
No longer blocks: gfx-triage

Comment on attachment 9204410 [details]
Bug 1422855 - Tickle the CVDisplayLink after display reconfigurations, to help it get unstuck. r=jrmuizel,mattwoodrow

Beta/Release Uplift Approval Request

  • User impact if declined: Needed if we want to uplift bug 1682713 to 86.
    On its own, this patch fixes a freeze on Intel machines after user switching. Combined with bug 1682713, it fixes a much more severe freeze on M1 machines after system sleep (and also after user switching).
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce: (QE testing should happen in bug 1682713)
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Very tightly-scoped fix.
  • String changes made/needed: none
Attachment #9204410 - Flags: approval-mozilla-release?
QA Whiteboard: [qa-triaged]

This issue was verified on bug 1682713. Please see comment 111 and comment 114 for more information. Based on those and comment 108, I'm going to close this as verified.

Status: RESOLVED → VERIFIED
Flags: qe-verify+

Comment on attachment 9204410 [details]
Bug 1422855 - Tickle the CVDisplayLink after display reconfigurations, to help it get unstuck. r=jrmuizel,mattwoodrow

Approved for 86.0.1, thanks.

Attachment #9204410 - Flags: approval-mozilla-release? → approval-mozilla-release+

Hello! The issue is no longer reproducible with Firefox 86.0.1 (20210310152336) by the following bug 1682713 steps. Tried several times on M1 mac mini 11.2.3 when switching users with fast switch and Firefox is working as expected.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: