Closed Bug 1368231 Opened 5 years ago Closed 3 years ago

Firefox on Windows 10 becomes unresponsive with near zero cpu usage with Malwarebytes installed

Categories

(External Software Affecting Firefox :: Other, defect)

x86_64
Windows 10
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: ptressel, Unassigned)

Details

(Keywords: hang, Whiteboard: [AV:Malwarebytes])

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0
Build ID: 20170518000419

Steps to reproduce:

Nothing specific leads to the issue.  It seems to happen after some time passes, on the order of 15 minutes to an hour after Firefox is started.


Actual results:

At some point, Firefox will become unresponsive.  Windows will flag this with "(Not Responding)" in the window title bar.  Looking at the Firefox process with Task Manager shows almost no CPU usage.  There is a pattern to the CPU usage -- a cycle of about a second, during which the CPU use is mostly zero, then there will briefly be a tiny amount of usage.  I've waited for up to 20 minutes -- Firefox does not recover.  I then have to kill the process.

The rest of the system and other apps are fine.

This started with version 53.0.2, I believe, and is still happening in 53.0.3.  This is the 32-bit version of Firefox, running on Windows 10 64-bit.  I have disabled Malwarebytes web protection and deinstalled McAfee Web Advisor -- the hang occurs even without them.  I've set the network connection to unmetered -- this makes no difference.  I see nothing in the event log.  I found nothing that looks like a log for Firefox.  Note I tend to have more windows and tabs open than is typical -- usually about 5-10 windows, with about 10-200 tabs each.  Most of these tabs are inactive -- I need them for occasional reference.  In case it's relevant, 53.0.2 automated update failed, and I had to install it manually.  I didn't allow the automated update of 53.0.3 to run, but went straight to a manual install.

(Note Firefox hung while I was trying to file this bug report.  I'm now writing it in a separate editor, and will try to post it right away after starting Firefox, or I'll have to use [some other browser] to do so.)

If you can tell me what to do, I'll be happy to help collect information that might help determine what Firefox is stuck waiting for.  (FWIW, I'm a software engineer with about 40 years experience.)


Expected results:

Not to become permanently unresponsive.
At the time automated updates began failing (error was during installation, not download), I changed the update option from "automatically install" to "check for updates".  Other update settings were left at default.  This time, after I started Firefox, I got suspicious, and changed the update setting to "never check for updates".  Firefox has been up for some 3 hours without becoming unresponsive since that change.  This could be just chance, and not relevant...
>> At the time automated updates began failing (error was during installation, not download), I changed the update option from "automatically install" to "check for updates".

Actually, I take that part back -- I think I made that change earlier, before 53.0.2.  If that option -- "check for updates" -- goes along with seeing the popup offering each new version, then I made the change earlier.  But the switch from "check for updates" to "never check for updates" was during this session, and Firefox has not hung yet.
Ok, that was a red herring -- it just hung again.  This time I looked at it with Process Explorer (SysInternals).  What little CPU usage there is, is (almost) all in one thread:

Start address: xul.dll+0x4ec077
State: Wait:WrLpcReply
Stack:
ntdll.dll!NtRequestWaitReplyPort+0x14
wow64.dll!Wow64AllocThreadHeap+0x3eab
wow64.dll!Wow64SystemServiceEx+0x155
wow64cpu.dll!TurboDispatchJumpAddressEnd+0xb
wow64.dll!Wow64KiUserCallbackDispatcher+0x4151
wow64.dll!Wow64LdrpInitialize+0x120
ntdll.dll!LdrInitializeThunk+0x16d
ntdll.dll!LdrInitializeThunk+0xe

I did see a very brief bit of CPU use in another thread:

Start address: firefox.exe!GetHandleVerifier+0x27ff
State: Wait:Suspended
Stack:
ntdll.dll!RtlEnterCriticalSection+0x18
mbae.dll+0x1cf24
Summary: Firefox 53.0.2 and 53.0.3 on Windows 10 becomes unresponsive with zero cpu usage → Firefox 53.0.2 and 53.0.3 on Windows 10 becomes unresponsive with near zero cpu usage
I expect you'd need the arguments to the LPC call to know what it's trying to talk to.  Again, I'm willing to help get information.

With checking for updates disabled, Firefox is staying responsive quite a bit longer than with update checks enabled.  Perhaps there is more than one action that provokes the hang.  Note the stack trace above was from a hang with updates disabled.
I'm seeing about the same thing as Pat. I'm also using the 32-bit version of Firefox, running on Windows 10 64-bit (V1703), and no different addons/extension from earlier Firefox versions.

When I see the hang, I don't usually see any reporting by Windows of Firefox not responding. What I see is no response in the current tab, and clicking on other tabs results in them being either non-responsive if there is content being rendered, or nothing visible if the tab has not been visited since the last Firefox restart F5 or clicking reload has no effect. Restarting Firefox always cures the hang until the next time. The hang seems to come after several hours of use. I'll need to keep better track of that. But, I've had to restart Firefox at least once a day now. Coming out of hibernation doesn't seem to be a cause.
https://developer.mozilla.org/en-US/docs/How_to_Report_a_Hung_Firefox
Severity: normal → critical
Keywords: hang
OS: Unspecified → Windows 10
Hardware: Unspecified → x86_64
Thanks, Yang?

I'll run crashfirefox on the next hang, which will give us a sample without the update check running, then I'll turn the update check back on and get another dump. I can get set up for debugging if that would help.

Thanks for the report, Tim!

What I mean by Windows flagging Firefox as unresponsive is that it puts the text "(Not Responding)" in the title bar to the right of whatever else is there. Also, when I have the mouse cursor over any Firefox window, within the frame, the cursor switches to its "waiting" circle form. Clicking on tabs does nothing, so I don't see pages not rendering. That may mean your state is not quite the same. Do you have SysInternals Process Explorer(https://technet.microsoft.com/en-us/sysinternals/processexplorer.aspx)? If so, run that, select (click) the firefox.exe line, in the top menu go to Process -> Properties, look for threads that have some cpu activity, select one, click the Trace button. Do they look like the same traces as mine above? Or, collect a dump as per Yang's link.

I, too, have no changes in addons, and since I've deinstalled McAfee Web Advisor and turned off Malwarebytes web protection, there shouldn't be anything reaching in and blocking stuff. I don't have any ad blocking or other intrusive addons -- just Firebug, ChatZilla, SQLite Manager.  (Shockwave Flash is installed but disabled.)
(In reply to Pat Tressel from comment #7)

> Thanks, Yang?

So much for proofreading. Not sure how ? got substituted for !, but that should have been:

Thanks, Yang!

Don't see a way to edit comments...
Ok, it just hung again, and I ran crashfirefox.exe, but...it was apparently unable to kill Firefox.  In its cmd window, it listed the various processes it found while looking for Firefox, then stopped when it got to firefox.exe.  But nothing happened after that -- the Firefox windows were still up, and the process was still present in Task Manager.  So it seems it's hung in a way that whatever crashfirefox.exe is doing cannot interrupt.

The two other options, given in the "how to report a hung Firefox" document, that apply to Windows, are to use the Crash Me NOw addon, and to run under a debugger.  The former won't work, as it requires Firefox to be responding in order to invoke the addon.  Looks like it's time to start getting set up for the latter option...
(In reply to Pat Tressel from comment #9)
> Ok, it just hung again, and I ran crashfirefox.exe, but...it was apparently
> unable to kill Firefox.  In its cmd window, it listed the various processes
> it found while looking for Firefox, then stopped when it got to firefox.exe.
> But nothing happened after that -- the Firefox windows were still up, and
> the process was still present in Task Manager.  So it seems it's hung in a
> way that whatever crashfirefox.exe is doing cannot interrupt.

Try run the crashfirefox [Process ID].
Yang --

Ok, did that.  As one would expect, it was not able to stop Firefox that way either.  It was fairly clear that it found the correct process the first time, so specifying the process wasn't going to help it.
Here's another way to approach this.  With update check enabled, the hang appears to happen within an hour of starting Firefox.  With update check disabled, the hang does not happen for about 3-4 hours -- I have not been keeping track of the precise time.  So the question is, what procedure is update check using to contact whatever service it is using?  What else uses that same procedure?  Given that we know what update check is using, can we cause that to be invoked, and see if that gets a hang?  Then we could add logging to that path.  This has the advantage of not interferring with the possibility that this involves multiple threads.

Another option would be to run under debugging, with a breakpoint in update check.  Turn update check back on, preferably set the interval to something less than the default.  From the breakpoint, attempt to step or continue to where the hang is.  This has the disadvantage of attempting debugging with multiple threads.
I'm seeing a slightly different behavior now, perhaps because I'm using Firefox closer to when it starts to go south. In the threads tab of properties of Firefox from Process Explorer, I see that the thread GetHandleVerifier is using between 20% & 25 % of the CPU. All other threads use less than 1% of CPU.

Rendering is fragmented; showing only parts of the window while the rest is black. Result is the same when I switch tabs. I had 15 tabs open when this happened today. This is happening about once a day. Restarting Firefox cures the problem for about a day. Not sure there is any correlation, but it's been happening about 2-4 hours after coming out of hibernation in the mornings.
Tim -- I just saw a hang in which the thread with an address referencing GetHandleVerifier was the only thread with a bit of CPU usage.  This hang happened within about 10 minutes of starting Firefox, so this is a change from what I'd seen before.

There was also a small amount of network traffic, in addition to the very small amount of CPU use.

One caution with the appearance of named locations in the Process Explorer thread display or trace:  We don't have symbols, so the only names available will be globals.  Those might not be the nearest symbols if we had the complete symbol table available.  So we can't say with certainty that it's actually in GetHandleVerifier when it hangs.

I'm wondering if the one thread that remains active has taken out a lock and isn't giving it up, and if other things eventually end up waiting on that lock.  Maybe whatever it's waiting for is expected to be quick, and it doesn't have any "give up for now and release the lock" fallback if whatever-it-is isn't responding quickly.

So the question still is, what's the best way to get more information about what it's doing when it hangs?
I just had the first hang event since yesterday morning. With this one, I was able to successfully crash it with crashfirefox.exe and sent a report referencing this bug. CPU usage on the GetHandleVerifier thread was as before.

Pat, do you have any insight on how Process Explorer might display a thread name that was not correct? I would have thought all reported symbols would be correct. Hopefully, the information in the crash report will shed some light on the thread states.
Tim --

That's great that you got a dump!!

Re. Process Explorer's trace:  It's not that the thread address info is wrong, but that it's of little use.  If the executable has only global symbols, then the only way Process Explorer can show the addresses is as (large) offsets from the nearest preceding global location (i.e. an address label that happens to be a global).  There might be many local addresses in between that global and where the location on the call stack is.  So we don't know what (local) function the call is in.

Now, because we have the build number for the copies of Firefox we're using, we could either get the symbols for that version from the Mozilla symbol server (if they're available), or we could build exactly that version.  With that, we could find out what function label is closest to each location shown by Process Explorer.  Or, if we could run a copy that was built with local symbols, then maybe Process Explorer could show the addresses as offsets from the nearby local address labels.  But if we're going to do that, we may as well get set up for debugging.  Then when it hangs, we could connect to the running process.  That would probably allow us to get not just the call stack addresses, but also the call arguments.

Since crashfirefox worked on your hung Firefox, that's another hint that we may have somewhat different situations.  It seems that crashfirefox depends on having code run *within* Firefox, i.e. the crash reporter.  Mine seems to be so solidly hung that that can't happen.  Now, Windows can kill it, so it's not hung is some way that Windows can't stop it (and, yes, that's possible).
So, I'm looking for a hint as to the next step, given that I cannot get a dump.  The "how to report a hung Firefox" page says:

> Instead of crashing Firefox, more advanced developers who know how to use a debugger can try attaching to the Firefox process and debugging in place. This is only useful if you have Firefox debug symbols, which are available on Windows from the Mozilla symbol server.

So, yes, I've used lots of debuggers.  ;-)  Is this what the above refers to?

https://developer.mozilla.org/en-US/docs/Tools/Remote_Debugging/Debugging_Firefox_Desktop

There are several potential problems.  First, that requires a listener to be running in the target Firefox.  Given how thoroughly Firefox is hung, that listener may be unable to respond after the target hangs.  Second, the WebIDE allows connecting to a tab.  I have at the moment 6 windows with on average 50 tabs.  I want to examine a specific *thread* that may not be a tab.  In particular, I want to examine the thread that appears to be waking and using a little CPU time periodically.  Third, this is not something that we can expect to happen in a bare session with just a tab or two active -- it may require a very large number of windows or tabs, and may even require some particular site open or some particular setting.  So, I'd have to use my real profile and session.  So I can't expect to get the hang if I don't have those hundreds of tabs open.

Now, in other forms of remote debugging, one can examine a separate *process's memory* from a debugger.  The process does not need to be capable of communicating with the debugger.  This is essentially the same as debugging a dump, except one is looking at an in-memory process.  This might be more appropriate for a hung process.

Any advice would be appreciated.
Another option might be to turn on logging, then after some hangs, look at the end of the log, and see if there was something distinctive that happened.

Early this morning, Firefox hung repeatedly, much more quickly after restart than it has been.  That implies either there is an element of chance, or that there is something happening at that time that triggered the hangs, perhaps activity caused by a particular web page.

Tim -- Is your dump attached to this bug report? or was the bug ID included when you submitted the dump?  If not, it might be useful to attach the dump here.  Also, you could vote for the bug, in case that determines how much attention it gets.
(In reply to Pat Tressel from comment #18)
> Tim -- Is your dump attached to this bug report? or was the bug ID included
> when you submitted the dump?  If not, it might be useful to attach the dump
> here.  Also, you could vote for the bug, in case that determines how much
> attention it gets.

I've submitted two crash reports so far, and referenced this bug in both.
Thanks, Tim!!

I got a full dump using (SysInternals) Process Explorer, and also have the thread id and stack trace for the one active thread (for the same hung process as the dump).  I'll see if I can get symbols for this version (52.0.3), and look into how to examine the dump, with the goal of getting function names and arguments on the stack.  Again, I'm assuming that the one active thread is waiting for something that's not happening (and is maybe holding a lock that other things need).  If that doesn't make sense, then let me know -- it will mean we'd have to look at other threads.  I did see one other thread very briefly active while I was examining the process in Process Explorer.  The dump is 1.5MB, so I won't try to attach it unless requested.

It hung once (with the same signature as the other hangs) shortly after starting, and before it displayed the session recovery form.  That implies the hang does not depend on behavior of the specific pages that are open.
(In reply to Pat Tressel from comment #20)

> I got a full dump using (SysInternals) Process Explorer, and also have the
> thread id and stack trace for the one active thread (for the same hung
> process as the dump).  I'll see if I can get symbols for this version
> (52.0.3), and look into how to examine the dump, with the goal of getting
> function names and arguments on the stack.

I mean...53.0.3.
It's gotten much worse over the past two days -- it now hangs within half an hour of starting.  I turned off all the automatic updating that I could find (and that I hadn't already), and that didn't help.  So it's being provoked by something else.  The signature isn't quite the same as before.
Tim points out that new version 54 contains changes that might be relevant.  (Thanks, Tim, for researching the relevant changes and settings!)  We've both installed it.  I did a fresh install as per instructions on the Mozilla site -- renamed the installation folder, then ran the installer.

Tim has dom.ipc.processCount set to 4.  I left my settings at the installed values; dom.ipc.processCount is 1, as I have an incompatible extension (SQLite Manager).  (And besides, I have it installed because I use it...)  Rather than deinstall that, I've left multiprocess disabled, so we have examples of both cases.

So...yes, it hung.  Signature was different from previous version hangs.  This time, there were three xul.dll threads using a small amount of CPU time.  Stack was the same in all cases, but one was in state Wait:DelayExecution, and the other two in Wait:UserRequest.  I have a minidump and the TIDs of the three xul.dll threads.

I had a look at the threads while FF was running normally.  (The normal behavior has also changed in v54.)  Most xul.dll threads do not remain running -- they have a very brief amount of runtime, then done.  One of them seems to be running fairly continuously.  There is a thread with address firefox.exe!GetHandleVerifier+0x3673 that has the most runtime.  When hung, this thread has no runtime.

If someone would like the minidump, let me know how to submit it.
Tim are you getting any Firefox crash reports (Not just MS reports) would you please let us know the Crash !D of each of those including the bp- Entering that in a bug comment creates a link to the processed report itself.


It could be worth resorting to a process of elimination here, as we can be fairly certain this issue is not seen by the majority of Firefox users or we would see a flood of bug reports and other reports. Install a new copy of the latest Firefox nightly as an additional browser. Run that in Firefox's safe mode and in Windows safe mode is it possible to recreate the unresponsiveness like that? If not at least it gives a few leads as to potential causes. To me 32bit Fx52 & 53 seem to be a bit of a memory hog on Windows 10 but is not producing unresponsiveness, other than presumably related to memory use.

There is an option to attach files to bug reports it should have a link towards the top of the page or use https://bugzilla.mozilla.org/attachment.cgi?bugid=1368231&action=enter . I will not understand the minidump, but others may like to see it.

There is a tool for tracking down regressions. However if this is 15 minutes to an hour to discover whether or not Firefox is affected that is not going to help it would take too long. Perhaps though this could be tested in both Nightly and ESR that may at least go some way to working out which major versions are affected IF this is a Firefox regression.
(In reply to John Hesling [:John99] (NeedInfo me) from comment #24)
> Tim are you getting any Firefox crash reports (Not just MS reports) would
> you please let us know the Crash !D of each of those including the bp-
> Entering that in a bug comment creates a link to the processed report itself.
> 

John, For Firefox versions 53.x I submitted a few crash reports after forcing the crash. The first two of these have IDs of

     bp-7db8c870-19fc-4304-b2db-dc9b10170610
     bp-1e2b35f7-3bc9-449e-af99-9ca3b0170614

I've since moved to v 54.0 (32 bit) and set the dom.ipc.processCount to 4. At first I was getting occasional very unresponsive behavior, but I noticed that coincided with 100% disk usage shown in the Windows Task Manager. Some of that was traced to my Nod32 antivirus program doing start-up scans even after coming out of hibernation. I changed the Nod32 start-up exclude e-mail files and archives. This has significantly reduced the amount of time I see 100% disk usage, and I have not seen Firefox at this version show the unresponsiveness I'd seen with v 53. I have been running the 54.0 version now for over 3 days with no problems. 

For a final test of v 54 with regard to this bug, I'm going to drop the processCount to 1, and enable the 3 add-ons which were not compatible with multiple process Firefox. If that does not bring back the unresponsive issue, I'll consider this issue fixed, at least for me, in v 54.

Pat, what's your experience with v 54 now that you've had some experience running with it. That is, since your last post here. Is your hang coincident with high disk usage?
Tim, John --

V 54 hangs anywhere from 10 minutes to several hours after being started.  No, there is no high disk usage -- there is a *tiny* amount of *network* traffic, just as the hang "sets in".  Previously, under v 53, I disabled both McAfee and Malwarebytes scan of web activity, but that made no apparent difference in whether there was a hang or how soon.  McAfee is still disabled, but I've enabled Malwarebytes web protection again.  It sounds like John's issue is different, and is related to his antivirus software.

All threads are in some form of wait state.  Typically, one thread is accruing a bit of cpu time, but whenever I look at it, it's back in wait.  That thread is usually in an LPC wait, such as Wait:WrLpcReply.  In other software where I've seen this pattern, it's due to a global lock being acquired and not released, and other threads eventually needing that lock, and waiting on it.  Since this involves LPC, that thread may be trying to make a request of some other local process or a thread in the same process.  Perhaps that other service is failing to respond.  This is, of course, just guessing...

The point of getting a Windows dump (since crashfirefox was unable to get a dump) is to find out what each thread is waiting on.  Especially the one thread that has some runtime is of interest.  Interpreting this doesn't require special Windows knowledge -- we just need local symbols so we can see what was being called.  I believe the distinction between a minidump and a full dump is whether the program memory is included.  That's not needed if one has the matching image file.

Perhaps this hang is frequent enough to notice, for me, because I've got a lot of windows and tabs open.  I don't know what's typical.  I usually have about 6-10 windows, and about 50-100 tabs per window.  On the other hand, it did hang once on startup, *before it even displayed the session recovery page*.  If I run Nightly, I could copy the profile, so that it has the same session.  At the moment, it seems simpler to find out what Firefox is trying to do while it's hung.  Since this has not gotten other reports, it won't be something people are trying to fix, so it's unlikely that any changes in Nightly have fixed it.
(In reply to Pat Tressel from comment #26)
> Tim, John --
> 
>...It sounds
> like John's issue is different, and is related to his antivirus software.
>
Pat, I think you meant my issue. But, just to be clear, I believe my Nod32 antivirus is only affecting my Firefox performance in that it can contribute to high disk usage. In my issue, high disk usage seems to starve Firefox of it's access to the disk, and leads to the problem I was seeing. Now that I have solved my high disk usage issue, Firefox is not exhibiting a performance problem for me. 

I've now limited Firefox's processes to one (down from four), and enabled the add-ons that were incompatible with multiprocesses, I'll be looking for any change in performance, and report my findings here.
HiPat,

I tested this with Fx53 and latest nightly, but couldn't manage to reproduce.
Could you attach your "about:support" information? Thanks
Flags: needinfo?(tressel)
My experiment with v54 (now 54.0.1) with only one Firefox process is now concluded. Firefox only remained usable for about a day before it became complete non-responsive, and I would have to kill it. I began the experiment with all of my add-ons enabled included those that were incompatible with the muliprocess Firefox. After each season became non-responsive, I disabled one more of the incompatible add-ons, and restarted. The last session to become non-responsive today had no incompatible add-ons enabled.

In the last session, there were 15 active tabs. There was very little CPU usage from Firefox or any of the other processes running. One tab that was streaming audio content continued to deliver the audio. Disk usage was very light.

Visual symptom was a black content area for each tab. All of the multiprocess add-ons were disabled, so it is not related to them.

I forced a crash and the Crash ID: bp-f2c3caed-3513-46cc-912f-3ad730170707.

I will now go back to running 4 Firefox process which seemed to be very stable for me.
Flags: needinfo?(tressel)
Summary: Firefox 53.0.2 and 53.0.3 on Windows 10 becomes unresponsive with near zero cpu usage → Firefox 53.0.2, 53.0.3, 54.0, 54.0.1 on Windows 10 becomes unresponsive with near zero cpu usage
At this point, Firefox is hanging frequently, sometimes only a few minutes after restart.  I've closed most windows -- there are just three open at the moment, with on the order of 50 tabs per each.

A few other observations:

Focus sometimes switches from one window to another without user action.  There are some intervals during which Firefox becomes unresponsive, but then recovers.

Some session cookies seem to be getting corrupted.  These are on sites that may be actively writing cookies at the time Firefox hangs.  Symptom is that session login does not persist across browser restart, but deleting the cookie and thus having it re-created allows login to persist.  I wonder if there is some other stored information that might be corrupted.

There is no CPU or disk contention -- the system is mostly idle.  As mentioned, Firefox has some periodic CPU use while it is unresponsive, and also a small amount of network use.  It would be really nice to know what it is waiting for, or what it is trying to do.  It still looks like LPC is involved.

I have scanned with both McAfee and Malwarebytes.

I should note I'm now (as of yesterday) running Windows 10 Creators Update, v 1703.  I reinstalled Firefox earlier, but not since the upgrade.

I will try the inverse of Tim's test:  I will disable the SQLite Manager plugin, and enable multiple processes.
Here's an odd, but consistent, thing:  If Firefox is sluggish and behaves oddly during session restore, then it will very likely hang quickly.  Also, it will experience episodes of non-response that clear up, before finally hanging.  In the current session, by contrast, the startup proceeded normally, and did not take much longer than expected.  It hasn't hung yet, and it's been several hours.  So perhaps some event happens that later causes the hang, and sometimes that event happens right away.  I'm mentioning this because it impacts the test I was intending to try -- if this is triggered by an event, it might not happen, by chance, while I'm conducting the test.  On the other hand, I do still expect an eventual hang.
Hi  Pat,
I added all the add-ons from comment 30, but could not manage to reproduce it.
Is this still an issue with the latest build? Thanks
Flags: needinfo?(tressel)
Apologies for the delay -- I was having problems with the Windows 10 Creators Update (Explorer crashes, and printer not recognized), and have had to revert it.

Abe, the only relation to the add-ons is that one of them precludes enabling multiple processes. None of the add-ons is in use when there is a hang. It may take quite a while to observe the problem. How long do you have Firefox running before restarting it?

I'll update Firefox later today, but will not enable multiple processes, then wait to see if there is a hang. I can usually count on a hang once a day, but I've taken to restarting Firefox without waiting for it to hang, so I'll have to leave it up for longer than I have been.
Hmm.  The Firefox download page says I'm running the latest version.  I have 54.0.

Abe -- Did you mean you didn't see the problem with Nightly, rather than the latest released version?

I still need to make the test with multiple processes enabled.

I should mention, I did see the hang when I had the Windows Creators Update installed.
Ok, I finally got pushed to disable the one plugin that was incompatible with multiprocessing when Firefox was hanging every 10 minutes or so.  With that plugin disabled, I restarted Firefox, and it now says multiprocessing is enabled.

The behavior has changed, but it is still hanging.  The difference is that *the UI* is still responsive -- I can still use the menus, open new tabs, and such.  However, after about the same amount of time, Firefox stopped being able to load pages.  If I open a new tab, enter an URL in the location bar, hit return, the blue spinner appears in the tab, but nothing else happens.  If I open a tab that was already in the session, but that I hadn't yet opened (which usually triggers at least displaying the cached copy of the page) there is a spinner on the blank page (not the blue one -- this has radial lines).  It is not showing the status messages telling what it's doing.
Flags: needinfo?(tressel)
I ran crashfirefox, and this time, it was able to send a crash report.  I included a link to this bug, so one hopes that dump can be associated with this.
Ok, found the dump ID.

https://crash-stats.mozilla.com/report/index/189661fe-2873-43c6-9cb4-9af360170809

(And a good thing, too, because the crash reporter, or whatever records the crash, had *removed the bugzilla URL* that I included in the comments, so there would be nothing to tie that report to this bug.)
I should also mention:  I restarted Firefox after the above episode.  When I exited from Firefox later (using File -> Exit), one of the processes did not exit.  I've waited about 5 minutes, and it's still there, using no CPU time.  I wonder if that might be a leftover from the hung Firefox...?

(Normally, I wait for Firefox to exit before trying to shut the system down, else shutdown may stop, waiting for Firefox.)
I was hoping the hang with multiprocessing enabled was just some unrelated quirk, but no, it's just happened again.  I have four windows open.  The main process is exhibiting the same pattern of periodic small CPU use and small amounts of network activity.  None of the pages in any window are active.  The UI for a window (menus, tabs) is still operable, but any UI that is related to an individual page is not -- in particular, scroll bars do not work.

Anyhow, enabling multiprocessing does not fix the problem, it just obscures it.
To be clear, by "main process" I mean the parent process of the four window processes.  There was also a background process.
The UI is sluggish in the multiprocess version, enough so that I can't tell if it's hung, or just that it's going to take a minute to act on some attempted operation.  Also, it's harder to tell when it's hung, so I waste several minutes fruitlessly trying to scroll or poke links before catching on -- with multiprocessing off, at least I know it's hung quickly.  I've turned multiprocessing back off.

I'm going to install 55.0.1 and see if there's a difference.  At this point, since there isn't much interest in this issue, my best bet is to see if I can find version 53.0.1 and just stop there.
I've been running 55.0.1 for about half a day, and there are some odd things...  First is that although multiprocessing is *not* enabled, there are five processes -- a parent process and one per window.  There is no background process.  Second is that several times, it has "almost hung" -- it became unresponsive for several minutes, then eventually came back to life.  But unlike the hang that is the subject of this issue, the new form of "temporary" hang goes with *high* CPU usage -- it's using one entire core.
Hi Pat,

Thanks again for taking time on this. Could you provide the memory report? 
Steps are described here: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/about:memory
Flags: needinfo?(tressel)
Hi, Abe!  Sorry about the delay in responding!

One problem is that I can't get the memory info that way *after* Firefox is hung, which is when I suppose it might be most useful.  I'll get the about:memory report now.  Windows Task Manager shows about the same memory usage as when Firefox hangs -- it does not do anything that looks like a memory leak.

When Firefox is hung, it generally is not using a huge amount of memory.  I'm down to four windows now, with maybe 50-100 tabs per window.  The machine has 8GB RAM, and Firefox is generally using less than 1GB when it hangs.

It's hung three times today so far.  (I'm starting to remember to type text into a separate editor and paste it...I'm doing that right now, for this very post.  :D )

I should check for an update to Firefox.  I have not been auto-updating, because the auto-update process tends to fail.  At need, I can install Nightly.  I can also try crashfirefox.exe again, but it could not stop the single-process version of Firefox.

I'll attach the about:memory report shortly.
Flags: needinfo?(tressel)
Attached file Output of about:memory
Hi Eric, comment or suggestion on this memory report, comment 46? Thanks
Flags: needinfo?(erahm)
Summary: Firefox 53.0.2, 53.0.3, 54.0, 54.0.1 on Windows 10 becomes unresponsive with near zero cpu usage → Firefox 53.0.2, 53.0.3, 54.0, 54.0.1, 55.0.1 on Windows 10 becomes unresponsive with near zero cpu usage
(In reply to Abe - QA (:Abe_LV) from comment #47)
> Hi Eric, comment or suggestion on this memory report, comment 46? Thanks

>     0.00 MB ── gfx-surface-quartz
> 1,123.48 MB ── gfx-textures
> 2,855.58 MB ── gfx-textures-peak
>      14,473 ── page-faults-hard
>  11,215,608 ── page-faults-soft
> 1,889.56 MB ── resident
> 2,424.27 MB ── resident-peak
>   495.23 MB ── resident-unique
> 6,777.04 MB ── vsize

We're using a large amount of graphics memory, not sure if that's a leak or you're doing some graphics intensive stuff. Also the *peak* values are pretty high. There's a large amount of page-faults, that can definitely lead to slowdown. My guess is that Firefox is competing with other applications for memory and ends up swapping in and out of the pagefile rather aggressively, but in early comments you indicated there was no disk activity so I might be wrong here.

Also interesting is that you stated you're using 32-bit Firefox on 64-bit Windows, but we have a `vsize` of 6,777.04 MB which should not be possible. Additionally I'd expect there to be a `vsize-max-contiguous` number but that's missing. Can you confirm the version of Firefox you're using? Also the latest release is 56, can you confirm that this still reproduces (I assume it does but it helps to double check).

My general recommendations here would be:

#1 - See if you can reproduce in safe mode [1]
#2 - Try a Firefox refresh [2]

mconley any other thoughts?

[1] https://support.mozilla.org/kb/troubleshoot-firefox-issues-using-safe-mode
[2] https://support.mozilla.org/kb/refresh-firefox-reset-add-ons-and-settings
Flags: needinfo?(tressel)
Flags: needinfo?(mconley)
Flags: needinfo?(erahm)
Summary: Firefox 53.0.2, 53.0.3, 54.0, 54.0.1, 55.0.1 on Windows 10 becomes unresponsive with near zero cpu usage → Firefox hangs on Windows 10
Flags: needinfo?(tressel)
Summary: Firefox hangs on Windows 10 → Firefox on Windows 10 becomes unresponsive with near zero cpu usage
A performance profile on Nightly might help us see what's slowing your Firefox down, assuming the content processes are sluggish and not completely hung:

https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem
Flags: needinfo?(mconley) → needinfo?(tressel)
Eric --

Don't hit me, but I had to change the bug title back.  ;-)  Firefox does not "hang" in the usual sense, of being completely frozen.  There is still both CPU activity *and* network activity.  From the pattern of activity -- small amounts of activity with a fixed periodicity -- it seems to me that Firefox is *waiting for something*, and *retrying or polling for a response*.  If we can find out what it's waiting for, then we will be a lot closer to solving this.  Just sayin'...

(If you need the word "hang" to appear in the title for search purposes, then we can add it.  But I'd really like to make it clear that this is not the usual meaning of hang.)

I'd be happy to blame this on my somewhat flaky network connection (via my cell phone's wifi hotspot), but it also happens when  I'm on other networks.  I don't currently have access to a wired connection, but I think this started before I switched to using my phone.

My Windows 10 is current.  I briefly had the Windows Creators' Update installed, but had to revert that due to missing drivers.

I'm currently running 55.0.1, and it's definitely the 32-bit version.  I don't let it attempt to auto-update, because that fails.  I'll install 56.  I was going to ask:  Should I try the 64-bit version?
Mike --

Ok, I'll install Nightly.  Any special instructions, e.g. to enable debugging info?  Should I stick with the 32-bit version?

Once it gets in the unresponsive state, it does not revive.  I've let it sit for hours.  It doesn't seem to be a performance issue.  It seems to be off doing some one thing, and ignoring other things.  Again, if we could find out what it's waiting for...

Hmm.  I could look for network traffic.  The fact that there is periodic network activity doesn't necessarily mean it's trying to contact some external site -- it could be using a network protocol to contact another local process.  But if it is trying to contact some other site, that would be a clue.

I should mention that none of Malwarebytes, McAfee, Windows Defender find any signs of infection.
Hm, normally I'd ask you to use crashfirefox.exe to try to kill the content process to see where it's stuck, but it sounds like you've already tried that, and it (for some reason or another) fails to kill the content process.

Does using crashfirefox64.exe allow you to crash the Not Responding content process?: https://ftp.mozilla.org/pub/utilities/crashfirefox-intentionally/crashfirefox64.exe
Mike --

crashfirefox didn't work on the (old) single-process version, one presumes, because the process is not able to service whatever signal it's sent.  It did work when I installed a later, multiple-process version, with multiprocessing enabled, and I attached that dump.  Right now, I have multiprocessing disabled because I have SQLite Manager installed, and it's not yet been revised for multiprocessing.  But I can disable it to run crashfirefox.  However, it may be that only the UI process is able to respond to crashfirefox after the content process gets in the unresponsive state.

Does the reference to crashfirefox64.exe mean you'd like me to install the 64-bit version?
(In reply to Pat Tressel from comment #53)

I see.

Could you please keep multi-process disabled, and use crashfirefox64.exe to attempt to crash the parent process when in this state?
(In reply to Mike Conley (:mconley) (:⚙️) - Backlogged on reviews from comment #54)

Mike --

Ok, I have 56.0 32-bit installed, and multiprocessing is off.

Some questions...

-- Does crashfirefox64 imply that it wants the 64-bit version of Firefox, or does it merely mean that's what's used with a 64-bit version of Windows?  I think I ran just plain crashfirefox.exe before...

-- Is Nightly (using the installer) built with local symbols?  If so, then I could look at it with Process Explorer, as Tim and I were doing earlier, find the thread(s) that still have small (apparently periodic) CPU and network activity, and get a better idea of where they are in the code.  Also, Task Manager has an "analyze wait chain" feature, intended for hung processes, that we could try.

-- I'll need a separate profile for Nightly -- that gives us an opportunity to see if the profile is part of the problem.  Two options:  Start with a copy of the current profile, which may have something to do with the not-quite-hang, or start with a fresh profile, and start adding load til there's a problem.  Is one of these a better option?  Starting with a fresh profile might give us a clue whether this is something to do with specific websites -- I could do a "binary search" over which tabs I have open.  :D  :D  :D
This minidump was captured using SysInternals Process Explorer.  It is Firefox v56.0 32-bit, and was "not responsive" for several minutes before the dump was captured.
(In reply to Mike Conley (:mconley) (:⚙️) - Backlogged on reviews from comment #54)

Mike --

Ok, v56.0 went unresponsive, and I tried running crashfirefox64, but it didn't stop the process.  Instead, I used Process Explorer to get a minidump, which I've attached.
Hi Pat, thanks for the minidump - that was very helpful.

I see the main thread stuck waiting for something inside mbae.dll, which (apparently) is MalwareBytes.

So MalwareBytes, despite being disabled (as you did in comment 7), still seems to be injecting itself.

Do you have the most up-to-date version of MalwareBytes?
Hi, Mike!  Sorry about the delay (excuses: politics, hackathons, fixing a friend's laptop,...).

Oho!!!  So we have a potential culprit -- good sleuthing!  Malwarebytes reinstalled itself on 10/3 -- the current version is 3.2.2.2029.  That didn't end the hangs.  I have deinstalled it.  There has been at least one hang per day recently, so if there are no hangs for a few days, that "potential culprit" can be elevated to "culprit".  Note I have a paid version of Malwarebytes, which, one presumes, means I get to whine about it.

(I'm leaving "needinfo" turned on, as I'll need to report back on the outcome.)
> (I'm leaving "needinfo" turned on, as I'll need to report back on the
> outcome.)

Pat, any update on this?
Abe --

Sorry about the delay!  I was hoping to collect one more bit of information, but what I have now is pretty definitive:  With Malwarebytes deinstalled, there have not been any hangs.  I was intending to get the latest version and install that, to see if it's still happening.

But it seems close to certain that this is Malwarebytes fault -- it's not returning from whatever it was trying to do.  So we can close this.

So, I should now reinstall Malwarebytes, and if the problem comes back, let Malwarebytes know about it.  Would you like me to post any information I get back from Malwarebytes?  Is it ok to post here, even after this is closed?
Flags: needinfo?(tressel)
Maybe we know some folks who work on MalwareBytes who can look into this. Hey astevenson, do we have anybody working on MalwareBytes we can point this report at?
Flags: needinfo?(astevenson)
Not currently but they are on my list. Will reach out to them and highlight this bug report.
Flags: needinfo?(astevenson)
Is this a consumer version of MWB or is it enterprise?
Flags: needinfo?(tressel)
Summary: Firefox on Windows 10 becomes unresponsive with near zero cpu usage → Firefox on Windows 10 becomes unresponsive with near zero cpu usage with Malwarebytes installed
Component: Untriaged → Other
Product: Firefox → External Software Affecting Firefox
Version: 53 Branch → unspecified
Whiteboard: [AV:Malwarebytes]
(In reply to Wayne Mery (:wsmwk) from comment #64)
> Is this a consumer version of MWB or is it enterprise?

Hi, Wayne!

This is the consumer version.  I still need to download and install a new version to try out, in case there has been a change.  I should be able to do that tomorrow.
Flags: needinfo?(tressel)
Flags: needinfo?(tressel)
Hello Pat, we are in contact with Malwarebytes. In order to debug this issue, they would need a full dump.

Under the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps registry key, set DumpType to the value 2.
You can find more info at https://msdn.microsoft.com/en-us/library/windows/desktop/bb787181(v=vs.85).aspx.

We can't do much without additional information from the original reporter. Closing.

Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
Flags: needinfo?(ptressel)
You need to log in before you can comment on or make changes to this bug.