Closed Bug 570316 Opened 14 years ago Closed 3 years ago

sporadic hang in futex_wait_queue_me state on Linux

Categories

(Core :: General, defect)

All
Linux
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: tonymec, Unassigned)

Details

(Keywords: hang)

Attachments

(1 file)

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.3a5pre) Gecko/20100510 Lightning/1.1a1pre SeaMonkey/2.1a1pre - Build ID: 20100510003927

Sporadically (let's say no oftener than about once a day, unpredictably, and also when I'm sleeping in the other room) SeaMonkey gets hung, consuming no CPU time AFAICT, and I cannot wake it up. I can still shut it down with kill -15. When switching virtual desktops to give it focus, the titlebar of the window gets drawn, but not the chrome and not the content, however long I wait. During such hangs the ps program reports "futex_wait_queue_me" in the WCHAN column for the seamonkey-bin process.

Once I tried to start a second SeaMonkey without -no-remote and nothing happened. No "SeaMonkey is running but not responding" popup. The second instance would not even close on kill -15 (kill -9 was necessary); the first one still did.
Hey, sh**! Just happened shortly after I filed this bug: I was rereading comment #0 and scrolling stopped working. Sure enough, Sm was in futex_wait_queue_me again. Never had it so soon after startup. (I reported this bug shortly after restarting because of the bug.)
Happened again while typing a mail. This time I used kill -4 i.e. SIGILL to get «some sort» of a report: bp-d06674a4-fb70-4b0e-a9ab-e905a2100608 as shown below. Don't know if useful but I couldn't think of anything else.

Signature	linux-gate.so@0x424
UUID	d06674a4-fb70-4b0e-a9ab-e905a2100608
Time 	2010-06-08 15:27:17.77316
Uptime	29080
Last Crash	136035 seconds before submission
Product	SeaMonkey
Version	2.1a1pre
Build ID	20100510003927
Branch	1.9.3
OS	Linux
OS Version	0.0.0 Linux 2.6.31.12-0.2-desktop #1 SMP PREEMPT 2010-03-16 21:25:39 +0100 i686
CPU	x86
CPU Info	AuthenticAMD family 6 model 7 stepping 1
Crash Reason	SIGILL
Crash Address	0xca4
User Comments	futex_wait_queue_me (the SIGILL is an artefact to get "some kind" of a stack trace).

Crashing Thread
Frame 	Module 	Signature [Expand] 	Source
0 	linux-gate.so 	linux-gate.so@0x424 	
1 	libmozjs.so 	js_SuspendThread 	js/src/jslock.cpp:1073
2 	libmozjs.so 	js_LockTitle 	js/src/jslock.cpp:1114
3 	libmozjs.so 	js_LockObj 	js/src/jslock.cpp:1337
4 	libnspr4.so 	PR_AtomicDecrement 	nsprpub/pr/src/misc/pratom.c:312
5 	libmozjs.so 	js_LookupPropertyWithFlags 	js/src/jsobj.cpp:4255
6 	libmozjs.so 	js_Enumerate 	js/src/jsobj.cpp:5383
7 	seamonkey-bin 	XPC_WN_JSOp_Enumerate 	js/src/xpconnect/src/xpcwrappednativejsops.cpp:55
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.3a6pre) Gecko/20100620 SeaMonkey/2.1a2pre
Build ID:   20100620004012

happened again some 12 hours ago or more; I was away from the computer and forgot to log it immediately.
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:2.0b2pre) Gecko/20100629 Lightning/1.1a1pre SeaMonkey/2.1a3pre - Build ID: 20100629004158

Happened at startup, against high use of RAM for "cache" by the kernel. This is the first "rv:2.0b2pre" nightly, freshly installed; but same SeaMonkey version as previously. On restart from kill -15, twice as many tabs as normal. Restarted again (Ctrl+Q followed by relaunch; "Are you sure you want to close all these tabs" popup didn't appear, which is unusual; I use: at browser startup, show my [multitab] home page), started OK this time.
P.S. I'm on openSUSE Linux 11.2 and "uname -a" replies with:
Linux linux 2.6.31.12-0.2-desktop #1 SMP PREEMPT 2010-03-16 21:25:39 +0100 i686 athlon i386 GNU/Linux
P.P.S. This time SeaMonkey was using _high_ CPU, with at least one-third of it "system" time.
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:2.0b2pre) Gecko/20100701 Lightning/1.1a1pre SeaMonkey/2.1a3pre - Build ID: 20100701014002

Happened again (low CPU and no way to wake it up short of a KILL); I had gone to bed after a reboot due to accidentally pulling the plug.
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:2.0b2pre) Gecko/20100701 Lightning/1.1a1pre SeaMonkey/2.1a3pre - Build ID: 20100701101320

happened soon after startup
Mozilla/5.0 (X11; Linux i686; rv:2.0b8pre) Gecko/20101108 Firefox/4.0b8pre SeaMonkey/2.1b2pre - Build ID: 20101108012856

happened shortly after startup. High CPU but I forgot to check if it was SeaMonkey or cron-weekly (which happened to be running at the same time), or which proportion of both.
Mozilla/5.0 (X11; Linux i686; rv:2.0b8pre) Gecko/20101110 Firefox/4.0b8pre SeaMonkey/2.1b2pre - Build ID: 20101110024719

Happened shortly but not immediately after startup. All my home tabs had either been loaded or timed out, and I had had time to "individually reload" a couple of the timed-out ones.

System quite busy (SeaMonkey compile in progress, with "nice -n 15" but also "make -j2").
I have the impression that changing panes in the addons manager has a higher than usual risk of producing this bug. (But not every time, and the bug also happens in other circumstances.)
Build identifier: Mozilla/5.0 (X11; Linux i686; rv:2.0b9pre) Gecko/20101217 Firefox/4.0b9pre SeaMonkey/2.1b2pre

Now it's more than an impression: I've been trying several times to open the "Extensions" tab of about:addons (usually from "Available Updates", sometimes from "Get Add-ons" and every time I get this hang... except in Safe Mode.

I'll try to disable extensions selectively to see if I can circumvent the problem.
Hm. Disabling MR-Tech Toolkit 6.0.4 (which does not advertise support of anything later than Fx 3.7a5pre, Sm 2.1a1pre, Tb 3.2a1pre, Sunbird 1.0pre, Mobile 1.1a2pre) and enabling Nightly Tester Tools 3.1 instead (which advertises support of current trunk) "seems" to resolve the problem - but for a sporadic bug, let's not be too hasty.

I'll leave it this way for the time being, and if the problem does not reappear I'll resolve this bug INVALID.
Whiteboard: [CLOSEME 2011-01-01 INVA]
Happened with MTT disabled. I had clicked "Extensions" in the addons manager: the Extensions label got selected in the left pane but the right pane remained blank, etc.
Whiteboard: [CLOSEME 2011-01-01 INVA]
In reply to comment #14
After kill -15, restarting the Mailer alone went OK, but Tools => Add-on Manager produced the hang. Still OK in Safe Mode (kill again, start Mail, invoke EM from there). I wonder what it can be; I don't see any "obvious" culprit among my extensions... Let's try commenting away the userContent.css section for about:addons
Related with bug 227168?
oh, and I forgot, on 1st restart at comment #15 the hang happened with high CPU rather than the usual zero CPU use
On reload after comment #15 I could add the two comments above. Add-ons manager was on "Recent updates". Clicking "Extensions" in the left pane triggered the hang. One strange thing in the display then: a grey vertical bar at right about where a scrollbar would be; "cogwheel" popdown button a little bit too far to the right; search input box clipped (more or less in the middle) by the above-mentioned grey bar.
...and at the hang described in comment #18 there was still one plugin-container process running at about 3% CPU (seamonkey-bin process at 0% AFAICT).
OK, I don't know what to try next. Here's my about:support dump. If something there looks obvious to you, please advise: I am stumped.
Related with Firefox bug 555912 ?
Happened during shutdown (after Ctrl+Q) and with high CPU% (a lot of which was "system" time) in 2011-01-21 nightly of SeaMonkey 2.1b2pre.
Mozilla/5.0 (X11; Linux x86_64; rv:2.0b13pre) Gecko/20110309 Firefox/4.0b13pre SeaMonkey/2.1b3pre ID:20110309003002

After upgrading to x86_64, I still see this. One particular action which seems to trigger it (almost) always consists of selecting the "Extensions" tab in the addons manager.
Hardware: x86 → All
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
REOPENED: bug 637286 is FIXED, yet I still had a futex_wait_queue_me hang at closedown in:
Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110417 Firefox/6.0a1 SeaMonkey/2.2a1pre 
ID:20110417003006
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Status: REOPENED → NEW
I was experiencing almost constant futex_wait_queue_me hangs with SeaMonkey 2.0.14 yesterday, beginning around 6 p.m. local time when I turned the fully patched Ubuntu 10.04.01 LTS computer on after it had been turned off for several days.  Sometimes the hangs would happen before the application window could even be painted the first time and I'd have to kill it with the System Monitor.  Other hangs would happen within a few seconds of startup, or a few minutes at most, even while viewing Web pages that had never caused problems before.  I tried starting SeaMonkey in safe mode ("seamonkey -safe-mode") from a terminal window to see if one of the add-ons might be responsible, to no avail.  In frustration, I switched to Firefox 3.6.17 (Mozilla Firefox for Ubuntu canonical 1.0) and it, too, was now behaving this way.  Finally, I started Opera 11.5, and was at least able to use it for a few hours to begin searching for solutions or hints. A couple of times when SeaMonkey stopped with futex_wait_queue_me in the Waiting Channel and I killed it, it hosed the system so badly that a cold restart was needed.  Strangely, after midnight the SeaMonkey browser stopped hanging and as I write this it has been running error-free for nearly 10 hours.

This behavior is telling me a few things:
1. There may be a memory leak in SeaMonkey and Firefox, at least in the Linux versions of these products.  If the program isn't writing a value out-of-bounds, it may at least be reading an out-of-bounds value that is random and unpredictable.
2. The problem that causes the futex_wait_queue_me hang is common to both products since SeaMonkey was revamped at release 2.0 to use the Firefox Gecko engine.
3. The problem may be due to a variable involved in calculating time-of-day not being initialized correctly or not being initialized at all.  (How about one part of the program using local time while another is using UTC, causing a nonsense result in time difference calculations, and possibly a negative or huge time value being used in a wait loop?)

I don't run SeaMonkey 2.x or Firefox 3.x on any Microsoft Windows system regularly, so I don't know if there is a corresponding hang phenomenon on those operating systems. If Windows doesn't use a futex scheme for managing threads, the problem may not appear there at all.

(I tried SeaMonkey 2.1 for Linux about two weeks ago and had to remove it from the system and revert to 2.0.14, because the newer version was crashing constantly, so I can't say if the newer version still has the futex_wait_queue_me problem.)
I thought this hang was (for me) a thing of the past, but it just happened a few minutes ago in SeaMonkey 2.8a1 linux-x86_64 nightly dated 2011-12-17, halfway during closedown (all windows already closed but about half the memory still in use).
I've had it in SeaMonkey linux-x86_64 trunk nightlies (2.13a1 and maybe 2.12a1 before the latest aurora merge) a couple of times this week, during closedown as in comment #27. Still no STR and still a hang thus no crash backtrace.
Confirming this in linux x86  32 bit, ubuntu 12.04, firefox 14.1.

Happens to me a few times a week.


Computer is a core2 mac mini - Dual core w hyperthreading.
(In reply to Stuart Axon from comment #29)
> Confirming this in linux x86  32 bit, ubuntu 12.04, firefox 14.1.
> 
> Happens to me a few times a week.
> 
> 
> Computer is a core2 mac mini - Dual core w hyperthreading.

This is oftener than I can claim nowadays — but — you mean you're running Ubuntu Linux on a Mac computer? ATM I'm running SeaMonkey 2.15a1 (64 bit) on openSUSE 12.1 on a twin-core Pentium, uh, 4HT I think (the Intel logo comes only at bootup).

Since you're seeing on Firefox the problem I have on SeaMonkey, I guess it's a "Core" problem, but for the love of me I couldn't say where in Core. Let's hope the Core developers will have some idea of where this bug belongs.
Product: SeaMonkey → Core
P.S. It has happened -rarely- that I've seen the futex wait clear itself spontaneously (but if it clears itself spontaneously I would usually not think of looking at ps at the "right" time). But I've also seen it remain "forever" (well, let's say I notice it, I play a game which takes between 25 minutes and half an hour, then I come back and it's still there, so I kill SeaMonkey).

Selecting "Extensions" in about:addons (comment #23) doesn't trigger it anymore for me.
Just happened to me again ... (Firefox 16.0.1) 

I also found an instance of somebody with the same symptom in Thunderbird
http://us.generation-nt.com/answer/when-does-linux-get-rid-damned-futex-call-help-206238872.html

So it may well be in core.


I'm mostly in Chrome at work now because of this and Firebug performance.

However, it would be nice to help debug this, I can run firefox under strace and try and get info from it if anybody knows what to do, to get info from it.
Tony, do you still hang in this way?
Flags: needinfo?(antoine.mechelynck)
The motherboard in my AMD processor based machine that was experiencing the numerous futex_wait_queue hangs failed in August 2012 and I've been running on an Intel-based IBM ThinkCentre since then.  I haven't seen any futex_wait_queue hangs since then.  Shortly before the previous machine died, the frequency of hangs increased greatly, and then began affecting other applications besides SeaMonkey.  It got to the point that the machine wouldn't even boot reliably, so to preserve my data I removed the hard drive and temporarily installed it in the ThinkCentre machine to perform a drive-to-drive transfer of files.  A few files apparently got corrupted before this, but I rescued over 99.9%.

The motherboard in the old AMD-based machine had been manufactured around 2004, using the cheapest, off-brand electrolytic capacitors.  My assumption is that many of the capacitors had dried out and were no longer supplying enough peak current to the RAM modules required by intensive memory access.  Initially, the hangs only happened when SeaMonkey was running, because its memory usage was most demanding of the various programs I was running, but later other programs were producing the same symptom.

For SeaMonkey users who continue to see futex_wait_queue hangs, especially if the frequency of hangs appears to be gradually increasing, I would suggest considering it an early symptom of motherboard failure and plan to replace the affected machine as soon as practical, or send the motherboard to a repair service to have the electrolytic capacitors replaced.
(In reply to Wayne Mery (:wsmwk) from comment #33)
> Tony, do you still hang in this way?

Not recently, but at the mo' I'm on bad hardware: the processor can get locked for no apparent reason (often when scrolling in SeaMonkey or Firefox, but a few times it has even happened before the boot loader had gone up), or reboot spontaneously, especially when the air is cold. I just checked the thermometer in this room, and it is now at 25°C (77°F) which is rather warm for people but the 'puter likes it.

As can be seen in the past comments, this hang has been very sporadic at times, disappearing for long periods just to reappear afterwards. No one seems to have experienced it in the past year or so though, or maybe they didn't bother to mention it, or didn't want to spam the bug with a "me too".

I wonder what would be the right thing to do with this bug. Degrading to UNCONFIRMED seems inadequate since it has (in its time) be seen by several people on several programs. I hesitate between WORKSFORME and INCOMPLETE. INCOMPLETE might (?) be better since its ordinary connotation is "feel free to reopen if you can provide more info". Probably not FIXED unless someone can put his finger on the exact changeset that fixed it.
Flags: needinfo?(antoine.mechelynck)
P.S. I regularly observe futex_wait_queue_me states at startup in the process oosplash (part of LibreOffice) but it is not a bug there, since later on it changes to pipe_wait, and AFAICT LibreOffice always reacts normally. Not Mozilla-related but something to remain aware of before getting into alarm in relation with comment #34.
After upgrading my PC to an Intel system, I also have the problem that Firefox and Thunderbird occasionally hang. Here is the output of /proc/`pidof firefox`/stack:

[<ffffffff810ab47a>] futex_wait_queue_me+0xca/0x130
[<ffffffff810ac215>] futex_wait+0x165/0x250
[<ffffffff810ad96d>] do_futex+0xdd/0xa70
[<ffffffff810ae36c>] SyS_futex+0x6c/0x150
[<ffffffff815a4412>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

The stacktrace is always the same if the hang occurs and identical for Firefox and Thunderbird.
If this happens I have to kill Firefox/Thunderbird. The program doesn't come back even if I wait for an hour. I never had any hang like this in the past years using my old AMD system. Maybe it is an Intel specific problem?

My configuration:
Intel Celeron G1840
Gentoo Linux, Kernel 3.16.5-gentoo
Firefox ESR 31.2.0/Thunderbird 31.2.0 using the binary packages provided by Mozilla
See the possible cause here:
https://github.com/torvalds/linux/commit/76835b0ebf8a7fe85beb03c75121419a7dec52f0

I ran into the same problem on Centos6. Upgrading that system to kernel-2.6.32-504.16.2 solved the issue.
See the Centos bug report here:
https://bugs.centos.org/view.php?id=8703
(In reply to johnds@cbs.dtu.dk from comment #38)
> See the possible cause here:
> https://github.com/torvalds/linux/commit/
> 76835b0ebf8a7fe85beb03c75121419a7dec52f0

Thanks. I have no longer seen any hang since I upgraded to kernel 3.19.0-gentoo about 3 months ago, so I think you are right and everybody using kernel >=3.18 (or an older version with the backported patch) should be safe.
Having same problem with icedove 31.7.0-1 (thunderbird) on debian stretch (testing), problem appears.
Problem appears with kernels 3.16.0 from debain stable and 4.0.0 from testing.
On debian jessie (stable) all was fine even with same version icedove installed from backports.
On my system, the bug seemed to have disappeared, but it has resurfaced this past week on shutdown. I think it is correlated with a very high number of browser tabs (several hundreds). When it happens, the CPU% falls down to an extremely low level.
(In reply to Tony Mechelynck [:tonymec] from comment #41)
> On my system, the bug seemed to have disappeared, but it has resurfaced this
> past week on shutdown. I think it is correlated with a very high number of
> browser tabs (several hundreds). When it happens, the CPU% falls down to an
> extremely low level.

P.S. This is in SeaMonkey 2.43a1 (Gecko 46.0a1).
...and my Linux kernel version is 4.1.13 (from openSUSE Leap 42.1, and "uname -a" answers "Linux linux-2iyu 4.1.13-5-default #1 SMP PREEMPT Thu Nov 26 16:35:17 UTC 2015 (49475c3) x86_64 x86_64 x86_64 GNU/Linux").

Hey Tony,
Do you still experience this hang?

Flags: needinfo?(antoine.mechelynck)

(In reply to Andrei Purice from comment #44)

Hey Tony,
Do you still experience this hang?

Not that I know of, but a little less than a year ago I got a new computer with a total of 12 virtual CPUs (or 6 double ones IIUC) "Intel(R) Core(TM) i5-10500T CPU @ 2.30GHz" clocked by the kernel at something like 3.7 GHz. SeaMonkey sporadically hogs 100% (or almost) of one of the virtual CPUs at a time but I haven't checked recently what ps says of it. When it happens, I reload the browser.

Today I'm using SeaMonkey 2.53.9b1pre 20210627210003 and Linux openSUSE kernel 5.3.18-lp152.78-default but that could change any day depending on new builds being published.

Flags: needinfo?(antoine.mechelynck)

Marking this as Resolved > Worksfor me based on reporter's last comment.
If anyone can still reproduce the issue feel free to re-open the bug or file a new one.

Status: NEW → RESOLVED
Closed: 13 years ago3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: