Closed Bug 835106 Opened 11 years ago Closed 11 years ago

Frequent linux32 shutdown hang in mochitest-chrome with mozilla::dom::workers::RuntimeService::Cleanup() on the stack [@ linux-gate.so + 0x424]

Categories

(Core :: DOM: Workers, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure)

Crash Data

Regression range is https://hg.mozilla.org/projects/profiling/pushloghtml?fromchange=4a6adedaaa31&tochange=bf5b5084cdcd which is https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2876e73c9b6f&tochange=35e0c12f4332 which is https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=bedb55033fe7&tochange=35e0c12f4332 plus bug 780561.

https://tbpl.mozilla.org/php/getParsedLog.php?id=19071922&tree=Profiling
Rev3 Fedora 12 profiling opt test mochitest-other on 2013-01-23 16:48:45 PST for push bf5b5084cdcd
slave: talos-r3-fed-020

41840 INFO TEST-END | chrome://mochitests/content/chrome/xpfe/appshell/src/test/test_hiddenPrivateWindow.xul | finished in 96ms
41841 INFO TEST-START | Shutdown
41842 INFO Passed: 39333
41843 INFO Failed: 0
41844 INFO Todo:   135
41845 INFO SimpleTest FINISHED
41846 INFO TEST-INFO | Ran 0 Loops
41847 INFO SimpleTest FINISHED
NOTE: child process received `Goodbye', closing down
TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output
args: ['/home/cltbld/talos-slave/test/build/bin/screentopng']
(screenshot of no Firefox window, only the terminal window)
INFO | automation.py | Application ran for: 0:17:14.036192
INFO | automation.py | Reading PID log: /tmp/tmpjt9Ei0pidlog
==> process 2208 launched child process 2250
==> process 2208 launched child process 2263
==> process 2208 launched child process 2273
==> process 2208 launched child process 2276
==> process 2208 launched child process 2279
==> process 2208 launched child process 2283
INFO | automation.py | Checking for orphan process with PID: 2250
INFO | automation.py | Checking for orphan process with PID: 2263
INFO | automation.py | Checking for orphan process with PID: 2273
INFO | automation.py | Checking for orphan process with PID: 2276
INFO | automation.py | Checking for orphan process with PID: 2279
INFO | automation.py | Checking for orphan process with PID: 2283
Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/profiling-linux/1358964072/firefox-21.0a1.en-US.linux-i686.crashreporter-symbols.zip
PROCESS-CRASH | Shutdown | application crashed [@ linux-gate.so + 0x424]
Crash dump filename: /tmp/tmpKTnl9b/minidumps/40d47c33-6ace-ae93-44808811-0e750d18.dmp
Operating system: Linux
                  0.0.0 Linux 2.6.31.5-127.fc12.i686.PAE #1 SMP Sat Nov 7 21:25:57 EST 2009 i686
CPU: x86
     GenuineInte family 6 model 23 stepping 10
     2 CPUs

Crash reason:  SIGABRT
Crash address: 0x88c

Thread 0 (crashed)
 0  linux-gate.so + 0x424
    eip = 0x00eb8424   esp = 0xbfb5cac0   ebp = 0xbfb5cb28   ebx = 0xb76f0a08
    esi = 0x00000000   edi = 0x0000009b   eax = 0xfffffffc   ecx = 0x00000080
    edx = 0x0000009b   efl = 0x00200282
    Found by: given as instruction pointer in context
 1  libnspr4.so!PR_Wait [ptsynch.c : 582 + 0x11]
    eip = 0x00fc8107   esp = 0xbfb5cb30   ebp = 0xbfb5cb68
    Found by: previous frame's frame pointer
 2  libxul.so!mozilla::ReentrantMonitor::Wait(unsigned int) [ReentrantMonitor.h : 89 + 0x6]
    eip = 0x0140e056   esp = 0xbfb5cb70   ebp = 0xbfb5ce94   ebx = 0x02d89820
    esi = 0xb720a06c   edi = 0xbfb5cc04
    Found by: call frame info
 3  libxul.so!nsEventQueue::GetEvent(bool, nsIRunnable**) [ReentrantMonitor.h : 192 + 0x9]
    eip = 0x02010609   esp = 0xbfb5cb90   ebp = 0xbfb5ce94   ebx = 0x02d89820
    esi = 0xb720a06c   edi = 0xbfb5cc04
    Found by: call frame info
 4  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp : 619 + 0xe]
    eip = 0x02011b1f   esp = 0xbfb5cbc0   ebp = 0xbfb5ce94   ebx = 0x02d89820
    esi = 0xb720a040   edi = 0x00000000
    Found by: call frame info
 5  libxul.so!NS_ProcessNextEvent_P(nsIThread*, bool) [nsThreadUtils.cpp : 238 + 0x12]
    eip = 0x01fe016c   esp = 0xbfb5cc30   ebp = 0xbfb5ce94   ebx = 0x02d89820
    esi = 0x00000001   edi = 0xaf146064
    Found by: call frame info
 6  libxul.so!mozilla::dom::workers::RuntimeService::Cleanup() [RuntimeService.cpp : 1192 + 0xc]
    eip = 0x019102a2   esp = 0xbfb5cc60   ebp = 0xbfb5ce94   ebx = 0x02d89820
    esi = 0xaf146060   edi = 0xaf146064
    Found by: call frame info

In https://tbpl.mozilla.org/?tree=Profiling&rev=5eb8c01e6277 I got 4 of 5 failures in non-PGO, and 0 of 5 in PGO, which is nice since it means we're not likely to see it on the aurora tree in three weeks.
Ed, can you please see if you can bisect this?
Might have become permaorange, I got curious and triggered 5 without a green, but we're so perpetually out of linux32 slaves that it can take days to get a significant number of runs.
Crash Signature: [@ linux-gate.so + 0x424]
Summary: Frequent linux32 non-pgo --disable-profiling shutdown hang in mochitest-chrome with mozilla::dom::workers::RuntimeService::Cleanup() on the stack → Frequent linux32 non-pgo --disable-profiling shutdown hang in mochitest-chrome with mozilla::dom::workers::RuntimeService::Cleanup() on the stack [@ linux-gate.so + 0x424]
The only places we do non-PGO --disable-profiling are Profiling, Profiling and Profiling, so if you think you see this somewhere else, you don't.

However, as long as we're making it easy to tbplbot-star near permaorange...
Summary: Frequent linux32 non-pgo --disable-profiling shutdown hang in mochitest-chrome with mozilla::dom::workers::RuntimeService::Cleanup() on the stack [@ linux-gate.so + 0x424] → PROFILING BRANCH ONLY: Frequent linux32 non-pgo --disable-profiling shutdown hang in mochitest-chrome with mozilla::dom::workers::RuntimeService::Cleanup() on the stack [@ linux-gate.so + 0x424]
I've debugged this, it has nothing to do with the profiling branch or configure flags (unfortunately). This problem exists everywhere, we're just magically hitting it more on these profiling builds.
Summary: PROFILING BRANCH ONLY: Frequent linux32 non-pgo --disable-profiling shutdown hang in mochitest-chrome with mozilla::dom::workers::RuntimeService::Cleanup() on the stack [@ linux-gate.so + 0x424] → Frequent linux32 shutdown hang in mochitest-chrome with mozilla::dom::workers::RuntimeService::Cleanup() on the stack [@ linux-gate.so + 0x424]
This should be fixed now.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
The intermittent timeout also seems to appear on other platforms.
Hmm?  We haven't seen this in 3 months ...
I keep hitting them in my testsuite. Of course, it might be my fault: https://tbpl.mozilla.org/?tree=Try&rev=4ec12185c3a8
You need to log in before you can comment on or make changes to this bug.