Closed Bug 889869 Opened 11 years ago Closed 10 years ago

Intermittent LOOK AT THE STACK ipc | application timed out after 330 seconds with no output | application crashed [@ linux-gate.so + 0x424] | libxcb.so.1.1.0 + 0x811f

Categories

(Core :: Graphics: Layers, defect, P5)

x86
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 975216

People

(Reporter: RyanVM, Unassigned)

References

Details

(Keywords: crash, intermittent-failure)

Crash Data

Attachments

(1 file)

Attached image screenshot
https://tbpl.mozilla.org/php/getParsedLog.php?id=24886335&tree=Mozilla-Central

Ubuntu VM 12.04 mozilla-central opt test crashtest-ipc on 2013-07-03 07:38:18 PDT for push 2cae857c17cb
slave: tst-linux32-ec2-019

07:48:43     INFO -  REFTEST TEST-START | file:///builds/slave/test/build/tests/reftest/tests/layout/style/crashtests/645951-1.html
07:48:43     INFO -  REFTEST TEST-LOAD | file:///builds/slave/test/build/tests/reftest/tests/layout/style/crashtests/645951-1.html | 1815 / 2487 (72%)
07:54:13  WARNING -  TEST-UNEXPECTED-FAIL | file:///builds/slave/test/build/tests/reftest/tests/layout/style/crashtests/645951-1.html | application timed out after 330 seconds with no output
07:54:13     INFO -  args: ['/builds/slave/test/build/tests/bin/screentopng']
07:54:13     INFO -  Xlib:  extension "RANDR" missing on display ":0".
07:54:22     INFO -  SCREENSHOT: <see attached>
07:54:22     INFO -  INFO | automation.py | Application ran for: 0:11:23.070918
07:54:22     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpFtLN8wpidlog
07:54:22     INFO -  ==> process 2286 launched child process 2317
07:54:22     INFO -  ==> process 2317 launched child process 2348
07:54:22     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 2317
07:54:22     INFO -  INFO | zombiecheck | Checking for orphan process with PID: 2348
07:54:22     INFO -  mozcrash INFO | Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux/1372857248/firefox-25.0a1.en-US.linux-i686.crashreporter-symbols.zip
07:54:22     INFO -  Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux/1372857248/firefox-25.0a1.en-US.linux-i686.crashreporter-symbols.zip
07:55:03  WARNING -  PROCESS-CRASH | file:///builds/slave/test/build/tests/reftest/tests/layout/style/crashtests/645951-1.html | application crashed [@ linux-gate.so + 0x424]
07:55:03     INFO -  Crash dump filename: /tmp/tmp2mR9tf/minidumps/2d56bc86-7869-1260-7f6b698b-7d170286.dmp
07:55:03     INFO -  Operating system: Linux
07:55:03     INFO -                    0.0.0 Linux 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686
07:55:03     INFO -  CPU: x86
07:55:03     INFO -       GenuineIntel family 6 model 45 stepping 7
07:55:03     INFO -       1 CPU
07:55:03     INFO -  Crash reason:  SIGABRT
07:55:03     INFO -  Crash address: 0x8eb
07:55:03     INFO -  Thread 0 (crashed)
07:55:03     INFO -   0  linux-gate.so + 0x424
07:55:03     INFO -      eip = 0xb7702424   esp = 0xbfd21a50   ebp = 0x00000000   ebx = 0xbfd21aa8
07:55:03     INFO -      esi = 0x00000000   edi = 0xb758fff4   eax = 0xfffffffc   ecx = 0x00000001
07:55:03     INFO -      edx = 0xffffffff   efl = 0x00200282
07:55:03     INFO -      Found by: given as instruction pointer in context
07:55:03     INFO -   1  libc-2.15.so + 0xdc37f
07:55:03     INFO -      eip = 0xb74cb380   esp = 0xbfd21a60   ebp = 0x00000000
07:55:03     INFO -      Found by: stack scanning
07:55:03     INFO -   2  libxcb.so.1.1.0 + 0x1fff3
07:55:03     INFO -      eip = 0xb3773ff4   esp = 0xbfd21a74   ebp = 0x00000000
07:55:03     INFO -      Found by: stack scanning
07:55:03     INFO -   3  libxcb.so.1.1.0 + 0x811f
07:55:03     INFO -      eip = 0xb375c120   esp = 0xbfd21a80   ebp = 0x00000000
07:55:03     INFO -      Found by: stack scanning
07:55:03     INFO -   4  libxcb.so.1.1.0 + 0x9932
07:55:03     INFO -      eip = 0xb375d933   esp = 0xbfd21a90   ebp = 0x00000000
07:55:03     INFO -      Found by: stack scanning
07:55:03     INFO -   5  libxcb.so.1.1.0 + 0x1fff3
07:55:03     INFO -      eip = 0xb3773ff4   esp = 0xbfd21ac0   ebp = 0x00000000
07:55:03     INFO -      Found by: stack scanning
07:55:03     INFO -   6  libxcb.so.1.1.0 + 0x9a4f
07:55:03     INFO -      eip = 0xb375da50   esp = 0xbfd21ad0   ebp = 0x00000000
07:55:03     INFO -      Found by: stack scanning
07:55:03     INFO -   7  libxcb.so.1.1.0 + 0x1fff3
07:55:03     INFO -      eip = 0xb3773ff4   esp = 0xbfd21ae4   ebp = 0x00000000
07:55:03     INFO -      Found by: stack scanning
Crash Signature: [@ linux-gate.so@0x424]
Looking at the stack frame candidates in the log of comment 20 and using
addr2line -if -e with dbg packages from precise, the application appears to
be waiting in poll for the X server to reply with this stack.

3
_xcb_conn_wait
/build/buildd/libxcb-1.8.1/build/src/../../src/xcb_conn.c:400

4
_xcb_in_wake_up_next_reader
/build/buildd/libxcb-1.8.1/build/src/../../src/xcb_in.c:621

6
wait_for_reply
/build/buildd/libxcb-1.8.1/build/src/../../src/xcb_in.c:390

11
xcb_wait_for_reply
/build/buildd/libxcb-1.8.1/build/src/../../src/xcb_in.c:420

16
_XReply
/build/buildd/libx11-1.4.99.1/build/src/../../src/xcb_io.c:601

22
XScreenSaverQueryInfo
/build/buildd/libxss-1.2.1/build/src/../../src/XScrnSaver.c:220 (discriminator 2)

There isn't much remarkable about that.  I may have suspected an X server hang
except that the screenshot took only 9 seconds to capture (from the server).

That leaves only speculation.  Perhaps a file descriptor problem.
Perhaps https://bugs.freedesktop.org/show_bug.cgi?id=56508
No clues here.
Do we have a Linux machine in the test farm that reproduces this more often than others? We may have to remote into that box to debug. I see these 3 machines in the logs:

slave: tst-linux32-ec2-019
slave: tst-linux32-ec2-084
slave: tst-linux32-ec2-090

Note that the symptoms Karl noted in comment 22 seem identical to this resolved bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=555352#c31
Summary: Intermittent LOOK AT THE STACK 645951-1.html | application timed out after 330 seconds with no output | application crashed [@ linux-gate.so + 0x424] → Intermittent LOOK AT THE STACK ipc 645951-1.html | application timed out after 330 seconds with no output | application crashed [@ linux-gate.so + 0x424]
Summary: Intermittent LOOK AT THE STACK ipc 645951-1.html | application timed out after 330 seconds with no output | application crashed [@ linux-gate.so + 0x424] → Intermittent LOOK AT THE STACK ipc | application timed out after 330 seconds with no output | application crashed [@ linux-gate.so + 0x424] | libxcb.so.1.1.0 + 0x811f
Priority: -- → P5
This is only happens on ipc tests.
The X server seems to be responding to other clients, but the client here is wait for reply, suggesting that libxcb may be confused about its state.

Could browser.tabs.remote cause Xlib to be used from more than one thread?
Component: Layout → Graphics: Layers
Blocks: 910488
(In reply to Karl Tomlinson (:karlt) from comment #61)
> Could browser.tabs.remote cause Xlib to be used from more than one thread?

Fwiw, that was my impression too on the duped bug 943241.
Can you explain in small simple words I can understand what makes a test failure _this bug_, or alternately what makes it not this bug?

Right now, you have it set up to get tbplbot spammed for every single 330 seconds without output hang on Linux, since they all get a SIGABRT in linux-gate.so + 0x424. Soon, tbpl is going to blacklist that signature, because we do just throw every single unfiled timeout into whatever bug happens to have that in the summary.
The significant things to match here are:

1) This happens only in reftest-ipc and crashtest-ipc
2) libxcb.so.1.1.0 + 0x811f is on the stack

I don't know what libxcb.so.1.1.0 + 0x1fff3 is but given it occurs multiple times on the stack and addr2line doesn't know, it may not be a return address but just a pointer on the stack to some data, which the stack scanning algorithm thought might be interesting.

Yes, linux-gate.so + 0x424 just means a system call afaik.
Bug 910488 may be the same issue.
It was semi-reproducible before graffiti covered over the bug.
Blocks: 934827
https://tbpl.mozilla.org/php/getParsedLog.php?id=34930082&tree=Mozilla-Inbound

Karl and Matt, I know we did a little IRC chatting about this bug last night. Any chance we could summarize that here? This is getting more frequent now that we're running reftests on B2G desktop builds and will get even more so once we start running more than just reftest-sanity on them. I think I could justifiably argue that this should block that work, actually.
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(karlt)
Fwiw, getting more reftests running on b2g desktop is pretty low priority for me atm, for two reasons:

1) I'm going to shift focus towards intermittent issues/log mangling
2) b2g desktop is being replaced by mulet and I'd like to avoid triaging reftest failures on two separate platforms if at all possible.
All we decided was that b2g refests, along with linux Cipc/Ripc were places where we are accessing Xlib from multiple threads.

This should be ok (we initialize Xlib in threaded mode for trunk builds), but it's plausible that we're hitting bugs that are specific to this code path.
Any suggestions for how to proceed? This is quickly climbing the ranks.
I wonder whether this could be the same issue that http://cgit.freedesktop.org/xcb/libxcb/commit/?id=23911a707b8845bff52cd7853fc5d59fb0823cef addressed.
I expect the fix is in 1.8.1-1ubuntu0.1
https://bugs.launchpad.net/ubuntu/+source/libxcb/+bug/1059276/comments/26
Probably worth going to 1.8.1-1ubuntu0.2 to fix the memory safety bug too
http://changelogs.ubuntu.com/changelogs/pool/main/libx/libxcb/libxcb_1.8.1-1ubuntu0.2/changelog
Flags: needinfo?(karlt)
By my perusal of the logs:
libxcb-devel.i686 0:1.5-1.el6

Ouch. I'll file a RelEng bug for getting that updated.
Flags: needinfo?(matt.woodrow)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #91)
> libxcb-devel.i686 0:1.5-1.el6

I'm guessing that is on a CentOS build machine,
but we need the update on the Ubuntu test machines.
Depends on: 975216
Status: NEW → RESOLVED
Closed: 10 years ago
No longer depends on: 975216
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: