Created attachment 770835 [details] screenshot https://tbpl.mozilla.org/php/getParsedLog.php?id=24886335&tree=Mozilla-Central Ubuntu VM 12.04 mozilla-central opt test crashtest-ipc on 2013-07-03 07:38:18 PDT for push 2cae857c17cb slave: tst-linux32-ec2-019 07:48:43 INFO - REFTEST TEST-START | file:///builds/slave/test/build/tests/reftest/tests/layout/style/crashtests/645951-1.html 07:48:43 INFO - REFTEST TEST-LOAD | file:///builds/slave/test/build/tests/reftest/tests/layout/style/crashtests/645951-1.html | 1815 / 2487 (72%) 07:54:13 WARNING - TEST-UNEXPECTED-FAIL | file:///builds/slave/test/build/tests/reftest/tests/layout/style/crashtests/645951-1.html | application timed out after 330 seconds with no output 07:54:13 INFO - args: ['/builds/slave/test/build/tests/bin/screentopng'] 07:54:13 INFO - Xlib: extension "RANDR" missing on display ":0". 07:54:22 INFO - SCREENSHOT: <see attached> 07:54:22 INFO - INFO | automation.py | Application ran for: 0:11:23.070918 07:54:22 INFO - INFO | zombiecheck | Reading PID log: /tmp/tmpFtLN8wpidlog 07:54:22 INFO - ==> process 2286 launched child process 2317 07:54:22 INFO - ==> process 2317 launched child process 2348 07:54:22 INFO - INFO | zombiecheck | Checking for orphan process with PID: 2317 07:54:22 INFO - INFO | zombiecheck | Checking for orphan process with PID: 2348 07:54:22 INFO - mozcrash INFO | Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux/1372857248/firefox-25.0a1.en-US.linux-i686.crashreporter-symbols.zip 07:54:22 INFO - Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux/1372857248/firefox-25.0a1.en-US.linux-i686.crashreporter-symbols.zip 07:55:03 WARNING - PROCESS-CRASH | file:///builds/slave/test/build/tests/reftest/tests/layout/style/crashtests/645951-1.html | application crashed [@ linux-gate.so + 0x424] 07:55:03 INFO - Crash dump filename: /tmp/tmp2mR9tf/minidumps/2d56bc86-7869-1260-7f6b698b-7d170286.dmp 07:55:03 INFO - Operating system: Linux 07:55:03 INFO - 0.0.0 Linux 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686 07:55:03 INFO - CPU: x86 07:55:03 INFO - GenuineIntel family 6 model 45 stepping 7 07:55:03 INFO - 1 CPU 07:55:03 INFO - Crash reason: SIGABRT 07:55:03 INFO - Crash address: 0x8eb 07:55:03 INFO - Thread 0 (crashed) 07:55:03 INFO - 0 linux-gate.so + 0x424 07:55:03 INFO - eip = 0xb7702424 esp = 0xbfd21a50 ebp = 0x00000000 ebx = 0xbfd21aa8 07:55:03 INFO - esi = 0x00000000 edi = 0xb758fff4 eax = 0xfffffffc ecx = 0x00000001 07:55:03 INFO - edx = 0xffffffff efl = 0x00200282 07:55:03 INFO - Found by: given as instruction pointer in context 07:55:03 INFO - 1 libc-2.15.so + 0xdc37f 07:55:03 INFO - eip = 0xb74cb380 esp = 0xbfd21a60 ebp = 0x00000000 07:55:03 INFO - Found by: stack scanning 07:55:03 INFO - 2 libxcb.so.1.1.0 + 0x1fff3 07:55:03 INFO - eip = 0xb3773ff4 esp = 0xbfd21a74 ebp = 0x00000000 07:55:03 INFO - Found by: stack scanning 07:55:03 INFO - 3 libxcb.so.1.1.0 + 0x811f 07:55:03 INFO - eip = 0xb375c120 esp = 0xbfd21a80 ebp = 0x00000000 07:55:03 INFO - Found by: stack scanning 07:55:03 INFO - 4 libxcb.so.1.1.0 + 0x9932 07:55:03 INFO - eip = 0xb375d933 esp = 0xbfd21a90 ebp = 0x00000000 07:55:03 INFO - Found by: stack scanning 07:55:03 INFO - 5 libxcb.so.1.1.0 + 0x1fff3 07:55:03 INFO - eip = 0xb3773ff4 esp = 0xbfd21ac0 ebp = 0x00000000 07:55:03 INFO - Found by: stack scanning 07:55:03 INFO - 6 libxcb.so.1.1.0 + 0x9a4f 07:55:03 INFO - eip = 0xb375da50 esp = 0xbfd21ad0 ebp = 0x00000000 07:55:03 INFO - Found by: stack scanning 07:55:03 INFO - 7 libxcb.so.1.1.0 + 0x1fff3 07:55:03 INFO - eip = 0xb3773ff4 esp = 0xbfd21ae4 ebp = 0x00000000 07:55:03 INFO - Found by: stack scanning
Looking at the stack frame candidates in the log of comment 20 and using addr2line -if -e with dbg packages from precise, the application appears to be waiting in poll for the X server to reply with this stack. 3 _xcb_conn_wait /build/buildd/libxcb-1.8.1/build/src/../../src/xcb_conn.c:400 4 _xcb_in_wake_up_next_reader /build/buildd/libxcb-1.8.1/build/src/../../src/xcb_in.c:621 6 wait_for_reply /build/buildd/libxcb-1.8.1/build/src/../../src/xcb_in.c:390 11 xcb_wait_for_reply /build/buildd/libxcb-1.8.1/build/src/../../src/xcb_in.c:420 16 _XReply /build/buildd/libx11-126.96.36.199/build/src/../../src/xcb_io.c:601 22 XScreenSaverQueryInfo /build/buildd/libxss-1.2.1/build/src/../../src/XScrnSaver.c:220 (discriminator 2) There isn't much remarkable about that. I may have suspected an X server hang except that the screenshot took only 9 seconds to capture (from the server). That leaves only speculation. Perhaps a file descriptor problem. Perhaps https://bugs.freedesktop.org/show_bug.cgi?id=56508 No clues here.
Do we have a Linux machine in the test farm that reproduces this more often than others? We may have to remote into that box to debug. I see these 3 machines in the logs: slave: tst-linux32-ec2-019 slave: tst-linux32-ec2-084 slave: tst-linux32-ec2-090 Note that the symptoms Karl noted in comment 22 seem identical to this resolved bug: https://bugzilla.mozilla.org/show_bug.cgi?id=555352#c31
This is only happens on ipc tests. The X server seems to be responding to other clients, but the client here is wait for reply, suggesting that libxcb may be confused about its state. Could browser.tabs.remote cause Xlib to be used from more than one thread?
(In reply to Karl Tomlinson (:karlt) from comment #61) > Could browser.tabs.remote cause Xlib to be used from more than one thread? Fwiw, that was my impression too on the duped bug 943241.
Can you explain in small simple words I can understand what makes a test failure _this bug_, or alternately what makes it not this bug? Right now, you have it set up to get tbplbot spammed for every single 330 seconds without output hang on Linux, since they all get a SIGABRT in linux-gate.so + 0x424. Soon, tbpl is going to blacklist that signature, because we do just throw every single unfiled timeout into whatever bug happens to have that in the summary.
The significant things to match here are: 1) This happens only in reftest-ipc and crashtest-ipc 2) libxcb.so.1.1.0 + 0x811f is on the stack I don't know what libxcb.so.1.1.0 + 0x1fff3 is but given it occurs multiple times on the stack and addr2line doesn't know, it may not be a return address but just a pointer on the stack to some data, which the stack scanning algorithm thought might be interesting. Yes, linux-gate.so + 0x424 just means a system call afaik.
Bug 910488 may be the same issue. It was semi-reproducible before graffiti covered over the bug.
https://tbpl.mozilla.org/php/getParsedLog.php?id=34930082&tree=Mozilla-Inbound Karl and Matt, I know we did a little IRC chatting about this bug last night. Any chance we could summarize that here? This is getting more frequent now that we're running reftests on B2G desktop builds and will get even more so once we start running more than just reftest-sanity on them. I think I could justifiably argue that this should block that work, actually.
Fwiw, getting more reftests running on b2g desktop is pretty low priority for me atm, for two reasons: 1) I'm going to shift focus towards intermittent issues/log mangling 2) b2g desktop is being replaced by mulet and I'd like to avoid triaging reftest failures on two separate platforms if at all possible.
All we decided was that b2g refests, along with linux Cipc/Ripc were places where we are accessing Xlib from multiple threads. This should be ok (we initialize Xlib in threaded mode for trunk builds), but it's plausible that we're hitting bugs that are specific to this code path.
Any suggestions for how to proceed? This is quickly climbing the ranks.
I wonder whether this could be the same issue that http://cgit.freedesktop.org/xcb/libxcb/commit/?id=23911a707b8845bff52cd7853fc5d59fb0823cef addressed. I expect the fix is in 1.8.1-1ubuntu0.1 https://bugs.launchpad.net/ubuntu/+source/libxcb/+bug/1059276/comments/26 Probably worth going to 1.8.1-1ubuntu0.2 to fix the memory safety bug too http://changelogs.ubuntu.com/changelogs/pool/main/libx/libxcb/libxcb_1.8.1-1ubuntu0.2/changelog
By my perusal of the logs: libxcb-devel.i686 0:1.5-1.el6 Ouch. I'll file a RelEng bug for getting that updated.
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #91) > libxcb-devel.i686 0:1.5-1.el6 I'm guessing that is on a CentOS build machine, but we need the update on the Ubuntu test machines.