Closed Bug 494671 Opened 16 years ago Closed 15 years ago

[SeaMonkey-Ports, MacOSX] cb-seamonkey-osx-*: "mochitest-plain: T-FAIL CRASH L-FAIL", possibly related to 'libSystem.B.dylib'

Categories

(SeaMonkey :: Release Engineering, defect)

x86
macOS
defect
Not set
major

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 528817

People

(Reporter: sgautherie, Unassigned)

References

()

Details

(Keywords: crash, Whiteboard: [Needs a Parallels fix])

These boxes are always orange, with either this bug or a more random(!?) "T-FAIL + timeout"; this bug is about the CRASH case, which I may have found a "pattern" for. *** The crashed thread stack varies and can be +/- long, but it always ends (with different register values but) at the same instruction(s): { Operating system: Mac OS X 10.5.6 9G71 CPU: x86 GenuineIntel family 6 model 7 stepping 6 2 CPUs Crash reason: EXC_ARITHMETIC / EXC_I386_DIV Crash address: 0xffff0315 Thread 0 (crashed) 0 0xffff0315 eip = 0xffff0315 esp = 0xbfffd0e0 ebp = 0xbfffd0e8 ebx = 0x4a191a57 esi = 0x00006877 edi = 0x00000000 eax = 0xfffd9563 ecx = 0x3b9aca00 edx = 0xffffffff efl = 0x00210246 1 libSystem.B.dylib + 0x29e78 eip = 0x91635e79 esp = 0xbfffd0f0 ebp = 0xbfffd128 [...] } Maybe these boxes have a bad "libSystem.B.dylib" or the like?
(In reply to comment #0) > Maybe these boxes have a bad "libSystem.B.dylib" or the like? All (but 1) of the other threads (which number varies too) are also running 'libSystem.B.dylib'... The other thread (running code) seems to be +/- random: { (I didn't look a older builds...) http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1242728092.1242735151.31723.gz&fulltext=1 OS X 10.5 comm-1.9.1 unit test on 2009/05/19 03:14:52 0 libgklayout.dylib!nsLineLayout::CombineTextDecorations(nsPresContext*, unsigned char, nsIFrame*, nsRect&, int, float) [nsLineLayout.cpp:e49c05fc9122 : 98 + 0x1] http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1242757522.1242764852.14409.gz&fulltext=1 OS X 10.5 comm-1.9.1 unit test on 2009/05/19 11:25:22 0 libgklayout.dylib!nsTextFrameUtils::TransformText(unsigned char const*, unsigned int, unsigned char*, nsTextFrameUtils::CompressionMode, unsigned char*, gfxSkipCharsBuilder*, unsigned int*) [nsTextFrameUtils.cpp:82d4f0cd8238 : 211 + 0x0] http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1242950367.1242956368.31389.gz OS X 10.5 comm-1.9.1 unit test on 2009/05/21 16:59:27 0 libxpconnect.dylib!WrappedNativeJSGCThingTracer [xpcwrappednativescope.cpp:e8fc03d9a29e : 356 + 0x4] http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1243116825.1243122991.1031.gz OS X 10.5 comm-1.9.1 unit test on 2009/05/23 15:13:45 0 libxpcom_core.dylib!NS_LogDtor_P [nsTraceRefcntImpl.cpp:a7eb03446bed : 274 + 0x3b] http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1243127633.1243131058.12166.gz OS X 10.5 comm-1.9.1 unit test on 2009/05/23 18:13:53 (no "other" thread) http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1243155596.1243159462.20067.gz&fulltext=1 OS X 10.5 comm-1.9.1 unit test on 2009/05/24 01:59:56 0 libgklayout.dylib!oggz_get_stream [oggz.c:a7eb03446bed : 328 + 0xe] }
Summary: [SeaMonkey-Ports, MacOSX] cb-seamonkey-osx-*: "mochitest-plain: T-FAIL CRASH L-FAIL" → [SeaMonkey-Ports, MacOSX] cb-seamonkey-osx-*: "mochitest-plain: T-FAIL CRASH L-FAIL", possibly related to 'libSystem.B.dylib'
Oh, I forgot the crashing test, in the same order: *** 43138 INFO Running /tests/layout/base/tests/test_bug441782-2e.html... *** 43305 INFO Running /tests/layout/base/tests/test_bug441782-1e.html... *** 43554 INFO Running /tests/layout/base/tests/test_bug441782-5b.html... *** 43533 INFO Running /tests/layout/base/tests/test_bug441782-2c.html... *** 13110 INFO Running /tests/content/canvas/test/test_2d.drawImage.9arg.sourcesize.html... *** 27974 INFO TEST-PASS | /tests/content/media/video/test/test_timeupdate1.html | Check currentTime of 0.7329999804496765 is greater than last time of 0.6990000009536743 (It looked like we would have a culprit, but not anymore.)
This time it crashed on the "other" thread: { http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1243197733.1243203288.28479.gz&fulltext=1 OS X 10.5 comm-1.9.1 unit test on 2009/05/24 13:42:13 Crash reason: EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE Crash address: 0x165cb4 Thread 0 (crashed) 0 libmozjs.dylib!JS_CallTracer [jsgc.cpp:e376af3a7490 : 1130 + 0x0] }
Even the hangs we're seeing frequently seem to almost always be in video tests somewhere. I wonder what it means that both hangs and crashes are at different places but almost always in video mochitests.
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1243403266.1243407750.548.gz&fulltext=1 has: Crash reason: EXC_ARITHMETIC / EXC_I386_DIV Crash address: 0xffff0315 Thread 2 (crashed) 0 0xffff0315 eip = 0xffff0315 esp = 0xb0206dd0 ebp = 0xb0206dd8 ebx = 0x4a1ce42b esi = 0x0000926b edi = 0x00000000 eax = 0xfa99b621 ecx = 0x3b9aca00 edx = 0xffffffff efl = 0x00010246 1 libSystem.B.dylib + 0x29e78 eip = 0x90b7ee79 esp = 0xb0206de0 ebp = 0xb0206e18 http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1243424728.1243430169.9162.gz&fulltext=1 has: Crash reason: EXC_ARITHMETIC / EXC_I386_DIV Crash address: 0xffff0315 Thread 22 (crashed) 0 0xffff0315 eip = 0xffff0315 esp = 0xb1b9cd90 ebp = 0xb1b9cd98 ebx = 0x4a1d3bc6 esi = 0x000078f3 edi = 0x00000000 eax = 0xeba436bf ecx = 0x3b9aca00 edx = 0xffffffff efl = 0x00010246 1 libSystem.B.dylib + 0x29e78 eip = 0x91635e79 esp = 0xb1b9cda0 ebp = 0xb1b9cdd8 I still suspect this to be largely a Parallels virtualization issue.
(In reply to comment #5) > I still suspect this to be largely a Parallels virtualization issue. In order to help figure out what the situation is with this bug (and more globally these boxes), I suggest to disable mochitest-plain for a while...
So you're crashing inside gettimeofday, which sucks: http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/md/unix/unix.c#3020 looking at the symbols we have on Socorro, the 10.5.6 symbols for libSystem.B.dylib.sym confirms that: PUBLIC 29e47 0 gettimeofday I can only think this is a Parallels issue or an OS X issue exposed by running on Parallels.
Ted, thanks, that's very good to know, might be just what we need to file a ticket with the Parallels people. Phong, can you look into doing that?
After a bit more digging through OSX source: gettimeofday calls __commpage_gettimeofday: http://www.opensource.apple.com/source/Libc/Libc-498.1.5/sys/gettimeofday.c which is a little stub that calls into a fixed address: http://www.opensource.apple.com/source/Libc/Libc-498.1.5/i386/sys/i386_gettimeofday.s which is in in the "comm page" to talk to the kernel: http://www.opensource.apple.com/source/xnu/xnu-1228.9.59/osfmk/i386/cpu_capabilities.h #define _COMM_PAGE_GETTIMEOFDAY (_COMM_PAGE_START_ADDRESS+0x2e0) /* used by gettimeofday() */ _COMM_PAGE_START_ADDRESS is 0xFFFF0000 on i386, so _COMM_PAGE_GETTIMEOFDAY = 0xFFFF02E0. Your crash is at 0xffff0315, and the next address defined in that header isn't until 0x4e0, so it sure looks like you're crashing inside the code at that address.
Here's the source for the code that lives at that comm page address: http://www.opensource.apple.com/source/xnu/xnu-1228.9.59/osfmk/i386/commpage/commpage_gettimeofday.s My assembler-fu is weak, so I'll leave it at that.
I disabled bug 494769 yesterday, to hopefully (help) work around this too, ftb. There was one occurrence (only/yet) of this since then: http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1243563863.1243574516.25364.gz&fulltext=1 OS X 10.5 comm-1.9.1 unit test on 2009/05/28 19:24:23 { 35093 INFO TEST-PASS | nodeCloneFalseNoCopyTextAssert1 35095 INFO Running /tests/dom/tests/mochitest/dom-level1-core/test_hc_nodeclonegetparentnull.html... NEXT ERROR TEST-UNEXPECTED-FAIL | (automation.py) | Exited with code 3 during test run INFO | (automation.py) | Application ran for: 0:24:28.216440 NEXT ERROR TEST-UNEXPECTED-FAIL | (automation.py) | Browser crashed (minidump found) [...] Thread 0 (crashed) 0 0xffff0315 1 libSystem.B.dylib + 0x29e78 2 libnspr4.dylib!_PR_UNIX_GetInterval [unix.c:560662a707ba : 3020 + 0x12] 3 libgklayout.dylib!PresShell::ProcessReflowCommands(int) [nsPresShell.cpp:560662a707ba : 6740 + 0x4] } with no "other" thread.
Depends on: 494769
(In reply to comment #11) > I disabled bug 494769 yesterday, to hopefully (help) work around this too, ftb. Scratch that: it seems that bug helped bug 493450, but not this one (at all). > There was one occurrence (only/yet) of this since then: Actually, all build had this failure :-<
No longer depends on: 494769
{ http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1243644913.1243649587.1904.gz OS X 10.5 comm-1.9.1 unit test on 2009/05/29 17:55:13 17374 INFO Running /tests/content/canvas/test/test_size.attributes.style.html... http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1243663806.1243671111.1552.gz OS X 10.5 comm-1.9.1 unit test on 2009/05/29 23:10:06 43529 INFO Running /tests/layout/base/tests/test_bug441782-3c.html... http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1243677738.1243684848.27313.gz OS X 10.5 comm-1.9.1 unit test on 2009/05/30 03:02:18 28055 INFO TEST-PASS | /tests/content/media/video/test/test_wav_ended2.html | Expect at least one playing event } Let's wait for Parallels to be fixed!
Whiteboard: [Needs future Parallels fix/upgrade]
We'll need to watch this for a bit more, but it looks like the Parallels and system upgrades in bug 494462 might have fixed this. I'd like to see a day or so of non-crash data before closing the bug here though.
(In reply to comment #14) Ftr, the last occurrence of this bug was: http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1244638160.1244643373.14097.gz OS X 10.5 comm-1.9.1 unit test on 2009/06/10 05:49:20
Depends on: 494462
No longer depends on: 493450
Flags: in-testsuite-
Whiteboard: [Needs future Parallels fix/upgrade]
Target Milestone: --- → seamonkey2.0b1
Blocks: 493450
Hrm, looks like our old "friend" is still here :( http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1244757887.1244764036.27165.gz&fulltext=1 Operating system: Mac OS X 10.5.7 9J61 CPU: x86 GenuineIntel family 6 model 7 stepping 6 2 CPUs Crash reason: EXC_ARITHMETIC / EXC_I386_DIV Crash address: 0xffff0315 Thread 16 (crashed) 0 0xffff0315 eip = 0xffff0315 esp = 0xb1497df0 ebp = 0xb1497df8 ebx = 0x4a319621 esi = 0x0000c463 edi = 0x00000000 eax = 0xfec1fc3f ecx = 0x3b9aca00 edx = 0xffffffff efl = 0x00010246 1 libSystem.B.dylib + 0x29f38 eip = 0x94d63f39 esp = 0xb1497e00 ebp = 0xb1497e38 2 libclient.dylib + 0x11813e eip = 0x10eb113f esp = 0xb1497e40 ebp = 0xb1497ed8 3 libclient.dylib + 0x5f5ae eip = 0x10df85af esp = 0xb1497ee0 ebp = 0xb1497f18 4 libclient.dylib + 0x5f0f9 eip = 0x10df80fa esp = 0xb1497f20 ebp = 0xb1497f98 5 libclient.dylib + 0x2c4934 eip = 0x1105d935 esp = 0xb1497fa0 ebp = 0xb1497fc8 6 libSystem.B.dylib + 0x7d1ff eip = 0x94db7200 esp = 0xb1497fd0 ebp = 0xb1497fe8
(In reply to comment #8) > might be just what we need to file a ticket with the Parallels people. > Phong, can you look into doing that? Did you?
No longer depends on: 494462
Whiteboard: [Needs a Parallels fix]
Target Milestone: seamonkey2.0b1 → ---
Serge: Please let me drive this, I tend to communicate with people outside the bugs as well, so such a question here might be redundant and result in just noise. Phong and I agreed to wait for the updates we just did this week before actually filing this, as we had some hope they may have fixed this. Meanwhile, we had more crashes with the same signature as comment #16: http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1244777538.1244787463.2153.gz&fulltext=1 http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1244784829.1244791814.10506.gz&fulltext=1 This appears to just be the 10.5.7 version of what ted investigated with 10.5.6 before. Phong, based on that, can you please file a ticket with Parallels on this issue as well?
I have an open ticket with Parallels about the issue.
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Ports/1244829896.1244835872.12801.gz OS X 10.5 comm-1.9.1 unit test on 2009/06/12 11:04:56 Fwiw, it's not often but the following extra 'Invalid memory access' log is (still) happening too: { [...] 43744 INFO Running /tests/layout/base/tests/test_bug467672-3f.html... 2009-06-12 12:37:22.168 seamonkey-bin[84275:11503] Invalid memory access of location 00000000 eip=ffff0315 }
(In reply to comment #21) > The crashes continue to happen, though they are less frequently now that the > VMs have been reduced to one CPU, but they are still there and still have the > same signature: Fwiw, note that some of these crashed at a _different_ location...
(In reply to comment #22) > Fwiw, note that some of these crashed at a _different_ location... Not really, we have moved from 10.5.6 to 10.5.7, and with that, the location looks different, but I'm pretty sure it's still the same location in terms of code, it just moved in the binary with the patched kernel.
Blocks: 510788
Looks like this is more than just Seamonkey? OS X 10.5.2 mozilla-central test opt mochitests on 2009/10/01 09:54:21 http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254416061.1254416350.18354.gz&fulltext=1#err2
(In reply to comment #24) > Looks like this is more than just Seamonkey? Supposedly unlikely: Firefox VMs don't run on Parallels, do they ? > OS X 10.5.2 mozilla-central test opt mochitests on 2009/10/01 09:54:21 > http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1254416061.1254416350.18354.gz&fulltext=1#err2 And not the same crash: { Crash reason: EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE Crash address: 0xc Thread 31 (crashed) 0 XUL + 0x6c9db0 }
Depends on: 522682
I'm not sure what this bug status is atm, but bug 522682 might help wrt comment 3 and comment 25...
Looks like bug 528817 fixed this, as long as bug 537308 will not show any regression, we can consider this one fixed.
Depends on: 528817, 537308
Dupe of bug 528817, but in the mean time we abandoned Parallels for OSX VMs completely.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
No longer depends on: 528817
Component: Project Organization → Release Engineering
QA Contact: organization → release
You need to log in before you can comment on or make changes to this bug.