Closed
Bug 781643
Opened 12 years ago
Closed 12 years ago
paas-dea1.webapp.scl3 has died several times
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mburns, Assigned: dumitru)
References
Details
(Whiteboard: SeaMicro C-2391 case)
Attachments
(1 file)
99.19 KB,
image/png
|
Details |
[13:38:52] <nagios-scl3> Thu 13:38:51 PDT [548] paas-dea1.webapp.scl3.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100% This box has previously paged as DOWN earlier this week. On boot, the box dumps to single user mode and forces a manual fsck.
Reporter | ||
Comment 1•12 years ago
|
||
Comment 2•12 years ago
|
||
Last crash yesterday. I downtimed the host in nagios until we figure out what is going on.
Reporter | ||
Comment 3•12 years ago
|
||
happened again Saturday night, 19:48:48 PDT [540]
Comment 4•12 years ago
|
||
Aug 18 20:06:23 paas-dea1 kernel: ------------[ cut here ]------------ Aug 18 20:06:23 paas-dea1 kernel: WARNING: at fs/buffer.c:677 __set_page_dirty+0xcb/0xf0() (Tainted: G B --------------- ) Aug 18 20:06:23 paas-dea1 kernel: Hardware name: SM10000-XE Aug 18 20:06:23 paas-dea1 kernel: Modules linked in: bridge bonding 8021q garp stp llc ipv6 microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Aug 18 20:06:23 paas-dea1 kernel: Pid: 24897, comm: puppet Tainted: G B --------------- 2.6.32-279.2.1.el6.x86_64 #1 Aug 18 20:06:23 paas-dea1 kernel: Call Trace: Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff8106b747>] ? warn_slowpath_common+0x87/0xc0 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff8106b79a>] ? warn_slowpath_null+0x1a/0x20 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff811adbeb>] ? __set_page_dirty+0xcb/0xf0 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff811ade48>] ? __set_page_dirty_buffers+0x88/0xc0 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff81128419>] ? set_page_dirty+0x39/0x60 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff8113d09f>] ? unmap_vmas+0x9df/0xc30 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff81142d27>] ? exit_mmap+0x87/0x170 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff8106897c>] ? mmput+0x6c/0x120 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff811814d4>] ? flush_old_exec+0x484/0x690 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff811d1b6d>] ? load_elf_binary+0x3ad/0x1b10 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff8113adff>] ? follow_page+0x31f/0x470 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff81140110>] ? __get_user_pages+0x110/0x430 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff811ceadc>] ? load_misc_binary+0xac/0x3e0 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff811404c9>] ? get_user_pages+0x49/0x50 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff81182abb>] ? search_binary_handler+0x11b/0x360 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff81183c49>] ? do_execve+0x239/0x340 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff810d53ae>] ? __audit_getname+0xbe/0xd0 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff810095ea>] ? sys_execve+0x4a/0x80 Aug 18 20:06:23 paas-dea1 kernel: [<ffffffff8100b54a>] ? stub_execve+0x6a/0xc0 Aug 18 20:06:23 paas-dea1 kernel: ---[ end trace f2bd86a0c32c30c2 ]---
Assignee | ||
Updated•12 years ago
|
Assignee: server-ops → dgherman
Assignee | ||
Updated•12 years ago
|
Whiteboard: SeaMicro C-2391 case
Assignee | ||
Comment 5•12 years ago
|
||
A SeaMicro technician logged into the chassis and had a look. There's nothing to prove that the issue is related to SeaMicro. We did see, however, that the system clock was not PDT, but it was UTC. The chassis's time is also in PDT. Per his suggestion, modified the server's timezone (wondering how it got UTC in the first place), and we'll see if it goes down again.
Assignee | ||
Comment 6•12 years ago
|
||
Seeing something else now, but maybe that was the initial cause: Message from syslogd@paas-dea1 at Aug 27 10:38:11 ... kernel:BUG: soft lockup - CPU#6 stuck for 67s! [flush-8:0:483] Message from syslogd@paas-dea1 at Aug 27 10:38:34 ... kernel:BUG: soft lockup - CPU#1 stuck for 67s! [rhsmcertd-worke:3212] Message from syslogd@paas-dea1 at Aug 27 10:39:10 ... kernel:BUG: soft lockup - CPU#3 stuck for 67s! [sosreport:3171] Message from syslogd@paas-dea1 at Aug 27 10:39:35 ... kernel:BUG: soft lockup - CPU#6 stuck for 67s! [flush-8:0:483] seamicro-a# server console connect 56 Using local telnet client for loopback connection to server: 56. Standard telnet commands apply. Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. Connecting to server 56... Success! 000000036381370 FS: 00007f685882b700(0000) GS:ffff8800282c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6852045000 CR3: 00000004335e0000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process sosreport (pid: 3171, threadinfo ffff8804326ea000, task ffff880432559500) Stack: ffff8804326ebbc8 ffffffff81113ffe ffff88043319ef00 ffff8804326ebc78 <d> ffff8804326ebc38 ffffffff8111550b ffff880432559500 ffff88042df88de0 <d> 0000000000000002 0000002881127061 ffff8804326ebc18 ffff88043319ef70 Call Trace: [<ffffffff81113ffe>] ? find_get_page+0x1e/0xa0 [<ffffffff8111550b>] ? filemap_fault+0x8b/0x500 [<ffffffff8113ed44>] ? __do_fault+0x54/0x510 [<ffffffff8113f2f7>] ? handle_pte_fault+0xf7/0xb50 [<ffffffff8115c30a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff81048ac7>] ? pte_alloc_one+0x37/0x50 [<ffffffff8113ff34>] ? handle_mm_fault+0x1e4/0x2b0 [<ffffffff81044479>] ? __do_page_fault+0x139/0x480 [<ffffffff810d358d>] ? audit_filter_rules+0x2d/0xdd0 [<ffffffff81145fca>] ? do_mmap_pgoff+0x33a/0x380 [<ffffffff8150339e>] ? do_page_fault+0x3e/0xa0 [<ffffffff81500755>] ? page_fault+0x25/0x30 Code: 00 8b 45 e8 48 83 c4 10 5b 41 5c c9 c3 90 90 90 55 48 8d 47 08 48 8b 7f 08 48 89 e5 48 85 ff 74 54 40 f6 c7 01 74 49 48 83 e7 fe <8b> 17 89 d0 48 3b 34 c5 a0 57 c0 81 77 3c 8d 0c 52 8d 4c 09 fa Call Trace: [<ffffffff81113ffe>] ? find_get_page+0x1e/0xa0 [<ffffffff8111550b>] ? filemap_fault+0x8b/0x500 [<ffffffff8113ed44>] ? __do_fault+0x54/0x510 [<ffffffff8113f2f7>] ? handle_pte_fault+0xf7/0xb50 [<ffffffff8115c30a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff81048ac7>] ? pte_alloc_one+0x37/0x50 [<ffffffff8113ff34>] ? handle_mm_fault+0x1e4/0x2b0 [<ffffffff81044479>] ? __do_page_fault+0x139/0x480 [<ffffffff810d358d>] ? audit_filter_rules+0x2d/0xdd0 [<ffffffff81145fca>] ? do_mmap_pgoff+0x33a/0x380 [<ffffffff8150339e>] ? do_page_fault+0x3e/0xa0 [<ffffffff81500755>] ? page_fault+0x25/0x30 BUG: soft lockup - CPU#6 stuck for 67s! [flush-8:0:483] Modules linked in: bridge bonding 8021q garp stp llc ipv6 microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] CPU 6 Modules linked in: bridge bonding 8021q garp stp llc ipv6 microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 483, comm: flush-8:0 Tainted: G B --------------- 2.6.32-279.2.1.el6.x86_64 #1 SeaMicro SM10000-XE/Type2 - Board Product Name1 RIP: 0010:[<ffffffff81113d1b>] [<ffffffff81113d1b>] find_get_pages_tag+0x5b/0x120 RSP: 0018:ffff880430735940 EFLAGS: 00000246 RAX: ffff880435fa6708 RBX: ffff880430735990 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffea000e990ac8 RBP: ffffffff8100bc0e R08: 0000000000000000 R09: 0000000000000002 R10: 000000000000000e R11: ffff880435fa68f8 R12: ffffffff81faff00 R13: 0000000000000000 R14: ffffffff81faff00 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880028380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000010bf0f0 CR3: 0000000001a85000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process flush-8:0 (pid: 483, threadinfo ffff880430734000, task ffff880430a16ae0) Stack: ffffea000ea53130 ffffea000000000d ffff880430735970 ffff88042df88de8 <d> 0000000000000000 ffff880430735a00 0000000000000000 ffff88042df88de0 <d> 0000000000008000 ffff880430735a00 ffff8804307359b0 ffffffff8112a965 Call Trace: [<ffffffff8112a965>] ? pagevec_lookup_tag+0x25/0x40 [<ffffffffa0083f9a>] ? ext4_num_dirty_pages+0xda/0x260 [ext4] [<ffffffff811b34a0>] ? blkdev_get_block+0x0/0x70 [<ffffffff811af820>] ? block_write_full_page_endio+0xe0/0x120 [<ffffffff8112b7e6>] ? __pagevec_release+0x26/0x40 [<ffffffffa0088906>] ? ext4_da_writepages+0x416/0x620 [ext4] [<ffffffff81271a29>] ? cpumask_next_and+0x29/0x50 [<ffffffff81056a64>] ? find_busiest_group+0x244/0x9f0 [<ffffffff81129b11>] ? do_writepages+0x21/0x40 [<ffffffff811a513d>] ? writeback_single_inode+0xdd/0x2c0 [<ffffffff811a557e>] ? writeback_sb_inodes+0xce/0x180 [<ffffffff811a56db>] ? writeback_inodes_wb+0xab/0x1b0 [<ffffffff811a5a7b>] ? wb_writeback+0x29b/0x3f0 [<ffffffff814fd960>] ? thread_return+0x4e/0x76e [<ffffffff8107eb42>] ? del_timer_sync+0x22/0x30 [<ffffffff811a5d69>] ? wb_do_writeback+0x199/0x240 [<ffffffff811a5e73>] ? bdi_writeback_task+0x63/0x1b0 [<ffffffff81091f97>] ? bit_waitqueue+0x17/0xd0 [<ffffffff81138770>] ? bdi_start_fn+0x0/0x100 [<ffffffff811387f6>] ? bdi_start_fn+0x86/0x100 [<ffffffff81138770>] ? bdi_start_fn+0x0/0x100 [<ffffffff81091d66>] ? kthread+0x96/0xa0 [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [<ffffffff81091cd0>] ? kthread+0x0/0xa0 [<ffffffff8100c140>] ? child_rip+0x0/0x20 Code: 7d c8 48 89 de 45 89 e8 44 89 f1 e8 10 41 16 00 85 c0 89 c6 0f 84 b0 00 00 00 49 89 df 31 d2 31 c9 0f 1f 80 00 00 00 00 49 8b 07 <48> 8b 38 40 f6 c7 01 75 c6 48 85 ff 74 3c 48 83 ff ff 74 bb 44 Call Trace: [<ffffffff8112a965>] ? pagevec_lookup_tag+0x25/0x40 [<ffffffffa0083f9a>] ? ext4_num_dirty_pages+0xda/0x260 [ext4] [<ffffffff811b34a0>] ? blkdev_get_block+0x0/0x70 [<ffffffff811af820>] ? block_write_full_page_endio+0xe0/0x120 [<ffffffff8112b7e6>] ? __pagevec_release+0x26/0x40 [<ffffffffa0088906>] ? ext4_da_writepages+0x416/0x620 [ext4] [<ffffffff81271a29>] ? cpumask_next_and+0x29/0x50 [<ffffffff81056a64>] ? find_busiest_group+0x244/0x9f0 [<ffffffff81129b11>] ? do_writepages+0x21/0x40 [<ffffffff811a513d>] ? writeback_single_inode+0xdd/0x2c0 [<ffffffff811a557e>] ? writeback_sb_inodes+0xce/0x180 [<ffffffff811a56db>] ? writeback_inodes_wb+0xab/0x1b0 [<ffffffff811a5a7b>] ? wb_writeback+0x29b/0x3f0 [<ffffffff814fd960>] ? thread_return+0x4e/0x76e [<ffffffff8107eb42>] ? del_timer_sync+0x22/0x30 [<ffffffff811a5d69>] ? wb_do_writeback+0x199/0x240 [<ffffffff811a5e73>] ? bdi_writeback_task+0x63/0x1b0 [<ffffffff81091f97>] ? bit_waitqueue+0x17/0xd0 [<ffffffff81138770>] ? bdi_start_fn+0x0/0x100 [<ffffffff811387f6>] ? bdi_start_fn+0x86/0x100 [<ffffffff81138770>] ? bdi_start_fn+0x0/0x100 [<ffffffff81091d66>] ? kthread+0x96/0xa0 [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [<ffffffff81091cd0>] ? kthread+0x0/0xa0 [<ffffffff8100c140>] ? child_rip+0x0/0x20 BUG: soft lockup - CPU#1 stuck for 67s! [rhsmcertd-worke:3212] Modules linked in: bridge bonding 8021q garp stp llc ipv6 microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] CPU 1 Modules linked in: bridge bonding 8021q garp stp llc ipv6 microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 3212, comm: rhsmcertd-worke Tainted: G B --------------- 2.6.32-279.2.1.el6.x86_64 #1 SeaMicro SM10000-XE/Type2 - Board Product Name1 RIP: 0010:[<ffffffff81277170>] [<ffffffff81277170>] radix_tree_lookup_slot+0x0/0x70 RSP: 0000:ffff880432e03bb0 EFLAGS: 00000246 RAX: ffffea000e990ac7 RBX: ffff880432e03bc8 RCX: ffff880435fa6708 RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff88042df88de8 RBP: ffffffff8100bc0e R08: 0000000000000002 R09: 0000000000000028 R10: ffff88042df88de0 R11: 0000000000000002 R12: ffff880432e03ba8 R13: ffffffff8100bc0e R14: ffff880000038b08 R15: 0000000000000000 FS: 00007f81d7667700(0000) GS:ffff880028240000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f81d082e000 CR3: 000000042f54a000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rhsmcertd-worke (pid: 3212, threadinfo ffff880432e02000, task ffff8804307d1540) Stack: ffffffff81113ffe ffff88042f580300 ffff880432e03c78 ffff880432e03c38 <d> ffffffff8111550b ffff8804307d1540 ffff88042df88de0 0000000000000002 <d> 0000002881127061 ffff88042fe49380 ffff88042f580370 ffffea000e976c40 Call Trace: [<ffffffff81113ffe>] ? find_get_page+0x1e/0xa0 [<ffffffff8111550b>] ? filemap_fault+0x8b/0x500 [<ffffffff8113ed44>] ? __do_fault+0x54/0x510 [<ffffffff8113f2f7>] ? handle_pte_fault+0xf7/0xb50 [<ffffffff8115c30a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff81048ac7>] ? pte_alloc_one+0x37/0x50 [<ffffffff8113ff34>] ? handle_mm_fault+0x1e4/0x2b0 [<ffffffff81044479>] ? __do_page_fault+0x139/0x480 [<ffffffff810d358d>] ? audit_filter_rules+0x2d/0xdd0 [<ffffffff81145fca>] ? do_mmap_pgoff+0x33a/0x380 [<ffffffff8150339e>] ? do_page_fault+0x3e/0xa0 [<ffffffff81500755>] ? page_fault+0x25/0x30 Code: f6 48 c7 c2 20 e5 fc 81 e8 5e f5 00 00 85 c0 74 dc 4c 89 e7 89 45 e8 e8 ff f4 00 00 8b 45 e8 48 83 c4 10 5b 41 5c c9 c3 90 90 90 <55> 48 8d 47 08 48 8b 7f 08 48 89 e5 48 85 ff 74 54 40 f6 c7 01 Call Trace: [<ffffffff81113ffe>] ? find_get_page+0x1e/0xa0 [<ffffffff8111550b>] ? filemap_fault+0x8b/0x500 [<ffffffff8113ed44>] ? __do_fault+0x54/0x510 [<ffffffff8113f2f7>] ? handle_pte_fault+0xf7/0xb50 [<ffffffff8115c30a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff81048ac7>] ? pte_alloc_one+0x37/0x50 [<ffffffff8113ff34>] ? handle_mm_fault+0x1e4/0x2b0 [<ffffffff81044479>] ? __do_page_fault+0x139/0x480 [<ffffffff810d358d>] ? audit_filter_rules+0x2d/0xdd0 [<ffffffff81145fca>] ? do_mmap_pgoff+0x33a/0x380 [<ffffffff8150339e>] ? do_page_fault+0x3e/0xa0 [<ffffffff81500755>] ? page_fault+0x25/0x30 BUG: soft lockup - CPU#3 stuck for 67s! [sosreport:3171] Modules linked in: bridge bonding 8021q garp stp llc ipv6 microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] CPU 3 Modules linked in: bridge bonding 8021q garp stp llc ipv6 microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 3171, comm: sosreport Tainted: G B --------------- 2.6.32-279.2.1.el6.x86_64 #1 SeaMicro SM10000-XE/Type2 - Board Product Name1 RIP: 0010:[<ffffffff812771c1>] [<ffffffff812771c1>] radix_tree_lookup_slot+0x51/0x70 RSP: 0000:ffff8804326ebba8 EFLAGS: 00000282 RAX: ffff880435fa6708 RBX: ffff8804326ebba8 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffffea000e990ac8 RBP: ffffffff8100bc0e R08: 0000000000000002 R09: 0000000000000028 R10: ffff88042df88de0 R11: 0000000000000002 R12: ffff880000038b08 R13: 0000000000000000 R14: 00000040ffffffff R15: 0000000036381370 FS: 00007f685882b700(0000) GS:ffff8800282c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6852045000 CR3: 00000004335e0000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process sosreport (pid: 3171, threadinfo ffff8804326ea000, task ffff880432559500) Stack: ffff8804326ebbc8 ffffffff81113ffe ffff88043319ef00 ffff8804326ebc78 <d> ffff8804326ebc38 ffffffff8111550b ffff880432559500 ffff88042df88de0 <d> 0000000000000002 0000002881127061 ffff8804326ebc18 ffff88043319ef70 Call Trace: [<ffffffff81113ffe>] ? find_get_page+0x1e/0xa0 [<ffffffff8111550b>] ? filemap_fault+0x8b/0x500 [<ffffffff8113ed44>] ? __do_fault+0x54/0x510 [<ffffffff8113f2f7>] ? handle_pte_fault+0xf7/0xb50 [<ffffffff8115c30a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff81048ac7>] ? pte_alloc_one+0x37/0x50 [<ffffffff8113ff34>] ? handle_mm_fault+0x1e4/0x2b0 [<ffffffff81044479>] ? __do_page_fault+0x139/0x480 [<ffffffff810d358d>] ? audit_filter_rules+0x2d/0xdd0 [<ffffffff81145fca>] ? do_mmap_pgoff+0x33a/0x380 [<ffffffff8150339e>] ? do_page_fault+0x3e/0xa0 [<ffffffff81500755>] ? page_fault+0x25/0x30 Code: 81 77 3c 8d 0c 52 8d 4c 09 fa eb 09 66 0f 1f 44 00 00 83 e9 06 48 89 f0 48 d3 e8 83 e0 3f 48 8d 44 c7 18 48 8b 38 48 85 ff 74 14 <83> ea 01 75 e2 c9 c3 0f 1f 84 00 00 00 00 00 48 85 f6 74 f1 31 Call Trace: [<ffffffff81113ffe>] ? find_get_page+0x1e/0xa0 [<ffffffff8111550b>] ? filemap_fault+0x8b/0x500 [<ffffffff8113ed44>] ? __do_fault+0x54/0x510 [<ffffffff8113f2f7>] ? handle_pte_fault+0xf7/0xb50 [<ffffffff8115c30a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff81048ac7>] ? pte_alloc_one+0x37/0x50 [<ffffffff8113ff34>] ? handle_mm_fault+0x1e4/0x2b0 [<ffffffff81044479>] ? __do_page_fault+0x139/0x480 [<ffffffff810d358d>] ? audit_filter_rules+0x2d/0xdd0 [<ffffffff81145fca>] ? do_mmap_pgoff+0x33a/0x380 [<ffffffff8150339e>] ? do_page_fault+0x3e/0xa0 [<ffffffff81500755>] ? page_fault+0x25/0x30 BUG: soft lockup - CPU#6 stuck for 67s! [flush-8:0:483] Modules linked in: bridge bonding 8021q garp stp llc ipv6 microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] CPU 6 Modules linked in: bridge bonding 8021q garp stp llc ipv6 microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 483, comm: flush-8:0 Tainted: G B --------------- 2.6.32-279.2.1.el6.x86_64 #1 SeaMicro SM10000-XE/Type2 - Board Product Name1 RIP: 0010:[<ffffffff81113d24>] [<ffffffff81113d24>] find_get_pages_tag+0x64/0x120 RSP: 0018:ffff880430735940 EFLAGS: 00000246 RAX: ffff880435fa6708 RBX: ffff880430735990 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffea000e990ac8 RBP: ffffffff8100bc0e R08: 0000000000000000 R09: 0000000000000002 R10: 000000000000000e R11: ffff880435fa68f8 R12: ffffffff81faff00 R13: 0000000000000000 R14: ffffffff81faff00 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880028380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000010bf0f0 CR3: 0000000001a85000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process flush-8:0 (pid: 483, threadinfo ffff880430734000, task ffff880430a16ae0) Stack: ffffea000ea53130 ffffea000000000d ffff880430735970 ffff88042df88de8 <d> 0000000000000000 ffff880430735a00 0000000000000000 ffff88042df88de0 <d> 0000000000008000 ffff880430735a00 ffff8804307359b0 ffffffff8112a965 Call Trace: [<ffffffff8112a965>] ? pagevec_lookup_tag+0x25/0x40 [<ffffffffa0083f9a>] ? ext4_num_dirty_pages+0xda/0x260 [ext4] [<ffffffff811b34a0>] ? blkdev_get_block+0x0/0x70 [<ffffffff811af820>] ? block_write_full_page_endio+0xe0/0x120 [<ffffffff8112b7e6>] ? __pagevec_release+0x26/0x40 [<ffffffffa0088906>] ? ext4_da_writepages+0x416/0x620 [ext4] [<ffffffff81271a29>] ? cpumask_next_and+0x29/0x50 [<ffffffff81056a64>] ? find_busiest_group+0x244/0x9f0 [<ffffffff81129b11>] ? do_writepages+0x21/0x40 [<ffffffff811a513d>] ? writeback_single_inode+0xdd/0x2c0 [<ffffffff811a557e>] ? writeback_sb_inodes+0xce/0x180 [<ffffffff811a56db>] ? writeback_inodes_wb+0xab/0x1b0 [<ffffffff811a5a7b>] ? wb_writeback+0x29b/0x3f0 [<ffffffff814fd960>] ? thread_return+0x4e/0x76e [<ffffffff8107eb42>] ? del_timer_sync+0x22/0x30 [<ffffffff811a5d69>] ? wb_do_writeback+0x199/0x240 [<ffffffff811a5e73>] ? bdi_writeback_task+0x63/0x1b0 [<ffffffff81091f97>] ? bit_waitqueue+0x17/0xd0 [<ffffffff81138770>] ? bdi_start_fn+0x0/0x100 [<ffffffff811387f6>] ? bdi_start_fn+0x86/0x100 [<ffffffff81138770>] ? bdi_start_fn+0x0/0x100 [<ffffffff81091d66>] ? kthread+0x96/0xa0 [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [<ffffffff81091cd0>] ? kthread+0x0/0xa0 [<ffffffff8100c140>] ? child_rip+0x0/0x20 Code: 89 f1 e8 10 41 16 00 85 c0 89 c6 0f 84 b0 00 00 00 49 89 df 31 d2 31 c9 0f 1f 80 00 00 00 00 49 8b 07 48 8b 38 40 f6 c7 01 75 c6 <48> 85 ff 74 3c 48 83 ff ff 74 bb 44 8b 47 08 45 85 c0 74 e3 45 Call Trace: [<ffffffff8112a965>] ? pagevec_lookup_tag+0x25/0x40 [<ffffffffa0083f9a>] ? ext4_num_dirty_pages+0xda/0x260 [ext4] [<ffffffff811b34a0>] ? blkdev_get_block+0x0/0x70 [<ffffffff811af820>] ? block_write_full_page_endio+0xe0/0x120 [<ffffffff8112b7e6>] ? __pagevec_release+0x26/0x40 [<ffffffffa0088906>] ? ext4_da_writepages+0x416/0x620 [ext4] [<ffffffff81271a29>] ? cpumask_next_and+0x29/0x50 [<ffffffff81056a64>] ? find_busiest_group+0x244/0x9f0 [<ffffffff81129b11>] ? do_writepages+0x21/0x40 [<ffffffff811a513d>] ? writeback_single_inode+0xdd/0x2c0 [<ffffffff811a557e>] ? writeback_sb_inodes+0xce/0x180 [<ffffffff811a56db>] ? writeback_inodes_wb+0xab/0x1b0 [<ffffffff811a5a7b>] ? wb_writeback+0x29b/0x3f0 [<ffffffff814fd960>] ? thread_return+0x4e/0x76e [<ffffffff8107eb42>] ? del_timer_sync+0x22/0x30 [<ffffffff811a5d69>] ? wb_do_writeback+0x199/0x240 [<ffffffff811a5e73>] ? bdi_writeback_task+0x63/0x1b0 [<ffffffff81091f97>] ? bit_waitqueue+0x17/0xd0 [<ffffffff81138770>] ? bdi_start_fn+0x0/0x100 [<ffffffff811387f6>] ? bdi_start_fn+0x86/0x100 [<ffffffff81138770>] ? bdi_start_fn+0x0/0x100 [<ffffffff81091d66>] ? kthread+0x96/0xa0 [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [<ffffffff81091cd0>] ? kthread+0x0/0xa0 [<ffffffff8100c140>] ? child_rip+0x0/0x20 BUG: soft lockup - CPU#1 stuck for 66s! [rhsmcertd-worke:3212] Modules linked in: bridge bonding 8021q garp stp llc ipv6 microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] CPU 1 Modules linked in: bridge bonding 8021q garp stp llc ipv6 microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 3212, comm: rhsmcertd-worke Tainted: G B --------------- 2.6.32-279.2.1.el6.x86_64 #1 SeaMicro SM10000-XE/Type2 - Board Product Name1 RIP: 0010:[<ffffffff812771ae>] [<ffffffff812771ae>] radix_tree_lookup_slot+0x3e/0x70 RSP: 0000:ffff880432e03ba8 EFLAGS: 00000297 RAX: 0000000000000002 RBX: ffff880432e03ba8 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff880435fa66e0 RBP: ffffffff8100bc0e R08: 0000000000000002 R09: 0000000000000028 R10: ffff88042df88de0 R11: 0000000000000002 R12: ffff880000038b08 R13: 0000000000000000 R14: 00000040ffffffff R15: 00000000361cb570 FS: 00007f81d7667700(0000) GS:ffff880028240000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f81d082e000 CR3: 000000042f54a000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rhsmcertd-worke (pid: 3212, threadinfo ffff880432e02000, task ffff8804307d1540) Stack: ffff880432e03bc8 ffffffff81113ffe ffff88042f580300 ffff880432e03c78 <d> ffff880432e03c38 ffffffff8111550b ffff8804307d1540 ffff88042df88de0 <d> 0000000000000002 0000002881127061 ffff88042fe49380 ffff88042f580370 Call Trace: [<ffffffff81113ffe>] ? find_get_page+0x1e/0xa0 [<ffffffff8111550b>] ? filemap_fault+0x8b/0x500 [<ffffffff8113ed44>] ? __do_fault+0x54/0x510 [<ffffffff8113f2f7>] ? handle_pte_fault+0xf7/0xb50 [<ffffffff8115c30a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff81048ac7>] ? pte_alloc_one+0x37/0x50 [<ffffffff8113ff34>] ? handle_mm_fault+0x1e4/0x2b0 [<ffffffff81044479>] ? __do_page_fault+0x139/0x480 [<ffffffff810d358d>] ? audit_filter_rules+0x2d/0xdd0 [<ffffffff81145fca>] ? do_mmap_pgoff+0x33a/0x380 [<ffffffff8150339e>] ? do_page_fault+0x3e/0xa0 [<ffffffff81500755>] ? page_fault+0x25/0x30 Code: c7 01 74 49 48 83 e7 fe 8b 17 89 d0 48 3b 34 c5 a0 57 c0 81 77 3c 8d 0c 52 8d 4c 09 fa eb 09 66 0f 1f 44 00 00 83 e9 06 48 89 f0 <48> d3 e8 83 e0 3f 48 8d 44 c7 18 48 8b 38 48 85 ff 74 14 83 ea Call Trace: [<ffffffff81113ffe>] ? find_get_page+0x1e/0xa0 [<ffffffff8111550b>] ? filemap_fault+0x8b/0x500 [<ffffffff8113ed44>] ? __do_fault+0x54/0x510 [<ffffffff8113f2f7>] ? handle_pte_fault+0xf7/0xb50 [<ffffffff8115c30a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff81048ac7>] ? pte_alloc_one+0x37/0x50 [<ffffffff8113ff34>] ? handle_mm_fault+0x1e4/0x2b0 [<ffffffff81044479>] ? __do_page_fault+0x139/0x480 [<ffffffff810d358d>] ? audit_filter_rules+0x2d/0xdd0 [<ffffffff81145fca>] ? do_mmap_pgoff+0x33a/0x380 [<ffffffff8150339e>] ? do_page_fault+0x3e/0xa0 [<ffffffff81500755>] ? page_fault+0x25/0x30 After talking to the Seamicro tech, we'll re-image the node and see if the problem persists.
Assignee | ||
Comment 8•12 years ago
|
||
Same issues. Still discussing with SeaMicro to see what to do next. They said that "We have not tested RHEL6.3 in our lab and qualified it.", but I replied now telling them that it's ridiculous to assume it's a RHEL issue when we have over 100 nodes running 6.3 with no problems. I'll try assigning a different disk to this node, to isolate the issue.
Assignee | ||
Comment 10•12 years ago
|
||
Server crashed again with the new disk. Sending this info to SeaMicro, but looks like the c-card is busted.
Assignee | ||
Comment 11•12 years ago
|
||
We swapped the c-card. Keeping an eye on it now.
QA Contact: jdow → shyam
Assignee | ||
Comment 12•12 years ago
|
||
Closing out. Server decommissioned per bug 805945.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•