Closed
Bug 779978
Opened 13 years ago
Closed 13 years ago
hg pull corrupted repository Mon Jul 30 - why
Categories
(Developer Services :: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: u429623, Unassigned)
Details
(Whiteboard: [reit])
Attachments
(2 files)
One of the mercurial repositories used for conversion to git became corrupted, and needed to be re-cloned. This has been noted in the past with other repositories - this time we have logs!
The top likely causes include:
a) some client host event caused corruption during 'hg pull'
b) some server host event caused corrupted data to be sent during 'hg pull'
c) some client host event caused corruption during 'hg gexport'
d) bug in hggit, causing corruption of mercurial repo during 'hg gexport'
I don't have access to any of the logs needed to rule out the first three. (d) seems unlikely due to the widepread usage of that tool. This bug is to get the data to identify a possible event.
Details for this repo:
- client host: github-sync1.dmz.releng.scl3.mozilla.com
- repo location: /home/hwine/converted/integration-mozilla-inbound.bad
- pull URL: http://hg.mozilla.org/integration/mozilla-inbound
Details for this incident:
- last successful pull: . . . . . . . . . . . 2012-07-30T07:05:51-0700
- prior gexport (reported good, but suspect): 2012-07-30T07:06:00-0700
- last pull (reported good, but suspect): . . 2012-07-30T07:16:22-0700
- first bad gexport: . . . . . . . . . . . . 2012-07-30T07:29:18-0700
"Application" log will be attached
Correct the component
Assignee: nobody → server-ops-devservices
Component: Release Engineering: Developer Tools → Server Operations: Developer Services
QA Contact: hwine → shyam
Repository is considered broken based on reports from both git & hg:
0 [hwine@github-sync1 integration-mozilla-inbound]
$ hg gexport
exporting hg objects to git
abort: data/gfx/layers/ReadbackLayer.h.i@a55b1c98d5db: no match found!
255 [hwine@github-sync1 integration-mozilla-inbound]
$ PAGE=yes timeit hg --cwd $PWD verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
gfx/layers/ReadbackLayer.h@100870: a55b1c98d5db in manifests not found
gfx/layers/ReadbackProcessor.cpp@100870: 56e58b690f97 in manifests not found
gfx/layers/ReadbackProcessor.h@100870: 8b596f1527c5 in manifests not found
gfx/layers/ThebesLayerBuffer.cpp@100870: a62dbf7635f1 in manifests not found
gfx/layers/ThebesLayerBuffer.h@100870: c1032f32997b in manifests not found
102548 files, 101147 changesets, 515364 total revisions
5 integrity errors encountered!
(first damaged changeset appears to be 100870)
1 [hwine@github-sync1 integration-mozilla-inbound]
Comment 3•13 years ago
|
||
The mozilla-inbound data checks out today:
[root@hgssh1.dmz.scl3 mozilla-inbound]# date
Fri Aug 3 18:12:15 PDT 2012
[root@hgssh1.dmz.scl3 mozilla-inbound]# hg verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
102650 files, 101372 changesets, 516375 total revisions
These are the logs I've extracted from the servers from the date Monday, July 30th (with various hung process killed chaff culled out, see attachment if you'd like to see that).
[Mon Jul 30 08:45:14 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/connect.php, referer: http://bjfit.cn/home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_2819&touid=2819&pmid=0&daterange=2&pid=4451095&tid=920926
[Mon Jul 30 11:23:59 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/index.php, referer: http://gosec.cn/index.php/images/goods/20100717/images/goods/20100807/images/goods/20101009/article-35.html
[Mon Jul 30 11:41:38 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/index.php, referer: http://gosec.cn/index.php/images/goods/20100717/images/goods/20110606/images/goods/20110106/gallery-44.html
[Mon Jul 30 18:40:03 2012] [error] /usr/lib64/python2.6/site-packages/mercurial/archival.py:72: DeprecationWarning: use the name attribute
[Mon Jul 30 18:40:03 2012] [error] fname = getattr(self, 'name', None) or self.filename
[Mon Jul 30 18:40:26 2012] [error] Exception IOError: IOError('client connection closed',) in <bound method _Stream.__del__ of <tarfile._Stream instance at 0x7f336dac7440>> ignored
[Mon Jul 30 18:40:27 2012] [error] /usr/lib64/python2.6/site-packages/mercurial/archival.py:72: DeprecationWarning: use the name attribute
[Mon Jul 30 18:40:27 2012] [error] fname = getattr(self, 'name', None) or self.filename
[Mon Jul 30 18:40:38 2012] [error] Exception IOError: IOError('client connection closed',) in <bound method GzipFileWithTime.__del__ of <gzip mercurial.hgweb.request.wsgirequest object at 0x7f336bb14e10 0x7f336c578878>> ignored
[Mon Jul 30 08:38:27 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/admin
[Mon Jul 30 08:38:27 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/admin
[Mon Jul 30 18:40:26 2012] [error] /usr/lib64/python2.6/site-packages/mercurial/archival.py:72: DeprecationWarning: use the name attribute
[Mon Jul 30 18:40:26 2012] [error] fname = getattr(self, 'name', None) or self.filename
[Mon Jul 30 05:36:31 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/admin
[Mon Jul 30 06:25:54 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/connect.php, referer: http://www.bjfit.cn/home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_635&touid=635&pmid=0&daterange=2&pid=7478422&tid=1161258
[Mon Jul 30 06:26:03 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/connect.php, referer: http://www.bjfit.cn/home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_409532&touid=409532&pmid=0&daterange=2&pid=4920884&tid=944687
[Mon Jul 30 11:26:28 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/index.php, referer: http://gosec.cn/index.php/weln/me/statics/images/goods/20111204/themes/llyr/images/images/goods/20110910/images/goods/20110508/product-458.html
[Mon Jul 30 11:41:46 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/index.php, referer: http://gosec.cn/index.php/weln/me/statics/images/goods/20111218/gallery-27.html
[Mon Jul 30 11:50:36 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/index.php, referer: http://gosec.cn/index.php/weln/me/statics/themes/llyr/images/images/goods/20091208/images/goods/20100811/images/goods/20111207/images/goods/20100907/gallery-_ANY_-b,_ANY__t,_ANY__p,0-0---9.html
[Mon Jul 30 18:40:03 2012] [error] Exception IOError: IOError('client connection closed',) in <bound method _Stream.__del__ of <tarfile._Stream instance at 0x7f0fd8f0e1b8>> ignored
[Mon Jul 30 06:20:34 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/connect.php, referer: http://bjfit.cn/home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_591940&touid=591940&pmid=0&daterange=2&pid=7440638&tid=1159882
[Mon Jul 30 07:23:08 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/connect.php, referer: http://www.bjfit.cn/home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_7848&touid=7848&pmid=0&daterange=2&pid=4639236&tid=930412
[Mon Jul 30 11:40:16 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/index.php, referer: http://gosec.cn/
[Mon Jul 30 11:47:53 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/index.php, referer: http://gosec.cn/index.php/weln/me/statics/images/goods/20111218/images/goods/20100717/gallery-49.html
[Mon Jul 30 11:48:07 2012] [error] [client 10.22.74.208] File does not exist: /var/www/html/index.php, referer: http://gosec.cn/index.php/weln/me/statics/images/goods/20111204/themes/llyr/images/images/goods/20110910/images/goods/20091208/product-481.html
Comment 4•13 years ago
|
||
The above data means reason (b) in comment #0 is unlikely.
Do you have data for reasons (a) & (c)? (I don't have access to any logs on github-sync1.dmz.releng.scl3.mozilla.com).
Comment 6•13 years ago
|
||
The things culled in my logs were hangups from clients, which could explain (a), but would not be the fault of the server. It happens so often that it would have been hidden in all the other traffic hangups.
Logging into github-sync1.dmz.releng.scl3.mozilla.com I found this in the kernel logs:
INFO: task kjournald:447 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald D 0000000000000000 0 447 2 0x00000080
ffff880079f93c50 0000000000000046 ffff880079f93c10 ffffffffa00041fc
ffff880079f93bc0 ffffffff81012b59 ffff880079f93c00 ffffffff8109b6a9
ffff880079d57ab8 ffff880079f93fd8 000000000000f4e8 ffff880079d57ab8
Call Trace:
[<ffffffffa00041fc>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
[<ffffffff81012b59>] ? read_tsc+0x9/0x20
[<ffffffff8109b6a9>] ? ktime_get_ts+0xa9/0xe0
[<ffffffff8109b6a9>] ? ktime_get_ts+0xa9/0xe0
[<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
[<ffffffff814ed293>] io_schedule+0x73/0xc0
[<ffffffff811a9410>] sync_buffer+0x40/0x50
[<ffffffff814edc4f>] __wait_on_bit+0x5f/0x90
[<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
[<ffffffff814edcf8>] out_of_line_wait_on_bit+0x78/0x90
[<ffffffff81090ad0>] ? wake_bit_function+0x0/0x50
[<ffffffff811a93c6>] __wait_on_buffer+0x26/0x30
[<ffffffffa00b3fde>] journal_commit_transaction+0x9ee/0x1310 [jbd]
[<ffffffff8107bf8c>] ? lock_timer_base+0x3c/0x70
[<ffffffff8107ca1b>] ? try_to_del_timer_sync+0x7b/0xe0
[<ffffffffa00b9bb8>] kjournald+0xe8/0x250 [jbd]
[<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa00b9ad0>] ? kjournald+0x0/0x250 [jbd]
[<ffffffff81090726>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffff81090690>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
INFO: task master:1724 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
master D 0000000000000001 0 1724 1 0x00000084
ffff880037317968 0000000000000082 ffff880037317928 ffffffffa00041fc
ffff88000003cc60 0000000000000000 ffff880000000041 ffffffff8109b6a9
ffff880079d56638 ffff880037317fd8 000000000000f4e8 ffff880079d56638
Call Trace:
[<ffffffffa00041fc>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
[<ffffffff8109b6a9>] ? ktime_get_ts+0xa9/0xe0
[<ffffffff8109b6a9>] ? ktime_get_ts+0xa9/0xe0
[<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
[<ffffffff814ed293>] io_schedule+0x73/0xc0
[<ffffffff811a9410>] sync_buffer+0x40/0x50
[<ffffffff814edafa>] __wait_on_bit_lock+0x5a/0xc0
[<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
[<ffffffff814edbd8>] out_of_line_wait_on_bit_lock+0x78/0x90
[<ffffffff81090ad0>] ? wake_bit_function+0x0/0x50
[<ffffffff811a95b6>] __lock_buffer+0x36/0x40
[<ffffffffa00b32b3>] do_get_write_access+0x483/0x500 [jbd]
[<ffffffff811a8acc>] ? __getblk+0x2c/0x2e0
[<ffffffffa00b34c1>] journal_get_write_access+0x31/0x50 [jbd]
[<ffffffffa00efa8d>] __ext3_journal_get_write_access+0x2d/0x60 [ext3]
[<ffffffffa00d627b>] ext3_reserve_inode_write+0x7b/0xa0 [ext3]
[<ffffffffa00d62e8>] ext3_mark_inode_dirty+0x48/0xa0 [ext3]
[<ffffffffa00d64c1>] ext3_dirty_inode+0x61/0xa0 [ext3]
[<ffffffff8119fdfb>] __mark_inode_dirty+0x3b/0x160
[<ffffffff81190372>] file_update_time+0xf2/0x170
[<ffffffff811800b2>] pipe_write+0x2d2/0x650
[<ffffffff8117628a>] do_sync_write+0xfa/0x140
[<ffffffff81186cb2>] ? user_path_at+0x62/0xa0
[<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8117b664>] ? cp_new_stat+0xe4/0x100
[<ffffffff81012b59>] ? read_tsc+0x9/0x20
[<ffffffff8120c1e6>] ? security_file_permission+0x16/0x20
[<ffffffff81176588>] vfs_write+0xb8/0x1a0
[<ffffffff81176f91>] sys_write+0x51/0x90
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task puppet:19792 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
puppet D 0000000000000001 0 19792 19787 0x00000080
ffff88003728bb18 0000000000000082 0000000000000000 ffffffffa00041fc
ffff880000000041 ffffffff81042c24 ffff880000033b48 0000000000000001
ffff880037a670b8 ffff88003728bfd8 000000000000f4e8 ffff880037a670b8
Call Trace:
[<ffffffffa00041fc>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
[<ffffffff81042c24>] ? __do_page_fault+0x1e4/0x480
[<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
[<ffffffff814ed293>] io_schedule+0x73/0xc0
[<ffffffff811a9410>] sync_buffer+0x40/0x50
[<ffffffff814edafa>] __wait_on_bit_lock+0x5a/0xc0
[<ffffffff811a842f>] ? __find_get_block_slow+0xaf/0x130
[<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
[<ffffffff814edbd8>] out_of_line_wait_on_bit_lock+0x78/0x90
[<ffffffff81090ad0>] ? wake_bit_function+0x0/0x50
[<ffffffff81090957>] ? bit_waitqueue+0x17/0xd0
[<ffffffff811a95b6>] __lock_buffer+0x36/0x40
[<ffffffffa00b25c3>] journal_invalidatepage+0x1b3/0x2b0 [jbd]
[<ffffffffa00d52ea>] ext3_invalidatepage+0x5a/0xa0 [ext3]
[<ffffffff811a8acc>] ? __getblk+0x2c/0x2e0
[<ffffffff81128245>] do_invalidatepage+0x25/0x30
[<ffffffff81128462>] truncate_inode_page+0xa2/0xc0
[<ffffffff81128760>] truncate_inode_pages_range+0x160/0x460
[<ffffffff810519c3>] ? __wake_up+0x53/0x70
[<ffffffffa00b1786>] ? journal_stop+0x1e6/0x2c0 [jbd]
[<ffffffff814edfee>] ? mutex_lock+0x1e/0x50
[<ffffffffa00d9560>] ? ext3_delete_inode+0x0/0x140 [ext3]
[<ffffffff81128a75>] truncate_inode_pages+0x15/0x20
[<ffffffffa00d957d>] ext3_delete_inode+0x1d/0x140 [ext3]
[<ffffffff811915ee>] generic_delete_inode+0xde/0x1d0
[<ffffffff81191745>] generic_drop_inode+0x65/0x80
[<ffffffff811905c2>] iput+0x62/0x70
[<ffffffff81186502>] do_unlinkat+0x112/0x1c0
[<ffffffff81141688>] ? do_munmap+0x308/0x3a0
[<ffffffff811865c6>] sys_unlink+0x16/0x20
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
sd 2:0:0:0: [sda] task abort on host 2, ffff880079c9f480
sd 2:0:0:0: [sda] task abort on host 2, ffff88003752fd80
INFO: task kjournald:447 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald D 0000000000000000 0 447 2 0x00000080
ffff880079f93c50 0000000000000046 ffff880079f93c10 ffffffffa00041fc
ffff880079f93bc0 ffffffff81012b59 ffff880079f93c00 ffffffff8109b6a9
ffff880079d57ab8 ffff880079f93fd8 000000000000f4e8 ffff880079d57ab8
Call Trace:
[<ffffffffa00041fc>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
[<ffffffff81012b59>] ? read_tsc+0x9/0x20
[<ffffffff8109b6a9>] ? ktime_get_ts+0xa9/0xe0
[<ffffffff8109b6a9>] ? ktime_get_ts+0xa9/0xe0
[<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
[<ffffffff814ed293>] io_schedule+0x73/0xc0
[<ffffffff811a9410>] sync_buffer+0x40/0x50
[<ffffffff814edc4f>] __wait_on_bit+0x5f/0x90
[<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
[<ffffffff814edcf8>] out_of_line_wait_on_bit+0x78/0x90
[<ffffffff81090ad0>] ? wake_bit_function+0x0/0x50
[<ffffffff811a93c6>] __wait_on_buffer+0x26/0x30
[<ffffffffa00b3beb>] journal_commit_transaction+0x5fb/0x1310 [jbd]
[<ffffffff8107bf8c>] ? lock_timer_base+0x3c/0x70
[<ffffffff8107ca1b>] ? try_to_del_timer_sync+0x7b/0xe0
[<ffffffffa00b9bb8>] kjournald+0xe8/0x250 [jbd]
[<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa00b9ad0>] ? kjournald+0x0/0x250 [jbd]
[<ffffffff81090726>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffff81090690>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
INFO: task master:1724 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
master D 0000000000000001 0 1724 1 0x00000084
ffff880037317968 0000000000000082 0000000000000000 ffffffffa00041fc
ffff880037317928 ffffffff81053054 ffff880037317908 ffffffff8106208b
ffff880079d56638 ffff880037317fd8 000000000000f4e8 ffff880079d56638
Call Trace:
[<ffffffffa00041fc>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod]
[<ffffffff81053054>] ? check_preempt_wakeup+0x1a4/0x260
[<ffffffff8106208b>] ? enqueue_task_fair+0xfb/0x100
[<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
[<ffffffff814ed293>] io_schedule+0x73/0xc0
[<ffffffff811a9410>] sync_buffer+0x40/0x50
[<ffffffff814edafa>] __wait_on_bit_lock+0x5a/0xc0
[<ffffffff811a93d0>] ? sync_buffer+0x0/0x50
[<ffffffff814edbd8>] out_of_line_wait_on_bit_lock+0x78/0x90
[<ffffffff81090ad0>] ? wake_bit_function+0x0/0x50
[<ffffffff811a95b6>] __lock_buffer+0x36/0x40
[<ffffffffa00b32b3>] do_get_write_access+0x483/0x500 [jbd]
[<ffffffff811a8acc>] ? __getblk+0x2c/0x2e0
[<ffffffffa00b34c1>] journal_get_write_access+0x31/0x50 [jbd]
[<ffffffffa00efa8d>] __ext3_journal_get_write_access+0x2d/0x60 [ext3]
[<ffffffffa00d627b>] ext3_reserve_inode_write+0x7b/0xa0 [ext3]
[<ffffffffa00d62e8>] ext3_mark_inode_dirty+0x48/0xa0 [ext3]
[<ffffffffa00d64c1>] ext3_dirty_inode+0x61/0xa0 [ext3]
[<ffffffff8119fdfb>] __mark_inode_dirty+0x3b/0x160
[<ffffffff81190372>] file_update_time+0xf2/0x170
[<ffffffff811800b2>] pipe_write+0x2d2/0x650
[<ffffffff8117628a>] do_sync_write+0xfa/0x140
[<ffffffff81186cb2>] ? user_path_at+0x62/0xa0
[<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8117b664>] ? cp_new_stat+0xe4/0x100
[<ffffffff81012b59>] ? read_tsc+0x9/0x20
[<ffffffff8120c1e6>] ? security_file_permission+0x16/0x20
[<ffffffff81176588>] vfs_write+0xb8/0x1a0
[<ffffffff81176f91>] sys_write+0x51/0x90
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task nscd:2521 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nscd D 0000000000000000 0 2521 1 0x00000080
ffff880037acbe08 0000000000000086 0000000000000000 ffff8800252e3438
ffff880037a80e00 ffffffff8120d34f ffff880037acbd98 ffff88007a43f200
ffff88007aa0b078 ffff880037acbfd8 000000000000f4e8 ffff88007aa0b078
Call Trace:
[<ffffffff8120d34f>] ? security_inode_permission+0x1f/0x30
[<ffffffff814ef065>] rwsem_down_failed_common+0x95/0x1d0
[<ffffffff814ef1c3>] rwsem_down_write_failed+0x23/0x30
[<ffffffff81276eb3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff814ee6c2>] ? down_write+0x32/0x40
[<ffffffff81131ddc>] sys_mmap_pgoff+0x5c/0x2d0
[<ffffffff81010469>] sys_mmap+0x29/0x30
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task nscd:2523 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nscd D 0000000000000000 0 2523 1 0x00000080
ffff880037639988 0000000000000086 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
ffff880037233038 ffff880037639fd8 000000000000f4e8 ffff880037233038
Call Trace:
[<ffffffffa00b30f5>] do_get_write_access+0x2c5/0x500 [jbd]
[<ffffffff81090ad0>] ? wake_bit_function+0x0/0x50
[<ffffffffa00b34c1>] journal_get_write_access+0x31/0x50 [jbd]
[<ffffffffa00efa8d>] __ext3_journal_get_write_access+0x2d/0x60 [ext3]
[<ffffffffa00d627b>] ext3_reserve_inode_write+0x7b/0xa0 [ext3]
[<ffffffffa00d62e8>] ext3_mark_inode_dirty+0x48/0xa0 [ext3]
[<ffffffffa00d64c1>] ext3_dirty_inode+0x61/0xa0 [ext3]
[<ffffffff8119fdfb>] __mark_inode_dirty+0x3b/0x160
[<ffffffff81190372>] file_update_time+0xf2/0x170
[<ffffffff811a9128>] ? __set_page_dirty_buffers+0x88/0xc0
[<ffffffff8113ad07>] do_wp_page+0x3b7/0x8d0
[<ffffffff814925ea>] ? inet_recvmsg+0x5a/0x90
[<ffffffff8141a123>] ? sock_recvmsg+0x133/0x160
[<ffffffff8113b9fd>] handle_pte_fault+0x2cd/0xb50
[<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8113c464>] handle_mm_fault+0x1e4/0x2b0
[<ffffffff81042b79>] __do_page_fault+0x139/0x480
[<ffffffff8118d1cf>] ? __d_free+0x3f/0x60
[<ffffffff8118d248>] ? d_free+0x58/0x60
[<ffffffff81195740>] ? mntput_no_expire+0x30/0x110
[<ffffffff81177ee1>] ? __fput+0x1a1/0x210
[<ffffffff814f253e>] do_page_fault+0x3e/0xa0
[<ffffffff814ef8f5>] page_fault+0x25/0x30
Perhaps the disk dropped out from under it for a short while. Looping lerxst into this in case he has some comment (this is an ESX VM).
Comment 7•13 years ago
|
||
It's possible that could have been caused by an issue we had with storage vmotions. Unfortunately I don't have an exact date & time as to when that happened to find out if it overlapped with this or not.
Dan - I'm not familiar enough with vmotions to know if they were a "normal event" or an exceptional event? Also is that unique to vmdk's or would something similar happened to NFS mounts at the same time?
The difference on my end is how fast we work to get off the VM (we are slowly moving to real hardware, but have much large data on NFS disk).
Thanks!
Comment 9•13 years ago
|
||
It was an extremely abnormal event and one we know how to avoid now. Storage VMotions are distinct from regular VMotions as well. (The former changes the datastore a VM is running on but leaves it on the same host). The issue comes from running more than 2 storage vmotions simultaneously. It's a bug in vSphere. Note that while regular VMotions can be triggered automatically by the system to balance load, we do not (and will never) enable automatic Storage VMotions.
Reporter | ||
Comment 10•13 years ago
|
||
Dan - thanks for the explanation.
That answers all the questions I have - thanks Ben & Dan.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in
before you can comment on or make changes to this bug.
Description
•