Closed
Bug 1029206
Opened 10 years ago
Closed 10 years ago
vertica4.metrics.scl3 is reporting as Down
Categories
(Data & BI Services Team :: DB: MySQL, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mpressman, Assigned: mpressman)
Details
[dbadmin@vertica4.metrics.scl3 ~]$ /opt/vertica/bin/admintools -t view_cluster DB | Host | State ---------+---------------+------- metrics | 192.168.100.7 | DOWN metrics | 192.168.100.8 | UP metrics | 192.168.100.9 | UP
Assignee | ||
Comment 1•10 years ago
|
||
After restarting the node on 192.168.100.7 (vertica4.metrics.scl3) the host now reports as up: [dbadmin@vertica4.metrics.scl3 ~]$ /opt/vertica/bin/admintools -t view_cluster DB | Host | State ---------+------+------- metrics | ALL | UP
Assignee | ||
Updated•10 years ago
|
Assignee: server-ops-database → mpressman
Assignee | ||
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 2•10 years ago
|
||
A couple of points from the log. Shutdown occurred at: 2014-06-23 22:25:09.456 unknown:0x7f71b7a60700 [Init] <INFO> Shutdown complete. Exiting. Startup occurred at 2014-06-23 22:40: 2014-06-23 22:40:52.823 INFO New log 2014-06-23 22:40:52.823 unknown:0x7fa4240dd700 [Init] <INFO> Log /data/metrics/v_metrics_node0001_catalog/vertica.log opened; #1 2014-06-23 22:40:52.823 unknown:0x7fa4240dd700 [Init] <INFO> Processing command line: /opt/vertica/bin/vertica -C metrics -D /data/metrics/v_metrics_node0001_catalog -h 192.168.100.7 -p 5433 2014-06-23 22:40:52.823 unknown:0x7fa4240dd700 [Init] <INFO> Starting up Vertica Analytic Database v6.1.1-0 2014-06-23 22:40:52.823 unknown:0x7fa4240dd700 [Init] <INFO> Project Codename: Bulldozer 2014-06-23 22:40:52.823 unknown:0x7fa4240dd700 [Init] <INFO> vertica(v6.1.1-0) built by release@build2.verticacorp.com from releases/VER_6_1_RELEASE_BUILD_1_0_20130214@109264 on 'Thu Feb 14 14:43:35 2013' $BuildId$ 2014-06-23 22:40:52.823 unknown:0x7fa4240dd700 [Init] <INFO> 64-bit Optimized Build 2014-06-23 22:40:52.823 unknown:0x7fa4240dd700 [Init] <INFO> Compiler Version: 4.1.2 20080704 (Red Hat 4.1.2-52) 2014-06-23 22:40:52.825 unknown:0x7fa4240dd700 <LOG> @[initializing]: 00000/5081: Total swap memory used: 0 2014-06-23 22:40:52.825 unknown:0x7fa4240dd700 <LOG> @[initializing]: 00000/4435: Process size resident set: 24100864 2014-06-23 22:40:52.825 unknown:0x7fa4240dd700 <LOG> @[initializing]: 00000/5075: Total Memory free + cache: 59904757760 2014-06-23 22:40:52.836 unknown:0x7fa4240dd700 [Txn] <INFO> Looking for catalog at: /data/metrics/v_metrics_node0001_catalog/Catalog 2014-06-23 22:40:52.837 unknown:0x7fa4240dd700 [Catalog] <INFO> Loading Checkpoint 2014-06-23 22:40:55.899 unknown:0x7fa4240dd700 [Catalog] <INFO> Replaying 1 Txnlogs 2014-06-23 22:40:56.879 unknown:0x7fa4240dd700 [Txn] <INFO> Installing objects... 2014-06-23 22:40:57.708 unknown:0x7fa4240dd700 [Txn] <INFO> Catalog loaded from path: /data/metrics/v_metrics_node0001_catalog/Catalog [379726 objects, GLOBAL version 492036, LOCAL version 388536] (no checkpoint needed) 2014-06-23 22:40:57.711 unknown:0x7fa4240dd700 [Comms] <INFO> Changing my node name to: v_metrics_node0001 2014-06-23 22:40:57.711 unknown:0x7fa4240dd700 [Txn] <INFO> switchToLocalNode: v_metrics_node0001 with path /data/metrics/v_metrics_node0001_catalog/Catalog 2014-06-23 22:40:57.712 unknown:0x7fa4240dd700 [Txn] <INFO> Transaction sequence set, seq num=d6e633, nodeID=a 2014-06-23 22:40:57.712 unknown:0x7fa4240dd700 [Txn] <INFO> Catalog sequence set, seq num=315632f, nodeID=a 2014-06-23 22:40:57.712 unknown:0x7fa4240dd700 [Txn] <INFO> Found my node (v_metrics_node0001) in the catalog 2014-06-23 22:40:57.712 unknown:0x7fa4240dd700 [Txn] <INFO> Catalog info: version=0x78204, number of nodes=3, permanent #=3, K=1 2014-06-23 22:40:57.712 unknown:0x7fa4240dd700 [Txn] <INFO> Catalog info: current epoch=0x41532 2014-06-23 22:40:57.797 unknown:0x7fa4240dd700 [Init] <INFO> Catalog loaded 2014-06-23 22:40:57.797 unknown:0x7fa4240dd700 [Init] <INFO> Listening on port: 5433 2014-06-23 22:40:57.797 unknown:0x7fa4240dd700 [Init] <INFO> About to fork 2014-06-23 22:40:57.799 unknown:0x7fa4240dd700 [Init] <INFO> About to fork again 2014-06-23 22:40:57.802 unknown:0x7fa4240dd700 [Init] <INFO> Completed forking 2014-06-23 22:40:57.802 unknown:0x7fa4240dd700 [Init] <INFO> PID=1469 2014-06-23 22:40:57.802 unknown:0x7fa4240dd700 [Init] <INFO> Start reading DataCollector information
Assignee | ||
Comment 3•10 years ago
|
||
Adding a couple of items for posterity since we've encountered some unexplained downtimes for hosts in the vertica cluster. Although some of those cases seem to be due to bad memory, there have been other transient crashes that have not been fully whose causes have not been fully identified. I noticed in dmesg a kernel message that has been reported to cause instability issues: do_IRQ: 2.162 No irq handler for vector (irq -1) do_IRQ: 3.211 No irq handler for vector (irq -1) do_IRQ: 22.86 No irq handler for vector (irq -1) /var/log/messages shows: Jun 23 19:19:48 vertica4.metrics.scl3.mozilla.com kernel: do_IRQ: 3.211 No irq handler for vector (irq -1) This occurred just under three hours before the vertica "crash" Another output shows a similar issue after restarting: Jun 23 23:45:48 vertica4.metrics.scl3.mozilla.com kernel: do_IRQ: 22.86 No irq handler for vector (irq -1) If we do see another "crash" then this may be worth investigating further. https://bugzilla.redhat.com/show_bug.cgi?id=225399 provides some workarounds and patches to alleviate the issue
Updated•10 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•