Closed
Bug 822415
Opened 12 years ago
Closed 12 years ago
Change the monitor for Socorro replication to use hot_standby_delay instead of replicate_row
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: selenamarie, Assigned: dumitru)
References
Details
We had a problem where the check returned a false positive. :dumitru is working on changing this.
Assignee | ||
Updated•12 years ago
|
Assignee: server-ops → dgherman
Reporter | ||
Comment 1•12 years ago
|
||
There is a bug in the perl script that doesn't properly handle '--port2' and '--host2' parameters as you'd expect from the documentation. If you specify this like so: --port=5432,5432 --host=host1,host2 That should fix the error that we saw on Tuesday.
Assignee | ||
Comment 2•12 years ago
|
||
[root@nagios1.private.phx1 mozilla]# ./check_postgres.pl --host=tp-socorro01-master01.phx1.mozilla.com,tp-socorro01-master02.phx1.mozilla.com --dbuser=nagiosdaemon --dbname=breakpad --action=hot_standby_delay --dbuser2=nagiosdaemon --dbname2=breakpad --warning=10 --critical=20 Password for user nagiosdaemon: Password for user nagiosdaemon: Password for user nagiosdaemon: Password for user nagiosdaemon: POSTGRES_HOT_STANDBY_DELAY OK: DB "breakpad" (host:tp-socorro01-master01.phx1.mozilla.com) -25328 | time=0.42s replay_delay=-25328;10;20 receive-delay=-25328;10;20 [root@nagios1.private.phx1 mozilla]# ./check_postgres.pl -V check_postgres.pl version 2.19.0 So, remember when I first tried the hot_standby_delay check with the older version of the script? It returned the same big negative values.
Reporter | ||
Comment 3•12 years ago
|
||
Yeah, it's because the standby receives WAL between the time the check runs on the master (first) and the check runs on the replica (second). It's not beautiful, but it does accurately represent the state of the system. They're looking at changing the logic for the script to ask the replica for it's location first.
Assignee | ||
Comment 4•12 years ago
|
||
I see. Does this mean we can switch to this check? If so, we need to fine tune it to alert us when the deltas are too high.
Reporter | ||
Comment 5•12 years ago
|
||
Let's try setting the delta to warn at 16777216 (that's in bytes, 16 MB). It should never get that high if things are working.
Reporter | ||
Comment 6•12 years ago
|
||
ping!
Assignee | ||
Comment 7•12 years ago
|
||
Completed: [10:05] <nagios-phx1> | dumitru: tp-socorro01-master02.phx1.mozilla.com:PostgreSQL Hot Standby Delay is OK - POSTGRES_HOT_STANDBY_DELAY OK: (host:tp-socorro01-master01.phx1.mozilla.com => tp-socorro01-master02.phx1.mozilla.com) 0 Last Checked: 2013-01-04 10:02:41 PST So I replaced the "replicate_row" with "hot_standby_delay". Thresholds are: warning at 16777216 and critical at 33554432.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•