Closed
Bug 643545
Opened 13 years ago
Closed 12 years ago
graph socorro postgres server_status table with ganglia
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rhelmer, Assigned: ashish)
References
Details
Socorro has a little built-in status page: https://crash-stats.mozilla.com/status This shows a small window of time (last hour) from this simple table: """ breakpad=> \d server_status Table "public.server_status" Column | Type | Modifiers -------------------------+-----------------------------+------------------------------------------------------------ id | integer | not null default nextval('server_status_id_seq'::regclass) date_recently_completed | timestamp without time zone | date_oldest_job_queued | timestamp without time zone | avg_process_sec | real | avg_wait_sec | real | waiting_job_count | integer | not null processors_count | integer | not null date_created | timestamp without time zone | not null Indexes: "server_status_pkey" PRIMARY KEY, btree (id) "idx_server_status_date" btree (date_created, id) """ This is fine for a small window of time, but ganglia would be a better tool for showing graphs over a longer period of time, like months or years. It'd be nice if this info was available right alongside all the other graphs we have in ganglia, too. I think avg_process_sec, avg_wait_sec, waiting_job_count and processors_count would be handy to have in ganglia.
Updated•13 years ago
|
Assignee: server-ops → bkero
Comment 1•13 years ago
|
||
I should be able to integrate the stats you want from this into the pgstats.py that is collected anyways. What query will get me these stats?
Comment 2•13 years ago
|
||
Rob, Waiting job count is already reported in ganglia, I think. The others would be new. At least, the overall count. Ben, For the others, it's really easy, you just do one metric at a time: SELECT avg(avg_process_sec) from server_status; SELECT avg(avg_wait_sec) from server_status; SELECT count(*) FROM server_status; I think we already have a generic ganglia probe set up which runs arbitrary queries which return a number.
Updated•13 years ago
|
Assignee: bkero → mpressman
Comment 4•12 years ago
|
||
Ashish is ramping up on ganglia and bkero isn't managing it anymore, going to punt to him.
Assignee: bkero → ashish
Assignee | ||
Comment 5•12 years ago
|
||
Shouldn't processors_count be =<10? breakpad=> SELECT count(*) FROM server_status; count ------- 41354 (1 row)
Status: NEW → ASSIGNED
Reporter | ||
Comment 6•12 years ago
|
||
(In reply to Ashish Vijayaram [:ashish] from comment #5) > Shouldn't processors_count be =<10? > > breakpad=> SELECT count(*) FROM server_status; > count > ------- > 41354 > (1 row) A row is inserted into server_status every 5 minutes, so you'd look at the "processors_count" column for the latest row to see how many processors were registered within the last 5 minutes. Here is the table definition: CREATE TABLE server_status ( id integer NOT NULL, date_recently_completed timestamp with time zone, date_oldest_job_queued timestamp with time zone, avg_process_sec real, avg_wait_sec real, waiting_job_count integer NOT NULL, processors_count integer NOT NULL, date_created timestamp with time zone NOT NULL ); It'd be nice to have long-term graphs of each column; you just need to pull the latest row every 5 mins.
Assignee | ||
Comment 7•12 years ago
|
||
Thanks! These 3 metrics have been added as - avg_process_sec, avg_wait_sec and processors_count.
Assignee | ||
Comment 8•12 years ago
|
||
un 13 07:00:12 tp-socorro01-master01 /usr/sbin/gmond[1698]: Unable to find the metric information for 'avg_process_sec'. Possible that the module has not been loaded.#012 Jun 13 07:00:12 tp-socorro01-master01 /usr/sbin/gmond[1698]: Unable to find the metric information for 'avg_wait_sec'. Possible that the module has not been loaded.#012 Jun 13 07:00:12 tp-socorro01-master01 /usr/sbin/gmond[1698]: Unable to find the metric information for 'processors_count'. Possible that the module has not been loaded.#012 Digging...
Assignee | ||
Comment 9•12 years ago
|
||
I got the 3 new metrics added and working in [1]. The only metric not working there is jobs_in_queue and from the looks of it, needs the query fixed: breakpad=> SELECT * from jobs_in_queue; ERROR: attribute 8 has wrong type DETAIL: Table has type timestamp with time zone, but query expects timestamp without time zone. [1] http://sp-admin01.phx1.mozilla.com/ganglia/?r=hour&cs=&ce=&m=&c=Socorro+Postgres&h=tp-socorro01-master01.phx1.mozilla.com&tab=m&vn=&mc=2&z=medium&metric_group=ALLGROUPS
Reporter | ||
Comment 10•12 years ago
|
||
(In reply to Ashish Vijayaram [:ashish] from comment #9) > I got the 3 new metrics added and working in [1]. The only metric not > working there is jobs_in_queue and from the looks of it, needs the query > fixed: > > breakpad=> SELECT * from jobs_in_queue; > ERROR: attribute 8 has wrong type > DETAIL: Table has type timestamp with time zone, but query expects > timestamp without time zone. Hmm are you sure the above error is for that query? All jobs_in_queue has is a single bigint column: breakpad=> \d jobs_in_queue View "public.jobs_in_queue" Column | Type | Modifiers --------+--------+----------- count | bigint | > [1] > http://sp-admin01.phx1.mozilla.com/ganglia/ > ?r=hour&cs=&ce=&m=&c=Socorro+Postgres&h=tp-socorro01-master01.phx1.mozilla. > com&tab=m&vn=&mc=2&z=medium&metric_group=ALLGROUPS You could cast a column to just TIMEZONE (which will be without timestamp), but if you're not using the column(s) in question it's better to just select what you are using (this also protects your code from being broken if someone adds or reorders columns: SELECT avg_process_sec, avg_wait_sec, processors_count FROM server_status;
Comment 11•12 years ago
|
||
Ok, so there's two issues here: 1) jobs_in_queue is broken. I need to fix it. 2) I should write a view on top of server_status for ganglia to check. I'll do both of those for the next release of socorro.
Updated•12 years ago
|
Assignee: ashish → josh
Updated•12 years ago
|
Assignee: josh → ashish
Assignee | ||
Updated•12 years ago
|
QA Contact: mrz → jdow
Updated•12 years ago
|
Whiteboard: [blocked on rhelmer/berkus]
Assignee | ||
Comment 12•12 years ago
|
||
:selena provided with an updated SQL for jobs_in_queue and processors_count. All the ganglia graphs look good now.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [blocked on rhelmer/berkus]
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•