Closed
Bug 709338
Opened 13 years ago
Closed 13 years ago
monitoring the sync cluster
Categories
(Web Apps Graveyard :: AppsInTheCloud, defect, P1)
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: tarek, Assigned: gozer)
References
Details
(Whiteboard: devPreviewNonBlocker)
Here's the we need to make sure we have for the developer preview,
- monitor the heartbeat page for appsync, every 10 s
- monitor the heartbeat page for the node.js server, every 10 s
- monitor the heartbeat page for every HBase node, every 10 s
in case of a timeout or non-200, send a mail to :
- <Bill's cell> (sent by email/in the ldap index)
- appsync-has-no-bugs@googlegroups.com
If possible (not a priority) on IRC on #openwebapps
nice to have:
- CPU/memory/fd monitoring with an alarm on low resource
Comment 1•13 years ago
|
||
if there's a web page showing the status of these pieces, i'll put it up on my 2nd floor MV kiosk...
Assignee | ||
Comment 2•13 years ago
|
||
(In reply to Tarek Ziadé (:tarek) from comment #0)
> Here's the we need to make sure we have for the developer preview,
Exact urls would be nice ;-)
> - monitor the heartbeat page for appsync, every 10 s
http://appsync-stage1.vm1.labs.sjc1.mozilla.com/__heartbeat__
> - monitor the heartbeat page for the node.js server, every 10 s
http://sauropod-stage1.vm1.labs.sjc1.mozilla.com:8001/__heartbeat__ ??
> - monitor the heartbeat page for every HBase node, every 10 s
??
> in case of a timeout or non-200, send a mail to :
>
> - <Bill's cell> (sent by email/in the ldap index)
> - appsync-has-no-bugs@googlegroups.com
Easy.
> If possible (not a priority) on IRC on #openwebapps
Will need to check with whoever is already running nagios IRC bots
> nice to have:
> - CPU/memory/fd monitoring with an alarm on low resource
Will do.
Assignee: nobody → gozer
Status: NEW → ASSIGNED
Reporter | ||
Comment 3•13 years ago
|
||
https://wiki.mozilla.org/Apps/ServerArchitecture will give you all the nodes urls
The URL is /__heartbeat__ for the two app servers, I don't know for the HBase server. ccing Ryan
Updated•13 years ago
|
Priority: -- → P1
Comment 4•13 years ago
|
||
FYI, the sauropod __heartbeat__ page pings hbase and errors out if it's not reachable
Reporter | ||
Comment 5•13 years ago
|
||
I am not certain we want this e.g. if possible the HBase nodes should have their own heartbeat and the node one should be standalone, so we can tell which box is really down in the monitoring
Comment 6•13 years ago
|
||
Fair enough. The hbase rest server has a "cluster status" page which should be useful for this:
http://appsync-hbase-stage1.vm1.labs.sjc1.mozilla.com:8080/status/cluster
Assignee | ||
Comment 7•13 years ago
|
||
Checks are now in place, and will send notifications to :
- <Bill's cell> (sent by email/in the ldap index)
- appsync-has-no-bugs@googlegroups.com
- Gozer's cell
Reporter | ||
Comment 9•13 years ago
|
||
I have tried to shut down gunicorn then nginx, and we did not receive any mail.
Did you get the SMS ?
Updated•13 years ago
|
Assignee | ||
Updated•13 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 10•13 years ago
|
||
The old app sync codebase is no longer going to be supported. All resolved fixed bugs are being marked as invalid, as they no longer apply to the new apps in the cloud service.
Resolution: FIXED → INVALID
Updated•6 years ago
|
Product: Web Apps → Web Apps Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•