Closed Bug 709338 Opened 13 years ago Closed 13 years ago

monitoring the sync cluster

Categories

(Web Apps Graveyard :: AppsInTheCloud, defect, P1)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: tarek, Assigned: gozer)

References

Details

(Whiteboard: devPreviewNonBlocker)

Here's the we need to make sure we have for the developer preview, 

- monitor the heartbeat page for appsync, every 10 s
- monitor the heartbeat page for the node.js server, every 10 s
- monitor the heartbeat page for every HBase node, every 10 s
 
in case of a timeout or non-200, send a mail to :

- <Bill's cell> (sent by email/in the ldap index) 
- appsync-has-no-bugs@googlegroups.com

If possible (not a priority) on IRC on #openwebapps

nice to have:
- CPU/memory/fd monitoring with an alarm on low resource
if there's a web page showing the status of these pieces, i'll put it up on my 2nd floor MV kiosk...
(In reply to Tarek Ziadé (:tarek) from comment #0)
> Here's the we need to make sure we have for the developer preview, 

Exact urls would be nice ;-)
 
> - monitor the heartbeat page for appsync, every 10 s

http://appsync-stage1.vm1.labs.sjc1.mozilla.com/__heartbeat__

> - monitor the heartbeat page for the node.js server, every 10 s

http://sauropod-stage1.vm1.labs.sjc1.mozilla.com:8001/__heartbeat__ ??

> - monitor the heartbeat page for every HBase node, every 10 s

??

> in case of a timeout or non-200, send a mail to :
> 
> - <Bill's cell> (sent by email/in the ldap index) 
> - appsync-has-no-bugs@googlegroups.com

Easy.

> If possible (not a priority) on IRC on #openwebapps

Will need to check with whoever is already running nagios IRC bots

> nice to have:
> - CPU/memory/fd monitoring with an alarm on low resource

Will do.
Assignee: nobody → gozer
Status: NEW → ASSIGNED
https://wiki.mozilla.org/Apps/ServerArchitecture will give you all the nodes urls

The URL is /__heartbeat__ for the two app servers, I don't know for the HBase server. ccing Ryan
Priority: -- → P1
FYI, the sauropod __heartbeat__ page pings hbase and errors out if it's not reachable
I am not certain we want this e.g. if possible the HBase nodes should have their own heartbeat and the node one should be standalone, so we can tell which box is really down in the monitoring
Fair enough.  The hbase rest server has a "cluster status" page which should be useful for this:

  http://appsync-hbase-stage1.vm1.labs.sjc1.mozilla.com:8080/status/cluster
Checks are now in place, and will send notifications to :

- <Bill's cell> (sent by email/in the ldap index) 
- appsync-has-no-bugs@googlegroups.com
- Gozer's cell
awesome!
Whiteboard: devPreviewNonBlocker
I have tried to shut down gunicorn then nginx, and we did not receive any mail.

Did you get the SMS ?
Blocks: 710342
No longer blocks: 700492
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
The old app sync codebase is no longer going to be supported. All resolved fixed bugs are being marked as invalid, as they no longer apply to the new apps in the cloud service.
Resolution: FIXED → INVALID
Product: Web Apps → Web Apps Graveyard
You need to log in before you can comment on or make changes to this bug.