Closed Bug 709338 Opened 13 years ago Closed 13 years ago

monitoring the sync cluster

Tracking

(Not tracked)

Status:

RESOLVED INVALID

People

(Reporter: tarek, Assigned: gozer)

References

Details

(Whiteboard: devPreviewNonBlocker)

Tarek Ziadé (:tarek)

Reporter

Description

•

13 years ago

Here's the we need to make sure we have for the developer preview, 

- monitor the heartbeat page for appsync, every 10 s
- monitor the heartbeat page for the node.js server, every 10 s
- monitor the heartbeat page for every HBase node, every 10 s
 
in case of a timeout or non-200, send a mail to :

- <Bill's cell> (sent by email/in the ldap index) 
- appsync-has-no-bugs@googlegroups.com

If possible (not a priority) on IRC on #openwebapps

nice to have:
- CPU/memory/fd monitoring with an alarm on low resource

Bill Walker [:bwalker] [@wfwalker]

Comment 1

•

13 years ago

if there's a web page showing the status of these pieces, i'll put it up on my 2nd floor MV kiosk...

Philippe M. Chiasson (:gozer)

Assignee

Comment 2

•

13 years ago

(In reply to Tarek Ziadé (:tarek) from comment #0)
> Here's the we need to make sure we have for the developer preview, 

Exact urls would be nice ;-)
 
> - monitor the heartbeat page for appsync, every 10 s

http://appsync-stage1.vm1.labs.sjc1.mozilla.com/__heartbeat__

> - monitor the heartbeat page for the node.js server, every 10 s

http://sauropod-stage1.vm1.labs.sjc1.mozilla.com:8001/__heartbeat__ ??

> - monitor the heartbeat page for every HBase node, every 10 s

??

> in case of a timeout or non-200, send a mail to :
> 
> - <Bill's cell> (sent by email/in the ldap index) 
> - appsync-has-no-bugs@googlegroups.com

Easy.

> If possible (not a priority) on IRC on #openwebapps

Will need to check with whoever is already running nagios IRC bots

> nice to have:
> - CPU/memory/fd monitoring with an alarm on low resource

Will do.

Assignee: nobody → gozer

Status: NEW → ASSIGNED

Tarek Ziadé (:tarek)

Reporter

Comment 3

•

13 years ago

https://wiki.mozilla.org/Apps/ServerArchitecture will give you all the nodes urls

The URL is /__heartbeat__ for the two app servers, I don't know for the HBase server. ccing Ryan

Bill Walker [:bwalker] [@wfwalker]

Updated

•

13 years ago

Priority: -- → P1

Bill Walker [:bwalker] [@wfwalker]

Updated

•

13 years ago

Blocks: 700492

Ryan Kelly [:rfkelly]

Comment 4

•

13 years ago

FYI, the sauropod __heartbeat__ page pings hbase and errors out if it's not reachable

Tarek Ziadé (:tarek)

Reporter

Comment 5

•

13 years ago

I am not certain we want this e.g. if possible the HBase nodes should have their own heartbeat and the node one should be standalone, so we can tell which box is really down in the monitoring

Ryan Kelly [:rfkelly]

Comment 6

•

13 years ago

Fair enough.  The hbase rest server has a "cluster status" page which should be useful for this:

  http://appsync-hbase-stage1.vm1.labs.sjc1.mozilla.com:8080/status/cluster

Philippe M. Chiasson (:gozer)

Assignee

Comment 7

•

13 years ago

Checks are now in place, and will send notifications to :

- <Bill's cell> (sent by email/in the ldap index) 
- appsync-has-no-bugs@googlegroups.com
- Gozer's cell

Bill Walker [:bwalker] [@wfwalker]

Comment 8

•

13 years ago

awesome!

Whiteboard: devPreviewNonBlocker

Tarek Ziadé (:tarek)

Reporter

Comment 9

•

13 years ago

I have tried to shut down gunicorn then nginx, and we did not receive any mail.

Did you get the SMS ?

dclarke@mozilla.com [:onecyrenus]

Updated

•

13 years ago

Blocks: 710342
No longer blocks: 700492

Philippe M. Chiasson (:gozer)

Assignee

Updated

•

13 years ago

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Jason Smith [:jsmith]

Comment 10

•

12 years ago

The old app sync codebase is no longer going to be supported. All resolved fixed bugs are being marked as invalid, as they no longer apply to the new apps in the cloud service.

Resolution: FIXED → INVALID

BMO Automation

Updated

•

6 years ago

Product: Web Apps → Web Apps Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

monitoring the sync cluster

Categories

(Web Apps Graveyard :: AppsInTheCloud, defect, P1)

Tracking

(Not tracked)

People

(Reporter: tarek, Assigned: gozer)

References

Details

(Whiteboard: devPreviewNonBlocker)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Updated

Comment 10

Updated