Closed
Bug 1003170
Opened 11 years ago
Closed 11 years ago
Loop server - reporting up time
Categories
(Hello (Loop) :: Server, defect, P1)
Hello (Loop)
Server
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: RT, Assigned: mostlygeek)
References
Details
(Whiteboard: [qa+])
User Story
As a product manager I want to know monthly the Loop server up time so that I know how the infrastructure performs.
No description provided.
Reporter | ||
Updated•11 years ago
|
User Story: (updated)
Comment 1•11 years ago
|
||
I believe the second item "As a product manager I want to know monthly the Loop server up time so that I know how the infrastructure performs." is something ops provide. Benson, can you confirm that?
The maximum number of concurrent calls I can do is something tokbox itself can provide. We will be loadtesting with them in the future.
Flags: needinfo?(bwong)
Reporter | ||
Comment 3•11 years ago
|
||
Availability or Uptime refers to the percentage of the Loop server's availability calculated as follows: (TMW - TMU) * 100) / TMW
TMU: total minutes per calendar month of non availability
TMW: total minutes per calendar month
Reporter | ||
Comment 4•11 years ago
|
||
The maximum number of concurrent calls is a daily report of the maximum number of users who connected concurrently to the Loop server.
User Story: (updated)
Reporter | ||
Updated•11 years ago
|
User Story: (updated)
Assignee | ||
Comment 5•11 years ago
|
||
(In reply to Romain Testard [:RT] from comment #3)
> Availability or Uptime refers to the percentage of the Loop server's
> availability calculated as follows: (TMW - TMU) * 100) / TMW
>
> TMU: total minutes per calendar month of non availability
> TMW: total minutes per calendar month
What is considered "non-availability"?
For example, the loop-server is dependent on a third party and the loop-client service. If those go down it would likely affect availability from the user's perspective.
We can use pingdom to ping https://loop.services.mozilla.com and that will give us a high level report on the availability of that part of the service.
Flags: needinfo?(bwong)
Reporter | ||
Comment 6•11 years ago
|
||
The user story is for the Loop server only and as I understand it the service availability is dependent upon the Loop server, the partner infrastructure and the FxA server (applies only if the user is an account mode user)
Are you saying you would like to monitor the service overall including the partner infrastructure?
They are monitoring it separately but I agree it sounds sensible to be able to have a view of the service availability overall given each of the pieces separately can be service affecting.
Pingdom sounds like a great way to get things started for monitoring (high level view of availability at the IP level) and we could then work out whether we need more or not (I am not sure it will be enough to give us the full service availability picture - the servers may answer an HTTP request on a given socket although there may be some application level problem preventing users from placing calls).
Comment 7•11 years ago
|
||
Benson, I guess non availability is when we start sending 503s for a reason or the other. The server is configured to return a 503 status code when a dependent server isn't responding (redis, the tokbox servers etc).
Reporter | ||
Comment 8•11 years ago
|
||
Can you confirm how up-time will be reported? Is there a public PingDom page?
It would be great to be able to inform users of the availability of the service through a public PingDom page the way other services do it - http://heartbeat.skype.com/
Comment 9•11 years ago
|
||
Would we not also be hooking into http://status.mozilla.com/ - at least for the reporting of the status of our server?
Reporter | ||
Updated•11 years ago
|
User Story: (updated)
Summary: Loop server - reporting → Loop server - reporting up time
Reporter | ||
Comment 10•11 years ago
|
||
Looks like the ideal place indeed!
Assignee | ||
Comment 12•11 years ago
|
||
:alexis does the service return 503 everywhere or just for __heartbeat__?
We don't want this for everywhere as EC2 instances will be terminated and re-created if the load balancer's health checks fail.
I will look into getting more things into status.mozilla.com (or other options).
Flags: needinfo?(bwong)
Updated•11 years ago
|
Whiteboard: [qa?]
Comment 13•11 years ago
|
||
The service returns 503s when requests come in and the needed backend is not avail.
__heartbeat__ returns a 503 or a 200, depending if the service is available or not.
Comment 14•11 years ago
|
||
:mostlygeek If you want to know if the server runs, you should use the / endpoint if you want to know if the server handle requests (it means that backends are ready), you should use /__heartbeat__
Comment 16•11 years ago
|
||
Benson, can you confirm you're tracking this? Or, if there is anything we can do to help this going forward ? :-)
Flags: needinfo?(bwong)
Assignee | ||
Comment 17•11 years ago
|
||
We're still deciding on the best way to track it. To get started we might start with pingdom checking __heartbeat__.
Flags: needinfo?(bwong)
Comment 18•11 years ago
|
||
Any update on this front? How are we tracking uptime on production atm?
Updated•11 years ago
|
Severity: normal → major
Priority: -- → P1
Comment 19•11 years ago
|
||
Benson, do we have any news on the best way to track this? We need to have uptime information on production :)
Severity: major → critical
Flags: needinfo?(bwong)
Assignee | ||
Comment 20•11 years ago
|
||
We'll use pingdom for now (nice URLs to come in the future):
- dashboard: http://stats.pingdom.com/20dar76w4hmv
- loop-server: http://stats.pingdom.com/20dar76w4hmv/1304565
- loop-client: http://stats.pingdom.com/20dar76w4hmv/1304575
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: needinfo?(bwong)
Resolution: --- → FIXED
Comment 21•11 years ago
|
||
OK, I verified the current links in pingdom - you can stay on main page or click-through to more details per service.
We need to advertise these internally/externally - ideas?
Also, we should really consider merging with/using along side of http://status.mozilla.com/
although I am not sure how many people inside/outside Moizilla even know about it ;-)
:mostlygeek thanks for setting up pingdom, thanks for volunteering to follow up:
"I will look into getting more things into status.mozilla.com (or other options)."
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•