Closed
Bug 1472271
Opened 7 years ago
Closed 7 years ago
Decision task problem because of mercurial
Categories
(Developer Services :: Mercurial: hg.mozilla.org, defect, P1)
Developer Services
Mercurial: hg.mozilla.org
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: apop, Assigned: apop)
Details
There is an issue with hg servers and this affects the try and the decision tasks, please check the error :
[task 2018-06-29T17:13:30.584Z] ReadTimeout: HTTPSConnectionPool(host='hg.mozilla.org', port=443): Read timed out. (read timeout=5)
can you please check the issue ?
Updated•7 years ago
|
Severity: normal → blocker
Priority: -- → P1
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → apop
Comment 1•7 years ago
|
||
sounds like activedata is taking down hgmo and probably related:
<nagios-scl3> Fri 17:09:01 UTC [5914] [Unknown] hgweb15.dmz.scl3.mozilla.com:httpd max clients is WARNING: Using 26 out of 26 Clients (http://m.mozilla.org/httpd+max+clients)
10:12:29
<gps> ekyle-no-power: activedata is hammering hgmo
10:13:11 Usul, sal, dhouse: we can mitigate the load by banning UAs with "ActiveData-ETL" in them
10:15:02
<nagios-scl3> Fri 17:15:01 UTC [5917] [Unknown] hgweb15.dmz.scl3.mozilla.com:httpd max clients is WARNING: Using 26 out of 26 Clients (http://m.mozilla.org/httpd+max+clients)
10:18:22 apop|away → apop|ciduty
10:18:53
<nagios-scl3> Fri 17:18:52 UTC [5921] [devservices] hgweb14.dmz.scl3.mozilla.com:Load is CRITICAL: CRITICAL - load average: 47.23, 47.45, 46.96 (http://m.mozilla.org/Load)
10:20:38
<sal> im not sure how to filter them
10:21:01
<nagios-scl3> Fri 17:21:00 UTC [5924] [Unknown] hgweb15.dmz.scl3.mozilla.com:httpd max clients is WARNING: Using 26 out of 26 Clients (http://m.mozilla.org/httpd+max+clients)
10:21:07
<gps> i made noise in #sysadmins. usually someone there can take care of things
10:21:10
<nagios-scl3> Fri 17:21:09 UTC [5927] [devservices] hgweb13.dmz.scl3.mozilla.com:httpd max clients is WARNING: Using 53 out of 53 Clients (http://m.mozilla.org/httpd+max+clients)
10:24:08 Fri 17:24:07 UTC [5932] [devservices] hgweb13.dmz.scl3.mozilla.com:Load is CRITICAL: CRITICAL - load average: 46.42, 47.28, 46.98 (http://m.mozilla.org/Load)
10:26:45
<jlund> Jordan Lund hitting timeouts on at least try: https://treeherder.mozilla.org/#/jobs?repo=try
10:26:51 [task 2018-06-29T17:13:30.584Z] ReadTimeout: HTTPSConnectionPool(host='hg.mozilla.org', port=443): Read timed out. (read timeout=5)
10:27:01
<nagios-scl3> Fri 17:27:00 UTC [5936] [Unknown] hgweb15.dmz.scl3.mozilla.com:httpd max clients is WARNING: Using 26 out of 26 Clients (http://m.mozilla.org/httpd+max+clients)
10:28:04 Fri 17:28:03 UTC [5939] [devservices] hgweb13.dmz.scl3.mozilla.com:httpd max clients is WARNING: Using 53 out of 53 Clients (http://m.mozilla.org/httpd+max+clients)
10:30:02 Fri 17:30:01 UTC [5942] [devservices] hgweb13.dmz.scl3.mozilla.com:httpd max clients is OK: Using 10 out of 53 Clients (http://m.mozilla.org/httpd+max+clients)
10:31:01 Fri 17:31:00 UTC [5945] [Unknown] hgweb15.dmz.scl3.mozilla.com:httpd max clients is OK: Using 5 out of 26 Clients (http://m.mozilla.org/httpd+max+clients)
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Assignee | ||
Comment 3•7 years ago
|
||
I have re-opened the ticket, because we need to document it if the problem from try has been resolved or not.
We will keep on tracking it.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Comment 4•7 years ago
|
||
The operational issue causing the issue has been resolved (in bug 1472251).
The service was effectively under a DoS attack due to extremely high concurrent load against API endpoints that required sufficient CPU to process. I believe the last time this specific event happened, we were able to ride it out. But the hg.mo service is at lowered capacity right now because we're in the middle of a datacenter migration and some of our high-capacity servers are not available to service requests right now. Plus we're in the middle of a work day. I think the last time ActiveData did this was when the sun was over the Pacific Ocean, which is a period of relative tranquility for the servers.
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 5•7 years ago
|
||
We have received confirmation from the Sheriffs about the decision tasks. Everything looks to be fine, now
You need to log in
before you can comment on or make changes to this bug.
Description
•