Closed
Bug 797859
Opened 12 years ago
Closed 11 years ago
jenkins times out connecting to ES an awful lot
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: willkg, Assigned: bburton)
Details
(Whiteboard: [triaged 20121004])
Since September 25th or so, most of the sumo-master build test runs with Jenkins have failed with TimeoutErrors when trying to connect to whatever ES cluster Jenkins uses. https://ci.mozilla.org/job/sumo-master/ Most of the yellow dots on the left side are timeout errors. We've got the timeout set to 5 seconds. That should be plenty of time to establish a connection with ES. So we're thinking something is fishy with either Jenkins or with ES. Can someone take a look at the ES cluster that Jenkins is using and see if it's functioning and/or needs a restart?
Comment 1•12 years ago
|
||
IIRC :jason helped us troubleshoot some index problems on there once
Comment 2•12 years ago
|
||
-> major, this is disrupting our ability to deploy code.
Severity: normal → major
Assignee | ||
Comment 3•12 years ago
|
||
I am investigating
Assignee: server-ops-webops → bburton
Severity: major → normal
Status: NEW → ASSIGNED
Priority: -- → P1
Whiteboard: [triaged 20121004]
Reporter | ||
Comment 4•12 years ago
|
||
Adding Mike and Rehan to the cc: list.
Assignee | ||
Comment 5•12 years ago
|
||
Per email to webdev@ and webqa@ I want to restart ES in the morning as a first step and we need to upgrade ES as a long term fix
Assignee | ||
Updated•12 years ago
|
Group: infra
Assignee | ||
Updated•12 years ago
|
Whiteboard: [triaged 20121004] → [triaged 20121004][waiting][es restart]
Assignee | ||
Comment 6•12 years ago
|
||
ES was restarted at 10AM and looks happy after the restart, let me know how a new build goes [root@jenkins1.dmz.phx1 ~]# curl -v http://localhost:9200/_cluster/health?pretty=true * About to connect() to localhost port 9200 (#0) * Trying ::1... connected * Connected to localhost (::1) port 9200 (#0) > GET /_cluster/health?pretty=true HTTP/1.1 > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.13.1.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2 > Host: localhost:9200 > Accept: */* > < HTTP/1.1 200 OK < Content-Type: application/json; charset=UTF-8 < Content-Length: 271 < { "cluster_name" : "jenkins", "status" : "green", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 25, "active_shards" : 25, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0 * Connection #0 to host localhost left intact * Closing connection #0
Assignee | ||
Comment 7•11 years ago
|
||
There is now an ElasticSearch 0.20.x service available, just use the hostname 'jenkins-es20' in your tests.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Whiteboard: [triaged 20121004][waiting][es restart] → [triaged 20121004]
Updated•11 years ago
|
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•