Closed Bug 911095 Opened 12 years ago Closed 12 years ago

OrangeFactor not showing failures since 27-08-2013

Categories

(Tree Management Graveyard :: OrangeFactor, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: emorley, Unassigned)

Details

Guessing the logparser/pulse needs a kick? To what machine would I need to request access to be able to do this myself?
Flags: needinfo?(jgriffin)
As mentioned in IRC, you want access to orangefactor1.dmz.phx1.mozilla.com. The logparser is running and consuming messages from pulse. In the error log, I see this repeated over and over: 2013-08-30 08:37:02,329 - BuildLogMonitor - ERROR - Max retries exceeded for url: /logs/builds/_count Traceback (most recent call last): File "/home/webtools/apps/logparser/src/logparser/logparser/savelogs.py", line 60, in parse lp.parseFiles() File "/home/webtools/apps/logparser/src/logparser/logparser/logparser.py", line 106, in parseFiles self.postResultsToElasticSearch(testdata) File "/home/webtools/apps/logparser/src/logparser/logparser/logparser.py", line 196, in postResultsToElasticSearch self._post_testgroup_to_elasticsearch(data) File "/home/webtools/apps/logparser/src/logparser/logparser/logparser.py", line 183, in _post_testgroup_to_elasticsearch testgroup.submit() File "/home/webtools/apps/logparser/src/mozautolog/mozautolog/esautolog.py", line 80, in submit self._generate_testrun() File "/home/webtools/apps/logparser/src/mozautolog/mozautolog/esautolog.py", line 58, in _generate_testrun doc_type = [self.doc_type]) File "/home/webtools/apps/logparser/src/mozautoeslib/mozautoeslib/eslib.py", line 186, in query doc_types=self.doc_type) File "/home/webtools/apps/logparser/lib/python2.6/site-packages/pyes-0.15.0-py2.6.egg/pyes/es.py", line 793, in count return self._query_call("_count", query, indexes, doc_types, **query_params) File "/home/webtools/apps/logparser/lib/python2.6/site-packages/pyes-0.15.0-py2.6.egg/pyes/es.py", line 246, in _query_call response = self._send_request('GET', path, body, querystring_args) File "/home/webtools/apps/logparser/lib/python2.6/site-packages/pyes-0.15.0-py2.6.egg/pyes/es.py", line 205, in _send_request response = self.connection.execute(request) File "/home/webtools/apps/logparser/lib/python2.6/site-packages/pyes-0.15.0-py2.6.egg/pyes/connection_http.py", line 167, in _client_call return getattr(conn.client, attr)(*args, **kwargs) File "/home/webtools/apps/logparser/lib/python2.6/site-packages/pyes-0.15.0-py2.6.egg/pyes/connection_http.py", line 59, in execute response = self.client.urlopen(Method._VALUES_TO_NAMES[request.method], uri, body=request.body, headers=request.headers) File "/home/webtools/apps/logparser/lib/python2.6/site-packages/pyes-0.15.0-py2.6.egg/pyes/urllib3/connectionpool.py", line 294, in urlopen return self.urlopen(method, url, body, headers, retries-1, redirect) # Try again File "/home/webtools/apps/logparser/lib/python2.6/site-packages/pyes-0.15.0-py2.6.egg/pyes/urllib3/connectionpool.py", line 294, in urlopen return self.urlopen(method, url, body, headers, retries-1, redirect) # Try again File "/home/webtools/apps/logparser/lib/python2.6/site-packages/pyes-0.15.0-py2.6.egg/pyes/urllib3/connectionpool.py", line 294, in urlopen return self.urlopen(method, url, body, headers, retries-1, redirect) # Try again File "/home/webtools/apps/logparser/lib/python2.6/site-packages/pyes-0.15.0-py2.6.egg/pyes/urllib3/connectionpool.py", line 294, in urlopen return self.urlopen(method, url, body, headers, retries-1, redirect) # Try again File "/home/webtools/apps/logparser/lib/python2.6/site-packages/pyes-0.15.0-py2.6.egg/pyes/urllib3/connectionpool.py", line 255, in urlopen raise MaxRetryError("Max retries exceeded for url: %s" % url) MaxRetryError: Max retries exceeded for url: /logs/builds/_count I can access the ES cluster from this machine. Restarting the logparser didn't help. I am curious about that URL; that's not a full URL, but I'm not sure if it's only printing the path for some reason. I don't know what would have changed to cause an error like this, though.
Also since the logparser has been consuming messages, even though it's not outputting anything to ES, we'll have to use the back-fill script to go over past logs after we fix this.
The logparser is trying to write to both the dev and production instances of ES, but the dev instance is unreachable: [webtools@orangefactor1.dmz.phx1 bin]$ curl http://elasticsearch-zlb.webapp.scl3.mozilla.com:9200/ { "ok" : true, "status" : 200, "name" : "elasticsearch5_scl3", "version" : { "number" : "0.20.5", "snapshot_build" : false }, "tagline" : "You Know, for Search" }[webtools@orangefactor1.dmz.phx1 bin]$ curl http://elasticsearch-zlb.dev.vlan81.phx.mozlla.com:9200/ curl: (6) Couldn't resolve host 'elasticsearch-zlb.dev.vlan81.phx.mozilla.com' [webtools@orangefactor1.dmz.phx1 bin]$ I'm going to take the dev server out of the list for now.
Flags: needinfo?(jgriffin)
logparser is running again; I need to patch it to make it more resilient to ES failures like this.
Hmm, so the ES address of the dev server in the logparser config was wrong. was: elasticsearch-zlb.dev.vlan81.phx.mozilla.com should be: elasticsearch-zlb.dev.vlan81.phx1.mozilla.com (i.e., phx1 not phx) I corrected this, and it's happy again. But, since this was always wrong, I'm not actually sure it's the reason why OF isn't updating.
How do we go about backfilling the missing data? :-)
Flags: needinfo?(jgriffin)
We don't have a good way. We used to have a scraper that would scrape the FTP site and populate data based on logs it found, but that's completely bitrotted and would take some work to get running again. We'd also have to guard against writing duplicate data to ES, which the scraper is not currently smart enough to do. The scraper was removed in this changeset, for being completely obsolete: http://hg.mozilla.org/automation/logparser/rev/5556a69ce358
Flags: needinfo?(jgriffin)
That's unfortunate, but can't be helped. Thank you anyway!
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Product: Testing → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.