Closed Bug 1156329 Opened 9 years ago Closed 9 years ago

Public Bugzilla Elasticsearch Cluster is Stale (has ETL has stopped?)

Categories

(bugzilla.mozilla.org :: Infrastructure, defect)

Production
x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: ekyle, Assigned: fubar)

References

(Blocks 1 open bug)

Details

The public cluster has stale date.  The ETL may have stopped!
Blocks: 1156194
hung process from the 12th. killed it and will keep an eye on the next few cron runs.
Assignee: nobody → klibby
still happy. looking back at the logs...

the 2015-04-13 03:00 run is where things first get weird. looks like it does a normal run, but at 03:01 we start getting 'Waiting on thread "etl" notices which continue until 04:27 when we get:

2015-04-13 04:27:19.344993 - WARNING: problem logging to es
        at File /data/www/Bugzilla-ETL/bzETL/util/env/log_usingElasticSearch.py, line 85, in t
ime_delta_pusher
        at File /data/www/Bugzilla-ETL/bzETL/util/thread/threads.py, line 252, in _run

caused by
        ERROR: problem
        at File /data/www/Bugzilla-ETL/bzETL/util/env/elasticsearch.py, line 259, in extend
        at File /data/www/Bugzilla-ETL/bzETL/util/env/log_usingElasticSearch.py, line 82, in time_delta_pusher
        at File /data/www/Bugzilla-ETL/bzETL/util/thread/threads.py, line 252, in _run

caused by
        ERROR: Problem with call to http://elasticsearch4.bugs.scl3.mozilla.com:9200/debug/public_etl/_bulk
{"index":{"_id": "B72B955C2820C8475DDA94347E7D5D521D52A2AF"}}
{"timestamp": 1428924439000, "params":
        at File /data/www/Bugzilla-ETL/bzETL/util/env/elasticsearch.py, line 339, in post
        at File /data/www/Bugzilla-ETL/bzETL/util/env/elasticsearch.py, line 243, in extend
        at File /data/www/Bugzilla-ETL/bzETL/util/env/log_usingElasticSearch.py, line 82, in time_delta_pusher
        at File /data/www/Bugzilla-ETL/bzETL/util/thread/threads.py, line 252, in _run

caused by
        ERROR: HTTPConnectionPool(host='elasticsearch4.bugs.scl3.mozilla.com', port=9200): Max retries exceeded with url: /debug/public_etl/_bulk (Caused by <class '_socket.gaierror'>: [Errno -3] Temporary failure in name resolution)
        at File /usr/lib64/pypy-2.2.1/site-packages/requests/adapters.py, line 382, in send
        at File /usr/lib64/pypy-2.2.1/site-packages/requests/sessions.py, line 486, in send
        at File /usr/lib64/pypy-2.2.1/site-packages/requests/sessions.py, line 383, in request
        at File /usr/lib64/pypy-2.2.1/site-packages/requests/api.py, line 44, in request
        at File /usr/lib64/pypy-2.2.1/site-packages/requests/api.py, line 88, in post
        at File /data/www/Bugzilla-ETL/bzETL/util/env/elasticsearch.py, line 321, in post


given the time span, and the number of 'Temporary failure in name resolution' errors (5200+), I think that's a lie. There's nothing useful in the elasticsearch logs, but I've come to accept that as normal for ES.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
No longer blocks: 1156194
You need to log in before you can comment on or make changes to this bug.