Closed
Bug 1333408
Opened 7 years ago
Closed 7 years ago
[traceback] crontabber node: [Errno 28] No space left on device
Categories
(Socorro :: Infra, task)
Socorro
Infra
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: willkg, Assigned: miles)
Details
We're seeing errors on the crontabber node: """ IOError: [Errno 28] No space left on device File "crontabber/app.py", line 975, in _run_one for last_success in self._run_job(job_class, config, info): File "crontabber/base.py", line 189, in main function() File "crontabber/base.py", line 259, in _run_proxy return self.run(*args, **kwargs) File "socorro/cron/jobs/elasticsearch_cleanup.py", line 27, in run cleaner.delete_old_indices() File "socorro/external/es/index_cleaner.py", line 48, in delete_old_indices index_client = es_class.indices_client() File "socorro/external/es/connection_context.py", line 129, in indices_client elasticsearch.client.IndicesClient(self.connection()) File "socorro/external/es/connection_context.py", line 117, in connection elasticsearch.connection.RequestsHttpConnection File "elasticsearch/client/__init__.py", line 110, in __init__ self.transport = transport_class(_normalize_hosts(hosts), **kwargs) File "elasticsearch/client/__init__.py", line 38, in _normalize_hosts host File "logging/__init__.py", line 1171, in warning self._log(WARNING, msg, args, **kwargs) File "logging/__init__.py", line 1278, in _log self.handle(record) File "logging/__init__.py", line 1288, in handle self.callHandlers(record) File "logging/__init__.py", line 1335, in callHandlers " \"%s\"\n" % self.name) """ Looks like the disk is full. This issue covers fixing the immediate problem (no space left) and looking into why it ran out of space. Does it need a logrotate thinger? Does it need spiritual guidance?
Reporter | ||
Comment 1•7 years ago
|
||
Sentry issue: https://sentry.prod.mozaws.net/operations/socorro-prod/issues/378838/ Looks like it might have been happening for a long while? Either that or I don't know how to read the graphs and charts and all that technical stuff.
Comment 2•7 years ago
|
||
It started last Friday, so this has been going on for about 4 days. I'm confused by the fact that it happens only during the elasticsearch-cleanup job.
Comment 3•7 years ago
|
||
We have a contrabber.log file in /var/log/socorro/ that takes 5.2G out of the 8G of the disk. I suppose that's a problem, but I'm not sure how to best solve it. I don't want to just delete that file. Deferring to Miles.
Assignee: nobody → miles
Flags: needinfo?(miles)
Comment 4•7 years ago
|
||
Note: this is on the production admin server.
Assignee | ||
Comment 5•7 years ago
|
||
Gah! Lost the bugmail associated with this. I think deleting the current log, putting logrotate in place, and setting up Datadog alerts going forward is the way to handle this. A 5.2G log file makes no one happy.
Flags: needinfo?(miles)
Assignee | ||
Comment 6•7 years ago
|
||
I deleted the 5.2G log yesterday and put logrotate in place. I have put Datadog disk space alerts for all hosts tagged Environment={stage,prod}. The caveat is that not all of these hosts (including the prod admin node) have the Datadog Agent installed + the correct configuration, so some are not reporting. I am remedying this and will be making these changes using Ansible.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•