Closed Bug 1333408 Opened 6 years ago Closed 6 years ago

[traceback] crontabber node: [Errno 28] No space left on device


(Socorro :: Infra, task)

Not set


(Not tracked)



(Reporter: willkg, Assigned: miles)


We're seeing errors on the crontabber node:

IOError: [Errno 28] No space left on device
  File "crontabber/", line 975, in _run_one
    for last_success in self._run_job(job_class, config, info):
  File "crontabber/", line 189, in main
  File "crontabber/", line 259, in _run_proxy
    return*args, **kwargs)
  File "socorro/cron/jobs/", line 27, in run
  File "socorro/external/es/", line 48, in delete_old_indices
    index_client = es_class.indices_client()
  File "socorro/external/es/", line 129, in indices_client
  File "socorro/external/es/", line 117, in connection
  File "elasticsearch/client/", line 110, in __init__
    self.transport = transport_class(_normalize_hosts(hosts), **kwargs)
  File "elasticsearch/client/", line 38, in _normalize_hosts
  File "logging/", line 1171, in warning
    self._log(WARNING, msg, args, **kwargs)
  File "logging/", line 1278, in _log
  File "logging/", line 1288, in handle
  File "logging/", line 1335, in callHandlers
    " \"%s\"\n" %

Looks like the disk is full.

This issue covers fixing the immediate problem (no space left) and looking into why it ran out of space. Does it need a logrotate thinger? Does it need spiritual guidance?
Sentry issue:

Looks like it might have been happening for a long while? Either that or I don't know how to read the graphs and charts and all that technical stuff.
It started last Friday, so this has been going on for about 4 days. 

I'm confused by the fact that it happens only during the elasticsearch-cleanup job.
We have a contrabber.log file in /var/log/socorro/ that takes 5.2G out of the 8G of the disk. I suppose that's a problem, but I'm not sure how to best solve it. I don't want to just delete that file. 

Deferring to Miles.
Assignee: nobody → miles
Flags: needinfo?(miles)
Note: this is on the production admin server.
Gah! Lost the bugmail associated with this.
I think deleting the current log, putting logrotate in place, and setting up Datadog alerts going forward is the way to handle this. A 5.2G log file makes no one happy.
Flags: needinfo?(miles)
I deleted the 5.2G log yesterday and put logrotate in place. I have put Datadog disk space alerts for all hosts tagged Environment={stage,prod}. The caveat is that not all of these hosts (including the prod admin node) have the Datadog Agent installed + the correct configuration, so some are not reporting. I am remedying this and will be making these changes using Ansible.
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.