Closed Bug 1193420 Opened 9 years ago Closed 9 years ago

gunicorn/django exception logging for production's API isn't very helpful

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: emorley, Unassigned)

References

Details

I was trying to work out the reason behind the HTTP 500 seen during these log parser tasks, when they try to post to our API: https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/4f4e53-bd32f686-4046-11e5-97ea-b82a72d22a14 Since there's nothing to go on, on the celery task side, I tried looking at the gunicorn logs to see if Django reported an exception. However /var/log/gunicorn/treeherder_error.log was empty, and the older /var/log/gunicorn/treeherder_error.log-20150810 only contained this line: [2015-08-09 20:26:55 +0000] [1446] [CRITICAL] WORKER TIMEOUT (pid:14313) The actual exceptions are surfacing in: /var/log/run_gunicorn-supervisor.log ...however since they are not interleaved with the HTTP requests, it's hard to figure out which request corresponds to this exception - without manually sifting through timestamps: [2015-08-11 09:43:23,885] ERROR [treeherder.webapp.api.exceptions:29] No dataset found for project [2015-08-11 09:43:23,885] ERROR [treeherder.webapp.api.exceptions:30] Traceback (most recent call last): File "/data/www/treeherder.mozilla.org/venv/lib/python2.7/site-packages/rest_framework/views.py", line 453, in dispatch response = handler(request, *args, **kwargs) File "/data/www/treeherder.mozilla.org/treeherder-service/treeherder/webapp/api/utils.py", line 109, in use_jobs_model return model_func(*args, jm=jm, **kwargs) File "/data/www/treeherder.mozilla.org/treeherder-service/treeherder/webapp/api/performance_data.py", line 87, in get_signature_properties data = jm.get_signature_properties(signatures) File "/data/www/treeherder.mozilla.org/treeherder-service/treeherder/model/derived/jobs.py", line 1849, in get_signature_properties replace=signatures_repl) File "/data/www/treeherder.mozilla.org/treeherder-service/treeherder/model/derived/jobs.py", line 147, in execute return utils.retry_execute(self.get_dhub(), logger, **kwargs) File "/data/www/treeherder.mozilla.org/treeherder-service/treeherder/model/derived/base.py", line 72, in get_dhub datasource = self.get_datasource() File "/data/www/treeherder.mozilla.org/treeherder-service/treeherder/model/derived/base.py", line 81, in get_datasource self.source = self._get_datasource() File "/data/www/treeherder.mozilla.org/treeherder-service/treeherder/model/derived/base.py", line 154, in _get_datasource raise DatasetNotFoundError(self.project) DatasetNotFoundError: No dataset found for project
Between this and New Relic not showing these exceptions (bug 1191080) we have close to nothing to go on for these :-(
See Also: → 1191080
Summary: gunicorn/django exception logging on production isn't very helpful → gunicorn/django exception logging for production's API isn't very helpful
Prod's supervisord config: [emorley@treeherder1.webapp.scl3 ~]$ cat /etc/supervisord.conf.d/run_gunicorn.conf [program:run_gunicorn] command=/data/www/treeherder.mozilla.org/treeherder-service/bin/run_gunicorn directory=/data/www/treeherder.mozilla.org user=treeherder numprocs=1 stdout_logfile=/var/log/run_gunicorn-supervisor.log stderr_logfile=/var/log/run_gunicorn-supervisor.log autostart=true autorestart=true startsecs=10 environment= ; Need to wait for currently executing tasks to finish at shutdown. ; Increase this if you have very long running tasks. stopwaitsecs = 60 priority = 999 The Django logging options we're using: https://github.com/mozilla/treeherder/blob/1c00ccfcc74275f3ff7366c6150614bb065a8dc2/treeherder/settings/base.py#L149-L174
Priority: -- → P3
See Also: 11910801201086
Wontfix since moving to Heroku.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.