Closed
Bug 1192021
Opened 9 years ago
Closed 9 years ago
Clean up logging for relengapi
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Assigned: Callek)
Details
Attachments
(2 files, 1 obsolete file)
* Make sure nothing's being missed * All Python logging from webheads and celery (bug 1185135) * Apache access logs * Structured logging actually transmitted to MozDef * http://mozdef.readthedocs.org/en/latest/usage.html#sending-logs-to-mozdef * need to figure out how this is currently routed (if at all)
Reporter | ||
Comment 1•9 years ago
|
||
access logs are included - https://papertrailapp.com/groups/1421834/events?q=api.pub.build.mozilla.org for example
Reporter | ||
Comment 2•9 years ago
|
||
I'm seeing a bunch of logging go to /var/lib/httpd/error_log -- traceback for a 500 ISE, and a log of "generating signed S3 GET URL for .." from archiver.
Reporter | ||
Comment 3•9 years ago
|
||
Ah, mod_wsgi adds its own handler to the root of the logging structure (output from logging_tree): <--"" Level NOTSET so inherits level NOTSET Handler Stream <mod_wsgi.Log object at 0x7f2616b91e70> Level INFO Formatter fmt='%(asctime)s [%(name)s] %(message)s' datefmt=None Handler <cloghandler.ConcurrentRotatingFileHandler object at 0x7f25f4867150> Level INFO Formatter fmt='%(asctime)s [%(name)s] %(message)s' datefmt=None Handler SysLog ('localhost', 514) facility=1 Level INFO Formatter fmt='%(asctime)s web1.stage.releng.webapp.scl3.mozilla.com relengapi: %(message)s' datefmt='%b %d %H:%M:%S' which is, I assume, what's putting those error messages in /var/log/httpd/error.log. They should *also* be in /var/log/relengapi/relengapi.log. Indeed, I can confirm that tracebacks for 500 errors and "generating signed S3 GET URL" are present there. However, the tracebacks themselves are missing in papertrail; perhaps because they're huge and on a single line.
Reporter | ||
Comment 4•9 years ago
|
||
Part of that was because bug 1185135 was only applied for celery, not WSGI. But even with that, the tracebacks in papertrail look like Sep 02 08:35:16 web1.stage.releng.webapp.scl3.mozilla.com relengapi: Exception on /tokenauth/tokens [POST]#012Traceback (most recent call last):#012 File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request#012 rv = self.dispatch_request()#012 File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request#012 return self.view_functions[rule.endpoint](**req.view_args)#012 File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/newrelic-2.46.0.37/newrelic/hooks/framework_flask.py", line 30, in _nr_wrapper_handler_#012 return wrapped(*args, **kwargs)#012 File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/relengapi/lib/api.py", line 107, in replacement#012 result = wrapped(*args, **kwargs)#012 File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/relengapi/blueprints/tokenauth/__init__.py", line 282, in issue_token#012 token = token_issuers[typ](body, requested_permissions)#012 File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/relengapi/blueprints/tokenauth/__init__.py", line 235, in issue_usr#012 session.commit()#012 File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 150, in do#012 return getattr(self.registry(), name)(*args, **kwargs)#012 File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 790, in commit#012 self.transaction.commit()#012 File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 392, in commit#012 self._prepare_impl()#012 File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 372, in _prepare_impl#012 self.session.flush()#012 File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2004, in flush#012 self._flush(objects)#012 File "/data/www/relengapi/virtualenv/lib/python2.7/site- ooglay!
Reporter | ||
Comment 5•9 years ago
|
||
OK, straightening out those exceptions doesn't seem easy. http://stackoverflow.com/questions/32359144/log-each-line-of-an-exception-separately I added the following to relengapi.wsgi and celery: # monkey-patch Logger._log to log each exception line separately, so that they appear nicely in # the syslog (rather than concatenated in a single message) old_log = logging.Logger._log def new_log(self, level, msg, args, exc_info=None, extra=None): old_log(self, level, msg, args, exc_info=None, extra=extra) if exc_info: if not isinstance(exc_info, tuple): exc_info = sys.exc_info() exc_text = logging._defaultFormatter.formatException(exc_info) for line in exc_text.split("\n"): old_log(self, level, line, ()) logging.Logger._log = new_log and that seems to have worked.
Reporter | ||
Comment 6•9 years ago
|
||
OK, I *think* everything's going successfully to papertrail now!
Reporter | ||
Comment 7•9 years ago
|
||
Attachment #8656219 -
Flags: review?(bugspam.Callek)
Reporter | ||
Updated•9 years ago
|
Attachment #8656219 -
Flags: review?(bugspam.Callek) → review+
Reporter | ||
Comment 8•9 years ago
|
||
This is all committed, but not landed yet. That will happen along with the treestatus landing.
Reporter | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 9•9 years ago
|
||
EXCEPT this is causing celery to not start in staging ... Traceback (most recent call last): File "/data/www/relengapi/celery", line 14, in <module> old_log = logging.Logger._log NameError: name 'logging' is not defined And if I add import logging above the code block... Traceback (most recent call last): File "/data/www/relengapi/celery", line 83, in <module> load_entry_point('celery', 'console_scripts', 'celery')() File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/__main__.py", line 30, in main main() File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/celery.py", line 81, in main cmd.execute_from_commandline(argv) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/celery.py", line 769, in execute_from_commandline super(CeleryCommand, self).execute_from_commandline(argv))) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/base.py", line 307, in execute_from_commandline return self.handle_argv(self.prog_name, argv[1:]) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/celery.py", line 761, in handle_argv return self.execute(command, argv) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/celery.py", line 693, in execute ).run_from_argv(self.prog_name, argv[1:], command=argv[0]) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/worker.py", line 179, in run_from_argv return self(*args, **options) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/base.py", line 270, in __call__ ret = self.run(*args, **kwargs) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/worker.py", line 212, in run state_db=self.node_format(state_db, hostname), **kwargs File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/worker/__init__.py", line 100, in __init__ self.setup_instance(**self.prepare_args(**kwargs)) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/worker/__init__.py", line 141, in setup_instance self.blueprint.apply(self, **kwargs) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bootsteps.py", line 209, in apply self._debug('Preparing bootsteps.') File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bootsteps.py", line 272, in _debug return debug(_pre(self, msg), *args) File "/usr/lib/python2.7/logging/__init__.py", line 1128, in debug self._log(DEBUG, msg, args, **kwargs) File "/data/www/relengapi/celery", line 64, in new_log old_log(self, level, msg, args, exc_info=None, extra=extra) File "/data/www/relengapi/celery", line 18, in new_log old_log(self, level, msg, args, exc_info=None, extra=extra) TypeError: 'NoneType' object is not callable I'm not going to block relengapi's deploy on this, since its a deployment bug not a repo-bug (but I think we have other issues to address)...
Status: RESOLVED → REOPENED
Flags: needinfo?(dustin)
Resolution: FIXED → ---
Assignee | ||
Comment 10•9 years ago
|
||
Apparantly this does block prod since prod's and stagings celery script is identical... and simply commenting out that code doesn't work.
Assignee | ||
Comment 11•9 years ago
|
||
And... if I comment out the code block: Traceback (most recent call last): File "/data/www/relengapi/celery", line 81, in <module> load_entry_point('celery', 'console_scripts', 'celery')() File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/__main__.py", line 30, in main main() File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/celery.py", line 81, in main cmd.execute_from_commandline(argv) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/celery.py", line 769, in execute_from_commandline super(CeleryCommand, self).execute_from_commandline(argv))) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/base.py", line 307, in execute_from_commandline return self.handle_argv(self.prog_name, argv[1:]) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/celery.py", line 761, in handle_argv return self.execute(command, argv) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/celery.py", line 693, in execute ).run_from_argv(self.prog_name, argv[1:], command=argv[0]) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/worker.py", line 179, in run_from_argv return self(*args, **options) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/base.py", line 270, in __call__ ret = self.run(*args, **kwargs) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bin/worker.py", line 212, in run state_db=self.node_format(state_db, hostname), **kwargs File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/worker/__init__.py", line 100, in __init__ self.setup_instance(**self.prepare_args(**kwargs)) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/worker/__init__.py", line 141, in setup_instance self.blueprint.apply(self, **kwargs) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bootsteps.py", line 209, in apply self._debug('Preparing bootsteps.') File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/celery/bootsteps.py", line 272, in _debug return debug(_pre(self, msg), *args) File "/usr/lib/python2.7/logging/__init__.py", line 1128, in debug self._log(DEBUG, msg, args, **kwargs) File "/data/www/relengapi/celery", line 62, in new_log old_log(self, level, msg, args, exc_info=None, extra=extra) File "/usr/lib/python2.7/logging/__init__.py", line 1249, in _log fn, lno, func = self.findCaller() File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/structlog/stdlib.py", line 32, in findCaller f, name = _find_first_app_frame_and_name(['logging']) File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/structlog/_frames.py", line 41, in _find_first_app_frame_and_name while any(name.startswith(i) for i in ignores): File "/data/www/relengapi/virtualenv/lib/python2.7/site-packages/structlog/_frames.py", line 41, in <genexpr> while any(name.startswith(i) for i in ignores): AttributeError: 'NoneType' object has no attribute 'startswith'
Assignee | ||
Comment 12•9 years ago
|
||
... which looks like https://github.com/hynek/structlog/commit/655551f0da3dac53849684d893af3aa71f7b308b which is "interesting" since pip list says its structlog 15.2.0
Assignee | ||
Comment 13•9 years ago
|
||
Looking into c#11 On closer look (with debugging) it looks like frame.__name__ exists, but is explicitly set to None. So in this case, we don't hit the fallback of "?", and is a structlog issue. I created https://github.com/hynek/structlog/pull/58 for that.
Reporter | ||
Comment 14•9 years ago
|
||
The underlying problem here is that Celery helpfully deletes all symbols from __main__ when it starts up. It's like celery *tries* to be difficult to use :( That's why logging wasn't defined, and why old_log becomes None. And that's why I have all of those imports in the `setup` function -- that provides a function-level scope which Celery can't mess with, no matter how hard it tries. I must have botched a merge or something and ended up monkey-patching Logger.log twice -- once at the module level and once in `setup`. Just deleting the module-level version (as you've done by commenting it out) seems to help. I think that frame issue is also related to Celery trying to "clean house". But it may also be due to the monkey-patching (findCaller is always going to find `new_log` rather than its caller)
Flags: needinfo?(dustin)
Assignee | ||
Comment 15•9 years ago
|
||
Since structlog won't be released for at least a week, here is a monkeypatch.
Attachment #8661229 -
Flags: review?(dustin)
Assignee | ||
Comment 16•9 years ago
|
||
Comment on attachment 8661229 [details] [diff] [review] Monkeypatch Structlog Review of attachment 8661229 [details] [diff] [review]: ----------------------------------------------------------------- Err this is still failing, even after trying to monkeypatch
Attachment #8661229 -
Flags: review?(dustin) → review-
Assignee | ||
Comment 17•9 years ago
|
||
This was tested on staging, and works.
Attachment #8661229 -
Attachment is obsolete: true
Attachment #8661243 -
Flags: review?(dustin)
Reporter | ||
Updated•9 years ago
|
Attachment #8661243 -
Flags: review?(dustin) → review+
Reporter | ||
Updated•9 years ago
|
Assignee: dustin → bugspam.Callek
Assignee | ||
Comment 18•9 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #17) > Created attachment 8661243 [details] [diff] [review] > Monkeypatch Structlog take 2 > > This was tested on staging, and works. and this was subsequently deployed to prod. We're good now.
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•