Closed
Bug 754067
Opened 13 years ago
Closed 12 years ago
automatically rotate aus4-dev/aus4-admin-dev application logs
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task, P3)
Infrastructure & Operations Graveyard
WebOps: Other
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: nmaul)
References
Details
(Whiteboard: [balrog][triaged 20120831][service interrupt])
I've been getting nothing but ISE 500 errors from aus4-admin-dev this week. The http logs showed that admin.wsgi was returning prematurely, but the application log was absolutely empty. I ended up running through the admin.wsgi code by hand in the python interpreter, and this happened:
[bhearsum@node267.seamicro.phx1 aus4-admin]$ python
Python 2.6.6 (r266:84292, Sep 12 2011, 14:03:02)
[GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from auslib.web.base import db, app as application
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "auslib/web/base.py", line 1, in <module>
from flask import Flask, request
File "/usr/lib/python2.6/site-packages/flask/__init__.py", line 17, in <module>
from werkzeug import abort, redirect
ImportError: No module named werkzeug
>>>
>>> import site
>>> site.addsitedir('vendor/lib/python')
>>> from auslib.web.base import db, app as application
Bus error (core dumped)
Interestingly, I can cat the file fine:
[bhearsum@node267.seamicro.phx1 aus4-admin]$ cat auslib/web/base.py
from flask import Flask, request
from auslib import version
from auslib.db import AUSDatabase
app = Flask(__name__)
db = AUSDatabase()
@app.errorhandler(500)
def isa(error):
log.error("Caught ISE 500 error.")
log.debug("Balrog version is: %s", version)
log.debug("Request path is: %s", request.path)
log.debug("Request environment is: %s", request.environ)
log.debug("Request headers are: %s", request.headers)
return error
# All of our View modules contain routing information that needs to be imported
# to be active.
from auslib.web.views import *
This seems like it might be an infra issue, but I want to look into a little bit more before throwing it over the fence.
Reporter | ||
Comment 1•13 years ago
|
||
I dug into this some more and found that the root module that's causing the bus error is ctypes. It doesn't look like that module was updated anytime recently, so I'm not sure what to do from here. IT, any ideas? You can reproduce with the following in the python shell:
'import ctypes'
If you add print statements to /usr/lib/python2.6/ctypes/__init__.py you can bisect a bit, but it's the final call to CFUNCTYPE (the last line of the file) that's causing the error.
This issue is completely breaking aus4-dev currently, because no new data can be submitted.
Assignee: bhearsum → server-ops
Severity: normal → critical
Component: Release Engineering: Automation (General) → Server Operations: Web Operations
QA Contact: catlee → cshields
Assignee | ||
Updated•13 years ago
|
Assignee: server-ops → nmaul
Assignee | ||
Comment 2•13 years ago
|
||
[root@node267.seamicro.phx1 ~]# python
Python 2.6.6 (r266:84292, Sep 12 2011, 14:03:02)
[GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> quit()
The problem may be that these systems are out of disk space. There is giant log file at /var/httpd/log/aus.log, on both systems. This isn't a standard Apache log file... I suspect maybe it's something the app itself creates. It looks like debugging info... starts with some tracebacks, then a lot of JSON data. Most lines start with 'DEBUG:auslib.web.views.releases:'
Any idea where this comes from, and what we can do with it?
Reporter | ||
Comment 3•13 years ago
|
||
Yeah, that's the application log. We can definitely shrink it some, but is there any way we can set-up rotation of it through syslogd or something? How is application log rotation/aging normally done?
Reporter | ||
Comment 4•13 years ago
|
||
I just deleted aus.log, because we don't care too much about the history at this point, to get things unwedged.
Severity: critical → major
Reporter | ||
Comment 5•13 years ago
|
||
According to Jake on IRC, we should be able to use logrotate to rotate/age these logs. We'll need to do this for both apps: one logs to /var/log/aus.log, the other to /var/log/httpd/aus4.mozilla.org/aus.log. I filed bug 754383 on greatly reducing the amount of JSON blob we print to these logs, which should help avoid this situation in the future.
Summary: aus4-admin-dev dumping its core when trying to run admin.wsgi → automatically rotate aus4-dev/aus4-admin-dev application logs
Reporter | ||
Comment 6•13 years ago
|
||
Also, had to create the log files by hand, and chown them to apache to give the app write permission again.
14:34 < solarce> bhearsum|buildduty: jakem we should probably use puppet to make that file and set
permissions, but i'll defer to jakem
Assignee | ||
Updated•13 years ago
|
Severity: major → normal
Updated•13 years ago
|
Assignee: nmaul → bburton
Updated•13 years ago
|
Whiteboard: [balrog] → [balrog] [review 7/19]
Comment 8•13 years ago
|
||
Got caught up in some other things, so I fixed the issue that was reported in bug 774659 with
[logging]
;Where to put the application log. No rotation is done on this file.
-logfile=/var/log/httpd/aus4.mozilla.org/aus.log
+logfile=/var/log/httpd/ausadmin.log
But I still need to finish the log rotate and will look at this first thing tomorrow
Updated•13 years ago
|
Whiteboard: [balrog] [review 7/19] → [balrog] [finish 7/18]
Reporter | ||
Comment 9•13 years ago
|
||
Any update here? The admin application is still returning ISE 500 errors.
Severity: normal → major
Updated•13 years ago
|
Assignee: bburton → jcrowe
Updated•13 years ago
|
Assignee: jcrowe → bburton
Updated•13 years ago
|
Assignee: bburton → server-ops-webops
Severity: major → normal
Priority: -- → P4
Whiteboard: [balrog] [finish 7/18] → [balrog][triaged 20120831][waiting][webops]
Assignee | ||
Updated•13 years ago
|
Priority: P4 → P3
Whiteboard: [balrog][triaged 20120831][waiting][webops] → [balrog][triaged 20120831][service interrupt]
Reporter | ||
Comment 10•12 years ago
|
||
This bug has caused the aus4-dev environment to be broken for almost two months now, is there an ETA on having it fixed?
Assignee | ||
Comment 11•12 years ago
|
||
I think see the problem here... it's not the same one as this bug was initially opened for.
As a security measure against log file injection, the /var/log/httpd directory is no longer traversable by system accounts (like "nobody" or "apache"). This doesn't bother Apache, which opens its logfiles *before* dropping root privileges. It does affect this app though, because it can only write with the permissions of the workers running it... which can't reach /var/log/httpd.
I'm committing a change to the config to move this log file to /var/log/ausadmin.log. Once that goes out I think this should start to work again.
Assignee | ||
Comment 12•12 years ago
|
||
I believe this is resolved now. Please let us know if you're still getting errors, and how we can see replicate them.
Assignee: server-ops-webops → nmaul
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 13•12 years ago
|
||
Yup, we're past that error now. The admin application appears to be working 100% fine. The read-only application is hitting a different error now, which I filed in bug 791663.
Thanks for getting this resolved!
Updated•11 years ago
|
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•