754067 - automatically rotate aus4-dev/aus4-admin-dev application logs

Reporter

Description

•

13 years ago

I've been getting nothing but ISE 500 errors from aus4-admin-dev this week. The http logs showed that admin.wsgi was returning prematurely, but the application log was absolutely empty. I ended up running through the admin.wsgi code by hand in the python interpreter, and this happened: [bhearsum@node267.seamicro.phx1 aus4-admin]$ python Python 2.6.6 (r266:84292, Sep 12 2011, 14:03:02) [GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from auslib.web.base import db, app as application Traceback (most recent call last): File "<stdin>", line 1, in <module> File "auslib/web/base.py", line 1, in <module> from flask import Flask, request File "/usr/lib/python2.6/site-packages/flask/__init__.py", line 17, in <module> from werkzeug import abort, redirect ImportError: No module named werkzeug >>> >>> import site >>> site.addsitedir('vendor/lib/python') >>> from auslib.web.base import db, app as application Bus error (core dumped) Interestingly, I can cat the file fine: [bhearsum@node267.seamicro.phx1 aus4-admin]$ cat auslib/web/base.py from flask import Flask, request from auslib import version from auslib.db import AUSDatabase app = Flask(__name__) db = AUSDatabase() @app.errorhandler(500) def isa(error): log.error("Caught ISE 500 error.") log.debug("Balrog version is: %s", version) log.debug("Request path is: %s", request.path) log.debug("Request environment is: %s", request.environ) log.debug("Request headers are: %s", request.headers) return error # All of our View modules contain routing information that needs to be imported # to be active. from auslib.web.views import * This seems like it might be an infra issue, but I want to look into a little bit more before throwing it over the fence.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 1

•

13 years ago

I dug into this some more and found that the root module that's causing the bus error is ctypes. It doesn't look like that module was updated anytime recently, so I'm not sure what to do from here. IT, any ideas? You can reproduce with the following in the python shell: 'import ctypes' If you add print statements to /usr/lib/python2.6/ctypes/__init__.py you can bisect a bit, but it's the final call to CFUNCTYPE (the last line of the file) that's causing the error. This issue is completely breaking aus4-dev currently, because no new data can be submitted.

Assignee: bhearsum → server-ops

Severity: normal → critical

Component: Release Engineering: Automation (General) → Server Operations: Web Operations

QA Contact: catlee → cshields

Jake Maul [:jakem]

Assignee

Updated

•

13 years ago

Assignee: server-ops → nmaul

Jake Maul [:jakem]

Assignee

Comment 2

•

13 years ago

[root@node267.seamicro.phx1 ~]# python Python 2.6.6 (r266:84292, Sep 12 2011, 14:03:02) [GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import ctypes >>> quit() The problem may be that these systems are out of disk space. There is giant log file at /var/httpd/log/aus.log, on both systems. This isn't a standard Apache log file... I suspect maybe it's something the app itself creates. It looks like debugging info... starts with some tracebacks, then a lot of JSON data. Most lines start with 'DEBUG:auslib.web.views.releases:' Any idea where this comes from, and what we can do with it?

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 3

•

13 years ago

Yeah, that's the application log. We can definitely shrink it some, but is there any way we can set-up rotation of it through syslogd or something? How is application log rotation/aging normally done?

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 4

•

13 years ago

I just deleted aus.log, because we don't care too much about the history at this point, to get things unwedged.

Severity: critical → major

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 5

•

13 years ago

According to Jake on IRC, we should be able to use logrotate to rotate/age these logs. We'll need to do this for both apps: one logs to /var/log/aus.log, the other to /var/log/httpd/aus4.mozilla.org/aus.log. I filed bug 754383 on greatly reducing the amount of JSON blob we print to these logs, which should help avoid this situation in the future.

Summary: aus4-admin-dev dumping its core when trying to run admin.wsgi → automatically rotate aus4-dev/aus4-admin-dev application logs

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 6

•

13 years ago

Also, had to create the log files by hand, and chown them to apache to give the app write permission again. 14:34 < solarce> bhearsum|buildduty: jakem we should probably use puppet to make that file and set permissions, but i'll defer to jakem

Jake Maul [:jakem]

Assignee

Updated

•

13 years ago

Severity: major → normal

Brandon Burton [:solarce]

Updated

•

13 years ago

Assignee: nmaul → bburton

Brandon Burton [:solarce]

Updated

•

13 years ago

Whiteboard: [balrog] → [balrog] [review 7/19]

Brandon Burton [:solarce]

Comment 8

•

13 years ago

Got caught up in some other things, so I fixed the issue that was reported in bug 774659 with [logging] ;Where to put the application log. No rotation is done on this file. -logfile=/var/log/httpd/aus4.mozilla.org/aus.log +logfile=/var/log/httpd/ausadmin.log But I still need to finish the log rotate and will look at this first thing tomorrow

Brandon Burton [:solarce]

Updated

•

13 years ago

Whiteboard: [balrog] [review 7/19] → [balrog] [finish 7/18]

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 9

•

13 years ago

Any update here? The admin application is still returning ISE 500 errors.

Severity: normal → major

Phong Tran [:phong]

Updated

•

13 years ago

Assignee: bburton → jcrowe

Jason Crowe [:jd]

Updated

•

13 years ago

Assignee: jcrowe → bburton

Brandon Burton [:solarce]

Updated

•

13 years ago

Assignee: bburton → server-ops-webops

Severity: major → normal

Priority: -- → P4

Whiteboard: [balrog] [finish 7/18] → [balrog][triaged 20120831][waiting][webops]

Jake Maul [:jakem]

Assignee

Updated

•

13 years ago

Priority: P4 → P3

Whiteboard: [balrog][triaged 20120831][waiting][webops] → [balrog][triaged 20120831][service interrupt]

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 10

•

12 years ago

This bug has caused the aus4-dev environment to be broken for almost two months now, is there an ETA on having it fixed?

Jake Maul [:jakem]

Assignee

Comment 11

•

12 years ago

I think see the problem here... it's not the same one as this bug was initially opened for. As a security measure against log file injection, the /var/log/httpd directory is no longer traversable by system accounts (like "nobody" or "apache"). This doesn't bother Apache, which opens its logfiles *before* dropping root privileges. It does affect this app though, because it can only write with the permissions of the workers running it... which can't reach /var/log/httpd. I'm committing a change to the config to move this log file to /var/log/ausadmin.log. Once that goes out I think this should start to work again.

Jake Maul [:jakem]

Assignee

Comment 12

•

12 years ago

I believe this is resolved now. Please let us know if you're still getting errors, and how we can see replicate them.

Assignee: server-ops-webops → nmaul

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 13

•

12 years ago

Yup, we're past that error now. The admin application appears to be working 100% fine. The read-only application is hitting a different error now, which I filed in bug 791663. Thanks for getting this resolved!

Nobody; OK to take it and work on it

Updated

•

11 years ago

Component: Server Operations: Web Operations → WebOps: Other

Product: mozilla.org → Infrastructure & Operations

BMO Automation

Updated

•

6 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

Bugzilla

automatically rotate aus4-dev/aus4-admin-dev application logs

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task, P3)

Tracking

(Not tracked)

People

(Reporter: bhearsum, Assigned: nmaul)

References

Details

(Whiteboard: [balrog][triaged 20120831][service interrupt])

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Updated

Updated

Comment 8

Updated

Comment 9

Updated

Updated

Updated

Updated

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Updated