Closed Bug 831350 Opened 13 years ago Closed 13 years ago

Log processing for zlb AMO, DMO, AUS broke at 2013-01-15 14:00 PST

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

x86
macOS
task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dre, Unassigned)

Details

The logs stopped being processed properly during that hour. I haven't managed to determine the details of the error yet. cshields mentioned that there were some config changes on Zeus that happened at that time. I checked metrics-logger1, and the logs appear fine so far. They are there, everyone has read privs, etc. I noticed that the size of the files dropped sharply (10MB) right at hour 14. I've been grepping through them trying to see if there was a format change, but I haven't seen it yet.
Okay, I found the problem. There is a new filename under /stats/logs that is causing the java library we use for file access to throw an error. Caused by: org.apache.commons.vfs.FileSystemException: Invalid descendent file name "proxy-china-static:80.access_2013-01-15-22.gz". As a stop-gap, could we please get that filename changed to not include a colon?
:jakem fixed the filename on the zeus nodes and then deleted the problematic files from metrics-logger1 (no one was looking at those files anyway, so losing a few more hours of them is NBD). Our ETL is now chugging along merrily picking up all the files it couldn't see.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.