Closed Bug 1002626 Opened 10 years ago Closed 9 years ago

clean up Zeus logging

Categories

(Infrastructure & Operations :: IT-Managed Tools, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nmaul, Assigned: ericz)

References

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/161] )

1) Only send logs to BI/DW that they need (right now we send them everything). :aphadke, I know I've asked for this before, but I can't find it... could you comment here with a list of the files or sites that you care about? 2) gzip all log files, not just ones with "access" in the name. 3) Only gzip if files are not in use ("fuser -s $filename" works). This helps with files that are rotated less than hourly, which is often desirable on smaller sites. 4) Archive log files longer than an hour? Would be helpful for troubleshooting. Bunker was supposed to do this, but no idea when we can start relying on it. Even just an NFS mount that we can store files on for a few days before deletion. 5) Fix any logs going to /usr/local/zeus/log, especially if they're unrotated. 6) Consider syslog for log data? 7) Improve state such that logging is easier to "just do it". Simpler pathing (symlink the default directory to where we want it?), uniform naming, uniform logfile format, obviousness around which logs are BI/DW and should not be changed and which are not and can be.
Flags: needinfo?(aphadke)
jakem: 1. Logs we collect: my @wwwLocations = ("fhr.data.mozilla.com", "affiliates.mozilla.org", "releases.mozilla.com","pfs.mozilla.org", "data.mozilla.com", "marketplace.mozilla.org","addons.mozilla.org","services.addons.mozilla.org","static.addons.mozilla.org","www.mozilla.com","support.mozilla.com", "versioncheck.addons.mozilla.org","download.mozilla.org", "snippets-stats.mozilla.org", "www.mozilla.org", "input.mozilla.org","videos-origin.mozilla.org","videos-cdn.mozilla.net", "ftp.mozilla.org","download-stats.mozilla.org","bugzilla.mozilla.org", "aus2.mozilla.org", "aus3.mozilla.org", "aus4.mozilla.org", "marketplace.firefox.com", "snippets.mozilla.com"); 2. k wrt gzipp'ing all log files, can you pastebin few files that are not prefixed with "access". Would rsync script catch the non access files? 3. k, works. 4. sure, as long as the EOD rsync script pushes data to NFS, any log rotation interval is fine. 5. k, as long as they go to NFS 6. prefer to use the current rsync method, it works really well, we can move to syslog if necessary, ideally would prefer to use the current method. 7. yes, totally. happy to sync via vidyo if that can speed up things.
Flags: needinfo?(aphadke)
If you are scheduling a meeting can you include me? I'm behind the game on the ZLBs, what gets logged where and what we want to include in MozDef, remove from arcsight, etc.
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/125]
I have committed a simple "logrotate" job that works on logs in the default directory (as well as the stock zeus error and audit logs). It will catch the case where someone turns on logging but changes nothing else... which means it's a nice first step towards simplifying this setup. This actually solves a few points from my list in comment 0: It helps with #7, moots #5, moots #3, gives a clear path to #4. From here, I think we should identify logs that don't need to go to BI/DW, and reset them to the default location/name. We can symlink this log location to /var/log/zeus/standard or something, and then put the BI/DW logs into /var/log/zeus/metrics ... and have only those ones rsync'd to BI/DW. This means we can be more particular about how logs are delivered to BI/DW... meaning we can more easily standardize on filenames, gzipping, etc, without regard to the bulk of vhosts. It means we can focus on the important ones, and let the global logrotate catch everything else.
I've completed moving almost everything relevant on the PHX1 external ZLB cluster to the new rotation scheme, so they will not be delivered to BI/DW. I paid special attention not to affect any logs mentioned in comment 1, so this should have no effect on any metrics processing. I do have a open question on one logfile, however: comment 1 mentions "www.mozilla.com". This is puzzling to me because this no longer has a site on it- it's purely a redirect to a page on www.mozilla.org. Its logs are also intermingled in the general logfile for the static cluster (which is clumsily named 'www.mozilla.com.access' and not 'static' or something). The vast majority of the logs in this file are for fxfeeds.mozilla.com... not www.mozilla.com. Out of 3.8M lines, a bit under 100k of them are for www.mozilla.com... all of which are 301 redirects. Can we do something with this logfile? Like... say... stop sending it to you so we can move and rename it? :)
Flags: needinfo?(aphadke)
jake - your wish is granted :-) Given we no longer serve www.mozilla.com, we can safely move and rename it. :-) -anurag
Flags: needinfo?(aphadke)
Thank you! That is done, and takes care of everything we send logs to you for in PHX1. Next up, SCL3 and HCI.
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/125] → [kanban:https://webops.kanbanize.com/ctrl_board/2/161]
Updated lists: Logs that are used: addons.mozilla.org aus2.mozilla.org aus3.mozilla.org aus4.mozilla.org blocklist.addons.mozilla.org download.mozilla.org download-stats.mozilla.org fhr.data.mozilla.com input.mozilla.org marketplace.firefox.com marketplace.mozilla.org releases.mozilla.com services.addons.mozilla.org snippets-stats.mozilla.org static.addons.mozilla.org www.mozilla.org versioncheck.addons.mozilla.org Logs that have no uses, can stop copying to metrics-logger1 (have not been processed from the files into hive since 4/27 and nobody has complained, and we specifically asked folks who have access if they used it, everyone said no) affiliates.mozilla.org bugzilla.mozilla.org data.mozilla.com (see https://etherpad.mozilla.org/weblogs, was used for telemetry) ftp.mozilla.org pfs.mozilla.org snippets.mozilla.com support.mozilla.com videos-cdn.mozilla.net videos-origin.mozilla.org www.mozilla.com See also bug 1161713 for empty logs to be removed...
Just to document, our change here has been to move these files out of /var/log/zeus, and use the default zeus filename/path... which is filled with variables, but ends up being /usr/local/zeus/log/<vservername>.log. From there, there's a traditional "logrotate" job that rotates them daily. Completed: support.mozilla.com (really .org, but the filename is .com) - moved snippets.mozilla.com - moved data.mozilla.com - moved I think these already done, can you verify?: affiliates.mozilla.org - this is on generic, which should be already not sending data to metrics-logger1. www.mozilla.com - this is on static, which should be already not sending data to metrics-logger1. videos-cdn.mozilla.net - this is on static, which should be already not sending data to metrics-logger1. videos-origin.mozilla.org - this is already set not to send to metrics-logger1. Concerns: bugzilla.mozilla.org - this is one of the ones that (IIRC) people cared about being on metrics-logger1. I think they were accessing them directly on that node via SSH. :fubar might know more. ftp.mozilla.org - another one I fear people care about in the same way. :fubar again might be a good contact, or :mcote. pfs.mozilla.org - I think this is on one of Cloud Services' PHX1 zlb's... don't see it on ours.
Flags: needinfo?(scabral)
@klibby: Do you know anything about the need for log data on bugzilla or ftp.m.o? ISTR there was a need at one point for these logs to go to metrics-logger1 so people could see them, but they're not going into hadoop these days, so... ? bugzilla.mozilla.org - this is one of the ones that (IIRC) people cared about being on metrics-logger1. I think they were accessing them directly on that node via SSH. :fubar might know more. ftp.mozilla.org - another one I fear people care about in the same way. :fubar again might be a good contact, or :mcote.
Flags: needinfo?(klibby)
BMO is (also) logging via syslog to syslog1.private, which is where the devs get access to it. I don't think we need it on metrics-logger1 (they never went into hadoop afaik, it was just the "normal" place to find something approaching centralized logs). I know nil about ftp.m.o. 302 :hwine or :nthomas, maybe?
Flags: needinfo?(nthomas)
Flags: needinfo?(klibby)
Flags: needinfo?(hwine)
/me defers to :nthomas
Flags: needinfo?(hwine)
I'm not aware of any dashboards that use the logs from ftp.m.o, so I don't mind from that point of view. But we may want to ask questions about ftp usage in the bye-netapp-hi-s3 project, so what is our log retention now and if we stop sending logs ?
Flags: needinfo?(nthomas)
Verified that these don't send to metrics-logger1: affiliates.mozilla.org www.mozilla.com videos-cdn.mozilla.net videos-origin.mozilla.org Verified that these stopped sending logs to metrics-logger1 2 days ago (2015-05-06): support.mozilla.com snippets.mozilla.com data.mozilla.com
Flags: needinfo?(scabral)
Jake, Since nobody's screamed yet about ftp.mozilla.org logs, can we stop copying them from Zeus to metrics-logger1 and see if anyone notices? (What's the retention on Zeus, just in case?)
Flags: needinfo?(nmaul)
FWIW the logs are from: ftp.mozilla.org/pub (/mnt/ftp_stage/archive.mozilla.org)
Hey Sheeri, So the retention on Zeus is going to be only for an hour. We (webops) are using these logs to collect data on ftp usage etc (for internal webops use) and using metrics-logger1 for storage. We will probably continue to do this until ftp moves away from SCL3 to S3 (which is slated for this quarter). Until then, we'll keep the copying going. Is that acceptable? Thanks!
Flags: needinfo?(nmaul)
QA Contact: nmaul → smani
Well, the point is, we believe nobody uses the logs. So we'd like to remove access to the logs first, before making them go away entirely. I'd rather not wait until we move the service, because I'd rather change as few things as possible in each iteration. If you point me in the proper direction, I can look at the script and comment out the line(s) of the configuration that copy the FTP logs. We'd also like to stop copying of pvtbuilds.mozilla.org. Nancy has ping'd everyone who can login to metrics-logger and asked if they use those logs, and nobody has stepped up.
Flags: needinfo?(smani)
Just to recap: if you're using FTP then keep it copying, but we'd like pvtbuilds.mozilla.org to stop being copied to metrics-logger.
Eric, Can you help here please? Thanks!
Assignee: nmaul → eziegenhorn
Flags: needinfo?(smani)
pvtbuilds.mozilla.org HTTPS logs (the only pvtbuilds logs going to /var/log/zeus) have been switched to the Zeus default location where jakem's setup should rotate them and they will NOT as per comment 20 be rsynced to metrics-logger1.
Verified pvtbuilds logs have stopped going to metrics-logger1 on 8/28. FTP logs we have decided not to change until PHX1 is evacuated.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.