Closed Bug 530318 Opened 11 years ago Closed 10 years ago

Put full logs on ftp.m.o, along with the build

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Assigned: catlee)

References

Details

(Whiteboard: [ftp][buildbot][logs][oldbug])

Attachments

(3 files)

When a build job finishes, we upload the build to ftp.m.o, but not the log files of the build. This bug is not changing the builds, but proposing to change what we do with those log files.

Currently we leave log files on the build slave, where it is not accessible to everyone, and will be overwritten/deleted as needed by later jobs. We also email a copy of the log file to Tinderbox server, which creates a clickable link to the log file in an encoded location.

I propose that we post the log files in a directory under the generated build on ftp.m.o. For example:
ftp://ftp.mozilla.org/pub/firefox/tinderbox-builds/mozilla-1.9.1-win32/1258736589/logs
...
ftp://ftp.mozilla.org/pub/firefox/nightly/2009-10-21-03-mozilla-central/logs
...
ftp://ftp.mozilla.org/pub/firefox/releases/3.5.5/logs


Benefits include:
* it would allow others to easily see the full build logs
* it would simplify work of creating newer dashboard, as we could access build logs from easy-to-determine location on ftp.m.o, without needing to maintain Tinderbox Server and figure out where Tinderbox Server put the log files.

Also, we should including some timestamp/changeset info in the logfile name to avoid collisions on re-runs. 

Finally, it feels like this would be useful to do for unittest and talos jobs also, but I would like to start with builds first and see what other people think.
Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Priority: -- → P3
For some definition of "simple". More accurately, this seems to be self-contained hence good for a new person learning their way around.
Whiteboard: [simple]
Assignee: nobody → raliiev
What's our current retention time for builds? I think tinderbox keeps logs for ~60 days, I'd hate to severely lower that. (That being said, it'd be easy enough to copy the logs from the build dir to somewhere with a longer retention time.)
The dep builds in tinderbox-builds/ are kept for 24 hours, nightlies last for years (if not forever). 60 days is my memory for tinderbox.m.o too, so we'll definitely need some sort of smarts to keep disk usage under control, but not expunge too early.
Could this be done as a buildbot status plugin, like the one bhearsum wrote to push to tinderbox?

http://github.com/djmitche/buildbot/blob/master/buildbot/status/tinderbox.py
(In reply to comment #5)
> Could this be done as a buildbot status plugin, like the one bhearsum wrote to
> push to tinderbox?
> 
> http://github.com/djmitche/buildbot/blob/master/buildbot/status/tinderbox.py

I'd prefer not to, for a few reasons:

- The buildbot masters are already severely overloaded.  The less we can do on the masters, the better!  If we could get away with not sending the log to the master at all, that would be great.

- All the ssh keys and scripts for uploading to the correct place are already on the build slaves.
rhelmer, catlee, thanks for your comments.

I've investigated a bit this issue and here are some thoughts.

== Slave side implementation ==

Pros:
* Log files can be easily uploaded by "make upload" step, so I don't need to know destination server/directory.

Cons: 
* I couldn't find any easy way to get log files on slaves. Slaves send their output slave.bot.SlaveBuilder.sendUpdate() to the master not saving a local copy.


== Master side implementation (status plugin) ==

Pros:
* Masters collect logs, so it will be easy archiving them into a tarball.

Cons:
* Destination server and directory should be set or parsed from pust_upload.py output.

Comments are welcome.
I agree with catlee -- this should be done on the slave.
I'm going to guess this will mean sending a buildbot (or twistd?) patch upstream to enable local logging.
So, we could do this without any buildbot support by doing adding the equivalent of

">> output.log 2>&1"

in all of our build steps.
Hm, yeah.
But I think it should be 'command 2>&1 | tee -a output.log' so we don't lose streaming output and timeout during build steps?
(In reply to comment #10)
> Hm, yeah.
> But I think it should be 'command 2>&1 | tee -a output.log' so we don't lose
> streaming output and timeout during build steps?

Looks ugly a bit. :(

Some points.

* Redirecting stderr to stdout sounds not good.

* It won't be easy to get the same log output as we have for Tinderbox: there are some additional parts, such as log headers (environment, pwd) and log footers (return value,  duration). So we need to wrap every command with something, handle args, pass them to OS shell...
Don't all the spawned commands inherit their file descriptors from the parent?  If we redirect 1 and 2 to the file(s) of our choosing, child processes should just work too (which is why tee works at all with shell scripts and the like).
(In reply to comment #12)
> Don't all the spawned commands inherit their file descriptors from the parent? 
> If we redirect 1 and 2 to the file(s) of our choosing, child processes should
> just work too (which is why tee works at all with shell scripts and the like).

Not in buildbot...buildbot creates a fresh set of stdin/stdout/stderr file descriptors for each command.  So we'd either have to wrap every command we run with a logging wrapper, or modify buildbot.

Also, as Rail points out, there is out of band data that buildbot logs that we'd like to capture as well (although I wouldn't block on that).  And, as Aki points out, we need to preserve the output going to stdout as well so that buildbot doesn't think the command is stuck and kills it.
Whiteboard: [simple]
Assignee: rail → nobody
Whiteboard: [ftp]
Assignee: nobody → catlee
Hardware: x86 → All
Whiteboard: [ftp] → [ftp][buildbot][logs]
One closely-related feature request here is bug 413727 -- allowing people to see the logs as they're in-progress (as show in the buildbot UI) rather than waiting for the build to finish before anyone can see any of the log.  It might be worth thinking about that when designing the solution here.

However, some mechanism for publishing the logs other than tinderbox is probably the long pole for getting rid of tinderbox (and having TBPL be on top of other things), although there will almost certainly be minor tweaks to various other things needed as well once people attempt to get TBPL running on top of the new stuff.

So given the importance of having *something*, it may well be worth ignoring my first paragraph. :-)
(In reply to comment #14)
> One closely-related feature request here is bug 413727 -- allowing people to
> see the logs as they're in-progress (as show in the buildbot UI) rather than
> waiting for the build to finish before anyone can see any of the log.  It might
> be worth thinking about that when designing the solution here.

We could send log data through pulse.
Incremental updates should, IMO, be done thru pulse as each step finishes.
(In reply to comment #16)
> Incremental updates should, IMO, be done thru pulse as each step finishes.

I don't think that would cover dbaron's use case. I'm pretty sure he'd like in-progress compile/test logs. I don't think developers generally care about the setup/teardown we do.
Duplicate of this bug: 587013
Several pieces to this:

bin/log_uploader.py is a script that gets called by the status plugin to parse the build pickle and logs and upload to the given server using post_upload.py.  All the logic to construct the post_upload.py command has been refactored into a function in buildbotcustom.misc.  I didn't touch the release builders yet, since they're not running in 0.8.0 at this point.

status/log_handlers.py implement two new status plugins: ThreadedLogHandler which uses a thread pool to do log processing in a new thread after a build completes, and SubprocessLogHandler (a subclass of ThreadedLogHandler) that spawns another command to do the work.  In this case each thread of the thread pool would be waiting for one of the subprocesses to complete.

The rest of the diff is changing post_upload.py command string construction to use the postUploadCmdPrefix.
Attachment #465844 - Flags: feedback?(nrthomas)
Attachment #465844 - Flags: feedback?(bhearsum)
The test masters need to know where to upload logs, and what keys, etc. to use to get there.
Comment on attachment 465844 [details] [diff] [review]
Upload logs after build completes

I had a good look through this and I'm happy with what I see. I'm REALLY happy about the post upload command stuff finally getting factored out somewhere.
Attachment #465844 - Flags: feedback?(bhearsum) → feedback+
Is there going to be enough information in the data in the JSON in http://build.mozilla.org/builds/builds-4hr.js (and the analogous dated files) so that the JSON has enough information to build the log URL for each build in it?
(In reply to comment #22)
> Is there going to be enough information in the data in the JSON in
> http://build.mozilla.org/builds/builds-4hr.js (and the analogous dated files)
> so that the JSON has enough information to build the log URL for each build in
> it?

Not at first, but that's the next step.
Comment on attachment 465844 [details] [diff] [review]
Upload logs after build completes

Good enough for r+ then?

The same patch still applies with a bit of fuzz.
Attachment #465844 - Flags: feedback?(nrthomas) → review?(bhearsum)
Attachment #465846 - Flags: review?(bhearsum)
Comment on attachment 465844 [details] [diff] [review]
Upload logs after build completes

Yup, r+
Attachment #465844 - Flags: review?(bhearsum) → review+
Attachment #465846 - Flags: review?(bhearsum) → review+
Blocks: 590208
Comment on attachment 465844 [details] [diff] [review]
Upload logs after build completes

changeset:   902:2b679cc284d2
Attachment #465844 - Flags: checked-in+
Comment on attachment 465846 [details] [diff] [review]
Configs to go with

changeset:   2901:e37f3c9a15c7
Attachment #465846 - Flags: checked-in+
Depends on: 590939
Whiteboard: [ftp][buildbot][logs] → [ftp][buildbot][logs][oldbug]
Depends on: 591006
Depends on: 590940
Blocks: 511749
Depends on: 591168
Depends on: 591348
Attachment #471120 - Flags: review?(bhearsum)
Attachment #471120 - Flags: review?(bhearsum) → review+
Going to call this done.  Still some edge cases (i.e. mobile).
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.