Closed Bug 791406 Opened 12 years ago Closed 4 years ago

Mozharness scripts can lose output

Categories

(Release Engineering :: Applications: MozharnessCore, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: jgriffin, Unassigned)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2582] [mozharness])

It appears that when mozharness scripts hang and get killed by buildbot, we can lose output in the log, probably due to stdout buffering.

This appears to be happening with the mozharness Marionette scripts on win32 debug.  The logs all end at random points, making it hard to see what's going on.  See e.g., https://tbpl.mozilla.org/php/getParsedLog.php?id=15230913&tree=Try&full=1

However, running the script locally (sans buildbot) you can see that the tests end but Firefox isn't getting shut down correctly by the Marionette test runner.

I wonder if http://hg.mozilla.org/build/mozharness/file/87c0d9079e81/mozharness/base/log.py#l50 should be changed to write all log statements to stderr, instead of some going to stdout, to avoid stdout buffering?
I've tried various solutions a la http://stackoverflow.com/questions/107705/python-output-buffering, though I've always come out of it feeling like maybe the buffering wasn't a problem to begin with.  I think twistd also caches until it has a certain amount of output as well.

stderr is certainly a solution I haven't thought of, but I tend to dislike other scripts that write non-errors to stderr, so my preference would be a different solution.  If it's the best of a bad lot, we may need to go with it, though.
One solution is to make sure that the marionette/mozharness timeout is lower than the buildbot timeout.  If we notice the hung test and try to kill it and start outputting info about that, those buffers should clear.
Priority: -- → P3
Whiteboard: [mozharness]
Catlee thinks this should solve the buffering:
http://hg.mozilla.org/build/mozharness/rev/3d9eaa953734#l3.12
This is noticeable in logs such as: https://tbpl.mozilla.org/php/getParsedLog.php?id=19055414&tree=Firefox&full=1

Where we're running "make upload" and getting no output from it at all, because buildbot times it out before it completes.
Seems like we can call ScriptFactory with an interpreter of [python, '-u']
http://hg.mozilla.org/build/buildbotcustom/file/cfd8764019cc/process/factory.py#l6493
Product: mozilla.org → Release Engineering
Component: General Automation → Mozharness
Whiteboard: [mozharness] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2582] [mozharness]

I'm not sure I've heard much about this since we shifted to taskcluster. Closing for now.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.