Closed Bug 418009 Opened 17 years ago Closed 17 years ago

Need to do something about buffered output from Mochitest server

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Attachments

(2 files)

Patch 17 years ago Jeff Walden [:Waldo] 12.49 KB, patch	rcampbell : review+	Details \| Diff \| Splinter Review
Errors that pointed at the guilty code 17 years ago Jeff Walden [:Waldo] 6.08 KB, text/plain		Details

Jeff Walden [:Waldo]

Assignee

Description

•

17 years ago

Currently the output's buffered and never read. If I flip the DEBUG variable in the server to true, the server stops working, I think because the buffer of text is never flushed. As long as we don't dump *too* much (and we don't seem to in normal circumstances, including in debug builds) this doesn't matter, but I'd like this in case I ever need to enable server debugging on tinderboxen. Also, it seems good to get this on the record in case we ever cross the safely-writable threshold that would cause the server to fail before running all the mochitests.

Jeff Walden [:Waldo]

Assignee

Comment 1

•

17 years ago

Oh, this is only with runtests.py, not with runtests.pl -- Perl just dumps the server output to stdout instead of requiring manual processing.

Jeff Walden [:Waldo]

Assignee

Comment 2

•

17 years ago

Attached patch Patch — Details — Splinter Review

It's sort of sad the machinations (threads!) I have to do here to make this work, but it's not *super-horrible*. I think this might resolve the problem that has been hitting the PGO unittest box, where Mochitests time out midway through execution for no obvious reason, but I make no promises.

Attachment #313829 - Flags: review?(sayrer)

Jeff Walden [:Waldo]

Assignee

Comment 4

•

17 years ago

Comment on attachment 313829 [details] [diff] [review] Patch Let's have a race, shall we? I strongly suspect this is causing the initial foulup making bm-pgo not be able to be put on the main tinderbox, and we're going to hit this going forward as runtests.py is used elsewhere.

Attachment #313829 - Flags: review?(rcampbell)

Rob Campbell [:rc] (:robcee)

Comment 5

•

17 years ago

Comment on attachment 313829 [details] [diff] [review] Patch looks fine though it'd be nice if we didn't have to have so many conditions for our problem child win32.

Attachment #313829 - Flags: review?(rcampbell) → review+

Robert Sayre

Comment 6

•

17 years ago

Doesn't MozillaBuild ship with bsmedberg's process stuff? <http://svn.smedbergs.us/python-processes/trunk/>

Benjamin Smedberg

Comment 7

•

17 years ago

No it doesn't, but it ships with python 2.5 so dropping killableprocess in somewhere would be easy.

Jeff Walden [:Waldo]

Assignee

Comment 8

•

17 years ago

Landed on trunk. So far the Mac and Linux boxes are okay, but the Windows ones aren't; I had dolske look at logs, and I checked in the obvious fix for the problem (p.stdout only exists if you passed stdout=subprocess.PIPE to subprocess.Popen, didn't happen before because we were using a different method to get the output of the process in the logs :-\ ). Unfortunately, the buildbots don't seem to have recovered from this mistake, so I can now claim the dubious distinction of having broken lots of Windows tinderboxen that use automation.py.in. :-\ (On the plus side, I didn't turn the tree red!) That's bug 427723 if you're interested. Keeping this open until I see that the Windows boxes are okay with the followup checkin...Python's awesome and all, but its cross-platform story in its libraries absolutely is not.

Jeff Walden [:Waldo]

Assignee

Comment 9

•

17 years ago

Attached file Errors that pointed at the guilty code — Details

See the attachment for the errors that happened; from there it was pretty clear p.stdout wasn't iterable, and returning to docs pointed out that the value is only not None if stdout is specified to the Popen constructor. The stderr thing is something I noticed while referring to the docs at the same time, ensuring that both are directed to the same file, which is on par with non-Windows (popen2.Popen4 does that as its claim to fame). Index: build/pgo/automation.py.in =================================================================== RCS file: /cvsroot/mozilla/build/pgo/automation.py.in,v retrieving revision 1.6 retrieving revision 1.7 diff -p -u -4 -r1.6 -r1.7 --- build/pgo/automation.py.in 8 Apr 2008 05:18:45 -0000 1.6 +++ build/pgo/automation.py.in 8 Apr 2008 08:44:26 -0000 1.7 @@ -159,9 +159,11 @@ class Process: if IS_WIN32: import subprocess cmd = [command] cmd.extend(args) - p = subprocess.Popen(cmd, env = env) + p = subprocess.Popen(cmd, env = env, + stdout = subprocess.PIPE, + stderr = subprocess.STDOUT) self._out = p.stdout else: import popen2 cmd = []

Jeff Walden [:Waldo]

Assignee

Updated

•

17 years ago

Attachment #313829 - Flags: review?(sayrer)

Jeff Walden [:Waldo]

Assignee

Comment 10

•

17 years ago

fx-win32-tbox was spewing an exception backtrace after the bug 418009 changes (but only after profileserver.py did its stuff). I examined the logging code a bit since the backtrace implicated it, and it seems this is what happens when an exception is thrown during the process of logging something. Since it was happening after everything ran correctly, I suspected a problem within runApp, since that's the last thing that runs in profileserver.py. The last line there logs the return of runApp, which is the return of Process.wait(). On double-checking the semantics of the two different poll() functions in Process._run, I noticed that one returns None if the process is running, but the other returns -1, so I think I just messed this up in the Windows case, and the secondary loop after the process exits is catching the problems now. If we're a touch racy, the loop exits when output stops but before the process fully dies, and then calling poll() might return None and not a number, resulting in the error when we try to print "ERROR FAIL exited with code %d" % None. I think this might potentially be badness if the app crashes, with the right timing, and the ERROR FAIL wouldn't get printed, but I'm not 100% sure. Anyway, this is the patch I committed to fix the spewage and potential ERROR FAIL-swallowing. Index: build/pgo/automation.py.in =================================================================== RCS file: /cvsroot/mozilla/build/pgo/automation.py.in,v retrieving revision 1.7 retrieving revision 1.8 diff -p -u -4 -r1.7 -r1.8 --- build/pgo/automation.py.in 8 Apr 2008 08:44:26 -0000 1.7 +++ build/pgo/automation.py.in 8 Apr 2008 12:48:46 -0000 1.8 @@ -184,11 +184,17 @@ class Process: def _run(self): "Continues execution of this process on a separate thread." p = self._process out = self._out + + if IS_WIN32: + running = lambda: p.poll() is None + else: + running = lambda: p.poll() == -1 + # read in lines until the process finishes, then read in any last remaining # buffered lines - while p.poll() == -1: + while running(): line = out.readline().rstrip() if len(line) > 0: log.info(line) for line in out: Still need the MozillaTest boxes to be kicked/cycle to verify correctness...

Jeff Walden [:Waldo]

Assignee

Comment 11

•

17 years ago

They're cycling correctly now, marking FIXED.

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

(not currently active) Ted Mielczarek

Updated

•

16 years ago

Component: Testing → Mochitest

Product: Core → Testing

QA Contact: testing → mochitest

Version: Trunk → unspecified

Serge Gautherie (:sgautherie)

Updated

•

16 years ago

Depends on: 484231

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Need to do something about buffered output from Mochitest server

Categories

(Testing :: Mochitest, defect)

Tracking

(Not tracked)

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Comment 11

Updated

Updated

Attachment

General

Description

File Name

Content Type