Closed Bug 418009 Opened 16 years ago Closed 16 years ago

Need to do something about buffered output from Mochitest server

Categories

(Testing :: Mochitest, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Attachments

(2 files)

Currently the output's buffered and never read.  If I flip the DEBUG variable in the server to true, the server stops working, I think because the buffer of text is never flushed.  As long as we don't dump *too* much (and we don't seem to in normal circumstances, including in debug builds) this doesn't matter, but I'd like this in case I ever need to enable server debugging on tinderboxen.  Also, it seems good to get this on the record in case we ever cross the safely-writable threshold that would cause the server to fail before running all the mochitests.
Oh, this is only with runtests.py, not with runtests.pl -- Perl just dumps the server output to stdout instead of requiring manual processing.
Attached patch PatchSplinter Review
It's sort of sad the machinations (threads!) I have to do here to make this work, but it's not *super-horrible*.

I think this might resolve the problem that has been hitting the PGO unittest box, where Mochitests time out midway through execution for no obvious reason, but I make no promises.
Attachment #313829 - Flags: review?(sayrer)
Comment on attachment 313829 [details] [diff] [review]
Patch

Let's have a race, shall we?  I strongly suspect this is causing the initial foulup making bm-pgo not be able to be put on the main tinderbox, and we're going to hit this going forward as runtests.py is used elsewhere.
Attachment #313829 - Flags: review?(rcampbell)
Comment on attachment 313829 [details] [diff] [review]
Patch

looks fine though it'd be nice if we didn't have to have so many conditions for our problem child win32.
Attachment #313829 - Flags: review?(rcampbell) → review+
Doesn't MozillaBuild ship with bsmedberg's process stuff?

<http://svn.smedbergs.us/python-processes/trunk/>
No it doesn't, but it ships with python 2.5 so dropping killableprocess in somewhere would be easy.
Landed on trunk.  So far the Mac and Linux boxes are okay, but the Windows ones aren't; I had dolske look at logs, and I checked in the obvious fix for the problem (p.stdout only exists if you passed stdout=subprocess.PIPE to subprocess.Popen, didn't happen before because we were using a different method to get the output of the process in the logs :-\ ).  Unfortunately, the buildbots don't seem to have recovered from this mistake, so I can now claim the dubious distinction of having broken lots of Windows tinderboxen that use automation.py.in.  :-\  (On the plus side, I didn't turn the tree red!)  That's bug 427723 if you're interested.

Keeping this open until I see that the Windows boxes are okay with the followup checkin...Python's awesome and all, but its cross-platform story in its libraries absolutely is not.
See the attachment for the errors that happened; from there it was pretty clear p.stdout wasn't iterable, and returning to docs pointed out that the value is only not None if stdout is specified to the Popen constructor.  The stderr thing is something I noticed while referring to the docs at the same time, ensuring that both are directed to the same file, which is on par with non-Windows (popen2.Popen4 does that as its claim to fame).


Index: build/pgo/automation.py.in
===================================================================
RCS file: /cvsroot/mozilla/build/pgo/automation.py.in,v
retrieving revision 1.6
retrieving revision 1.7
diff -p -u -4 -r1.6 -r1.7
--- build/pgo/automation.py.in  8 Apr 2008 05:18:45 -0000       1.6
+++ build/pgo/automation.py.in  8 Apr 2008 08:44:26 -0000       1.7
@@ -159,9 +159,11 @@ class Process:
     if IS_WIN32:
       import subprocess
       cmd = [command]
       cmd.extend(args)
-      p = subprocess.Popen(cmd, env = env)
+      p = subprocess.Popen(cmd, env = env,
+                           stdout = subprocess.PIPE,
+                           stderr = subprocess.STDOUT)
       self._out = p.stdout
     else:
       import popen2
       cmd = []
Attachment #313829 - Flags: review?(sayrer)
fx-win32-tbox was spewing an exception backtrace after the bug 418009 changes (but only after profileserver.py did its stuff).  I examined the logging code a bit since the backtrace implicated it, and it seems this is what happens when an
exception is thrown during the process of logging something.  Since it was happening after everything ran correctly, I suspected a problem within runApp, since that's the last thing that runs in profileserver.py.  The last line there logs the return of runApp, which is the return of Process.wait().  On double-checking the semantics of the two different poll() functions in Process._run, I noticed that one returns None if the process is running, but the other returns -1, so I think I just messed this up in the Windows case, and the secondary loop after the process exits is catching the problems now.  If we're a touch racy, the loop exits when output stops but before the process fully dies, and then calling poll() might return None and not a number, resulting in the error when we try to print "ERROR FAIL exited with code %d" % None.  I think this might potentially be badness if the app crashes, with the right timing, and the ERROR FAIL wouldn't get printed, but I'm not 100% sure.  Anyway, this is the patch I committed to fix the spewage and potential ERROR FAIL-swallowing.


Index: build/pgo/automation.py.in
===================================================================
RCS file: /cvsroot/mozilla/build/pgo/automation.py.in,v
retrieving revision 1.7
retrieving revision 1.8
diff -p -u -4 -r1.7 -r1.8
--- build/pgo/automation.py.in  8 Apr 2008 08:44:26 -0000       1.7
+++ build/pgo/automation.py.in  8 Apr 2008 12:48:46 -0000       1.8
@@ -184,11 +184,17 @@ class Process:
   def _run(self):
     "Continues execution of this process on a separate thread."
     p = self._process
     out = self._out
+
+    if IS_WIN32:
+      running = lambda: p.poll() is None
+    else:
+      running = lambda: p.poll() == -1
+
     # read in lines until the process finishes, then read in any last remaining
     # buffered lines
-    while p.poll() == -1:
+    while running():
       line = out.readline().rstrip()
       if len(line) > 0:
         log.info(line)
     for line in out:

Still need the MozillaTest boxes to be kicked/cycle to verify correctness...
They're cycling correctly now, marking FIXED.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Component: Testing → Mochitest
Product: Core → Testing
QA Contact: testing → mochitest
Version: Trunk → unspecified
Depends on: 484231
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: