Closed Bug 597064 Opened 9 years ago Closed 6 years ago

xpcshell harness should handle timeouts internally, fail tests (like mochitest or reftest harnesses)

Categories

(Testing :: XPCShell Harness, defect, major)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
mozilla23

People

(Reporter: jdm, Assigned: jdm)

References

Details

(Keywords: sheriffing-P1, Whiteboard: [mentor=jdm][good first bug][lang=python])

Attachments

(1 file, 13 obsolete files)

19.31 KB, patch
ted
: review+
Details | Diff | Splinter Review
<ted> because the xpcshell harness doesn't know about timeouts
<ted> we should fix that
<ted> it's buildbot forcibly killing the process
Summary: Timeout errors don't show any test output → xpcshell harness should handle timeouts internally, fail tests (like mochitest or reftest harnesses)
Assignee: nobody → josh
Depends on: 597471
This is harder than simply adding a timer to head.js, unfortunately.  IPC tests complicate things in that if a child is in an infinite loop the parent process will wait forever for it to finish.  There's no way around that besides adding a watchdog into the shell, which I'm investigating.  The significantly easier alternative is to go back to my original plan of using a timer from runxpcshelltests.py.
Attachment #476555 - Attachment is obsolete: true
Attachment #477731 - Attachment is obsolete: true
Now with 100% more qref
Attachment #477735 - Flags: review?(ted.mielczarek)
Comment on attachment 477735 [details] [diff] [review]
Add timeout logic to xpcshell test runner.

>diff --git a/testing/xpcshell/runxpcshelltests.py b/testing/xpcshell/runxpcshelltests.py
>--- a/testing/xpcshell/runxpcshelltests.py
>+++ b/testing/xpcshell/runxpcshelltests.py
>+  def testTimeout(self, test, proc):
>+    global timedOut

You probably don't need to use a global here, just use a member variable, self.timedOut. Shouldn't be any synchronization issues since you're only setting it here.

>+    print "TEST-UNEXPECTED-FAIL | %s | Test timed out" % test
>+    timedOut = True
>+    os.killpg(os.getpgid(proc.pid), signal.SIGINT)

killpg won't work on Windows. You'll need to do something else. Also, for consistency with other harnesses, kill it with SIGABRT on Linux. In fact, you might just want to refactor this utility function into automationutils.py and use it instead:
http://mxr.mozilla.org/mozilla-central/source/build/automation.py.in#697


>+        if not interactive:
>+          testTimer = Timer(120.0, lambda: self.testTimeout(test, proc))
>+          testTimer.start()

You probably want to check debuggerInfo as well here, and not bother with this when a debugger of any kind is attached.

>+
>         try:
>           print "TEST-INFO | %s | running test ..." % test
> 
>           proc = self.launchProcess(cmdH + cmdT + self.xpcsRunArgs,
>                       stdout=pStdout, stderr=pStderr, env=self.env, cwd=testdir)
> 
>           # Allow user to kill hung subprocess with SIGINT w/o killing this script
>           # - don't move this line above launchProcess, or child will inherit the SIG_IGN
>@@ -482,16 +495,18 @@ class XPCShellTests(object):
>           # |stderr == None| as |pStderr| was either |None| or redirected to |stdout|.
>           stdout, stderr = self.communicate(proc)
>           signal.signal(signal.SIGINT, signal.SIG_DFL)
> 
>           if interactive:
>             # Not sure what else to do here...
>             return True
> 
>+          testTimer.cancel()

If you wind up checking debuggerInfo you'll need to check it again here so you don't touch this uninitialized.

r-, needs some work, but a decent start.
Attachment #477735 - Flags: review?(ted.mielczarek) → review-
Duplicate of this bug: 369207
Severity: normal → major
OS: Linux → All
Hardware: x86 → All
Version: unspecified → Trunk
Duplicate of this bug: 606282
This is a great bug for a new contributor with python knowledge to take on. The initial patch I submitted shows where the relevant changes are necessary; if you fix it up according to Ted's review comments in comment 5, we could get this checked in and make a number of people really happy.
Whiteboard: [mentor=jdm]
https://developer.mozilla.org/en/Writing_xpcshell-based_unit_tests has information on how to run xpcshell tests. Just add an infinite loop to an existing test and run that to ensure your changes are doing the right thing.
Whiteboard: [mentor=jdm] → [mentor=jdm][goodfirstbug]
Whiteboard: [mentor=jdm][goodfirstbug] → [mentor=jdm][good first bug][lang=python]
I am willing to take on this bug.  I will get started tomorrow.
I see that this bug has a dependency, should I try to fix the dependency, or wait until its fixed?
No longer depends on: 597471
No, that dependency was not actually blocking this bug.
Assignee: josh → astropiloto
The reason I used killpg was to deal with subprocesses of xpcshell, such as occur when executing IPC tests (ie. any tests in unit_ipc test directories). I'm inclined to say that for the purposes of this bug getting resolved quickly, we should ignore that problem and save it for a followup.
Ok, will do.  Everything else should be included right?
Yes, all of Ted's earlier comments should be addressed. You can look at using the routines from automation.py to replace the use of killpg in this patch for the time being.
Yes, that was what I was planning to do.
While running the code I got this error:
TEST-INFO | c:\Users\dorothy\firefox-src\mozilla-central\obj-i686-pc-mingw32\_te
sts\xpcshell\netwerk\test\unit\test_resumable_channel.js | running test ...
Exception in thread Thread-131:
Traceback (most recent call last):
  File "c:\mozilla-build\python\lib\threading.py", line 552, in __bootstrap_inne
r
    self.run()
  File "c:\mozilla-build\python\lib\threading.py", line 756, in run
    self.function(*self.args, **self.kwargs)
  File "c:/Users/dorothy/firefox-src/mozilla-central/testing/xpcshell/runxpcshel
ltests.py", line 672, in <lambda>
    testTimer = Timer(120.0, lambda: self.testTimeout(test, proc))
AttributeError: 'XPCShellTests' object has no attribute 'testTimeout'

I am unfamiliar with Timer and lambda, however this is the original unedited code, so what am I doing wrong?
That error suggests to me that you didn't apply this block from my original patch properly:
 
>+  def testTimeout(self, test, proc):
>+    global timedOut
>+    print "TEST-UNEXPECTED-FAIL | %s | Test timed out" % test
>+    timedOut = True
>+    os.killpg(os.getpgid(proc.pid), signal.SIGINT)
Specifically, it should be a method of the XPCShellTests class, at the same level of indentation as buildXpcsRunArgs.
That would do it. :)
(In reply to Josh Matthews [:jdm] from comment #15)
> Yes, all of Ted's earlier comments should be addressed. You can look at
> using the routines from automation.py to replace the use of killpg in this
> patch for the time being.

Should I copy this routine, or import it?  If I am going to import, how?
It looks to me like you can just use |from automation import Automation| and create an instance of Automation where you need it. Check how other test suites call killAndGetStack to make sure you use it correctly.
I tried that, and it gives me "ImportError: No module named automation".
Historically we haven't had the xpcshell harness rely on automation.py directly. Instead, I created automationutils.py to store common functionality between xpcshell and automation.py. If it's not too much work, the right solution here might be to move that functionality from automation.py to automationutils.py.
Would I copy the function over to automationutils.py, or move it?
Move it. automation.py imports automationutils.py, so it should be fairly straightforward to fix the existing code to call it from automationutils instead.
I am having trouble copying it over.  The build gives me an error "utilityPath is not defined.".  I assume this is from an import in automation.py.in, but I looked around on mxr.mozilla.org and I couldn't find where it originates.
>739   def killAndGetStack(self, proc, utilityPath, debuggerInfo):

Given that utilityPath is an argument to the method that you're supposedly moving, something doesn't add up.
I think I was unclear, let me try again.  When I call killAndGetStack from runxpcshelltests.py I get the error "utilityPath is not defined".  This makes sense because utilityPath has not been assigned in runxpcshelltests.py.  However, I am unsure what utilityPath should be assigned to, meaning what is the utilityPath?  (Sorry for asking so many "stupid" questions.  I have not participated in any large projects before, and this is kind of over my head. :) )
How are you calling killAndGetStack? What value are you passing for it? What value do existing callers of killAndGetStack pass to it?
Just for the record, I am moving dumpScreen http://mxr.mozilla.org/mozilla-central/source/build/automation.py.in#692 into automationutils.py because it is called by killAndGetStack.
I am having problems moving code from automation.py.in to automationutils.py.  This is blocking further work.
Attachment #618817 - Flags: feedback?(josh)
Comment on attachment 618817 [details] [diff] [review]
Patch for bug 597064 adding timeout for xpcshell test harness

automation.py.in functions should still live somewhere ;-<
Attachment #618817 - Flags: feedback?(josh) → review-
Sorry about that.  There was a problem with the diff file and I had to manually edit it.  I meant to keep the section where it gets added to automationutils.py.  Once again, sorry about that.
I have a couple other notes:
* killAndGetStack will only be called on non-mac and windows platforms, due to its indentation
* the bits where you check for NameErrors with certain variables is kind of weird. Could you explain why those are necessary?
* we probably need to check for testTimer existing before cancelling it

I will almost certainly have more comments once a fixed up version that includes the automationutils.py changes is included; please don't get discouraged by this! I'm excited that you're making progress here :)
Thanks for the feedback! I use the NameError stuff to check whether a variable exists.  I know its probably not the best way, but that was how I was taught how to do it.  And if I find a better way I could probably do that for testTimer as well.  I am unsure what you want me to do with the first comment.  Do you want me to get rid of the assignments for self.IS_MAC?
My point with the first comment is that right now, killAndGetStack only runs inside of one of the branches of the platform checks. You want to move it out of the linux check so that it runs regardless of the platform.

For the undefined variables, I suggest this: assign self.CRASHREPORTER in the XPCShellTests initialization and get rid of the bit before you create the timer. I'm not really sure what good the debuggerInfo one does, since right below it you check self.debuggerInfo. It looks like your code should work fine without it.

To do the testTimer check, just initialize testTimer to null before doing the checks to see if you should create it. That way you can just if |if testTimer:| later on when cancelling it.
After being sick for a long time, I am back and actively working on this bug.
I am moving this code here: http://mxr.mozilla.org/mozilla-central/source/build/automation.py.in#680   into the runxpcshelltest.py file.  However that line needs this entire class: http://mxr.mozilla.org/mozilla-central/source/build/automation.py.in#160  Should I move that too?
Attached file runxpcshelltests.py prototype (obsolete) —
still having trouble with OS recognition
Attachment #646994 - Flags: feedback?
You'll need to be more specific than that if you want useful feedback, Alan. Also, if you could attach a diff of your changes instead of the whole file, that would be easier to work with.
Ok, sorry, it was late when I uploaded this.  As I said before, OS recognition is not working.  When I run the tests, and one times out, it should close it in the proper manner dependent on the OS.  However, it simply runs the Mac close sequence (which is not under a if.. then.. statement), which causes errors on my Windows box.  Somehow, (and this is why I am confused), my Windows box is not being recognized as such.  Regarding a patch, I am not near my computer, but I will upload one ASAP.  Once again, sorry for not being clear!
Sorry for taking so long to give you some feedback here; you probably need to use self.mozinfo in testTimeout (and similarly in trySetupNode).
No problem! I have tons of free time this week so I should (hopefully) get this done.
Attachment #646994 - Flags: feedback?
Attached patch Almost complete (obsolete) — Splinter Review
This patch is NOT complete. For some reason, mozInfo doesn't have a OS property.
Attachment #646994 - Attachment is obsolete: true
Attachment #655334 - Flags: feedback?
Huh. I unfortunately have no clue why self.mozInfo.os isn't working. Try printing it out to see if you get an UNKOWN value, or if it really doesn't exist.
I tried printing it and got this: 
Exception in thread Thread-1:
Traceback (most recent call last):
  File "c:\mozilla-build\python\Lib\threading.py", line 552, in __bootstrap_inne
r
    self.run()
  File "c:\mozilla-build\python\Lib\threading.py", line 756, in run
    self.function(*self.args, **self.kwargs)
  File "c:/Users/Stan/firefox-src/mozilla-central/testing/xpcshell/runxpcshellte
sts.py", line 848, in <lambda>
    testTimer = Timer(5.0, lambda: self.testTimeout(test, proc))
  File "c:/Users/Stan/firefox-src/mozilla-central/testing/xpcshell/runxpcshellte
sts.py", line 609, in testTimeout
    print self.mozInfo.os
AttributeError: 'dict' object has no attribute 'os'

Which I guess means it doesn't exist at all.
I got it, I had to not use self.
Attached patch proposed patch for bug 597064 (obsolete) — Splinter Review
Attachment #618817 - Attachment is obsolete: true
Attachment #655334 - Attachment is obsolete: true
Attachment #655334 - Flags: feedback?
Comment on attachment 661427 [details] [diff] [review]
proposed patch for bug 597064

Review of attachment 661427 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for this, Alan! Assuming this works correctly, and continues to do so after the changes I propose, I think the next step should be to figure out how to avoid duplicating code from automation.py.in and move to sharing it instead. It's good that you have made these changes, since it's much clearer what actually needs to be shared now.

::: build/automation.py.in
@@ +17,5 @@
>  import sys
>  import threading
>  import tempfile
>  import sqlite3
> +import subprocess

Is this necessary?

@@ +157,5 @@
>             "IS_DEBUG_BUILD",
>             "DEFAULT_TIMEOUT",
>            ]
>  
> +  

This definition needs to stay here since the rest of the file uses it.

::: netwerk/test/unit/test_307_redirect.js
@@ +1,1 @@
> +while(true){}

Another important test to run is one that iloops after doing some things that cause output (eg. do_check_true). We want to make sure we see any existing output when the test times out.

::: testing/xpcshell/runxpcshelltests.py
@@ +30,5 @@
>    
>  """ Control-C handling """
>  gotSIGINT = False
>  def markGotSIGINT(signum, stackFrame):
> +  global gotSIGINT, timedOut 

This change doesn't look like it should be necessary.

@@ +35,4 @@
>    gotSIGINT = True
>  
>  class XPCShellTests(object):
> +  class Process(subprocess.Popen):

If this needs to be in this file, it should probably be in the top level (ie. not a member of XPCShellTests). However, if we can avoid duplicating it and just refer to the original, that would be much better.

@@ +476,5 @@
>      if self.nodeProc:
>        self.log.info('Node SPDY server shutting down ...')
>        # moz-spdy exits when its stdin reaches EOF, so force that to happen here
>        self.nodeProc.communicate()
> +  def killAndGetStack(self, proc, utilityPath, debuggerInfo):

Newline before this.

@@ +490,5 @@
> +        if os.path.exists(crashinject) and subprocess.Popen([crashinject, str(proc.pid)]).wait() == 0:
> +          return
> +      #TODO: kill the process such that it triggers Breakpad on OS X (bug 525296)
> +    self.log.info("Can't trigger Breakpad, just killing process")
> +    proc.kill()

Newline after this.

@@ +613,5 @@
> +      self.UNIXISH = True
> +      self.IS_WIN32 = False
> +      self.IS_Mac = False
> +    self.killAndGetStack(proc, "../../dist/bin", None)
> +  def dumpScreen(self, utilityPath):

Since we don't need to capture screenshots, just kill this whole method.

@@ +831,5 @@
>            args.insert(0, '-d')
> +	  try:
> +            debuggerInfo
> +	  except NameError:
> +            debuggerInfo = None

Why is this necessary?

@@ +835,5 @@
> +            debuggerInfo = None
> +      try:
> +          self.CRASHREPORTER
> +      except AttributeError:
> +          self.CRASHREPORTER = None

Likewise this. We should figure out what should actually be defining this value properly.

@@ +837,5 @@
> +          self.CRASHREPORTER
> +      except AttributeError:
> +          self.CRASHREPORTER = None
> +      if not interactive and not self.debuggerInfo:
> +	    

Kill this line.

@@ +838,5 @@
> +      except AttributeError:
> +          self.CRASHREPORTER = None
> +      if not interactive and not self.debuggerInfo:
> +	    
> +	      testTimer = Timer(120.0, lambda: self.testTimeout(test, proc))

Just pass the test filename instead of the whole test object here.

@@ +868,5 @@
>  
>          if interactive:
>            # Not sure what else to do here...
>            return True
> +        testTimer.cancel()

Wrap this in a |if testTimer| check.
(In reply to Josh Matthews [:jdm] from comment #52)
> Comment on attachment 661427 [details] [diff] [review]
> proposed patch for bug 597064
> 
> Review of attachment 661427 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks for this, Alan! Assuming this works correctly, and continues to do so
> after the changes I propose, I think the next step should be to figure out
> how to avoid duplicating code from automation.py.in and move to sharing it
> instead. It's good that you have made these changes, since it's much clearer
> what actually needs to be shared now.
> 
> ::: build/automation.py.in
> @@ +17,5 @@
> >  import sys
> >  import threading
> >  import tempfile
> >  import sqlite3
> > +import subprocess
> 
> Is this necessary?
> 

It is necessary for the process class.
> @@ +157,5 @@
> >             "IS_DEBUG_BUILD",
> >             "DEFAULT_TIMEOUT",
> >            ]
> >  
> > +  
> 
> This definition needs to stay here since the rest of the file uses it.
> 
> ::: netwerk/test/unit/test_307_redirect.js
> @@ +1,1 @@
> > +while(true){}
> 
> Another important test to run is one that iloops after doing some things
> that cause output (eg. do_check_true). We want to make sure we see any
> existing output when the test times out.
> 
I will do that.
> ::: testing/xpcshell/runxpcshelltests.py
> @@ +30,5 @@
> >    
> >  """ Control-C handling """
> >  gotSIGINT = False
> >  def markGotSIGINT(signum, stackFrame):
> > +  global gotSIGINT, timedOut 
> 
> This change doesn't look like it should be necessary.
Will fix this, just something I forgot to get around to.
> 
> @@ +35,4 @@
> >    gotSIGINT = True
> >  
> >  class XPCShellTests(object):
> > +  class Process(subprocess.Popen):
> 
> If this needs to be in this file, it should probably be in the top level
> (ie. not a member of XPCShellTests). However, if we can avoid duplicating it
> and just refer to the original, that would be much better.
Ted told me that they are trying to get rid of automation.py.in http://logbot.glob.com.au/?c=mozilla%23introduction&s=26+Apr+2012&e=26+Apr+2012
> 
> @@ +476,5 @@
> >      if self.nodeProc:
> >        self.log.info('Node SPDY server shutting down ...')
> >        # moz-spdy exits when its stdin reaches EOF, so force that to happen here
> >        self.nodeProc.communicate()
> > +  def killAndGetStack(self, proc, utilityPath, debuggerInfo):
> 
> Newline before this.
Will do.
> 
> @@ +490,5 @@
> > +        if os.path.exists(crashinject) and subprocess.Popen([crashinject, str(proc.pid)]).wait() == 0:
> > +          return
> > +      #TODO: kill the process such that it triggers Breakpad on OS X (bug 525296)
> > +    self.log.info("Can't trigger Breakpad, just killing process")
> > +    proc.kill()
> 
> Newline after this.
Same.
> 
> @@ +613,5 @@
> > +      self.UNIXISH = True
> > +      self.IS_WIN32 = False
> > +      self.IS_Mac = False
> > +    self.killAndGetStack(proc, "../../dist/bin", None)
> > +  def dumpScreen(self, utilityPath):
> 
> Since we don't need to capture screenshots, just kill this whole method.
Yep.
> 
> @@ +831,5 @@
> >            args.insert(0, '-d')
> > +	  try:
> > +            debuggerInfo
> > +	  except NameError:
> > +            debuggerInfo = None
> 
> Why is this necessary?
see below
> 
> @@ +835,5 @@
> > +            debuggerInfo = None
> > +      try:
> > +          self.CRASHREPORTER
> > +      except AttributeError:
> > +          self.CRASHREPORTER = None
> 
> Likewise this. We should figure out what should actually be defining this
> value properly.
How would I go about doing that?
> 
> @@ +837,5 @@
> > +          self.CRASHREPORTER
> > +      except AttributeError:
> > +          self.CRASHREPORTER = None
> > +      if not interactive and not self.debuggerInfo:
> > +	    
> 
> Kill this line.
Will do.
> 
> @@ +838,5 @@
> > +      except AttributeError:
> > +          self.CRASHREPORTER = None
> > +      if not interactive and not self.debuggerInfo:
> > +	    
> > +	      testTimer = Timer(120.0, lambda: self.testTimeout(test, proc))
> 
> Just pass the test filename instead of the whole test object here.
Is that just test without the proc? or what would that be?
> 
> @@ +868,5 @@
> >  
> >          if interactive:
> >            # Not sure what else to do here...
> >            return True
> > +        testTimer.cancel()
> 
> Wrap this in a |if testTimer| check.
Will do.
This fixes most of the problems that JDM brought up.  Not fixed: global timedOut, moving Process class back (see previous comment).
Attachment #661427 - Attachment is obsolete: true
Comment on attachment 661483 [details] [diff] [review]
2nd proposed patch for bug 597064

Review of attachment 661483 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/xpcshell/runxpcshelltests.py
@@ +9,5 @@
>  from glob import glob
>  from optparse import OptionParser
>  from subprocess import Popen, PIPE, STDOUT
> +from tempfile import mkdtemp, gettempdir, mkstemp
> +import tempfile

I don't think these changes are required any longer.

@@ +20,4 @@
>  from automationutils import *
>  
> +
> +class Process(subprocess.Popen):

I don't think this is even necessary any more. Try removing it from here and reverting any changes you made to automation.py.in.

@@ +80,5 @@
>    
>  """ Control-C handling """
>  gotSIGINT = False
>  def markGotSIGINT(signum, stackFrame):
> +  global gotSIGINT, timedOut 

This is still present.

@@ +104,5 @@
> +      self.IS_MAC = False
> +    else:
> +      self.UNIXISH = True
> +      self.IS_WIN32 = False
> +      self.IS_MAC = False

Let's just scrap this whole block and use the mozinfo.os bits below.

@@ +494,3 @@
>  
> +    if self.CRASHREPORTER and not debuggerInfo:
> +      if self.UNIXISH:

mozinfo.os not in ['mac', 'win']

@@ +496,5 @@
> +      if self.UNIXISH:
> +        # ABRT will get picked up by Breakpad's signal handler
> +        os.kill(proc.pid, signal.SIGABRT)
> +        return
> +      elif self.IS_WIN32:

mozinfo.os == 'win'

@@ +613,5 @@
> +  def testTimeout(self, test, proc):
> +    global timedOut
> +    print "TEST-UNEXPECTED-FAIL | %s | Test timed out" % test
> +    timedOut = True
> +    self.haveDumpedScreen = False

Don't need this any longer.

@@ +614,5 @@
> +    global timedOut
> +    print "TEST-UNEXPECTED-FAIL | %s | Test timed out" % test
> +    timedOut = True
> +    self.haveDumpedScreen = False
> +    self.killAndGetStack(proc, "../../dist/bin", None)

Any chance we can call Automation().killAndGetStack instead and not have to define it in this file?

@@ +787,5 @@
>            args.insert(0, '-d')
> +	  try:
> +            debuggerInfo
> +	  except NameError:
> +            debuggerInfo = None

This does does not appear to be necessary to me. Is it?

@@ +791,5 @@
> +            debuggerInfo = None
> +      try:
> +          self.CRASHREPORTER
> +      except AttributeError:
> +          self.CRASHREPORTER = None

The CRASHREPORTER stuff comes from http://mxr.mozilla.org/mozilla-central/source/build/automation.py.in#49, which is unfortunate since it's all about preprocessing. If we can get rid of killAndGetStack from this file, we can avoid the issue entirely.

@@ +793,5 @@
> +          self.CRASHREPORTER
> +      except AttributeError:
> +          self.CRASHREPORTER = None
> +      if not interactive and not self.debuggerInfo:
> +	      testTimer = Timer(120.0, lambda: self.testTimeout(test, proc))

The rest of the places that print things show that you can just use the |name| variable instead of the |test| one.

@@ +794,5 @@
> +      except AttributeError:
> +          self.CRASHREPORTER = None
> +      if not interactive and not self.debuggerInfo:
> +	      testTimer = Timer(120.0, lambda: self.testTimeout(test, proc))
> +	      testTimer.start()

There's an extra level of indentation here.
And while I realize that ted said that there's a desire to stop using automation.py.in, that work does not seem to be progressing quickly. I think for the purposes of this patch we should focus on sharing as much as possible.
Bug 783727 just landed, so it might be pretty useful to use the psutil.Process type to launch the shell, and when killing it you can use proc.get_children(), kill all of those and then kill the original. This will address my original use of killpg which ted didn't like.
So I started using mq, and found that if I removed the patch, it builds fine, so it must be something I did. The problem is I have no idea whats causing the build to fail.  When I get home I will post the error again.
I would like to continue working on this bug, but work may be a bit slower than normal.  On the other hand, if someone wants this bug I am willing to let go of it bug in exchange for something a bit more trivial.
I don't see anyone else stepping up to do the work, so feel free to continue working at your own pace. Thanks for sticking with it!
Alright, I fixed everything except for using killAndGetStack from automation.py.in.  I don't understand how to access it from runxpcshelltests.py.  Some help would be appreciated. Thanks!
Did you try this suggestion?

>Any chance we can call Automation().killAndGetStack instead and not have to define 
>it in this file?
Yes, I get this:

TEST-INFO | c:\Users\Stan\firefox-src\mozilla-central\obj-i686-pc-mingw32\_tests
\xpcshell\netwerk\test\unit\test_304_responses.js | running test ...
TEST-UNEXPECTED-FAIL | c:\Users\Stan\firefox-src\mozilla-central\obj-i686-pc-min
gw32\_tests\xpcshell\netwerk\test\unit\test_304_responses.js | Test timed out
Exception in thread Thread-1:
Traceback (most recent call last):
  File "c:\mozilla-build\python\Lib\threading.py", line 552, in __bootstrap_inne
r
    self.run()
  File "c:\mozilla-build\python\Lib\threading.py", line 756, in run
    self.function(*self.args, **self.kwargs)
  File "c:/Users/Stan/firefox-src/mozilla-central/testing/xpcshell/runxpcshellte
sts.py", line 846, in <lambda>
    testTimer = Timer(1.0, lambda: self.testTimeout(name, proc))
  File "c:/Users/Stan/firefox-src/mozilla-central/testing/xpcshell/runxpcshellte
sts.py", line 578, in testTimeout
    Automation.killAndGetStack(proc, "../../dist/bin", None)
NameError: global name 'Automation' is not defined
Add |from automation import Automation| after the rest of the imports, and make sure you're using |Automation().killAndGetStack(...)| (not the () after Automation).
I get a |ImportError: No module named automation| error.
Hmm, I was able to make it work by modifying some makefiles to use include the correct path, but there were some complications that arose (specifically, automation.py ends up running code that makes all output be repeated). Ted, do you have any input on how best to access killAndGetStack? I tried moving it into automationutils.py and take an Automation parameter, but I still need to end up importing automation which causes the double logging to occur.
Attached patch 3rd proposed patch (obsolete) — Splinter Review
Attachment #661483 - Attachment is obsolete: true
Attachment #694017 - Flags: feedback?(josh)
This is nearly ready. Unfortunately it breaks mach right now; I think fixing bug 794506 should make it work, since we rely on a python file that's in the objdir.
Assignee: astropiloto → josh
Attachment #694017 - Attachment is obsolete: true
Attachment #694017 - Flags: feedback?(josh)
Attachment #477735 - Attachment is obsolete: true
qref
Attachment #694260 - Attachment is obsolete: true
Is bug 794506 still a dependency? If so, any idea on when it will be fixed?
Depends on: 841713
Keywords: sheriffing-P1
So now that bug 841713 is fixed, what do we have to do next?
I'm testing to see whether I was correct that running xpcshell tests under mach now works correctly.
May as well make sure that this solution is actually desired while waiting for my build to complete.
Attachment #730392 - Flags: review?(ted)
Attachment #694261 - Attachment is obsolete: true
Ah, it looks like we probably also want bug 794506. Then mach will have the right search paths to be able to find the preprocessed automation.py from $objdir/build.
Actually, it turns out that there is a way to fix the problem without waiting on proper virtualenv integration.
No longer depends on: 841713
Attachment #731193 - Flags: review?(ted)
Attachment #730392 - Attachment is obsolete: true
Attachment #730392 - Flags: review?(ted)
Josh, how goes things here? IIUC, this blocks getting useful diagnostics on the various timeout failures which blocks un-hiding xpcshell tests on WinXP (bug 856027).
Flags: needinfo?(josh)
Don't blame me; It's still waiting on Ted's review.
Flags: needinfo?(josh)
Feel free to transfer the needinfo in that case ;)
Flags: needinfo?(ted)
Depends on: 859065
So yeah, bug 859065 should fix the simple case where we just have an async test that fails to complete. This patch will still be necessary for more complex things, like IPC tests or things that are actually hanging the main thread.
Flags: needinfo?(ted)
Comment on attachment 731193 [details] [diff] [review]
Add timeout logic to xpcshell test runner.

Review of attachment 731193 [details] [diff] [review]:
-----------------------------------------------------------------

r- for a few things, but it's mostly ok.

::: build/automation.py.in
@@ +88,5 @@
> +  while _log.handlers:
> +    _log.removeHandler(_log.handlers[0])
> +  handler = logging.StreamHandler(log)
> +  _log.setLevel(logging.INFO)
> +  _log.addHandler(handler)

I guess ideally we'd push this up into each of the harnesses that are using automation.py. It seems wrong for this module to be setting the log handler. Can you file a followup for that?

@@ +1020,3 @@
>  
>    def killAndGetStack(self, proc, utilityPath, debuggerInfo):
>      """Kill the process, preferrably in a way that gets us a stack trace."""

Can you move this comment down to the new method, and mention here that this attempts to save a screenshot and then kill the process?

::: config/makefiles/xpcshell.mk
@@ +57,5 @@
>  
>  xpcshell-tests-remote: DM_TRANS?=adb
>  xpcshell-tests-remote:
>  	$(PYTHON) -u $(topsrcdir)/testing/xpcshell/remotexpcshelltests.py \
> +	  -I$(testxpcobjdir) \

This...won't actually work. It's not running pythonpath.py.

::: testing/xpcshell/Makefile.in
@@ +13,5 @@
>  include $(topsrcdir)/config/rules.mk
> +# We're installing to _tests/xpcshell, so this is the depth
> +# necessary for relative objdir paths.
> +TARGET_DEPTH = ../..
> +include $(topsrcdir)/build/automation-build.mk

I don't think you really need all this gunk. $objdir/build is already in sys.path, and that has a copy of automation.py that you can use. You're not launching an application with it, so you don't need the paths processed correctly. (TARGET_DEPTH just affects the path to the default binary etc.)

You would also need to adjust the xpcshell test packaging goop to stick $objdir/build/automation.py in the xpcshell dir in the test package, but I think that's cleaner than all this.

::: testing/xpcshell/runxpcshelltests.py
@@ +844,5 @@
>              completeCmd = cmdH + cmdT + args
>  
> +            if not interactive and not self.debuggerInfo:
> +                testTimer = Timer(120.0, lambda: self.testTimeout(name, proc))
> +                testTimer.start()

I'm not wild about this using threads, but I guess we can clean this up in bug 771578.

@@ +869,5 @@
>  
>                  if interactive:
>                      # Not sure what else to do here...
>                      return True
> +                if testTimer:

This is going to error in the "interactive or debugger" case because testTimer is undefined. You need to assign testTimer = None or something like that outside of the first conditional you added.
Attachment #731193 - Flags: review?(ted) → review-
(Sorry about the review delay, will do better for the next round.)
Rebased, with comments addressed. I've tested running individual tests under mach, as well as running check-one, check-interactive, and xpcshell-tests under make.
Attachment #746526 - Flags: review?(ted)
Attachment #731193 - Attachment is obsolete: true
Comment on attachment 746526 [details] [diff] [review]
Add timeout logic to xpcshell test runner.

Review of attachment 746526 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for fixing this up.

::: config/makefiles/xpcshell.mk
@@ +39,5 @@
>  # Execute all tests in the $(XPCSHELL_TESTS) directories.
>  # See also testsuite-targets.mk 'xpcshell-tests' target for global execution.
>  xpcshell-tests:
>  	$(PYTHON) -u $(topsrcdir)/config/pythonpath.py \
> +	  -I$(testxpcobjdir) \

I feel like we could probably get rid of pythonpath here, but we can punt that to a followup. (There's already a bug filed on getting rid of pythonpath.)

::: netwerk/test/unit/test_head.js
@@ +112,5 @@
>    response = channel.getResponseHeader("Proxy-Authenticate");
>    do_check_eq(response, "line 1\nline 2\nline 3");
> +  
> +  for (;;) {
> +  }

I suspect you want to remove this before landing. :)
Attachment #746526 - Flags: review?(ted) → review+
Blocks: 869640
Not actually mochitest bustage, but postflight unification of Mac universal builds.

However, the timeouts in https://tbpl.mozilla.org/php/getParsedLog.php?id=22708325&tree=Mozilla-Inbound (test_cookies_read.js doesn't seem to be in the habit of timing out) and the inexplicable failure in https://tbpl.mozilla.org/php/getParsedLog.php?id=22707181&tree=Mozilla-Inbound mean you might also want to push to try and trigger a few dozen Windows debug and opt runs.
https://hg.mozilla.org/mozilla-central/rev/4a1cbbb07ce2
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Depends on: 870745
Blocks: 869638
Blocks: 889317
You need to log in before you can comment on or make changes to this bug.