627160 - mozmill stopping methods are all kinds of broken and should be completely refactored

Assignee

Description

•

14 years ago

I have studied, for some time, MozMill.stop_runner and related methods and I still can't divine the intent. What I suspect is this is because there isn't much intent there and people have, being in the same position of ignorance that I find myself in, have added code to the method(s) seeking to correct a specific problem without figuring out what the method should do. Here is a brief rundown of my explorations of stop runner: def stop_runner(self, timeout=30, close_bridge=False, hard=False): # tries to stop the runner. this is called by the MozMill.stop() # method, with close_bridge=True and timeout of 10 with hard being # whether stop is fatal or not, and also by MozMillRestart.run_dir() # with the default values for each test: # https://github.com/mozautomation/mozmill/blob/hotfix-1.5.2/mozmill/mozmill/__init__.py#L618 sleep(1) try: mozmill = jsbridge.JSObject(self.bridge, mozmillModuleJs) mozmill.cleanQuit() # we try to quit mozmill cleanly over the JS bridge.... except (socket.error, JSBridgeDisconnectError): pass # ...if we get some sort of network error, we completely ignore it (why?) except: self.runner.cleanup() raise # ... otherwise we cleanup and raise the error if not close_bridge: starttime = datetime.utcnow() self.runner.wait(timeout=timeout) endtime = datetime.utcnow() if ( endtime - starttime ) > timedelta(seconds=timeout): try: self.runner.stop() except: pass self.runner.wait() # if we don't want to close the bridge, we try to wait for the runner # with the timeout given. We determine if the wait is successful # (this logic should be moved to the runner....wait should return a # True or False depending on if the wait was successful, at the very # least). If its not successful, we try to stop (==SIGKILL) it. # Regardless of whether or not that succeeds, we wait some more. else: # TODO: unify this logic with the above better # if we're not closing the bridge... if hard: self.runner.cleanup() return # if it is a hard stop, cleanup the profile and stop the runner # you're done # if its not a hard stop.... # XXX this call won't actually finish in the specified timeout time self.runner.wait(timeout=timeout) # mysterious comment aside, try to wait for the runner to exit. self.back_channel.close() self.bridge.close() # close the back channel and the bridge x = 0 while x < timeout: if self.endRunnerCalled: break sleep(1) x += 1 # see if the wait is successful. We do this in a completely different # way then we do above (why?) else: print "WARNING | endRunner was never called. There must have been a failure in the framework." self.runner.cleanup() sys.exit(1) # if we never get endRunnerCalled, we cleanup the runner and hard exit # MozMill ### # Some questions that come from consideration of this method...what is # the correct way to wait for the runner? Is the first method # correct, or the second? They should probably be done in the same # way, and the only code that should be changed via the close_bridge # flag is `self.back_channel.close()` and `self.bridge.close()`, # unless there is some subtlty that convolves wait() and how we're # doing this. This function should probably be documented wrt intent. # Why do we give a blind pass to JSBridgeDisconnect errors? stop_runner and related stop functions should be made human readable with definite intent. They should be documented in addition, but the underlying code should be made to make sense. It is only if we have code that we can understand that we can further edit and implement sane and robust stopping conditions.

Jeff Hammel

Assignee

Comment 1

•

14 years ago

It is probably a good idea to combine MozMill and MozMillRestart into one class with a unified API for 2.0. While, again, it is difficult to tell what is actually meant, my understanding of the workflow is something like this:: MozMill: - create a profile - start the runner - create the JSBridge network - run the tests - stop the JSBridge network - stop the runner and clean up the profile MozMillRestart: - for each directory, do the exact thing that MozMill does as a one-time thing At the end of both of these, we also do reporting. I'm ignoring the event dispatching as this is roughly handled the same. Am I wrong here? Looking at the code, its difficult to tell. There are *tons* of special cases where its hard to be certain if the intent is actually as I describe it:: - we clone the profile for MozMillRestart per directory. Of course, we don't do this with a Profile.clone() method....we just introspect everything and copy it. - MozMillRestart looks for and fires a kinda magical 'callbacks.py' if it finds it in the directory. Why is this special for restart tests? Do we use it at all? Shouldn't ordinary mozmill have this power? ABICT, there is no such file in mozmill-tests. I'm not sure why we include this functionality OOTB. IMHO, this should be an event handler, like logging and reporting. There is no reason, ABICT, to inbuild this into the MozMillRestart class. - we do a lot of introspecting to figure out what tests we want, including supporting several different ways of doing things. For the easy case (-t), we should do something very simple. For more complicated cases, we should start using manifests and their data, as well as the JSBridge refactor.

Henrik Skupin [:whimboo][⌚️UTC+2]

Updated

•

14 years ago

Blocks: 615662

Jeff Hammel

Assignee

Updated

•

14 years ago

Assignee: nobody → jhammel

No longer blocks: 615662

Jeff Hammel

Assignee

Updated

•

14 years ago

Depends on: 627794

Jeff Hammel

Assignee

Updated

•

14 years ago

Depends on: 584464

Jeff Hammel

Assignee

Updated

•

14 years ago

Depends on: 634141

Jeff Hammel

Assignee

Comment 2

•

14 years ago

This is basically a tracking bug these days. The mozmill + mozmill-restart cleanup (bug 584464), when landed, will fix several of these issues (as well as its blocking bugs and the Mozmill killed my Firefox (bug 627794) covers a few more. I think there are a few more stragglers in the 2.0 bugs that should be blockers to this, but we're looking not so shabby. Note that this may surface other edge cases we haven't seen before (which neither are nor are not regressions, strictly speaking). That said, it will make killing them *much* easier!

cmtalbert

Updated

•

14 years ago

Whiteboard: [mozmill-2.0?] → [mozmill-2.0+]

Jeff Hammel

Assignee

Comment 4

•

14 years ago

If bug 634141 is not resolved for 2.0, we should probably close this bug (since it is a tracking bug for 2.0). Saving process IDs would be a big plus, but it should be done robustly and I wouldn't want to block 2.0 release on it if it turns out to be a time sink.

Jeff Hammel

Assignee

Updated

•

14 years ago

Depends on: 639991, 640010

Jeff Hammel

Assignee

Comment 5

•

13 years ago

this is effectively done, all other work is on bug 640047

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → DUPLICATE

Nobody; OK to take it and work on it

Updated

•

8 years ago

Product: Testing → Testing Graveyard

Bugzilla

Quick Search

mozmill stopping methods are all kinds of broken and should be completely refactored

Categories

(Testing Graveyard :: Mozmill, defect)

Tracking

(Not tracked)

People

(Reporter: k0scist, Assigned: k0scist)

References

Details

(Whiteboard: [mozmill-2.0+])

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Updated

Updated

Updated

Comment 2

Updated

Comment 4

Updated

Comment 5

Updated