842257 - Autophone - unittests - remove reftests, increase timeout, improve status updates and recovery

Assignee

Description

•

12 years ago

Attached patch patch 1 v1 — Details — Splinter Review

Per our discussions on Friday, It is not advisable to run reftests on such low resolution devices as our phones. In addition, after reviewing previous local test runs, I checked against the production android tbpl logs and found they use a much longer timeout for the tests. When running the tests locally I found that the unittests did not update the phone's status properly thus making it difficult to tell what was happening from the log or the autophone status. In particular I also noticed the phones would regularly get into an unrecoverable state. Rather than duplicate the logic of worker.py:recover_phone, it would be better to pass the PhoneWorkerSubProcess instance to the test's runjob method so it could use recover_phone. patch 1 1. change time_out in configs/unittest_defaults.ini.example from 300 to 2400. 2. remove reftests from the configs/unittest_settings.ini

Attachment #715074 - Flags: review?(mcote)

Bob Clary [:bc] (inactive)

Assignee

Comment 1

•

12 years ago

Attached patch patch 2 v1 — Details — Splinter Review

1. pass the PhoneWorkerSubProcess instance to the test's runjob method so it could use recover_phone. 2. use PhoneTest:set_status to update the worker's status to better reflect on the worker's state. 3. handle exceptions thrown from the test suite's test runner locally using recover_phone. 4. handle other exceptions thrown from runjob during the test setup in runjob().

Attachment #715076 - Flags: review?(mcote)

Bob Clary [:bc] (inactive)

Assignee

Comment 2

•

12 years ago

I tested these patches against my local haxxor phones and found it to be well behaved. I recommend we go with these before starting our "production" runs.

Bob Clary [:bc] (inactive)

Assignee

Comment 3

•

12 years ago

PS. apply patch 1 then patch 2. sorry for the spam.

Mark Côté [:mcote]

Updated

•

12 years ago

Attachment #715074 - Flags: review?(mcote) → review+

Mark Côté [:mcote]

Comment 4

•

12 years ago

Comment on attachment 715076 [details] [diff] [review] patch 2 v1 Review of attachment 715076 [details] [diff] [review]: ----------------------------------------------------------------- I'm not crazy about passing in the subprocess object to the test itself. I think we'll get a lot of bleed-over of functionality and that it will blur the lines between what they different classes are supposed to do. I don't like the blanket try/excepts either (which I missed when reviewing runtestsremote.py originally). I think it would be better to keep the recover_phone() and related calls in the PhoneWorkerSubProcess class (or a separate object owned by that class, but that's another problem). Instead of just calling self.phone_disconnected() in PhoneWorkerSubProcess.do_job(), we should try to recover the phone there. The other problem is that runtestsremote, unlike s1s2 or smoketest, is actually a test suite, and you want to try to continue running the tests even if we get a DMError in one of them, if we can recover the phone. I think we should change PhoneTest.runjob() to instead be a sort of iterator. PhoneWorkerSubProcess.do_job(), instead of calling runjob() once, would call something like run_next_test() repeatedly until, say, a StopIteration exception is raised (standard iterator end signal). If a DMError bubbles up to the SubProcess object, it tries to recover the phone and, if successful, calls run_next_test() as usual. One beneficial side effect here is that it makes it easier to retry a test. As the FIXME comment in do_job()'s DMError handler says, we abort the test even if it were possible to recover the phone and retry it. We should have a maximum number of attempts in this case, though, so that it doesn't try running the same test forever. How does that sound? It's kind of a big-ish change, but I'm worried that things will get more confused if we start passing big objects like SubProcess back and forth. ::: tests/runtestsremote.py @@ +35,5 @@ > self.logger.debug('runtestsremote.py runjob start') > + self.set_status(msg='runtestsremote.py runjob start') > + > + self.worker_subprocess = worker_subprocess > + self.worker_subprocess.check_sdcard() This is a good idea, and it's a fast operation, so let's move it to the loop in PhoneWorkerSubProcess.do_job(). @@ +96,5 @@ > + self.logger.error(error_message) > + self.set_status(msg=error_message) > + # give the phone a minute to recover > + time.sleep(60) > + self.worker_subprocess.recover_phone() Should probably check to see if the phone is disconnected (i.e. didn't recover), and abort the test if so.

Mark Côté [:mcote]

Comment 5

•

12 years ago

Oops ignore the second code comment above; I wrote that before thinking through the whole problem. The first comment still stands. Also, I can help out or do these changes myself, if you'd like.

Bob Clary [:bc] (inactive)

Assignee

Comment 6

•

12 years ago

(In reply to Mark Côté ( :mcote ) from comment #4) > Comment on attachment 715076 [details] [diff] [review] > patch 2 v1 > > Review of attachment 715076 [details] [diff] [review]: > ----------------------------------------------------------------- > > I'm not crazy about passing in the subprocess object to the test itself. I > think we'll get a lot of bleed-over of functionality and that it will blur > the lines between what they different classes are supposed to do. I don't Yeah, this is partly just not wanting to duplicate the recover phone logic but also is due to (as you note below) the fact that the runtestsremote.py is actually a set of test suites which each consist of a set of chunks. Each of the test suite chunks would qualify as a 'job' in the normal Autophone sense. Pulling the subprocess's functionality into the runtest is a natural inclination in that case since really the runtestsremote has to act like the subprocess. But the real problem is the difference between what Autophone normally considers a job and what the runtestsremote actually does. > like the blanket try/excepts either (which I missed when reviewing > runtestsremote.py originally). I think it would be better to keep the > recover_phone() and related calls in the PhoneWorkerSubProcess class (or a > separate object owned by that class, but that's another problem). Instead > of just calling self.phone_disconnected() in PhoneWorkerSubProcess.do_job(), > we should try to recover the phone there. The other problem is that > runtestsremote, unlike s1s2 or smoketest, is actually a test suite, and you > want to try to continue running the tests even if we get a DMError in one of > them, if we can recover the phone. I think we should change > PhoneTest.runjob() to instead be a sort of iterator. > PhoneWorkerSubProcess.do_job(), instead of calling runjob() once, would call > something like run_next_test() repeatedly until, say, a StopIteration > exception is raised (standard iterator end signal). If a DMError bubbles up > to the SubProcess object, it tries to recover the phone and, if successful, > calls run_next_test() as usual. One beneficial side effect here is that it > makes it easier to retry a test. As the FIXME comment in do_job()'s DMError > handler says, we abort the test even if it were possible to recover the > phone and retry it. We should have a maximum number of attempts in this > case, though, so that it doesn't try running the same test forever. > > How does that sound? It's kind of a big-ish change, but I'm worried that > things will get more confused if we start passing big objects like > SubProcess back and forth. > I like this idea quite a lot. This makes the UnitTest 'unit of work' much more similar to a normal Autophone job. I think it might also lead to a way of suspending and resuming Autophone. I am worried though since we need to get the UnitTests up and reporting to brasstacks soon and I need to devote more time to the Android x86 testing asap. Without this patch, the UnitTests were prone to pushing phones over the edge and not recovering them so the remaining test suites could be run. Can we go with this patch temporarily with the caveat that I will rework this to follow the iterator idea as soon as I can make some Android x86 progress? > ::: tests/runtestsremote.py > @@ +35,5 @@ > > self.logger.debug('runtestsremote.py runjob start') > > + self.set_status(msg='runtestsremote.py runjob start') > > + > > + self.worker_subprocess = worker_subprocess > > + self.worker_subprocess.check_sdcard() > > This is a good idea, and it's a fast operation, so let's move it to the loop > in PhoneWorkerSubProcess.do_job(). > Cool.

Bob Clary [:bc] (inactive)

Assignee

Comment 7

•

12 years ago

Or I could just not do the subprocess passing and temporarily duplicate recover_phone in runtestsremote until we can move to the iterator idea. That would be quick and less disruptive to the other tests.

Mark Côté [:mcote]

Comment 8

•

12 years ago

Comment on attachment 715076 [details] [diff] [review] patch 2 v1 All right, as discussed, we can go with this in the short term. I will file a bug for the bigger change. Although in that case, I think my code comments above stand: in each iteration of the loop, you should probably check to see if the phone is disconnected, and abort if so. Otherwise, if I'm reading this right, you'll continue to try to execute tests even if the recovery was unsuccessful.

Attachment #715076 - Flags: review?(mcote) → review+

Mark Côté [:mcote]

Comment 9

•

12 years ago

Test-suite functionality split off into bug 842628.

Bob Clary [:bc] (inactive)

Assignee

Comment 10

•

12 years ago

https://github.com/mozilla/autophone/commit/651e4be3d8cad0edcc4bc57a7336e6b35fc58de7 https://github.com/mozilla/autophone/commit/0ec36fd66dfc9c37aacfaaa161c5a2285fd638c3

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Bob Clary [:bc] (inactive)

Assignee

Updated

•

12 years ago

Depends on: 848937

BMO Automation

Updated

•

4 years ago

Product: Testing → Testing Graveyard

patch 1 v1 12 years ago Bob Clary [:bc] (inactive) 950 bytes, patch	mcote : review+	Details \| Diff \| Splinter Review
patch 2 v1 12 years ago Bob Clary [:bc] (inactive) 8.36 KB, patch	mcote : review+	Details \| Diff \| Splinter Review

Bugzilla

Autophone - unittests - remove reftests, increase timeout, improve status updates and recovery

Categories

(Testing Graveyard :: Autophone, defect)

Tracking

(Not tracked)

People

(Reporter: bc, Assigned: bc)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Updated

Attachment

General

Description

File Name

Content Type