Closed Bug 797996 Opened 12 years ago Closed 12 years ago

Talos needs to be updated for changes in devicemanager

Categories

(Testing :: Talos, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wlach, Assigned: wlach)

References

Details

Attachments

(1 file, 1 obsolete file)

Talos also needs to be updated for the changes made in bug 795456 to throw exceptions in case of error.
Attached patch Adapt to changes in mozdevice (obsolete) — Splinter Review
mozdevice now throws exceptions, and we need to handle them. At the same
time, this patch cleans up things slightly so we print out the tbpl-friendly
"Remote Device Error" in the case of a fatal device exception (per edmorley's suggestion).
Attachment #668114 - Flags: review?(jmaher)
Comment on attachment 668114 [details] [diff] [review]
Adapt to changes in mozdevice

LGTM from a TBPL point of view :-)

(Just make sure to keep an eye on this the first time you send to Try, in case a typo or something causes a perma-"Remote Device Error:", in which case the Try job will RETRY indefinitely until cancelled via self-serve).
Comment on attachment 668114 [details] [diff] [review]
Adapt to changes in mozdevice

Review of attachment 668114 [details] [diff] [review]:
-----------------------------------------------------------------

I haven't touched this code in a while, so this is my initial impression.  If try server is green, I could be fine ignoring a lot of these items.

::: talos/ffprocess_remote.py
@@ +68,5 @@
> +        try:
> +            return self.testAgent.getProcessList()
> +        except mozdevice.DMError:
> +            print "Remote Device Error: Error getting list of processes on remote device"
> +            raise

this could be wrong as we are querying before the test is actually running.

@@ +138,5 @@
>                      result = result + ', '
>                  result = result + process_name + ': terminated by testAgent.killProcess'
> +            except mozdevice.DMError:
> +                print "Remote Device Error: Error while killing process '%s'" % process_name
> +                raise

This could fail out as we cleanup a lot as a safety check.

@@ +166,5 @@
> +        try:
> +            return self.testAgent.pullFile(handle)
> +        except mozdevice.DMError:
> +            print "Remote Device Error: Error pulling file %s from remote device" % handle
> +            raise

we expect an error here and return '' in that case.

::: talos/ttest.py
@@ +173,5 @@
> +            try:
> +                self._ffprocess.testAgent.getDirectory(profile_dir + '/minidumps/', minidumpdir)
> +            except mozdevice.DMError:
> +                print "Remote Device Error: Error getting crash minidumps from device"
> +                raise

hmm, this could be bad as there might not be any files there.
Attachment #668114 - Flags: review?(jmaher) → review+
Yup, some of these are real issues.

(In reply to Joel Maher (:jmaher) from comment #4)
> Comment on attachment 668114 [details] [diff] [review]
> Adapt to changes in mozdevice
> 
> Review of attachment 668114 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I haven't touched this code in a while, so this is my initial impression. 
> If try server is green, I could be fine ignoring a lot of these items.
> 
> ::: talos/ffprocess_remote.py
> @@ +68,5 @@
> > +        try:
> > +            return self.testAgent.getProcessList()
> > +        except mozdevice.DMError:
> > +            print "Remote Device Error: Error getting list of processes on remote device"
> > +            raise
> 
> this could be wrong as we are querying before the test is actually running.

I don't think that should cause any failures. We're just checking the list of processes here, which should always work so long as the device is online (and it should be, if we get this far).

> @@ +138,5 @@
> >                      result = result + ', '
> >                  result = result + process_name + ': terminated by testAgent.killProcess'
> > +            except mozdevice.DMError:
> > +                print "Remote Device Error: Error while killing process '%s'" % process_name
> > +                raise
> 
> This could fail out as we cleanup a lot as a safety check.

We silently ignore processes that don't exist with killProcess now.

> @@ +166,5 @@
> > +        try:
> > +            return self.testAgent.pullFile(handle)
> > +        except mozdevice.DMError:
> > +            print "Remote Device Error: Error pulling file %s from remote device" % handle
> > +            raise
> 
> we expect an error here and return '' in that case.

Ok, updated to check for file's existence and return '' if not there yet.

> ::: talos/ttest.py
> @@ +173,5 @@
> > +            try:
> > +                self._ffprocess.testAgent.getDirectory(profile_dir + '/minidumps/', minidumpdir)
> > +            except mozdevice.DMError:
> > +                print "Remote Device Error: Error getting crash minidumps from device"
> > +                raise
> 
> hmm, this could be bad as there might not be any files there.

Ok, I updated the patch to see if the directory exists first before trying to get it.
Attached patch Updated patchSplinter Review
This addresses some of the earlier feedback as well as fixing some errors that were discovered from running what we have against try.
Attachment #668114 - Attachment is obsolete: true
Attachment #668622 - Flags: review?(jmaher)
After retrying some jobs a few times, I finally saw green with the above patch. Doing one last try run to make sure that this one's good:

https://tbpl.mozilla.org/?tree=Try&rev=08c04f437e32
Comment on attachment 668622 [details] [diff] [review]
Updated patch

Review of attachment 668622 [details] [diff] [review]:
-----------------------------------------------------------------

nothing of major concern, just seeing this code makes me realize how fragile talos still is.

::: talos/ffprocess_remote.py
@@ +176,5 @@
> +        # reading
> +        if local_filename:
> +            f = open(local_filename, 'w')
> +            f.write(data)
> +            f.close()

this is much cleaner!

::: talos/ttest.py
@@ +172,5 @@
>              minidumpdir = tempfile.mkdtemp()
> +            try:
> +                remoteminidumpdir = profile_dir + '/minidumps/'
> +                if self._ffprocess.testAgent.dirExists(remoteminidumpdir):
> +                    self._ffprocess.testAgent.getDirectory(remoteminidumpdir, minidumpdir)

for some reason I thoguht we created the minidumpdir even if there was no minidump.  This could be problematic, but if it is green for all remote talos, I am fine with this.

@@ +224,5 @@
> +            self._ffprocess.testAgent.pushFile("robotium.config", os.path.join(deviceRoot, "robotium.config"))
> +            self._ffprocess.testAgent.pushFile(browser_config['fennecIDs'], os.path.join(deviceRoot, "fennec_ids.txt"))
> +        except mozdevice.DMError:
> +            print "Remote Device Error: Error copying files for robocop setup"
> +            raise

ironically we hit this error a lot, so I expect the Sheriff to have a new orange bug with this message in it:)
Attachment #668622 - Flags: review?(jmaher) → review+
What is the status here?  We are 4+ days without being able to use our original process for talos development/testing/deployment.  If we cannot get this resolved by mid day tomorrow, we will need to back out the mozdevice changes and start landing our other changes on talos.
(In reply to Joel Maher (:jmaher) from comment #9)
> What is the status here?  We are 4+ days without being able to use our
> original process for talos development/testing/deployment.  If we cannot get
> this resolved by mid day tomorrow, we will need to back out the mozdevice
> changes and start landing our other changes on talos.

We are close, but it looks like there are some minor issues with running this on desktop which you pointed out. Until those are fixed, I propose a partial fix to bug 765365 to keep us going.
Have an updated patch which I am running through try: https://tbpl.mozilla.org/?tree=Try&rev=c62611067253
Okay, there were some minor problems even with that, but I finally got a patch working 100% ok with try and pushed it:

http://hg.mozilla.org/build/talos/rev/936ef281552d

Try run:

https://tbpl.mozilla.org/?tree=Try&rev=d4425bce8b09

(there are some reds but that was due to a seperate datazilla issue which has since been fixed)
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Depends on: 801633
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: