Closed Bug 488298 Opened 16 years ago Closed 14 years ago

Talos should indicate more clearly when it forcibly kills the browser

Categories

(Testing :: Talos, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ted, Assigned: anodelman)

References

Details

(Whiteboard: [talos])

Attachments

(1 file, 2 obsolete files)

There's some confusion in bug 486580 about some Talos runs where the browser appears to have crashed, but there's no stack trace. Specifically, we see output like this: ../Minefield.app/Contents/MacOS/run-mozilla.sh: line 399: 958 Terminated "$prog" ${1+"$@"} Failed tp: Stopped Thu, 02 Apr 2009 05:34:54 FAIL: Busted: tp FAIL: browser crash I *think* this means that Talos believed the browser was hung and forcibly killed it, but it's not very clear. It would be nice if Talos explicitly noted that it killed the browser in that situation. Replacing "FAIL: browser crash" with "FAIL: browser was hung, killed" or something like that would probably clear up some confusion here. If we do hit that situation a lot, we could then look at fixing bug 483968.
Assignee: nobody → anodelman
Assignee: anodelman → nobody
Component: Release Engineering: Talos → Release Engineering
Assignee: nobody → anodelman
Assignee: anodelman → nobody
Assignee: nobody → anodelman
Moving to future.
Assignee: anodelman → nobody
Component: Release Engineering → Release Engineering: Future
Mass move of bugs from Release Engineering:Future -> Release Engineering. See http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Whiteboard: [talos]
From comment#0, I *think* this is about changing how the talos suite detects a hung browser and reports killing it. afaict, this is not about how RelEng invokes the overall talos suite. ] Moving this bug to Testing:General for suite development, but please reassign this back if I misunderstood.
Component: Release Engineering → General
Product: mozilla.org → Testing
QA Contact: release → general
Version: other → unspecified
This is about how the Talos code itself works. Do those bugs no longer belong in the Releng component?
Moving to Talos component.
Component: General → Talos
This looks like a minor fix, assigning to myself.
Assignee: nobody → anodelman
Any chance of fixing this one soon, Alice?
This totally fell off my radar. Let me take another look.
Did initial testing on fedora only, works fine. Indicates which processes were killed and with what signal, then prints stack. This should clear up the confusion. Actual check-in/deployment will be blocked on a more complete test run.
Attachment #544618 - Flags: review?(jmaher)
Comment on attachment 544618 [details] [diff] [review] indicate when talos kills processes (take 1) Review of attachment 544618 [details] [diff] [review]: ----------------------------------------------------------------- overall this is good as it cleans up some stuff and simplifies the flow. I would like to clean up some of the code related to the remote process before r+. ::: bcontroller.py @@ +154,5 @@ > self.bwaiter = BrowserWaiter(self.command, self.log, self.timeout, self.mod, self.ffprocess) > noise = 0 > prev_size = 0 > while not self.bwaiter.hasTime(): > if noise > self.timeout: # check for frozen browser so if we remove ffprocess.cleanupProcesses from bcontroller, we only use ffprocess for the remote stuff. Can we clean up the code to pass in host to BrowserWaiter (instead of self.ffprocess) and handle the setup and managing of the remote device inside there? ::: ffprocess.py @@ -104,5 @@ > - > - # Timed out. > - self.TerminateAllProcesses(browser_wait, process_name) > - return (None, True) > - why is this removed, I assumed it was because we don't use it anymore? ::: ffprocess_remote.py @@ +132,5 @@ > try: > self.testAgent.killProcess(process_name) > + if result: > + result = result + ', ' > + result = result + process_name + ': terminated by testAgent.killProcess' we never set result here. killProcess returns None|String, so if we return a value it might be similar to this "Successfully killed 12906 org.mozilla.fennec".
Attachment #544618 - Flags: review?(jmaher) → review-
::: ffprocess.py @@ -104,5 @@ > - > - # Timed out. > - self.TerminateAllProcesses(browser_wait, process_name) > - return (None, True) > - why is this removed, I assumed it was because we don't use it anymore? This was removed as the cleanup is now handled in ttest.py - I wanted to better control who was killing the browser so that I could bubble up the correct message to the user. If the browser is terminated here, then I won't know in ttest.py if the browser crashed or froze. ::: ffprocess_remote.py @@ +132,5 @@ > try: > self.testAgent.killProcess(process_name) > + if result: > + result = result + ', ' > + result = result + process_name + ': terminated by testAgent.killProcess' we never set result here. killProcess returns None|String, so if we return a value it might be similar to this "Successfully killed 12906 org.mozilla.fennec". The way I set this up is to generate a simple generic "we killed something" - since there is no return value from killProcess. I just assume that killProcess was successful; and, even if it wasn't, having the message that we attempted to kill a process can also be useful. This code chunk also brings the behavior in line with the rest of the TerminateAllProcesses procedures.
this makes sense. What about cleaning up bcontroller.py now that we have only one dependency on ffprocess (with this patch)?
This patch includes cleanup of bcontroller. Only remaining reference to ffprocess is for the remote device case. Also removed a bunch of extraneous imports.
Attachment #544595 - Attachment is obsolete: true
Attachment #544618 - Attachment is obsolete: true
Attachment #545463 - Flags: review?(jmaher)
Comment on attachment 545463 [details] [diff] [review] [checked in]indicate when talos kills processes (take 2) Review of attachment 545463 [details] [diff] [review]: ----------------------------------------------------------------- looks good, is this tested in staging and on android?
Attachment #545463 - Flags: review?(jmaher) → review+
This is just waiting on availability of the ateam staging environment.
My staging run is going green across the board. Just waiting for confirmation that it is okay on mobile and then it is good to land.
jmaher says green on mobile. This is ready for check in and deployment.
Comment on attachment 545463 [details] [diff] [review] [checked in]indicate when talos kills processes (take 2) changeset: 241:8ae066ed37c8
Attachment #545463 - Attachment description: indicate when talos kills processes (take 2) → [checked in]indicate when talos kills processes (take 2)
Depends on: 672192
Thanks Alice! Can you also reply to bug 486580 comment 8 noting how forced kills can be detected?
This is now deployed. All done here.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: