Closed Bug 403102 Opened 17 years ago Closed 17 years ago

talos needs to recognize browser crash

Categories

(Release Engineering :: General, defect)

All
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: anodelman, Assigned: anodelman)

References

Details

Attachments

(5 files, 1 obsolete file)

Currently, if the browser crashes talos waits on a very long timeout to realize that it hasn't got test results and then exits.  It should recognize browser bustage sooner and report a useful message.

This might also end up meaning buildbot work to ensure that it does proper cleanup after browser failure.
With this patch, in 60 second intervals during the running of a test, we check to see if the crashreporter process is present and if the browser is still alive.  If the crashreporter exists or the browser is gone we stop the test and close talos cleanly.

I tested this on windows and linux and it worked great.  It would probably be good to give it a quick check on the mac to ensure that the naming is consistent cross-platform.  Still, way better than just waiting for things to time out.
Assignee: nobody → anodelman
Status: NEW → ASSIGNED
Attachment #288059 - Flags: review?(rcampbell)
Attachment #288059 - Flags: review?(rcampbell) → review+
Checking in ttest.py;
/cvsroot/mozilla/testing/performance/talos/ttest.py,v  <--  ttest.py
new revision: 1.7; previous revision: 1.6
done
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Figured that this is a reasonable addition to this bug - since it would be useful to know when the browser crashes, ie, how long after the test starts.

This has the fringe benefit of giving us a better idea of how long each test takes to run, instead of using estimates.  I've also added the title as part of the output so that we can now see in the buildbot waterfall which slave is generating the report.
Attachment #288568 - Flags: review?(rcampbell)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 288568 [details] [diff] [review]
add logging of start/stop times for talos tests

I'd like to see the format string in a variable rather than copied around. Also, while I'm nitting, I'd only use the time formatting function for the barest of time conversions and do the string formatting outside of that function.
Attachment #288568 - Flags: review?(rcampbell) → review-
Addressed the nits in the previous patch - this is quite a bit cleaner than my earlier slapdash approach.

The review process works!
Attachment #288568 - Attachment is obsolete: true
Attachment #288768 - Flags: review?(rcampbell)
Since applying the original patch we haven't had any results reported from either mac trunk or branch - from what I can see it is always assuming that the browser has crashed once it has closed.  I think that it is hitting a bad time slice wherein the browser has completed the test and closed but we haven't collected the test results yet.  This patch attempts to fix that problem.

If this patch doesn't work we should consider backing out the original patch until I can figure out why it doesn't work on mac.
Attachment #288779 - Flags: review?(rcampbell)
Comment on attachment 288768 [details] [diff] [review]
add logging of start/stop times for talos tests / remove nits

lovely :)
Attachment #288768 - Flags: review?(rcampbell) → review+
Attachment #288779 - Flags: review?(rcampbell) → review+
Checking in run_tests.py;
/cvsroot/mozilla/testing/performance/talos/run_tests.py,v  <--  run_tests.py
new revision: 1.19; previous revision: 1.18
done
Checking in ttest.py;
/cvsroot/mozilla/testing/performance/talos/ttest.py,v  <--  ttest.py
new revision: 1.8; previous revision: 1.7
done
Checking in utils.py;
/cvsroot/mozilla/testing/performance/talos/utils.py,v  <--  utils.py
new revision: 1.4; previous revision: 1.3
done
Might as well keep around the previous bustage fix patch since it does fix a problem - just not the one that is causing the mac failures.  Turns out that there is a crashreporterd daemon that is confusing talos into thinking that firefox has crashed.  By restricting ps to only processes owned by a terminal we exclude this daemon and things run correctly.  My tests worked fine, but these were run on the command line by me.  My hope is that running through buildbot still counts as being part of a terminal and the fix will still work.
Attachment #288799 - Flags: review?(rcampbell)
Comment on attachment 288799 [details] [diff] [review]
bustage fix for mac

do you even need the -a switch? everything should be running under the buildbot user, I think.
Attachment #288799 - Flags: review?(rcampbell) → review+
Checking in ffprocess_mac.py;
/cvsroot/mozilla/testing/performance/talos/ffprocess_mac.py,v  <--  ffprocess_mac.py
new revision: 1.4; previous revision: 1.3
done
So, I don't like this fix.  Turns out that python has non-threadsafe popen behavior (http://www.gossamer-threads.com/lists/python/bugs/593800) and we are hitting this due to interaction between buildbot and talos (or so we think).  I don't know why this is only showing up on the mac machines, but it is so I'm only putting this in the mac code.

If anybody has any idea how to protect this code without having to mess with python module internals that would be preferable.
Attachment #288911 - Flags: review?(rcampbell)
Comment on attachment 288911 [details] [diff] [review]
possible fix for non-threadsafe popen behavior

we'll try it. At least we're waiting on each of the popen objects so they should be allowed to terminate. Not sure what'll happen if we try to kill one.
Attachment #288911 - Flags: review?(rcampbell) → review+
I'm going to mark this fixed since the mac boxes started behaving again after the hacky fix was applied.  We can reopen when another option to the hacky fix presents itself.
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
Mass move of Core:Testing bugs to mozilla.org:Release Engineering:Talos. Filter on RelEngTalosMassMove to ignore.
Component: Testing → Release Engineering: Talos
Product: Core → mozilla.org
QA Contact: testing → release
Version: unspecified → other
Component: Release Engineering: Talos → Release Engineering
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: