Closed
Bug 403102
Opened 16 years ago
Closed 16 years ago
talos needs to recognize browser crash
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: anodelman, Assigned: anodelman)
References
Details
Attachments
(5 files, 1 obsolete file)
1.25 KB,
patch
|
rcampbell
:
review+
|
Details | Diff | Splinter Review |
3.00 KB,
patch
|
rcampbell
:
review+
|
Details | Diff | Splinter Review |
2.17 KB,
patch
|
rcampbell
:
review+
|
Details | Diff | Splinter Review |
886 bytes,
patch
|
rcampbell
:
review+
|
Details | Diff | Splinter Review |
1.11 KB,
patch
|
rcampbell
:
review+
|
Details | Diff | Splinter Review |
Currently, if the browser crashes talos waits on a very long timeout to realize that it hasn't got test results and then exits. It should recognize browser bustage sooner and report a useful message. This might also end up meaning buildbot work to ensure that it does proper cleanup after browser failure.
Assignee | ||
Comment 1•16 years ago
|
||
With this patch, in 60 second intervals during the running of a test, we check to see if the crashreporter process is present and if the browser is still alive. If the crashreporter exists or the browser is gone we stop the test and close talos cleanly. I tested this on windows and linux and it worked great. It would probably be good to give it a quick check on the mac to ensure that the naming is consistent cross-platform. Still, way better than just waiting for things to time out.
Updated•16 years ago
|
Attachment #288059 -
Flags: review?(rcampbell) → review+
Assignee | ||
Comment 2•16 years ago
|
||
Checking in ttest.py; /cvsroot/mozilla/testing/performance/talos/ttest.py,v <-- ttest.py new revision: 1.7; previous revision: 1.6 done
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 3•16 years ago
|
||
Figured that this is a reasonable addition to this bug - since it would be useful to know when the browser crashes, ie, how long after the test starts. This has the fringe benefit of giving us a better idea of how long each test takes to run, instead of using estimates. I've also added the title as part of the output so that we can now see in the buildbot waterfall which slave is generating the report.
Attachment #288568 -
Flags: review?(rcampbell)
Assignee | ||
Updated•16 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 4•16 years ago
|
||
Comment on attachment 288568 [details] [diff] [review] add logging of start/stop times for talos tests I'd like to see the format string in a variable rather than copied around. Also, while I'm nitting, I'd only use the time formatting function for the barest of time conversions and do the string formatting outside of that function.
Attachment #288568 -
Flags: review?(rcampbell) → review-
Assignee | ||
Comment 5•16 years ago
|
||
Addressed the nits in the previous patch - this is quite a bit cleaner than my earlier slapdash approach. The review process works!
Attachment #288568 -
Attachment is obsolete: true
Attachment #288768 -
Flags: review?(rcampbell)
Assignee | ||
Comment 6•16 years ago
|
||
Since applying the original patch we haven't had any results reported from either mac trunk or branch - from what I can see it is always assuming that the browser has crashed once it has closed. I think that it is hitting a bad time slice wherein the browser has completed the test and closed but we haven't collected the test results yet. This patch attempts to fix that problem. If this patch doesn't work we should consider backing out the original patch until I can figure out why it doesn't work on mac.
Attachment #288779 -
Flags: review?(rcampbell)
Comment 7•16 years ago
|
||
Comment on attachment 288768 [details] [diff] [review] add logging of start/stop times for talos tests / remove nits lovely :)
Attachment #288768 -
Flags: review?(rcampbell) → review+
Updated•16 years ago
|
Attachment #288779 -
Flags: review?(rcampbell) → review+
Assignee | ||
Comment 8•16 years ago
|
||
Checking in run_tests.py; /cvsroot/mozilla/testing/performance/talos/run_tests.py,v <-- run_tests.py new revision: 1.19; previous revision: 1.18 done Checking in ttest.py; /cvsroot/mozilla/testing/performance/talos/ttest.py,v <-- ttest.py new revision: 1.8; previous revision: 1.7 done Checking in utils.py; /cvsroot/mozilla/testing/performance/talos/utils.py,v <-- utils.py new revision: 1.4; previous revision: 1.3 done
Assignee | ||
Comment 9•16 years ago
|
||
Might as well keep around the previous bustage fix patch since it does fix a problem - just not the one that is causing the mac failures. Turns out that there is a crashreporterd daemon that is confusing talos into thinking that firefox has crashed. By restricting ps to only processes owned by a terminal we exclude this daemon and things run correctly. My tests worked fine, but these were run on the command line by me. My hope is that running through buildbot still counts as being part of a terminal and the fix will still work.
Attachment #288799 -
Flags: review?(rcampbell)
Comment 10•16 years ago
|
||
Comment on attachment 288799 [details] [diff] [review] bustage fix for mac do you even need the -a switch? everything should be running under the buildbot user, I think.
Attachment #288799 -
Flags: review?(rcampbell) → review+
Assignee | ||
Comment 11•16 years ago
|
||
Checking in ffprocess_mac.py; /cvsroot/mozilla/testing/performance/talos/ffprocess_mac.py,v <-- ffprocess_mac.py new revision: 1.4; previous revision: 1.3 done
Assignee | ||
Comment 12•16 years ago
|
||
So, I don't like this fix. Turns out that python has non-threadsafe popen behavior (http://www.gossamer-threads.com/lists/python/bugs/593800) and we are hitting this due to interaction between buildbot and talos (or so we think). I don't know why this is only showing up on the mac machines, but it is so I'm only putting this in the mac code. If anybody has any idea how to protect this code without having to mess with python module internals that would be preferable.
Attachment #288911 -
Flags: review?(rcampbell)
Comment 13•16 years ago
|
||
Comment on attachment 288911 [details] [diff] [review] possible fix for non-threadsafe popen behavior we'll try it. At least we're waiting on each of the popen objects so they should be allowed to terminate. Not sure what'll happen if we try to kill one.
Attachment #288911 -
Flags: review?(rcampbell) → review+
Assignee | ||
Comment 14•16 years ago
|
||
I'm going to mark this fixed since the mac boxes started behaving again after the hacky fix was applied. We can reopen when another option to the hacky fix presents itself.
Status: REOPENED → RESOLVED
Closed: 16 years ago → 16 years ago
Resolution: --- → FIXED
Comment 15•15 years ago
|
||
Mass move of Core:Testing bugs to mozilla.org:Release Engineering:Talos. Filter on RelEngTalosMassMove to ignore.
Component: Testing → Release Engineering: Talos
Product: Core → mozilla.org
QA Contact: testing → release
Version: unspecified → other
Updated•10 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•