There are many instances where a specific slave will run into hardware or OS configuration issues and fail more frequently than its peers. This is easy to detect when looking back at history because it will show an abnormal amount of failures. It would be nice to have another column in the slaves table that would show the max % of acceptable failures. This should be done by looking at the machine name 't-w864-ix-093', and you can determine the type by stripping off the -[0-9]+ at the end so it would be 't-w864-ix'. There are many families of machines and by looking at the total number of jobs and the total number of failures (excluding retries), we could determine a % failure for that given platform.
But slaves table contains data related to slaves, not platforms. How acceptable failure rate should be displayed in this case? I see only one solution - calculate it for given platform and then show it for every machine of this type. Is that correct? What is the expected order for this column? After "Passes" and before "Total"?
Few questions: 1. How do I calculate failure rate for platform? PFR = all failures (on all slaves of given platform) / all runs (failures + retries + passes on all slaves for given platform)? or all failures / (all runs - retries)? 2. What is infra? This column is not present at this moment, should I add it too? 3. You didn't mention how to mitigate point 2 in your reply (abnormal high failure rates for several slaves that spoil overall statistics for PFR calculation).
1) PFR = (sum of all failures on all slaves for a given platform) / (all runs on all slaves for the given platform). I am not sure if we should exclude retries or not. Retries usually indicate a failure and it helps point out problems, but if it is a problem with a build/test/harness, then we will retry a lot and rack up failures on a lot of machines. For now lets exclude them, bonus points to toggle that ;) 2) Infra is infrastructure related failures. Specifically things like DNS failures, power outages, etc. These are rare enough, but sometimes include hardware failures on the specific machine in test. these should be denoted as a different failure type. 3) I don't have a solution to mitigate high failure rates propping up the overall statistics. For now we can live with it, although I am open to more suggestions. Thanks for making sure you understand this bug and do the right thing. Looking forward to your patch.
1) That should be another checkbox to toggle that? 2) How can I recognize such failures? I looked into values stored in testtype, result, buildtype in database and found nothing similar to infra failures. 3) Perhaps, I can dig into statistics, but that was long time ago since I studied it in university :)
ok, I can find all the colors here: http://188.8.131.52/data/results?platform=android4.0 test failure: orange infra: red retry: blue passing: green If making a checkbox to include/exclude retries is doable, I vote for that. Let me know if that helps at all.
OK, I added infra results. Now failure rate is calculated as (num of fails * 100) / (num of fails + num of infra + num of passes). Server side is ready for calculating failure rate including retries as (num of fails * 100) / (num of fails + num of retries + num of infra + num of passes). I can submit patch for that right now. I need a bit more time to add checkbox for 'including retries' in failure rate calculations. Could we move sorting into separate issue?
Lets do the sortable tables in a different bug. I have filed bug 919960 to track that work.
Created attachment 811673 [details] [diff] [review] 0001-show-failure-rates-switch-between-failure-rates.patch
Attachment #811673 - Flags: review?(dminor)
Comment on attachment 811673 [details] [diff] [review] 0001-show-failure-rates-switch-between-failure-rates.patch Review of attachment 811673 [details] [diff] [review]: ----------------------------------------------------------------- The changes look good, but unfortunately the patch you attached does not apply cleanly to the latest ouija changes from github and needs to be rebased. In case you haven't done this before, merging from the github ouija master to your local master and then running 'git rebase master' from your local branch is probably the easiest way to do this. Once the patch is updated, I'll be happy to take another look at it. Thanks!
Attachment #811673 - Flags: review?(dminor) → review-
Created attachment 812245 [details] [diff] [review] resolved merge conflict Thanks, Dan! I resolved merge conflict.
Comment on attachment 812245 [details] [diff] [review] resolved merge conflict Review of attachment 812245 [details] [diff] [review]: ----------------------------------------------------------------- Great work, thanks!
Attachment #812245 - Flags: review?(dminor) → review+
Committed here: https://github.com/dminor/ouija/commit/b0889c6390f92eb53f8b2b8aeb1f175e54885be7 and in production.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.