Expanding BuildAPI to provide status on job completion and test results

RESOLVED INCOMPLETE

Status

Release Engineering
General
RESOLVED INCOMPLETE
5 years ago
10 months ago

People

(Reporter: zeller, Assigned: zeller)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(5 attachments, 6 obsolete attachments)

(Assignee)

Description

5 years ago
This bug serves as a place to discuss the expansion of BuildAPI to include the new endpoint at /self-serve/{branch}/rev/{revision}/is_done.  There is also an optional parameter of stableDelay to adjust the gap between the last endtime of a build/test and now (ie must have been 180 seconds since last endtime to be declared a finished job.... this prevents false positives in the case of retriggers and the like).

This REST URL provides a JSON response including the variables stableDelay (defaults to 180), job_complete and job_passed. stableDelay is an int, and job_complete/job_passed are both booleans.
(Assignee)

Comment 1

5 years ago
Created attachment 784162 [details] [diff] [review]
bug900318.patch

Here is the patch diff of all the necessary changes. I cherry picked this diff by removing stuff that modified the .hgignore and setup.cfg files. Not that it's important, but I figured I would mention it in case you noticed the difference when looking at my personal hg buildapi repo.
Attachment #784162 - Flags: feedback?(rail)
Attachment #784162 - Flags: feedback?(bhearsum)
Comment on attachment 784162 [details] [diff] [review]
bug900318.patch

Review of attachment 784162 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm with some minor readability changes:

::: buildapi/controllers/selfserve.py
@@ +222,5 @@
> +
> +        if branch not in self._branches_cache:
> +            return self._failed("Branch %s not found" % branch, 404)
> +
> +        retval = g.buildapi_cache.get_builds_for_revision(branch, revision)

Can you use job_items instead of retval here?

::: buildapi/lib/helpers.py
@@ +182,5 @@
> +    for index, item in enumerate(job_items):
> +        if (item['endtime'] == None) or ((now - item['endtime']) < stableDelay):
> +            break # Job is not done
> +        if index == (length - 1):
> +            job_info['job_complete'] = True

I'd use something like

unfinished_jobs = [j for j in job_items if not j['endtime'] or now - j['endtime'] < stableDelay]
if not unfinished_jobs:
    job_info['job_complete'] = True
    job_info['job_passed'] = all([j for j in job_items if j['status'] == 0])
Attachment #784162 - Flags: feedback?(rail) → feedback+
>    job_info['job_passed'] = all([j for j in job_items if j['status'] == 0])

failed_jobs = [j for j in job_items if j['status'] != 0]
job_info['job_passed'] = len(failed_jobs) == 0
(In reply to Rail Aliiev [:rail] from comment #3)
> >    job_info['job_passed'] = all([j for j in job_items if j['status'] == 0])
> 
> failed_jobs = [j for j in job_items if j['status'] != 0]
> job_info['job_passed'] = len(failed_jobs) == 0

Without reading all the patch (so admitting some naivety) if status = 5 is "RETRY", do we want to wait for retried jobs instead of marking them as failed and the overall push as done?
(Assignee)

Comment 5

5 years ago
(In reply to Justin Wood (:Callek) from comment #4)
> (In reply to Rail Aliiev [:rail] from comment #3)
> > >    job_info['job_passed'] = all([j for j in job_items if j['status'] == 0])
> > 
> > failed_jobs = [j for j in job_items if j['status'] != 0]
> > job_info['job_passed'] = len(failed_jobs) == 0
> 
> Without reading all the patch (so admitting some naivety) if status = 5 is
> "RETRY", do we want to wait for retried jobs instead of marking them as
> failed and the overall push as done?

We decided that retries were out of scope for this phase since a retry is typically indicative of manual intervention. Therefore, handling it would be somewhat unnecessary at this point.
(In reply to Justin Wood (:Callek) from comment #4)
> Without reading all the patch (so admitting some naivety) if status = 5 is
> "RETRY"

Is '5' an automated retry (as distinct from a manually initiated rebuild)?

(In reply to John Zeller [:zeller] from comment #5)
> We decided that retries were out of scope for this phase since a retry is
> typically indicative of manual intervention.

correct for manually initiated retries. However, if this status is an automated retry, then I think we _do_ want to treat those as non-failures. The stable delay timeout should have provided time for the automated retry to be scheduled, so we'd only be in this part of the code if that retry had _also_ completed. We would want to consider that status.
Flags: needinfo?(bugspam.Callek)
(In reply to Hal Wine [:hwine] from comment #6)
> (In reply to Justin Wood (:Callek) from comment #4)
> > Without reading all the patch (so admitting some naivety) if status = 5 is
> > "RETRY"
> 
> Is '5' an automated retry (as distinct from a manually initiated rebuild)?

Correct pressing the rebuild button just schedules a new build, the buildbot status '5' is buildbot saying, "your configs say this failed, but in a way that is likely to recover if we build again, lets do so!"

To put this on a brief state-of-procedure for us:

* Person X pushes a job
* Job completes sendchanges for tests.
* Test Y fails cloning tools. We catch this failure as 'slave specific' and set the buildbot status code to 5 (RETRY)
* Buildbot sees that status and queues another test run of Y
* Test Y then gets handed to a new slave, which succeeds

and variations on the above. We can also have retries that occur further along in a job run than right-at-the-start, or even on the compile parts.

> (In reply to John Zeller [:zeller] from comment #5)
> > We decided that retries were out of scope for this phase since a retry is
> > typically indicative of manual intervention.
> 
> correct for manually initiated retries. However, if this status is an
> automated retry, then I think we _do_ want to treat those as non-failures.
> The stable delay timeout should have provided time for the automated retry
> to be scheduled, so we'd only be in this part of the code if that retry had
> _also_ completed. We would want to consider that status.

What I don't want to conflate with my question here is tradeoffs on caring about this RETRY state on the first pass, and my thoughts on what we *should* care about. I'm not invested or deeply caring about this code at this time, nor have a read a single line of what you're doing here yet. So I'll leave tradeoff discussions to you guys.
Flags: needinfo?(bugspam.Callek)
(Assignee)

Comment 8

5 years ago
(In reply to Justin Wood (:Callek) from comment #7)
> (In reply to Hal Wine [:hwine] from comment #6)
> > (In reply to Justin Wood (:Callek) from comment #4)
> > > Without reading all the patch (so admitting some naivety) if status = 5 is
> > > "RETRY"
> > 
> > Is '5' an automated retry (as distinct from a manually initiated rebuild)?
> 
> Correct pressing the rebuild button just schedules a new build, the buildbot
> status '5' is buildbot saying, "your configs say this failed, but in a way
> that is likely to recover if we build again, lets do so!"
> 
> To put this on a brief state-of-procedure for us:
> 
> * Person X pushes a job
> * Job completes sendchanges for tests.
> * Test Y fails cloning tools. We catch this failure as 'slave specific' and
> set the buildbot status code to 5 (RETRY)
> * Buildbot sees that status and queues another test run of Y
> * Test Y then gets handed to a new slave, which succeeds
> 
> and variations on the above. We can also have retries that occur further
> along in a job run than right-at-the-start, or even on the compile parts.
> 
> > (In reply to John Zeller [:zeller] from comment #5)
> > > We decided that retries were out of scope for this phase since a retry is
> > > typically indicative of manual intervention.
> > 
> > correct for manually initiated retries. However, if this status is an
> > automated retry, then I think we _do_ want to treat those as non-failures.
> > The stable delay timeout should have provided time for the automated retry
> > to be scheduled, so we'd only be in this part of the code if that retry had
> > _also_ completed. We would want to consider that status.
> 
> What I don't want to conflate with my question here is tradeoffs on caring
> about this RETRY state on the first pass, and my thoughts on what we
> *should* care about. I'm not invested or deeply caring about this code at
> this time, nor have a read a single line of what you're doing here yet. So
> I'll leave tradeoff discussions to you guys.

So then when buildbot triggers a rebuild, does the previous build stay intact? What I mean is, once the rebuild finishes and succeeds, are there now 2 builds present on that job that are identical in revision and buildername, but one has status 0 and one has status 5?
(In reply to John Zeller [:zeller] from comment #8)
> So then when buildbot triggers a rebuild, does the previous build stay
> intact? What I mean is, once the rebuild finishes and succeeds, are there
> now 2 builds present on that job that are identical in revision and
> buildername, but one has status 0 and one has status 5?

One definitely has status 5 (its the one that would have finished earlier than the next one started), and the other one will have status "whatever the result was". It's not unheard of to have multiple status 5's in a row as well (if its a transient network issue we detect, for example)

Just being clear that the second jobs status is not guaranteed to be 0, but it _will_ be a different job, with an additional result.
(Assignee)

Comment 10

5 years ago
(In reply to Justin Wood (:Callek) from comment #9)
> (In reply to John Zeller [:zeller] from comment #8)
> > So then when buildbot triggers a rebuild, does the previous build stay
> > intact? What I mean is, once the rebuild finishes and succeeds, are there
> > now 2 builds present on that job that are identical in revision and
> > buildername, but one has status 0 and one has status 5?
> 
> One definitely has status 5 (its the one that would have finished earlier
> than the next one started), and the other one will have status "whatever the
> result was". It's not unheard of to have multiple status 5's in a row as
> well (if its a transient network issue we detect, for example)
> 
> Just being clear that the second jobs status is not guaranteed to be 0, but
> it _will_ be a different job, with an additional result.

Okay, thanks for the clarification. In that case, I think it makes sense to treat status 5 as a pass, given that if it is status 5, then there will be a retry, and that retry will determine the true result (0, 1, or 2). If it's another status 5, then the same thing will happen.

Thanks for this nugget of info! That will definitely make this better code.

Out of curiosity, what *are* all the status codes?
(In reply to John Zeller [:zeller] from comment #10)
> Out of curiosity, what *are* all the status codes?

While not where buildbot defines them directly, since you're working in buildapi I'll show you this commented-hunk:

http://mxr.mozilla.org/build/source/buildapi/buildapi/public/build-status.css
(In reply to Justin Wood (:Callek) from comment #11)
> (In reply to John Zeller [:zeller] from comment #10)
> > Out of curiosity, what *are* all the status codes?
> 
> While not where buildbot defines them directly, since you're working in
> buildapi I'll show you this commented-hunk:
> 
> http://mxr.mozilla.org/build/source/buildapi/buildapi/public/build-status.css

And right after I posted that comment I found the one I was looking for:

http://mxr.mozilla.org/build/source/buildbot/master/buildbot/status/builder.py?force=1#25
(Assignee)

Comment 13

5 years ago
(In reply to Justin Wood (:Callek) from comment #12)
> (In reply to Justin Wood (:Callek) from comment #11)
> > (In reply to John Zeller [:zeller] from comment #10)
> > > Out of curiosity, what *are* all the status codes?
> > 
> > While not where buildbot defines them directly, since you're working in
> > buildapi I'll show you this commented-hunk:
> > 
> > http://mxr.mozilla.org/build/source/buildapi/buildapi/public/build-status.css
> 
> And right after I posted that comment I found the one I was looking for:
> 
> http://mxr.mozilla.org/build/source/buildbot/master/buildbot/status/builder.
> py?force=1#25

aha! I will make sure to take these into account. It was a mistake on my part to simply overlook these other status values. Thanks Callek! :)
(Assignee)

Comment 14

5 years ago
I am verifying that I handle all status code correctly. The possible codes are:
0 - SUCCESS - Green
1 - WARNINGS - Orange
2 - FAILURE - Red, or simply not finished
3 - SKIPPED
4 - EXCEPTION
5 - RETRY
6 - CANCELLED

Currently I am treating 0 and 5 as SUCCESS and 1, 2, 3, 5 and 6 as FAILURE.
Should I be handling 3, 4 or 6 any differently? It is unclear to me, but I have a hunch that perhaps 3 will need to be a SUCCESS status as well in this instance.
(Assignee)

Comment 15

5 years ago
Created attachment 785210 [details] [diff] [review]
bug900318.patch

Rail, I added your suggestions for readability and shortening the length of the patch.

Callek, I added handling of status = 5.
Other things I added:
- I am also now handling a KeyError exception for when a job is still pending. In such a condition, the dictionary is missing certain keys including 'endtime'. In this condition, a KeyError can be handled as job_complete = False, since a job that is either running or finished will always have the 'endtime' key inserted into the dictionary.
- I also adjusted the formatting to follow closer to pep8 guidelines for maximum line length, which is 79.

Things I still need to know:
- Based on comment 13 I am curious if indeed status 3 can be handled as a SUCCESS and status' 4 and 6 can be handled as a FAILURE.
Attachment #784162 - Attachment is obsolete: true
Attachment #784162 - Flags: feedback?(bhearsum)
Attachment #785210 - Flags: feedback?(rail)
Attachment #785210 - Flags: feedback?(bhearsum)
Flags: needinfo?
(Assignee)

Comment 16

5 years ago
> Things I still need to know:
> - Based on comment 13 I am curious if indeed status 3 can be handled as a
> SUCCESS and status' 4 and 6 can be handled as a FAILURE.

DERP. I meant based on comment 14 :)
Comment on attachment 785210 [details] [diff] [review]
bug900318.patch

Review of attachment 785210 [details] [diff] [review]:
-----------------------------------------------------------------

Some comments below.

::: buildapi/controllers/selfserve.py
@@ +212,5 @@
>  
>          return self._ok(retval)
>  
> +    def revision_is_done(self, branch, revision):
> +        """Return a json dictionary with information about whether the job is 

pep8 complains about the trailing space :)

::: buildapi/lib/helpers.py
@@ +175,5 @@
>  
> +def get_completeness(job_items, stableDelay):
> +    now = int(time.time())
> +    job_info = {'stableDelay': stableDelay, 
> +                'job_complete': False, 

trailing spaces

@@ +179,5 @@
> +                'job_complete': False, 
> +                'job_passed': False}
> +
> +    try:
> +        unfinished_jobs = [job for job in job_items if (job['endtime'] == None) 

pep8 suggests using "job['endtime'] is None" instead
parentheses in  "(job['endtime'] == None)" and the line below are redundant

@@ +182,5 @@
> +    try:
> +        unfinished_jobs = [job for job in job_items if (job['endtime'] == None) 
> +                            or (now - job['endtime'] < stableDelay)]
> +    except KeyError:
> +        unfinished_jobs = ['endtime not found in some job']

This looks like a hack...
I'd use job.get('endtime') instead. The right part of "or" won't be evaluated if job.get('endtime') returns None.

@@ +187,5 @@
> +
> +    if not unfinished_jobs:    # If list is empty
> +        job_info['job_complete'] = True
> +        failed_jobs = [job for job in job_items if (job['status'] != 0) and 
> +                        (job['status'] != 5)]

redundant parentheses
You can use "job['status'] in (0, 5)" instead
Attachment #785210 - Flags: feedback?(rail) → feedback+
(Assignee)

Comment 18

5 years ago
Created attachment 785244 [details] [diff] [review]
bug900318.patch

Removed the many whitespaces, and improved the syntax in the many ways you mentioned. Really, really good feedback :)

One thing I am curious about though.

 +    unfinished_jobs = [job for job in job_items if (job.get('endtime') is None) +                        or (now - job.get('endtime') < stableDelay)]

The parenthesis on these lines are certainly redundant, but I think it improves readability a bit, given that it breaks up the statement into quickly discernible segments. Is this the sort of situation where style varies for each person, or is this generally not a good redundancy to have (maybe even pep8 says something about it... I have not seen anything though)?
Attachment #785210 - Attachment is obsolete: true
Attachment #785210 - Flags: feedback?(bhearsum)
Attachment #785244 - Flags: feedback?(rail)
Comment on attachment 785244 [details] [diff] [review]
bug900318.patch

LGTM. Thanks!
Attachment #785244 - Flags: feedback?(rail) → feedback+
(Assignee)

Comment 20

5 years ago
Thanks Rail!

I am also going to take this time to reiterate my question from comment 14. Callek, what're your thoughts?
Flags: needinfo?(catlee)
Flags: needinfo?(bugspam.Callek)
Flags: needinfo?
(Assignee)

Comment 21

5 years ago
Okay typo in comment 14, so disregard. I will reword the question here.

I am verifying that I handle all status codes correctly. The possible codes are:
0 - SUCCESS - Green
1 - WARNINGS - Orange
2 - FAILURE - Red, or simply not finished
3 - SKIPPED
4 - EXCEPTION
5 - RETRY
6 - CANCELLED

Currently I am treating...
0 and 5 as SUCCESS
1, 2, 3, 4 and 6 as FAILURE

I am not 100% confident in my handling of 3, 4 or 6.
I am thinking I should handle...
3 as SUCCESS
4 and 6 as FAILURE

Any objections or info I should know about the nature of these status codes?
(Assignee)

Comment 22

5 years ago
Comment on attachment 785244 [details] [diff] [review]
bug900318.patch

Review of attachment 785244 [details] [diff] [review]:
-----------------------------------------------------------------

Okay I have tested this patch, and I think it is all ready to go!
Attachment #785244 - Flags: review?(rail)
(Assignee)

Comment 23

5 years ago
Created attachment 786081 [details] [diff] [review]
bug900318-unittests.patch

Here is the unit tests I wrote up for this patch. I think this is fairly complete, but please tell me if there is more that you think should be added.

Note: I did not place this test in the tests/ directory because of the __init__.py file located there. It has some issues that I believe have to do with Pylons itself... therefore running 'nosetests test_revision_is_done.py' when the file is located in the tests/ directory will produce an error. By placing the file here, nosetests can run without issue. Also, because of this Pylons issue, I did not write the test using the Pylons framework, instead I just launched the server on localhost:5000 and the unit tests use urllib2 to create url requests, then the test validates them.

Need Info: Also, the current patch is treating status codes 0 and 5 as SUCCESS and 1, 2, 3, 5 and 6 as FAILURE. If this is fine, then we are good. But, if not, then please provide a comment on the need-info in comment 21.
Attachment #786081 - Flags: feedback?(rail)
(Assignee)

Comment 24

5 years ago
Comment on attachment 785244 [details] [diff] [review]
bug900318.patch

Review of attachment 785244 [details] [diff] [review]:
-----------------------------------------------------------------

Rerouting review from rail to nthomas :)
Attachment #785244 - Flags: review?(rail) → review?(nthomas)
(Assignee)

Comment 25

5 years ago
Comment on attachment 786081 [details] [diff] [review]
bug900318-unittests.patch

Review of attachment 786081 [details] [diff] [review]:
-----------------------------------------------------------------

Rerouting feedback from rail to nthomas :)
Attachment #786081 - Flags: feedback?(rail) → feedback?(nthomas)
Comment on attachment 785244 [details] [diff] [review]
bug900318.patch

Review of attachment 785244 [details] [diff] [review]:
-----------------------------------------------------------------

::: buildapi/lib/helpers.py
@@ +184,5 @@
> +
> +    if not unfinished_jobs:    # If list is empty
> +        job_info['job_complete'] = True
> +        failed_jobs = [job for job in job_items if job['status'] not in (0,5)]
> +        job_info['job_passed'] = len(failed_jobs) == 0

I'm not sure what sort of decisions autoland needs to make based on the result of this code, but assuming it's going to land if job_complete and job_passed are both True, lets run through the job states:

0 - SUCCESS - ideally all jobs have this (yeah, right)
1 - WARNINGS - typically test failures; maybe ok, maybe not
2 - FAILURE - compile failures or tests which completely failed to start
3 - SKIPPED - not used
4 - EXCEPTION - some sort of infra failure, no result known
5 - RETRY - some sort of infra failure we redo the job for, result available later
6 - CANCELLED - by a user

I guess RETRY is 'not a failure, yet, possibly success later', so treating as success is OK, but a comment would be helpful. FAILURE and EXCEPTION should definitely block autoland, as should CANCELLED so people can't game the system. 

WARNINGS is hard given intermittent test failures, as we don't have a way to distinguish test fail from the patch from the existing issues. The latter mean it's pretty much impossible to get an all green run on try. Is there a plan for this ? 

So what you have looks good, other than WARNINGS. I'll hold the r+/- pending a reply on that (might be question for rail).
Comment on attachment 786081 [details] [diff] [review]
bug900318-unittests.patch

Review of attachment 786081 [details] [diff] [review]:
-----------------------------------------------------------------

Bravo for writing tests! I don't know what the problem is with Pylons, but catlee may have some insight when he gets back. It would be ideal if we could avoid manually running a server with a particular port config.

The other thing that would be good to test is get_completeness()m for various jobs & scenarios. Eg if consider a single build which spawns a bunch of tests, then the following are possible
* pushed to repo but nothing scheduled yet (no job_items)
* compile job pending
* compile job running
* compile failed, no tests will be scheduled
* compile success, tests not scheduled yet (checking treeStableTimer)
* compile success, tests pending
* compile success, tests running
* compile success, some tests complete but others not
* compile and test jobs done (for various end states)
There's probably a bunch of overlap there, included it all for completeness. You would probably need to create some sample data and mock out time.time() to do this.

::: buildapi/test_revision_is_done.py
@@ +17,5 @@
> +# 3) FAIL when valid branch, but invalid revision name
> +# 4) PASS when valid branch and valid revision name
> +# 5) FAIL when not an int is passed to stableDelay
> +# 6) PASS when valid stableDelay
> +# 7) 

Nit, you can remove these empty lines. Watch out for trailing whitespace in general.

@@ +36,5 @@
> +# 2) Random revision numbers
> +# 3) Random branch names
> +# 4) Any branch other than mozilla-central (no sense in testing for a list 
> +#        of branches that is going to change... if it works for 1, you can assume it'll work for all)
> +# 5) Negative number passed for stableDelay

It would be good to reject negative stableDelay in the other patch, and add a test here. You can do validator.Int(min=0), see http://www.formencode.org/en/latest/modules/validators.html#module-formencode.validators

In practice any stableDelay less than 60 is asking for trouble, as that's how often the schedulers can take to schedule tests in response to a build.
Attachment #786081 - Flags: feedback?(nthomas) → feedback+
Regarding statuses,

1) You shouldn't use numerical values, import the corresponding constants from buildapi.model.util

2) I think the current handling is OK - treat SUCCESS as success and skip RETRY. I agree with Nick that you should add some comments why use them.
(Assignee)

Comment 29

5 years ago
(In reply to Nick Thomas [:nthomas] from comment #26)
> Comment on attachment 785244 [details] [diff] [review]
> bug900318.patch
> 
> Review of attachment 785244 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: buildapi/lib/helpers.py
> @@ +184,5 @@
> > +
> > +    if not unfinished_jobs:    # If list is empty
> > +        job_info['job_complete'] = True
> > +        failed_jobs = [job for job in job_items if job['status'] not in (0,5)]
> > +        job_info['job_passed'] = len(failed_jobs) == 0
> 
> I'm not sure what sort of decisions autoland needs to make based on the
> result of this code, but assuming it's going to land if job_complete and
> job_passed are both True, lets run through the job states:
> 
> 0 - SUCCESS - ideally all jobs have this (yeah, right)
> 1 - WARNINGS - typically test failures; maybe ok, maybe not
> 2 - FAILURE - compile failures or tests which completely failed to start
> 3 - SKIPPED - not used
> 4 - EXCEPTION - some sort of infra failure, no result known
> 5 - RETRY - some sort of infra failure we redo the job for, result available
> later
> 6 - CANCELLED - by a user
> 
> I guess RETRY is 'not a failure, yet, possibly success later', so treating
> as success is OK, but a comment would be helpful. FAILURE and EXCEPTION
> should definitely block autoland, as should CANCELLED so people can't game
> the system. 
> 
> WARNINGS is hard given intermittent test failures, as we don't have a way to
> distinguish test fail from the patch from the existing issues. The latter
> mean it's pretty much impossible to get an all green run on try. Is there a
> plan for this ? 
> 
> So what you have looks good, other than WARNINGS. I'll hold the r+/- pending
> a reply on that (might be question for rail).

We decided that the scope of this phase is not handle WARNINGS as fail, given the difficulty in determining their true significance. Having said that, it's also not such a bad thing to handle them as a failure since SUCCESS is the more common scenario and is only becoming more common.

Other than that, I agree, it seems pretty good! :)
So r+, then? :D
(Assignee)

Comment 30

5 years ago
(In reply to Rail Aliiev [:rail] from comment #28)
> Regarding statuses,
> 
> 1) You shouldn't use numerical values, import the corresponding constants
> from buildapi.model.util

Why is using the numerical values a problem?
(In reply to John Zeller [:zeller] from comment #30)
> (In reply to Rail Aliiev [:rail] from comment #28)
> > Regarding statuses,
> > 
> > 1) You shouldn't use numerical values, import the corresponding constants
> > from buildapi.model.util
> 
> Why is using the numerical values a problem?

1) Harder to search
2) If the values change you'd need to change them in one file
3) Easier to read
(Assignee)

Comment 32

5 years ago
Created attachment 787803 [details] [diff] [review]
bug900318.patch

Okay I added in some comments and am using the buildapi.models.util instead of straight numerical values. Also this patch constitutes all issues being resolved, as we have decided that status 0 and 5 are success and all else are failure.
Attachment #785244 - Attachment is obsolete: true
Attachment #785244 - Flags: review?(nthomas)
Attachment #787803 - Flags: review?(nthomas)
Flags: needinfo?(catlee)
Flags: needinfo?(bugspam.Callek)
(Assignee)

Comment 33

5 years ago
Created attachment 788276 [details] [diff] [review]
bug900318.patch

Opps, found some stuff that was included in the previous file that should not have been. I corrected it and now THIS is the file that is ready to go :)
Attachment #787803 - Attachment is obsolete: true
Attachment #787803 - Flags: review?(nthomas)
Attachment #788276 - Flags: review?(nthomas)
(Assignee)

Updated

5 years ago
Blocks: 903639
Comment on attachment 788276 [details] [diff] [review]
bug900318.patch

Review of attachment 788276 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with some changes on landing.

::: buildapi/lib/helpers.py
@@ +14,4 @@
>  
>  import buildapi.model.builds
>  from buildapi.lib import json
> +from buildapi.model.util import BUILDPOOL, TRYBUILDPOOL, TESTPOOL, results_to_str

I'm pretty sure Rail meant:
from buildapi.model.util import BUILDPOOL, TRYBUILDPOOL, TESTPOOL, SUCCESS, RETRY

@@ +183,5 @@
> +                        or (now - job.get('endtime') < stableDelay)]
> +
> +    if not unfinished_jobs:    # If list is empty
> +        job_info['job_complete'] = True
> +        # status SUCCESS and RETRY are both considered successes

Something like this would be good:
# RETRY isn't a failure because we're rerunning the job

@@ +184,5 @@
> +
> +    if not unfinished_jobs:    # If list is empty
> +        job_info['job_complete'] = True
> +        # status SUCCESS and RETRY are both considered successes
> +        failed_jobs = [job for job in job_items if results_to_str(job.get('status')) not in ('success','retry')]

And ...
failed_jobs = [job for job in job_items if job.get('status') not in (SUCCESS, RETRY)]

You might need to add in WARNINGS later, depending on how orange the tree typically is, but lets wait and see.
Attachment #788276 - Flags: review?(nthomas) → review+
(Assignee)

Comment 35

5 years ago
Created attachment 789151 [details] [diff] [review]
bug900318.patch

Made final changes, patch is ready for landing! :)
Attachment #788276 - Attachment is obsolete: true
Attachment #789151 - Flags: review?(nthomas)
Attachment #789151 - Flags: checked-in?
(Assignee)

Updated

5 years ago
Assignee: nobody → jozeller
Product: mozilla.org → Release Engineering
(Assignee)

Comment 37

5 years ago
Created attachment 791396 [details] [diff] [review]
bug900318-unittests.patch

Switched the unittests to use the python module unittest rather than nosetests. That way these can be stored in the tests/ directory properly.

I am also working on adding some of the things that nthomas mentioned in comment 27. However, I am requesting a check-in given that these should be added as documentation now and updates to it can be added later once I get the other tests worked out.
Attachment #786081 - Attachment is obsolete: true
Attachment #791396 - Flags: review?(nthomas)
Attachment #791396 - Flags: checked-in?
(Assignee)

Comment 38

5 years ago
Comment on attachment 791396 [details] [diff] [review]
bug900318-unittests.patch

Review of attachment 791396 [details] [diff] [review]:
-----------------------------------------------------------------

Adding catlee as it may be better answered by he :)
Attachment #791396 - Flags: review?(catlee)
(Assignee)

Comment 39

5 years ago
Created attachment 794984 [details] [diff] [review]
bug900318-2.patch

When rewriting unittests I found a bug in get_completeness that didn't handle TypeError's if job_items was None and incorrectly qualified job_items = [] as job complete and job passed. This patch fixes those bugs
Attachment #794984 - Flags: review?(nthomas)
(Assignee)

Comment 40

5 years ago
Created attachment 794986 [details] [diff] [review]
bug900318-unittests.patch

I rewrote the entire unittests file for this bug. This patch now uses the python unittests module to run 13 tests on the get_completeness function found in helpers.py. The tests that run all pass successfully with the following output:

(buildapi-test)localhost:tests jzeller$ python test_get_completeness.py 
.............
----------------------------------------------------------------------
Ran 13 tests in 0.008s

OK
Attachment #794986 - Flags: review?(catlee)
(Assignee)

Updated

5 years ago
Attachment #791396 - Flags: review?(nthomas)
Attachment #791396 - Flags: review?(catlee)
(Assignee)

Updated

5 years ago
Attachment #791396 - Flags: checked-in?
Attachment #794984 - Flags: review?(nthomas) → review+
(Assignee)

Updated

5 years ago
Attachment #794984 - Flags: checked-in?
Comment on attachment 794986 [details] [diff] [review]
bug900318-unittests.patch

Review of attachment 794986 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good! One question about test timing, but otherwise good to land.

::: buildapi/tests/test_get_completeness.py
@@ +128,5 @@
> +            result = get_completeness(job_items, self.valid_stableDelay)
> +            if x < self.valid_stableDelay:
> +                assert result.get('job_complete') == False and result.get('job_passed') == False
> +            if x >= self.valid_stableDelay:
> +                assert result.get('job_complete') == True and result.get('job_passed') == True

How sensitive is this to timing issues when running the test? Does get_completeness depend on the results of time.time() in any way?  If so, this test may fail if run on a slow/overloaded machine. Have you considered mocking out time.time in this case?
Attachment #794986 - Flags: review?(catlee) → review+
(Assignee)

Comment 43

5 years ago
Created attachment 804771 [details] [diff] [review]
bug900318-unittests-2.patch

This patch changes a few things from the previous patch.

 - Brings everything up to pep8
 - Within the test function test_job_success_stableDelay
   - Iterates over boundary cases instead of range(1001) now
   - Mocked time.time so that the unittests and helpers.get_completeness have the exact same time.time(), this ensures that there is not a delay causing an error
   - Rounds time.time() down to make sure there are not inconsistent errors in stableDelay check
Attachment #804771 - Flags: review?(hwine)
Comment on attachment 804771 [details] [diff] [review]
bug900318-unittests-2.patch

Review of attachment 804771 [details] [diff] [review]:
-----------------------------------------------------------------

Just a couple of questions -- re nom me for review when addressed.

::: buildapi/tests/test_get_completeness.py
@@ +1,3 @@
>  from buildapi.lib.helpers import get_completeness
>  from buildapi.model.util import NO_RESULT, SUCCESS, WARNINGS, FAILURE, SKIPPED, EXCEPTION, RETRY
> +import time as t

I thought we got rid of time as t -- why is it coming back?

@@ +5,3 @@
>  import json
>  import unittest
> +from mock import patch, Mock

I don't see any new code using these new imports. Why are they here?

If they are here for cleaner import of the code under test, then that's worthy of a comment.
Attachment #804771 - Flags: review?(hwine)
Status: NEW → RESOLVED
Last Resolved: 11 months ago
Resolution: --- → INCOMPLETE
Component: Tools → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.