Enable users to cancel Try Server builds

RESOLVED FIXED

Status

Release Engineering
General
P3
normal
RESOLVED FIXED
11 years ago
5 years ago

People

(Reporter: jwatt, Assigned: catlee)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [tryserver][db])

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

11 years ago
If I upload a patch and I see almost immediately that it doesn't work on Mac, I'd like to be able to cancel the other builds so I can try again without waiting hours for the Windows build to complete. This could really shorten the cycle between various attempts to get a patch working on other platforms.

As well as a manual button to cancel your build, it would be nice to have an option to have all platforms cancel their builds automatically if even one of the platforms fails, since usually you're going to want to retry a new patch. Might then be good to have the automatically canceled builds turn a color other than red (so you can see which build was the actual failing build).
Assignee: server-ops → nobody
Component: Server Operations → Try Server
Product: mozilla.org → Webtools
QA Contact: justin → try-server

Updated

11 years ago
Priority: -- → P3

Updated

10 years ago
Duplicate of this bug: 414084
Mass change of target milestone.
Target Milestone: --- → Future
Component: Try Server → Release Engineering
Product: Webtools → mozilla.org
QA Contact: try-server → release
Target Milestone: Future → ---

Comment 3

9 years ago
I believe this is wanted for production-master as well.
Futuring until we have the resources for this.
Component: Release Engineering → Release Engineering: Future

Comment 4

9 years ago
Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Depends on: 539588
Whiteboard: [tryserver]
Depends on: 520227
Priority: P3 → P4
Priority: P4 → P5
(Assignee)

Comment 5

8 years ago
It's possible to cancel *pending* builds using schedulerdb.  Canceling active builds still requires talking directly to the buildbot master.
Whiteboard: [tryserver] → [tryserver][db]

Comment 6

8 years ago
For cancelling pending builds, it would be nice to be able to
hg commit --close-branch
hg push -f try
Assignee: nobody → lsblakk
Whiteboard: [tryserver][db] → [tryserver][db][triagefollowup]
Whiteboard: [tryserver][db][triagefollowup] → [tryserver][db]
(Assignee)

Comment 7

8 years ago
Will do this as part of buildapi
Assignee: lsblakk → catlee
Depends on: 591351
(Assignee)

Updated

8 years ago
Priority: P5 → P3
(Assignee)

Comment 8

8 years ago
Created attachment 488118 [details] [diff] [review]
read-only bits enabled

Let's start with this.

The read-write stuff is almost there, pending actual testing in staging and read/write access to the database.
Attachment #488118 - Flags: review?(nrthomas)
(Assignee)

Updated

8 years ago
Depends on: 608052
Comment on attachment 488118 [details] [diff] [review]
read-only bits enabled

Lots of great work here cleaning and expanding on what we have now. I mostly have questions and a few suggestions, but a bit much to r+ with changes.

>diff --git a/buildapi/controllers/builds.py b/buildapi/controllers/builds.py
>+log = logging.getLogger(__name__)
>+access_log = logging.getLogger("buildapi.access")

What was the strategy for using log vs access_log in the functions in this file ? require_auth might be better using access_log, and the log() in branch() is the odd one out for the r/o functions.

>+    def require_auth(self):

We just need ldap auth set up in apache to make this work ? When we get to r/w that is.

>diff --git a/buildapi/lib/helpers.py b/buildapi/lib/helpers.py
> def convert_master(m):
>+    fqdn, port = addr_for_master(m)
>+    pretty_name = '%s:%i' % (fqdn, port)

I was using everything up to the first '.' before, which is useful for pending/running because it's shorter. Are you making that change intentionally ?

>diff --git a/buildapi/model/builds.py b/buildapi/model/builds.py
>+def requestFromRow(row):
>+def buildFromRow(row):
>+    request = requestFromRow(row)
>+    build = {
>+        'buildid': row.buildid,
>+        'requests': [request],

I had thought that including requests in a build blog was duplicating quite a lot of information, but perhaps it's just buildername and branch.

>+def getRevision(branch, revision, starttime=None, endtime=None, limit=None):
>+    # TODO: Look at changes table too
>+    q = getBuildsQuery(branch)

For unscheduled jobs ?

>+def reprioritizeRequest(who, brid, priority, engine):
>+    if r['claimed_at'] != 0:
>+        log.info("Request %i is already running, nothing to do", brid)
>+        return False
>+    if r['complete'] != 0:
>+        log.info("Request %i is complete, nothing to do", brid)
>+        return False

These are the messages end users would want to see. Can we pass them back in the response ?

>+    q = br.update().where(and_(br.c.id == brid, br.c.complete == 0))

Surprised there isn't a claimed_at != 0 condition here too. It looks like a guard against the job getting picked up since the first query.

>+    res = q.execute(bind=engine)
>+    return True

What happens if the execute fails, or stalls ? Is there a timeout ? Is there meant to be some handling of res here ?

>+def cancelRequest(who, brid, engine):
Similar comments as reprioritizeRequest.

>+def cancelBuild(who, bid, engine):

You could use the r/o connection here for smidgen more safety.

>diff --git a/buildapi/tests/functional/test_builds.py b/buildapi/tests/functional/test_builds.py
>+    def setUp(self):
>+        self.engine = sqlalchemy.create_engine("sqlite:///:memory:")
>+        sql = open(os.path.join(os.path.dirname(os.path.dirname(__file__)), "state.sql")).read().split(";")

state.sql is another file to be committed ?

>diff --git a/test.ini b/test.ini
>+branches = branch1, branch2
>+
>+auth_override =
>+
>+masters.aglon.name = aglon:/home/catlee/mozilla/buildapi/buildapi/tests/master
>+masters.aglon.fqdn = aglon.localdomain
>+masters.aglon.port = 5000

How will the prod values be handled for this ? Perhaps add a production.ini.
Attachment #488118 - Flags: review?(nrthomas) → review-
(Assignee)

Comment 10

8 years ago
(In reply to comment #9)
> Comment on attachment 488118 [details] [diff] [review]
> read-only bits enabled
> 
> Lots of great work here cleaning and expanding on what we have now. I mostly
> have questions and a few suggestions, but a bit much to r+ with changes.
> 
> >diff --git a/buildapi/controllers/builds.py b/buildapi/controllers/builds.py
> >+log = logging.getLogger(__name__)
> >+access_log = logging.getLogger("buildapi.access")
> 
> What was the strategy for using log vs access_log in the functions in this file
> ? require_auth might be better using access_log, and the log() in branch() is
> the odd one out for the r/o functions.

Ok, I'll fix it up so that r/o functions don't do explicit logging (it's available elsewhere if we need it), and have require_auth use access_log.

> 
> >+    def require_auth(self):
> 
> We just need ldap auth set up in apache to make this work ? When we get to r/w
> that is.

The proxy on build.m.o is set up to do LDAP auth and forwards that along to the app on cruncher.

> >diff --git a/buildapi/lib/helpers.py b/buildapi/lib/helpers.py
> > def convert_master(m):
> >+    fqdn, port = addr_for_master(m)
> >+    pretty_name = '%s:%i' % (fqdn, port)
> 
> I was using everything up to the first '.' before, which is useful for
> pending/running because it's shorter. Are you making that change intentionally
> ?

Nope, fixed it now.

> >diff --git a/buildapi/model/builds.py b/buildapi/model/builds.py
> >+def requestFromRow(row):
> >+def buildFromRow(row):
> >+    request = requestFromRow(row)
> >+    build = {
> >+        'buildid': row.buildid,
> >+        'requests': [request],
> 
> I had thought that including requests in a build blog was duplicating quite a
> lot of information, but perhaps it's just buildername and branch.

I wanted to make it easy to go back and forth between builds and requests.  Including the request information here seems like a good concession to make for convenience.

> >+def getRevision(branch, revision, starttime=None, endtime=None, limit=None):
> >+    # TODO: Look at changes table too
> >+    q = getBuildsQuery(branch)
> 
> For unscheduled jobs ?

And for changes that were merged with others into a single sourcestamp.  I've updated the comment.

> >+def reprioritizeRequest(who, brid, priority, engine):
> >+    if r['claimed_at'] != 0:
> >+        log.info("Request %i is already running, nothing to do", brid)
> >+        return False
> >+    if r['complete'] != 0:
> >+        log.info("Request %i is complete, nothing to do", brid)
> >+        return False
> 
> These are the messages end users would want to see. Can we pass them back in
> the response ?

This code is gone now with the message broker approach.

> 
> >+    q = br.update().where(and_(br.c.id == brid, br.c.complete == 0))
> 
> Surprised there isn't a claimed_at != 0 condition here too. It looks like a
> guard against the job getting picked up since the first query.

gone.

> 
> >+    res = q.execute(bind=engine)
> >+    return True
> 
> What happens if the execute fails, or stalls ? Is there a timeout ? Is there
> meant to be some handling of res here ?

gone.

> 
> >+def cancelRequest(who, brid, engine):
> Similar comments as reprioritizeRequest.

gone.

> 
> >+def cancelBuild(who, bid, engine):
> 
> You could use the r/o connection here for smidgen more safety.

gone.

> >diff --git a/buildapi/tests/functional/test_builds.py b/buildapi/tests/functional/test_builds.py
> >+    def setUp(self):
> >+        self.engine = sqlalchemy.create_engine("sqlite:///:memory:")
> >+        sql = open(os.path.join(os.path.dirname(os.path.dirname(__file__)), "state.sql")).read().split(";")
> 
> state.sql is another file to be committed ?

Yes, it's in my repo...forget if I stripped that out of the last patch for brevity or if I forgot to hg add it.

> 
> >diff --git a/test.ini b/test.ini
> >+branches = branch1, branch2
> >+
> >+auth_override =
> >+
> >+masters.aglon.name = aglon:/home/catlee/mozilla/buildapi/buildapi/tests/master
> >+masters.aglon.fqdn = aglon.localdomain
> >+masters.aglon.port = 5000
> 
> How will the prod values be handled for this ? Perhaps add a production.ini.

Yup!
(Assignee)

Comment 11

8 years ago
Created attachment 494194 [details] [diff] [review]
read-only bits + a bit of read-write

I think I've addressed your comments from the previous patch.

The read-write parts have been gutted and replaced by calls out to a message broker.

scripts/buildapi-agent.py is responsible for acting on the job requests to actually cancel builds, reprioritize, etc.

Some implementation notes are here:
http://hg.mozilla.org/users/catlee_mozilla.com/buildapi/file/fa37a00544bf/RESTAPI_NOTES.txt
Attachment #488118 - Attachment is obsolete: true
Attachment #494194 - Flags: feedback?(nrthomas)
(Assignee)

Updated

8 years ago
Attachment #494194 - Flags: feedback?(salbiz)
(Assignee)

Updated

8 years ago
Attachment #494194 - Flags: feedback?(bear)
(Assignee)

Comment 12

8 years ago
Created attachment 495564 [details] [diff] [review]
Initial self-serve implementation

It's still missing some bits, but I think it's good enough to start having people poke at.
Attachment #495564 - Flags: review?
Comment on attachment 494194 [details] [diff] [review]
read-only bits + a bit of read-write

Is it still useful to look at this patch given the later one attached and the changes we discussed last week ?
(Assignee)

Updated

8 years ago
Attachment #494194 - Flags: feedback?(salbiz)
Attachment #494194 - Flags: feedback?(nrthomas)
Attachment #494194 - Flags: feedback?(bear)
(Assignee)

Updated

8 years ago
Attachment #495564 - Flags: review?
(Assignee)

Comment 14

8 years ago
self-serve now allows you to do this.

https://build.mozilla.org/buildapi/self-serve/try
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.