Closed Bug 1049657 Opened 10 years ago Closed 7 years ago

monitoring for buildbot master step delay

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: Callek, Unassigned)

References

Details

Justin Wood (:Callek)

Reporter

Description

•

10 years ago

Sometimes our buildbot steps take far longer than they should, due to load on the master.

The load criteria can vary wildly, where usually the fix is to just add more masters, or split the slave pool in other ways.

We should monitor for this.

(e.g. is one step that normally takes <1s on a slave could take 30s+ to complete and start the next step, this extra time adds up fast)

Chris AtLee [:catlee]

Comment 1

•

10 years ago

I've been using this code to submit master lag times to graphite. Somebody should be able to use this to generate a coarser metric to use by nagios. e.g. if 50th percentile rises above 10s, we should get an alert.

#!/usr/bin/env python
import sqlalchemy as sa
import time
from datetime import timedelta

import logging
log = logging.getLogger(__name__)

def find_lag_since(db, build_id):
    q = sa.text("""
        SELECT builds.id as build_id, masters.name as master, steps.starttime, steps.endtime FROM masters, builds, steps
        WHERE
            builds.master_id = masters.id AND
            steps.build_id = builds.id AND
            steps.name = 'get_basedir' AND
            builds.id > :build_id
            """)
    return db.execute(q, build_id=build_id)

def get_last_build_id(db, d):
    q = sa.text("SELECT id FROM builds WHERE starttime >= :d ORDER BY starttime asc limit 1")
    return db.execute(q, d=d).fetchone()[0]

def main():
    import config
    from build_times import GraphiteSubmitter, td2s, dt2ts

    logging.basicConfig(format="%(asctime)s - %(message)s", level=logging.DEBUG)
    db = sa.create_engine("mysql://foobar")

    log.debug("getting last_build_id")
    last_build_id = get_last_build_id(db, "2014-05-01")

    g = GraphiteSubmitter("graphitehost", 2003, config.graphite_api_key)
    log.debug("getting lag")
    for row in find_lag_since(db, last_build_id):
        d = td2s(row.endtime - row.starttime)
        t = dt2ts(row.starttime)
        g.submit("masterlag.%s" % row.master, d, t)

if __name__ == '__main__':
    main()

Nobody; OK to take it and work on it

Assignee

Updated

•

7 years ago

Component: Tools → General

Chris AtLee [:catlee]

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

monitoring for buildbot master step delay

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: Callek, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated