Closed Bug 994994 Opened 12 years ago Closed 12 years ago

buildbot db problems

Categories

(Data & BI Services Team :: DB: MySQL, task)

x86
All
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: mpressman)

References

Details

Thu 16:03:21 PDT [5990] buildbot2.db.scl3.mozilla.com:MySQL Replication is WARNING: Replication lag: 951 seconds (http://m.mozilla.org/MySQL+Replication) Thu 16:13:08 PDT [4659] builddata.pub.build.mozilla.org:http file age - /buildjson/builds-running.js is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:10:59 ago - 8191 bytes in 0.006 second response time Something wrong with buildbot1, which is acting as r/o slave ? The machines hitting the r/w slave are not complaining.
The following query from user buildbot_mhommey was locking tables required for update: select builds.id as build_id, builders.name as builder_name, UNIX_TIMESTAMP(builds.starttime) as build_starttime, UNIX_TIMESTAMP(builds.endtime) as build_endtime, slaves.name as slave_name, steps.id as step_id, steps.name as step_name, steps.description as step_description, steps.order as step_order, UNIX_TIMESTAMP(steps.starttime) as step_starttime, UNIX_TIMESTAMP(steps.endtime) as step_endtime, status as step_status, changes.who from steps, builds, builders, slaves, source_changes, changes where steps.build_id = builds.id AND builds.builder_id = builders.id AND builds.slave_id = slaves.id and builds.result = 0 and builds.source_id = source_changes.source_id and source_changes.change_id = changes.id order by builds.starttime asc, builds.id asc, steps.order asc I killed the query and now the lag is recovering
[17:38:19] <nagios-scl3> Thu 16:38:19 PDT [5102] buildbot2.db.scl3.mozilla.com:MySQL Replication is OK: Replication running. Lag time: 0 seconds
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
See Also: → 990820
glandium is going to limit his queries to a subset of builders, and there's an index in the builds table for builder_id. There's also an open question if we should add a second slave db, so that we can run 'expensive' queries on large amounts of historical data without affecting other production uses of the r/o slave. Sheeri, could you have a look at our monitoring and see if that would be appropriate ?
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.