Recent jobs page is not loading up

RESOLVED FIXED

Status

RESOLVED FIXED
7 years ago
5 years ago

People

(Reporter: armenzg, Assigned: catlee)

Tracking

({buildapi})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [buildapi][reporting])

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

7 years ago
https://build.mozilla.org/buildapi/recent


##################################
Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /buildapi/recent.

Reason: Error reading from remote server
##################################

Comment 1

7 years ago
Loads forever on cruncher as well.
Whiteboard: [buildapi][reporting]
This was due to some slow MySQL queries.  Mpressman had a look at this today, and optimized several tables, to the point where the slow queries started using the appropriate indexes.  The indexes were being ignored because their cardinality indicated that a table scan would be faster.

Matthew's going to try to patch buildapi to generate a more efficient query, so I'm assigning this to him.
Assignee: nobody → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
It's probably also worth optimizing these on a regular basis from a crontab.  Let me know if you set that up, and I'll update the buildbot documentation on the wiki.
From IRC:
<justdave>	dustin: I lied.
# Analyze the buildbot_schedulers tables to keep the indexes performant
export HOME=/root
mysql buildbot_schedulers -e "analyze table `mysql buildbot_schedulers --skip-column-names -e 'show tables' | tr '\n' ',' | sed 's/,$//'`"
was looking in the wrong place
it's there
<justdave>	it's in cron.weekly, so it'll run around 4:20am every Sunday morning

Getting back to the .../recent query, I bumped the apache timeout in cruncher:/etc/httpd/conf/httpd.conf from 120 and 180 seconds and it loads for me now, albeit slowly. This report hits the status db (aka buildbot), but others like the endtoend hit schedulerdb (buildbot_schedulers) and are also timing out. Perhaps we're just filling up the tables so it takes longer to get the data we want, or it could be other load on the db servers is impacting on RelEng db's (setting a dep on bug 674298 to try to find out).
Depends on: 674298
Let's see if we can get this moving.  The /recent page is still timing out for me, although partcular-slave recent pages are not, e.g.,
  https://build.mozilla.org/buildapi/recent/talos-r3-w7-011

I'm not sure what to look at next, here - would the new redis host help?
Assignee: server-ops-releng → dustin
Catlee, what can we do here?
(Assignee)

Comment 7

7 years ago
I don't know. The query itself is pretty simple:

SELECT builds.id, builders.name AS buildname, builds.buildnumber, builds.starttime, builds.endtime, builds.result, slaves.name AS slavename, masters.name AS master  FROM builds, builders, slaves, masters  WHERE builds.slave_id = slaves.id AND builds.builder_id = builders.id AND builds.master_id = masters.id AND builds.result IS NOT NULL ORDER BY builds.id DESC   LIMIT 20;

explain has this to say:

+----+-------------+----------+--------+--------------------------------------------------------------------+--------------------+---------+----------------------------+------+---------------------------------+
| id | select_type | table    | type   | possible_keys                                                      | key                | key_len | ref                        | rows | Extra                           |
+----+-------------+----------+--------+--------------------------------------------------------------------+--------------------+---------+----------------------------+------+---------------------------------+
|  1 | SIMPLE      | slaves   | ALL    | PRIMARY                                                            | NULL               | NULL    | NULL                       | 1346 | Using temporary; Using filesort | 
|  1 | SIMPLE      | builds   | ref    | master_id,ix_builds_slave_id,ix_builds_builder_id,ix_builds_result | ix_builds_slave_id | 4       | buildbot.slaves.id         | 1060 | Using where                     | 
|  1 | SIMPLE      | builders | eq_ref | PRIMARY                                                            | PRIMARY            | 4       | buildbot.builds.builder_id |    1 |                                 | 
|  1 | SIMPLE      | masters  | eq_ref | PRIMARY                                                            | PRIMARY            | 4       | buildbot.builds.master_id  |    1 |                                 | 
+----+-------------+----------+--------+--------------------------------------------------------------------+--------------------+---------+----------------------------+------+---------------------------------+

and yet the query takes a long time to run (66 seconds on my first try)
I'm moving this back to release engineering for the moment, since it looks like this is a query-optimization problem, and rather low priority.  We have a new DBA starting in December, so this may be worth discussing with her at that point.
Assignee: dustin → nobody
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release

Updated

7 years ago
Assignee: nobody → catlee
Comment on attachment 575274 [details] [diff] [review]
faster waittime and queries

>diff --git a/buildapi/model/waittimes.py b/buildapi/model/waittimes.py

I compared the waittimes for 2011-11-13 thru to 17 on your instance and the production one (buildpool), and there were some new long-wait builds that showed up. eg:
  http://cruncher.build.mozilla.org/buildapi/reports/waittimes?starttime=1321268400&endtime=1321354800
  http://cruncher.build.mozilla.org/~catlee/wsgi/reports/waittimes?starttime=1321268400&endtime=1321354800
Silly question, but does your repo have the changes from bug 674057 in it ? Any idea what's going on here ?

We could remove this wonky routing from buildapi/buildapi/config/routing.py too:
    map.connect('/recent/{slave}/{count}', controller='recent', action='index')
so that all queries take the same ?count=N syntax.
Otherwise that patch looks great.
(Assignee)

Comment 12

7 years ago
Created attachment 575345 [details] [diff] [review]
faster waittime and queries

changed max(builds.start_time) to min(builds.start_time) as a better indication of wait times...also makes the results match the original.
Attachment #575274 - Attachment is obsolete: true
Attachment #575274 - Flags: review?(nrthomas)
Attachment #575345 - Flags: review?(nrthomas)
Comment on attachment 575345 [details] [diff] [review]
faster waittime and queries

Nice one.
Attachment #575345 - Flags: review?(nrthomas) → review+
(Assignee)

Updated

7 years ago
Attachment #575345 - Flags: checked-in+
(Assignee)

Updated

7 years ago
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED

Updated

6 years ago
Keywords: buildapi
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.