Closed Bug 674267 Opened 13 years ago Closed 13 years ago

Recent jobs page is not loading up

Categories

(Release Engineering :: General, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: catlee)

References

Details

(Keywords: buildapi, Whiteboard: [buildapi][reporting])

Attachments

(1 file, 1 obsolete file)

https://build.mozilla.org/buildapi/recent


##################################
Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /buildapi/recent.

Reason: Error reading from remote server
##################################
Loads forever on cruncher as well.
Whiteboard: [buildapi][reporting]
This was due to some slow MySQL queries.  Mpressman had a look at this today, and optimized several tables, to the point where the slow queries started using the appropriate indexes.  The indexes were being ignored because their cardinality indicated that a table scan would be faster.

Matthew's going to try to patch buildapi to generate a more efficient query, so I'm assigning this to him.
Assignee: nobody → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
It's probably also worth optimizing these on a regular basis from a crontab.  Let me know if you set that up, and I'll update the buildbot documentation on the wiki.
From IRC:
<justdave>	dustin: I lied.
# Analyze the buildbot_schedulers tables to keep the indexes performant
export HOME=/root
mysql buildbot_schedulers -e "analyze table `mysql buildbot_schedulers --skip-column-names -e 'show tables' | tr '\n' ',' | sed 's/,$//'`"
was looking in the wrong place
it's there
<justdave>	it's in cron.weekly, so it'll run around 4:20am every Sunday morning

Getting back to the .../recent query, I bumped the apache timeout in cruncher:/etc/httpd/conf/httpd.conf from 120 and 180 seconds and it loads for me now, albeit slowly. This report hits the status db (aka buildbot), but others like the endtoend hit schedulerdb (buildbot_schedulers) and are also timing out. Perhaps we're just filling up the tables so it takes longer to get the data we want, or it could be other load on the db servers is impacting on RelEng db's (setting a dep on bug 674298 to try to find out).
Depends on: 674298
Let's see if we can get this moving.  The /recent page is still timing out for me, although partcular-slave recent pages are not, e.g.,
  https://build.mozilla.org/buildapi/recent/talos-r3-w7-011

I'm not sure what to look at next, here - would the new redis host help?
Assignee: server-ops-releng → dustin
Catlee, what can we do here?
I don't know. The query itself is pretty simple:

SELECT builds.id, builders.name AS buildname, builds.buildnumber, builds.starttime, builds.endtime, builds.result, slaves.name AS slavename, masters.name AS master  FROM builds, builders, slaves, masters  WHERE builds.slave_id = slaves.id AND builds.builder_id = builders.id AND builds.master_id = masters.id AND builds.result IS NOT NULL ORDER BY builds.id DESC   LIMIT 20;

explain has this to say:

+----+-------------+----------+--------+--------------------------------------------------------------------+--------------------+---------+----------------------------+------+---------------------------------+
| id | select_type | table    | type   | possible_keys                                                      | key                | key_len | ref                        | rows | Extra                           |
+----+-------------+----------+--------+--------------------------------------------------------------------+--------------------+---------+----------------------------+------+---------------------------------+
|  1 | SIMPLE      | slaves   | ALL    | PRIMARY                                                            | NULL               | NULL    | NULL                       | 1346 | Using temporary; Using filesort | 
|  1 | SIMPLE      | builds   | ref    | master_id,ix_builds_slave_id,ix_builds_builder_id,ix_builds_result | ix_builds_slave_id | 4       | buildbot.slaves.id         | 1060 | Using where                     | 
|  1 | SIMPLE      | builders | eq_ref | PRIMARY                                                            | PRIMARY            | 4       | buildbot.builds.builder_id |    1 |                                 | 
|  1 | SIMPLE      | masters  | eq_ref | PRIMARY                                                            | PRIMARY            | 4       | buildbot.builds.master_id  |    1 |                                 | 
+----+-------------+----------+--------+--------------------------------------------------------------------+--------------------+---------+----------------------------+------+---------------------------------+

and yet the query takes a long time to run (66 seconds on my first try)
I'm moving this back to release engineering for the moment, since it looks like this is a query-optimization problem, and rather low priority.  We have a new DBA starting in December, so this may be worth discussing with her at that point.
Assignee: dustin → nobody
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
Assignee: nobody → catlee
Comment on attachment 575274 [details] [diff] [review]
faster waittime and queries

>diff --git a/buildapi/model/waittimes.py b/buildapi/model/waittimes.py

I compared the waittimes for 2011-11-13 thru to 17 on your instance and the production one (buildpool), and there were some new long-wait builds that showed up. eg:
  http://cruncher.build.mozilla.org/buildapi/reports/waittimes?starttime=1321268400&endtime=1321354800
  http://cruncher.build.mozilla.org/~catlee/wsgi/reports/waittimes?starttime=1321268400&endtime=1321354800
Silly question, but does your repo have the changes from bug 674057 in it ? Any idea what's going on here ?

We could remove this wonky routing from buildapi/buildapi/config/routing.py too:
    map.connect('/recent/{slave}/{count}', controller='recent', action='index')
so that all queries take the same ?count=N syntax.
Otherwise that patch looks great.
changed max(builds.start_time) to min(builds.start_time) as a better indication of wait times...also makes the results match the original.
Attachment #575274 - Attachment is obsolete: true
Attachment #575274 - Flags: review?(nrthomas)
Attachment #575345 - Flags: review?(nrthomas)
Comment on attachment 575345 [details] [diff] [review]
faster waittime and queries

Nice one.
Attachment #575345 - Flags: review?(nrthomas) → review+
Attachment #575345 - Flags: checked-in+
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Keywords: buildapi
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: