Closed Bug 656902 Opened 13 years ago Closed 13 years ago

Import run data from Buildbot JSON and store it in a database on the TBPL server

Categories

(Tree Management Graveyard :: TBPL, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mstange, Assigned: mstange)

References

Details

Attachments

(4 files)

Currently we request run data in the client directly from Tinderbox in time-based chunks. That's bad for several reasons:
 - It needs Tinderbox.
 - You can't predict how far in the future you have to look for results for a
   certain push, since a push might still get results days later.
 - It wastes bandwidth when you're only interested in the results for a single
   push.

I'm going to fix this with a database (MongoDB) on the TBPL server that stores run information so that you can request them by revision. The script that imports the data into the DB is written in python and needs to be run as a cronjob every few (two?) minutes. It's based on dbaron's filter-buildbot-json.py script at http://hg.mozilla.org/users/dbaron_mozilla.com/buildbot-json-tools/file/tip/filter-buildbot-json.py
Attachment #532202 - Flags: review?(catlee)
Attachment #532203 - Flags: review?(arpad.borsos)
Comment on attachment 532203 [details] [diff] [review]
add php script that exposes run info from the DB

Review of attachment 532203 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, it is so easy ;-)
Attachment #532203 - Flags: review?(arpad.borsos) → review+
Blocks: 656919
Blocks: 630538
Blocks: 658536
Depends on: 661365
This should make http://tbpl.swatinem.de/?usebuildbot=1 faster since I think you don't have this index there yet.
Alternatively, we could add a line to the setup instructions so that the server admin just adds the index once from the mongo shell. Would you prefer that?
Attachment #538216 - Flags: review?(arpad.borsos)
Comment on attachment 538216 [details] [diff] [review]
ensure getRevisionBuilds.php takes advantage of an index

Review of attachment 538216 [details] [diff] [review]:
-----------------------------------------------------------------

As there is no thing like create table in mongo, we should do all this in php.
Attachment #538216 - Flags: review?(arpad.borsos) → review+
Comment on attachment 532202 [details] [diff] [review]
add import-buildbot-data.py

Hey Markus, sorry it took so long to get to this...

Overall this looks great, I just have a few comments, mostly about how to make this a bit more "pythonic"...

>diff --git a/dataimport/import-buildbot-data.py b/dataimport/import-buildbot-data.py
>new file mode 100644
>--- /dev/null
>+++ b/dataimport/import-buildbot-data.py
>@@ -0,0 +1,197 @@
>+#!/usr/bin/python
>+
>+# This script imports run data from build.mozilla.org into the local MongoDB;
>+# specifically, into the "runs" table of the "tbpl" database.
>+# The data saved in the database is made accessible to TBPL clients via the
>+# php/get*.php scripts.
>+# TBPL clients don't request run data from build.mozilla.org directly for
>+# performance reasons. The JSON files are very large and include data from
>+# all branches, and most of that data isn't of interest to TBPL.
>+
>+import json

If performance is a concern, you may want to look at the simplejson module, which is interchangeable with python 2.6's json module. We often use a pattern like this:

try:
    import simplejson as json
except ImportError:
    import json

>+import urllib2
>+import os
>+import datetime
>+import gzip
>+import StringIO
>+import time
>+import re
>+import optparse
>+from pymongo import Connection
>+from string import Template
>+
>+log_path_try = Template("http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/$pusher-$rev/$branch_platform/$builder-build$buildnumber.txt.gz")
>+log_path_other = Template("http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/$branch_platform/$buildid/$builder-build$buildnumber.txt.gz")

I had no idea string.Template existed!  neat! those could also be written as log_path_try = "http://..../%(pusher)s-%(rev)s/...", and use log_try_path % data below.

All of the stuff below should be in another function called main() or something. Then you can add a section at the end of the script that looks like this:

if __name__ == "__main__":
    main()

Doing this makes it easier to import objects from this file later.

>+os.environ["TZ"] = "America/Los_Angeles"
>+time.tzset()

You might want to look at the pytz module if you're dealing with timezones. (http://pytz.sourceforge.net/)

Can you get these things fixed up and we'll be good to go!
Attachment #532202 - Flags: review?(catlee) → review-
Attachment #538507 - Flags: review?(catlee)
Attachment #538507 - Flags: review?(catlee) → review+
Depends on: 669000
Depends on: 682914
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: Webtools → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: