Closed Bug 682059 Opened 13 years ago Closed 13 years ago

port tbpl to use mysql

Categories

(Tree Management Graveyard :: TBPL, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Swatinem, Assigned: rhelmer)

References

Details

Attachments

(3 files, 14 obsolete files)

11.67 KB, patch
Swatinem
: review+
Details | Diff | Splinter Review
10.06 KB, patch
Swatinem
: review+
Details | Diff | Splinter Review
11.88 KB, patch
mstange
: review+
Details | Diff | Splinter Review
1. agree on a schema 2. port the data-import python job 3. port the php parts
Rationale: <laura> we don't have mongo, and it would take time to get it <laura> what we do have access to on there is mysql, ES, memcache, and redis [...] <laura> I'd vote mysql because that's easiest then Why mysql rather than elastic search: <laura> there will be more latency though, since [the ES server] is in SJC <laura> and the new infra for tbpl will be in PHX (datacenter in phoenix)
Attached file proposed schema (obsolete) —
Quickly translated the mongo schema into mysql. We only have a simple key on builders.name in mongo, we don’t override mongos primary _id there, but we do treat it as unique key internally, so I just made it a primary. rhelmer: please take a look at the python if it fits the schema or not. From a first look, the first builder history that is inserted by the python script does not contain reason, who and ip. And we are using int unix timestamps there instead of sql timestamp.
Attachment #555821 - Flags: feedback?(rhelmer)
Attachment #555821 - Flags: feedback?(mstange)
(In reply to Arpad Borsos (Swatinem) from comment #2) > rhelmer: please take a look at the python if it fits the schema or not. It does, from what I can tell. All fields that the python touches are present.
Comment on attachment 555821 [details] proposed schema Looks correct and complete.
Attachment #555821 - Flags: feedback?(mstange) → feedback+
Attachment #555821 - Attachment mime type: text/x-sql → text/plain
Comment on attachment 555821 [details] proposed schema Looks reasonable to me, going to load it up and play around with it right away. Thanks!
Attachment #555821 - Flags: feedback?(rhelmer) → feedback+
Attached patch quick PoC for mongo->mysql (obsolete) — Splinter Review
Here's a quick proof-of-concept conversion of this code from mongoDB to MySQL. I've made minimal changes to the existing code. Needs tests and configuration (mysql auth info is hardcoded). Arpad, I have some questions on the schema: * it looks like run.get_info()['log'] is sometimes not present. Should the schema allow runs.log to be NULL? It currently does not. * builders table has a "hidden" column. Where does this come from? I can't find anything like show/hide/hidden/etc. in the source JSON. * the old code inserted this to the mongo builders table: "history": [{ "date": int(time.time()), "action": "insert" }] Is the purpose of this just to know what time the data was fetched and inserted into the database? I thought perhaps the builders_history was a replacement for this but it has a bunch of other columns like "ip", "who", "reason", etc. * what is the purpose of the builders_history table (also see above) * what is the purpose of the runs_notes table? Thanks!
Attachment #555873 - Flags: feedback?(peterbe)
(In reply to Robert Helmer [:rhelmer] from comment #6) > * it looks like run.get_info()['log'] is sometimes not present. Should the > schema allow runs.log to be NULL? It currently does not. Good point, it should. > * builders table has a "hidden" column. Where does this come from? I can't > find anything like show/hide/hidden/etc. in the source JSON. php/updateBuilders.php sets hidden to true or false. The hidden state is one of the few things that are exclusively stored in the TBPL database and not imported from anywhere else. > * the old code inserted this to the mongo builders table: > "history": [{ > "date": int(time.time()), > "action": "insert" > }] > Is the purpose of this just to know what time the data was fetched and > inserted > into the database? Right. > I thought perhaps the builders_history was a replacement > for this It is. > but it has a bunch of other columns like "ip", "who", "reason", > etc. Those are empty for the "insert" history entry. Entries that use these columns are added by php/updateBuilders.php. Builder history is exposed by php/getBuilderHistory.php, though there's no UI for it yet. Example: http://tbpl.allizom.org/php/getBuilderHistory.php?name=mozilla-central_tegra_android-debug_test-crashtest (Notice that the "insert" entry doesn't have who/reason fields) The UI for updateBuilders.php is behind "Tree Info" -> "Open tree admin panel" on http://tbpl.allizom.org/?usebuildbot=1 . > * what is the purpose of the runs_notes table? This stores the build stars. That's the other kind of data that's only stored on the TBPL server and not imported from somewhere else. It's written by php/submitBuildStar.php and exposed by php/getRevisionBuilds.php.
BTW threw together a quick vagrant+puppet setup for tbpl: https://github.com/rhelmer/tbpl/tree/master/vagrant This will install and maintain tbpl and all dependencies into a local virtualbox VM. It's not complete yet, but good enough to test this patch.
Thanks Markus, that clarifies things. (In reply to Markus Stange from comment #7) > (In reply to Robert Helmer [:rhelmer] from comment #6) > > but it has a bunch of other columns like "ip", "who", "reason", > > etc. > > Those are empty for the "insert" history entry. Entries that use these > columns are added by php/updateBuilders.php. OK so it sounds like the current patch just needs to be updated to do an insert here then, and leave the rest alone for the frontend? I'll make that change, not sure if I will have a chance to get to tests and config for this but peterbe sounded eager to get on this in the morning so that's probably ok :)
Attached patch use mysql instead of mongodb (obsolete) — Splinter Review
This patch rolls up the code changes along with the schema (with my proposed changes to allow NULL for runs.log) along with a README update. I made the MySQL settings configurable via commandline switches; since this is run as a cron job I don't think that's ideal for the password though (the password will show up in the process list). We should probably also support an environment variable. Note that I am only committing the transaction at the very end of the process so this means that if anything fails during a run of this script, then everything is rolled back. I think playing it safe like this is probably fine, provided we're actually monitoring the output of this job. Leaving bug unassigned, peterbe feel free to either take it and finish it out or throw it back at me. Thanks!
Attachment #555821 - Attachment is obsolete: true
Attachment #555873 - Attachment is obsolete: true
Attachment #555873 - Flags: feedback?(peterbe)
Attachment #555896 - Flags: review?(peterbe)
Attachment #555896 - Flags: feedback?
Attachment #555896 - Flags: feedback?(mstange)
Attachment #555896 - Flags: feedback?(arpad.borsos)
Attachment #555896 - Flags: feedback?
(In reply to Robert Helmer [:rhelmer] from comment #10) > Created attachment 555896 [details] [diff] [review] > use mysql instead of mongodb > > This patch rolls up the code changes along with the schema (with my proposed > changes to allow NULL for runs.log) along with a README update. Oh, and also the builders_history schema now allows NULL for most columns, since "insert" doesn't use them (per comment 7).
Comment on attachment 555896 [details] [diff] [review] use mysql instead of mongodb Review of attachment 555896 [details] [diff] [review]: ----------------------------------------------------------------- Looks good. ::: dataimport/import-buildbot-data.py @@ +201,5 @@ > + if key in run_info: > + params.append(run_info[key]) > + else: > + params.append(None) > + cursor.execute("INSERT INTO runs VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)", params) I actually prefer the SET var = val syntax. It makes things a lot easier for the reader and allows schema changes that change the ordering. @@ +210,5 @@ > + cursor = db.cursor() > + cursor.execute("SELECT count(*) FROM builders WHERE name = %s", name) > + count = cursor.fetchone()[0] > + if count == 0: > + params = (name, branch, buildername, 0) Lets just add a default for hidden in the schema.
Attachment #555896 - Flags: feedback?(arpad.borsos) → feedback+
Comment on attachment 555896 [details] [diff] [review] use mysql instead of mongodb >diff --git a/dataimport/import-buildbot-data.py b/dataimport/import-buildbot-data.py >+ if key in run_info: >+ params.append(run_info[key]) >+ else: >+ params.append(None) This can also be written as params.append(run_info.get(key)) IIRC. >- if existing["buildername"] is None and buildername is not None: >- db.builders.update({"name": name}, {"$set": {"buildername": buildername}}) This part wasn't translated into MySQL. Looks good otherwise! I have no opinion on where to store authentication details. Aren't we completely screwed anyway if somebody can get access to the process list?
Attachment #555896 - Flags: feedback?(mstange) → feedback+
Comment on attachment 555896 [details] [diff] [review] use mysql instead of mongodb Review of attachment 555896 [details] [diff] [review]: ----------------------------------------------------------------- A couple of nits but in principle OK. I'd like some light on the use of INT instead of DATETIME for some columns. Also, it would be much better with another form of INSERT statements so you can see what columns are being used and don't depend on the order they were defined in the schema. ::: dataimport/import-buildbot-data.py @@ +206,1 @@ > return True I agree with Arpad. Then we don't need to bother with the None params too. @@ +212,5 @@ > + count = cursor.fetchone()[0] > + if count == 0: > + params = (name, branch, buildername, 0) > + cursor.execute("INSERT INTO builders VALUES (%s, %s, %s, %s)", params) > + Again, named columns format for INSERT INTO makes for much more readable code. Also makes it possible to mess with the order of the columns. @@ +213,5 @@ > + if count == 0: > + params = (name, branch, buildername, 0) > + cursor.execute("INSERT INTO builders VALUES (%s, %s, %s, %s)", params) > + > + history_params = (name, int(time.time()), "insert", None, None, None) If the timezone doesn't matter, can't we use some sort of default on the column instead? This `int(time.time())` isn't pretty. ::: dataimport/schema.sql @@ +29,5 @@ > +CREATE TABLE IF NOT EXISTS `builders` ( > + `name` varchar(128) COLLATE utf8_unicode_ci NOT NULL, > + `branch` varchar(256) COLLATE utf8_unicode_ci NOT NULL, > + `buildername` varchar(256) COLLATE utf8_unicode_ci NOT NULL, > + `hidden` tinyint(1) NOT NULL, Why is this an int? Why not a boolean? A boolean is explicit and obvious but when it's an int I have to wonder if there's some sort of index like "hidden=3 means was hidden now not hidden". @@ +35,5 @@ > + KEY `hidden` (`hidden`) > +) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; > + > +-- -------------------------------------------------------- > + CREATE INDEX builders_name_idx ON builders (name); Mayhaps? @@ +61,5 @@ > + `id` int(10) unsigned NOT NULL, > + `buildername` varchar(256) COLLATE utf8_unicode_ci NOT NULL, > + `slave` varchar(256) COLLATE utf8_unicode_ci NOT NULL, > + `revision` varchar(40) COLLATE utf8_unicode_ci NOT NULL, > + `starttime` int(10) unsigned NOT NULL, Why are they ints? A datetime column uses the same amount of bytes as this and has possibly many other benefits such as making it impossible to insert junk (real world) date times. @@ +65,5 @@ > + `starttime` int(10) unsigned NOT NULL, > + `endtime` int(10) unsigned NOT NULL, > + `result` varchar(20) COLLATE utf8_unicode_ci NOT NULL, > + `branch` varchar(256) COLLATE utf8_unicode_ci NOT NULL, > + `log` varchar(256) COLLATE utf8_unicode_ci, If this is log data, why isn't it TEXT? Isn't there a risk of running out of 256 characters? @@ +71,5 @@ > + KEY `revision` (`revision`,`branch`(255)) > +) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; > + > +-- -------------------------------------------------------- > + If 'branch' is going to be a very commonly queried attribute, should we perhaps already add: CREATE INDEX runs_user_idx ON runs (branch); @@ +80,5 @@ > +CREATE TABLE IF NOT EXISTS `runs_notes` ( > + `runid` int(10) unsigned NOT NULL, > + `who` varchar(256) COLLATE utf8_unicode_ci NOT NULL, > + `note` text COLLATE utf8_unicode_ci NOT NULL, > + `timestamp` int(10) unsigned NOT NULL, Again, why isn't this a datetime?
Attachment #555896 - Flags: review?(peterbe) → review+
(In reply to Peter Bengtsson [:peterbe] from comment #14) > I'd like some light on the use of INT instead of DATETIME for some columns. > Also, it would be much better with another form of INSERT statements so you > can see what columns are being used and don't depend on the order they were > defined in the schema. We were using ints in mongo too. I like using ints in mysql simply because the apis dont cast to the language internal date object, which is int in php, so we would have to SELECT unix_timestamp() anyways because otherwise its just a pain. Other opinions? > > + `hidden` tinyint(1) NOT NULL, > > Why is this an int? Why not a boolean? A boolean is explicit and obvious but > when it's an int I have to wonder if there's some sort of index like > "hidden=3 means was hidden now not hidden". http://dev.mysql.com/doc/refman/5.0/en/numeric-type-overview.html: BOOL, BOOLEAN These types are synonyms for TINYINT(1). So I really created that as boolean, but the pma export made a tinyint(1) out of it. > > + `log` varchar(256) COLLATE utf8_unicode_ci, > > If this is log data, why isn't it TEXT? Isn't there a risk of running out of > 256 characters? This is a url to the log stored on ftp, so not the log contents themselves. So yes, I would like some comments on the date types aswell. For the php-side, using plain ints is the easiest.
(In reply to Markus Stange from comment #13) > Comment on attachment 555896 [details] [diff] [review] > use mysql instead of mongodb > > >diff --git a/dataimport/import-buildbot-data.py b/dataimport/import-buildbot-data.py > > >+ if key in run_info: > >+ params.append(run_info[key]) > >+ else: > >+ params.append(None) > > This can also be written as params.append(run_info.get(key)) IIRC. The reason it's guarded like this is that the "log" key may not exist. This made me concerned that other keys may not exist so I just took a generic approach. We could use defaultdict and it'd look nicer, I was just in a hurry :) > >- if existing["buildername"] is None and buildername is not None: > >- db.builders.update({"name": name}, {"$set": {"buildername": buildername}}) > > This part wasn't translated into MySQL. Can you explain it? So if there is already a row for this name/branch/buildername ("existing" in the old code), why would we need to update the buildername on that row? Thanks for catching that, I should have called this out specifically that I removed it since I wasn't clear on what it was for, my fault. > Looks good otherwise! > > I have no opinion on where to store authentication details. Aren't we > completely screwed anyway if somebody can get access to the process list? Just bad form IMHO. It's conceivable that someone could have an account on the machine that we would not want to have logging into the db r/w.
(In reply to Arpad Borsos (Swatinem) from comment #12) > Comment on attachment 555896 [details] [diff] [review] > use mysql instead of mongodb > > Review of attachment 555896 [details] [diff] [review]: > ----------------------------------------------------------------- > > Looks good. > > ::: dataimport/import-buildbot-data.py > @@ +201,5 @@ > > + if key in run_info: > > + params.append(run_info[key]) > > + else: > > + params.append(None) > > + cursor.execute("INSERT INTO runs VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)", params) > > I actually prefer the SET var = val syntax. It makes things a lot easier for > the reader and allows schema changes that change the ordering. I do in general too, I'll change it. I assume you are ok with this instead of SET though right?: INSERT INTO table (col1, col2) VALUES (val1, val2); > @@ +210,5 @@ > > + cursor = db.cursor() > > + cursor.execute("SELECT count(*) FROM builders WHERE name = %s", name) > > + count = cursor.fetchone()[0] > > + if count == 0: > > + params = (name, branch, buildername, 0) > > Lets just add a default for hidden in the schema. Ok, thanks!
(In reply to Robert Helmer [:rhelmer] from comment #17) > (In reply to Arpad Borsos (Swatinem) from comment #12) > > Comment on attachment 555896 [details] [diff] [review] > > use mysql instead of mongodb > > > > Review of attachment 555896 [details] [diff] [review]: > > ----------------------------------------------------------------- > > > > Looks good. > > > > ::: dataimport/import-buildbot-data.py > > @@ +201,5 @@ > > > + if key in run_info: > > > + params.append(run_info[key]) > > > + else: > > > + params.append(None) > > > + cursor.execute("INSERT INTO runs VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)", params) > > > > I actually prefer the SET var = val syntax. It makes things a lot easier for > > the reader and allows schema changes that change the ordering. > > > I do in general too, I'll change it. > > I assume you are ok with this instead of SET though right?: > INSERT INTO table (col1, col2) > VALUES (val1, val2); > Definitely!
Attached patch use mysql instead of mongodb (obsolete) — Splinter Review
* explicit column names in INSERTs * builders.hidden now boolean DEFAULT FALSE * re-add update for buildername if name already exists (missed from mongo version) * added indexes for builders.name and runs.branch * honor MYSQL_PASSWORD env var (command-line switch has precedence) I've left alone the questions peterbe had on the schema about date versus int, I'll defer this to Arpad. I also would prefer to use a real datetime type rather than ints.
Assignee: nobody → rhelmer
Attachment #555896 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #556044 - Flags: review?(peterbe)
Attachment #556044 - Flags: feedback?(mstange)
Attachment #556044 - Flags: feedback?(arpad.borsos)
Attached patch use mysql instead of mongodb (obsolete) — Splinter Review
Diff against the tip of hg rather than my previous patch.
Attachment #556044 - Attachment is obsolete: true
Attachment #556044 - Flags: review?(peterbe)
Attachment #556044 - Flags: feedback?(mstange)
Attachment #556044 - Flags: feedback?(arpad.borsos)
Attachment #556045 - Flags: review?(peterbe)
Attachment #556045 - Flags: feedback?(mstange)
Attachment #556045 - Flags: feedback?(arpad.borsos)
Attached patch use mysql instead of mongodb (obsolete) — Splinter Review
* return True if update builders path elif branch is taken
Attachment #556045 - Attachment is obsolete: true
Attachment #556045 - Flags: review?(peterbe)
Attachment #556045 - Flags: feedback?(mstange)
Attachment #556045 - Flags: feedback?(arpad.borsos)
Attachment #556049 - Flags: review?(peterbe)
Attachment #556049 - Flags: feedback?(mstange)
Attachment #556049 - Flags: feedback?(arpad.borsos)
dumb question, builders.name is declared as a primary key, thus it has a unique index, why do you need an additional index on it? If it works like I think, you are adding useless overhead there.
Comment on attachment 556049 [details] [diff] [review] use mysql instead of mongodb >diff --git a/dataimport/import-buildbot-data.py b/dataimport/import-buildbot-data.py >index 8fcee43..a35deee 100644 >--- a/dataimport/import-buildbot-data.py >+++ b/dataimport/import-buildbot-data.py > def add_builder_to_db(builder, db): > (name, branch, buildername) = builder >- existing = db.builders.find_one({"name": name}, {"buildername": 1}) >- if not existing: >- db.builders.insert({ >- "name": name, >- "branch": branch, >- "buildername": buildername, >- "history": [{ >- "date": int(time.time()), >- "action": "insert" >- }] >- }) >+ cursor = db.cursor() >+ cursor.execute("SELECT count(*) FROM builders WHERE name = %s", name) >+ count = cursor.fetchone()[0] >+ if count == 0: >+ params = (name, branch, buildername) >+ cursor.execute("""INSERT INTO builders (name, branch, buildername) >+ VALUES (%s, %s, %s)""", params) I have been testing this, and discovered that buildername can sometimes be None (maybe this is why there is the buildername update that mstange pointed out?) I think we probably want to allow NULL for this in the schema.
Comment on attachment 556049 [details] [diff] [review] use mysql instead of mongodb Review of attachment 556049 [details] [diff] [review]: ----------------------------------------------------------------- If you really want to use datetime, go ahead, and make it please default to NOW(). I will just SELECT unix_timestamp() from the php side, we are using int timestamps in js as well, the casting to date happens clientside. Sorry for the delay, I´ll only get to the php tomorrow, I´m not on my devel machine right now. ::: dataimport/schema.sql @@ +34,5 @@ > + PRIMARY KEY (`name`), > + KEY `hidden` (`hidden`) > +) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; > + > +CREATE INDEX builders_name_idx ON builders (name); As Marco mentioned, this is already a primary key. @@ +72,5 @@ > + PRIMARY KEY (`id`), > + KEY `revision` (`revision`,`branch`(255)) > +) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; > + > +CREATE INDEX runs_user_idx ON runs (branch); This as well is already part of the compound key revision, branch. I´m not really sure what the difference is there...
Attachment #556049 - Flags: feedback?(arpad.borsos) → feedback+
(In reply to Arpad Borsos (Swatinem) from comment #24) > @@ +72,5 @@ > > + PRIMARY KEY (`id`), > > + KEY `revision` (`revision`,`branch`(255)) > > +) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; > > + > > +CREATE INDEX runs_user_idx ON runs (branch); > > This as well is already part of the compound key revision, branch. I´m not > really sure what the difference is there... If your query filters on revision AND on branch, the additional index is useless. Otherwise if you have queries hitting only branch, but not revision, the additional index is useful. It really depends on the queries you are going to run on it.
> I have been testing this, and discovered that buildername can sometimes be > None (maybe this is why there is the buildername update that mstange pointed > out?) Exactly. Here's the comment I typed a few hours ago but didn't submit: (In reply to Robert Helmer [:rhelmer] from comment #16) > > >- if existing["buildername"] is None and buildername is not None: > > >- db.builders.update({"name": name}, {"$set": {"buildername": buildername}}) > > > > This part wasn't translated into MySQL. > > > Can you explain it? So if there is already a row for this > name/branch/buildername ("existing" > in the old code), why would we need to update the buildername on that row? A builder has a "name" and a "buildername" (sorry for the utterly confusing naming). "name" is used for the internal identification of a builder, and "buildername" is what appears in the UI. Example: name: mozilla-central-macosx64-xulrunner buildername: OS X 10.6.2 mozilla-central xulrunner We only have a value for "buildername" if this JSON document contained a run from that builder. If there's no such run, we insert the builder with an empty buildername and hope that at some point in the future we encounter a run on that builder so that we can fill in the missing buildername.
Attached patch use mysql instead of mongodb (obsolete) — Splinter Review
Same as last time plus: * remove indexes (let's look at the queries the frontend uses first) * use datetime and timestamp instead of int for dates
Attachment #556049 - Attachment is obsolete: true
Attachment #556049 - Flags: review?(peterbe)
Attachment #556049 - Flags: feedback?(mstange)
Attachment #556119 - Flags: review?(peterbe)
Attachment #556119 - Flags: feedback?(mstange)
Attachment #556119 - Flags: feedback?(arpad.borsos)
Comment on attachment 556119 [details] [diff] [review] use mysql instead of mongodb Review of attachment 556119 [details] [diff] [review]: ----------------------------------------------------------------- Regarding the timestamps, you are mixing both timestamp and datetime types. From reading the docs, the difference is that timestamp is limited to 1970-2038 white datetime isnt. I dont have any preference to any one as long as its consistent. ::: dataimport/import-buildbot-data.py @@ +204,5 @@ > + params.append(run_info[key]) > + else: > + params.append(None) > + cursor.execute("""INSERT INTO runs (id, buildername, slave, revision, starttime, endtime, result, branch, log) > + VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)""", params) Oh, this just occurred to me. If we have overwrite set, we probably want a ON DUPLICATE KEY UPDATE here.
(In reply to Arpad Borsos (Swatinem) from comment #28) > Comment on attachment 556119 [details] [diff] [review] > use mysql instead of mongodb > > Review of attachment 556119 [details] [diff] [review]: > ----------------------------------------------------------------- > > Regarding the timestamps, you are mixing both timestamp and datetime types. > From reading the docs, the difference is that timestamp is limited to > 1970-2038 white datetime isnt. I dont have any preference to any one as long > as its consistent. I'd prefer to just use datetime for both, the only reason for using timestamp is that MySQL supports having a default of CURRENT_TIMESTAMP (it does not support this for datetime AFAICT). I'd be curious what peterbe suggests, since he brought this up :) > ::: dataimport/import-buildbot-data.py > @@ +204,5 @@ > > + params.append(run_info[key]) > > + else: > > + params.append(None) > > + cursor.execute("""INSERT INTO runs (id, buildername, slave, revision, starttime, endtime, result, branch, log) > > + VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)""", params) > > Oh, this just occurred to me. If we have overwrite set, we probably want a > ON DUPLICATE KEY UPDATE here. Good catch, I noticed that there was a "--force" mode, but had not tested. We do indeed get: _mysql_exceptions.IntegrityError: (1062, "Duplicate entry '6142369' for key 'PRIMARY'") I've changed this locally to: cursor.execute("""INSERT INTO runs (id, buildername, slave, revision, starttime, endtime, result, branch, log) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s) ON DUPLICATE KEY UPDATE id=%s, buildername=%s, slave=%s, revision=%s, starttime=%s, endtime=%s, result=%s, branch=%s, log=%s""", params * 2) Suggestions on something less ugly welcome :)
Attached patch preparation: unify tbpl config (obsolete) — Splinter Review
Just moves all the config into one file.
Attachment #556232 - Flags: review?(mstange)
Comment on attachment 556232 [details] [diff] [review] preparation: unify tbpl config The ?> at the end of config.php is unnecessary.
Attachment #556232 - Flags: review?(mstange) → review+
Attachment #556119 - Flags: feedback?(mstange) → feedback+
(In reply to Robert Helmer [:rhelmer] from comment #16) > (In reply to Markus Stange from comment #13) > > Comment on attachment 555896 [details] [diff] [review] > > use mysql instead of mongodb > > > > >diff --git a/dataimport/import-buildbot-data.py b/dataimport/import-buildbot-data.py > > > > >+ if key in run_info: > > >+ params.append(run_info[key]) > > >+ else: > > >+ params.append(None) > > > > This can also be written as params.append(run_info.get(key)) IIRC. > > The reason it's guarded like this is that the "log" key may not exist. This > made me concerned that other keys may not exist so I just took a generic > approach. > > We could use defaultdict and it'd look nicer, I was just in a hurry :) I don't think we need a defaultdict. dict.get(i) returns dict[i] if the key i exists, and None otherwise.
Attached patch port php part to mysql (obsolete) — Splinter Review
So this is the php part, I hope I haven’t missed anything. Runs on http://tbpl.swatinem.de/?usebuildbot=1. The default view works, as well as the hidden builder admin and I have tested commenting. Needs more thorough testing though. philor? A big WTF for me is that none of the changes I do (like hiding) are reflected in phpmyadmin?!? rhelmer: I made changes to the schema. would you please move those to your patch, move the schema into the root directory and mention in the readme that one needs to import it? I hope that minimizes the merge conflicts there a bit.
Attachment #556249 - Flags: review?(mstange)
Attached patch port php part to mysql (obsolete) — Splinter Review
Sorry, my bad, I haven’t actually qpushed my server :-) Now with two additional fixes
Attachment #556249 - Attachment is obsolete: true
Attachment #556249 - Flags: review?(mstange)
Attachment #556250 - Flags: review?(mstange)
Just a small observation: Importing by date told me it updated 5000+ builders, while I only have like 2600 in my db. So probably get_builders() should group by name and "UPDATE" in python instead of going through a COUNT and UPDATE for the duplicated entries. Feel free to leave this for later, its not urgent.
Comment on attachment 556250 [details] [diff] [review] port php part to mysql >-echo json_encode(iterator_to_array($result->sort(array('name'=>1)), false)) . "\n"; >+ ORDER BY buildername ASC;"); This is a behavior change, but I agree that it makes more sense this way. I haven't found any mistakes. Overall this is a lot more manageable than I expected!
Attachment #556250 - Flags: review?(mstange) → review+
(In reply to Markus Stange from comment #36) > >-echo json_encode(iterator_to_array($result->sort(array('name'=>1)), false)) . "\n"; > > >+ ORDER BY buildername ASC;"); > > This is a behavior change, but I agree that it makes more sense this way. I somehow expect that the ordering in mongo hasn’t worked at all. The machines in the hidden builders admin don’t seem to follow any particular order.
Well, looking at the data-name attribute of the LIs in the list, the order seems to be correct to me. But again, ordering by buildername makes more sense.
It all looks good to me. The order doesn't matter much, because neither set of names has that much of a real plan to its creation, so anyone who's used to hiding builds is used to cmd+f being the only way to find them. Doing any better would require getting releng to agree on a plan for future ones, and then writing a parser for that plus the legacy ones, which is pretty far down the priority list considering what an enormous improvement having a filter already is.
(In reply to Markus Stange from comment #32) > (In reply to Robert Helmer [:rhelmer] from comment #16) > > (In reply to Markus Stange from comment #13) > > > Comment on attachment 555896 [details] [diff] [review] > > > use mysql instead of mongodb > > > > > > >diff --git a/dataimport/import-buildbot-data.py b/dataimport/import-buildbot-data.py > > > > > > >+ if key in run_info: > > > >+ params.append(run_info[key]) > > > >+ else: > > > >+ params.append(None) > > > > > > This can also be written as params.append(run_info.get(key)) IIRC. > > > > The reason it's guarded like this is that the "log" key may not exist. This > > made me concerned that other keys may not exist so I just took a generic > > approach. > > > > We could use defaultdict and it'd look nicer, I was just in a hurry :) > > I don't think we need a defaultdict. dict.get(i) returns dict[i] if the key > i exists, and None otherwise. Oops, right you are. I'll make that change, thanks!
(In reply to Arpad Borsos (Swatinem) from comment #35) > Just a small observation: > Importing by date told me it updated 5000+ builders, while I only have like > 2600 in my db. So probably get_builders() should group by name and "UPDATE" > in python instead of going through a COUNT and UPDATE for the duplicated > entries. > > Feel free to leave this for later, its not urgent. I was just about to comment on this; I think it is easy enough to fix. Was this any different with mongo? The count comes from: count = sum([add_builder_to_db(builder, db) for builder in get_builders(j)]) And it looks like this case would've always returned true before... I think the simplest fix would be to modify the update to do "... AND buildername != buildername" then only return True if the number of rows modified > 1. I think we still want the SQL count() since that's how we're deciding to insert, update, or do nothing (although I suppose we could just replace all this with an "INSERT ... ON DUPLICATE KEY" and return True/False if the insert/update modified any rows). I'm not sure what you had in mind with 'group by name and "UPDATE" in python', can you clarify for me? Thanks!
(In reply to Robert Helmer [:rhelmer] from comment #41) > (In reply to Arpad Borsos (Swatinem) from comment #35) > > Just a small observation: > > Importing by date told me it updated 5000+ builders, while I only have like > > 2600 in my db. So probably get_builders() should group by name and "UPDATE" > > in python instead of going through a COUNT and UPDATE for the duplicated > > entries. > > > > Feel free to leave this for later, its not urgent. > > I was just about to comment on this; I think it is easy enough to fix. > > Was this any different with mongo? Yes, because this condition is gone: "if existing["buildername"] is None". But apart from the builder update count it won't make a difference in practice, I think.
(In reply to Robert Helmer [:rhelmer] from comment #41) > And it looks like this case would've always returned true before... I think > the simplest fix would be to modify the update to do "... AND buildername != > buildername" then only return True if the number of rows modified > 1. Actually that's not even necessary, we'll still get the correct update count back with a patch as small as: - return True + if cursor.rowcount == 0: + return False + else: + return True
(In reply to Markus Stange from comment #42) > (In reply to Robert Helmer [:rhelmer] from comment #41) > > (In reply to Arpad Borsos (Swatinem) from comment #35) > > > Just a small observation: > > > Importing by date told me it updated 5000+ builders, while I only have like > > > 2600 in my db. So probably get_builders() should group by name and "UPDATE" > > > in python instead of going through a COUNT and UPDATE for the duplicated > > > entries. > > > > > > Feel free to leave this for later, its not urgent. > > > > I was just about to comment on this; I think it is easy enough to fix. > > > > Was this any different with mongo? > > Yes, because this condition is gone: "if existing["buildername"] is None". > But apart from the builder update count it won't make a difference in > practice, I think. Ah right ok, I think instead of having to do another count() in mysql we can just look at the number of rows updated (per comment 43) to get the correct count back again.
Comment on attachment 556232 [details] [diff] [review] preparation: unify tbpl config Oh, you probably want to make a change to php/.htaccess, too.
Attached patch use mysql instead of mongodb (obsolete) — Splinter Review
* use timestamp instead of datetime (comment 28) * support --force (INSERT ... ON DUPLICATE KEY UPDATE) * return correct builder update count
Attachment #556119 - Attachment is obsolete: true
Attachment #556119 - Flags: review?(peterbe)
Attachment #556119 - Flags: feedback?(arpad.borsos)
Attachment #556280 - Flags: review?(peterbe)
Attachment #556280 - Flags: feedback?(mstange)
Attachment #556280 - Flags: feedback?(arpad.borsos)
Attached patch use mysql instead of mongodb (obsolete) — Splinter Review
same as comment 46 but remove extraneous vagrant patches
Attachment #556280 - Attachment is obsolete: true
Attachment #556280 - Flags: review?(peterbe)
Attachment #556280 - Flags: feedback?(mstange)
Attachment #556280 - Flags: feedback?(arpad.borsos)
Attachment #556282 - Flags: review?(peterbe)
Attachment #556282 - Flags: feedback?(mstange)
Attachment #556282 - Flags: feedback?(arpad.borsos)
Attachment #556282 - Flags: feedback?(arpad.borsos) → feedback+
Attached patch preparation: unify tbpl config (obsolete) — Splinter Review
refreshed for the .htaccess changes
Attachment #556232 - Attachment is obsolete: true
Attachment #556282 - Flags: feedback?(mstange) → feedback+
Comment on attachment 556282 [details] [diff] [review] use mysql instead of mongodb Review of attachment 556282 [details] [diff] [review]: ----------------------------------------------------------------- Giving this an r- because I don't think we have the primary keys right. I could be wrong, perhaps they're really not needed and perhaps I'm just being paranoid in which case I'm happy to be proven otherwise. ::: dataimport/schema.sql @@ +30,5 @@ > + `name` varchar(128) COLLATE utf8_unicode_ci NOT NULL, > + `branch` varchar(256) COLLATE utf8_unicode_ci NOT NULL, > + `buildername` varchar(256) COLLATE utf8_unicode_ci, > + `hidden` boolean NOT NULL DEFAULT FALSE, > + PRIMARY KEY (`name`), Maybe because I haven't been involved in every comment on this bug but I think I saw that it was mentioned that 'name' is sometimes NULL. Also, can it every repeat? Why don't we use an auto-incrementing integer here instead? Having a VARCHAR as primary key is only safe if we know for certain that the value is never repeated and always something. Performancewise I think an INT or VARCHAR primary key is about the same. @@ +47,5 @@ > + `action` varchar(20) COLLATE utf8_unicode_ci NOT NULL, > + `who` varchar(256) COLLATE utf8_unicode_ci, > + `reason` varchar(256) COLLATE utf8_unicode_ci, > + `ip` varchar(20) COLLATE utf8_unicode_ci, > + PRIMARY KEY (`builder`,`date`) Same here, why not use an INT instead. The date is certainly going to be a repeated value. Sort of sorts feasible that the same builder makes to actions (aka. history) on the same date. @@ +57,5 @@ > +-- Table structure for table `runs` > +-- > + > +CREATE TABLE IF NOT EXISTS `runs` ( > + `id` int(10) unsigned NOT NULL, Why doesn't this have a 'AUTO_INCREMENT' ? @@ +77,5 @@ > +-- Table structure for table `runs_notes` > +-- > + > +CREATE TABLE IF NOT EXISTS `runs_notes` ( > + `runid` int(10) unsigned NOT NULL, This table doesn't even have a primary key. I think we should prepend "`id` integer auto_increment not null primary key"
Attachment #556282 - Flags: review?(peterbe) → review-
(In reply to Peter Bengtsson [:peterbe] from comment #49) > @@ +30,5 @@ > > + `name` varchar(128) COLLATE utf8_unicode_ci NOT NULL, > > + `branch` varchar(256) COLLATE utf8_unicode_ci NOT NULL, > > + `buildername` varchar(256) COLLATE utf8_unicode_ci, > > + `hidden` boolean NOT NULL DEFAULT FALSE, > > + PRIMARY KEY (`name`), > > Maybe because I haven't been involved in every comment on this bug but I > think I saw that it was mentioned that 'name' is sometimes NULL. Also, can > it every repeat? buildername may be NULL, name not. Its how builders are uniquely identified. > @@ +47,5 @@ > > + `action` varchar(20) COLLATE utf8_unicode_ci NOT NULL, > > + `who` varchar(256) COLLATE utf8_unicode_ci, > > + `reason` varchar(256) COLLATE utf8_unicode_ci, > > + `ip` varchar(20) COLLATE utf8_unicode_ci, > > + PRIMARY KEY (`builder`,`date`) > > Same here, why not use an INT instead. > The date is certainly going to be a repeated value. Sort of sorts feasible > that the same builder makes to actions (aka. history) on the same date. We could make this int, no problem. Key on builder only then. > > +CREATE TABLE IF NOT EXISTS `runs` ( > > + `id` int(10) unsigned NOT NULL, > > Why doesn't this have a 'AUTO_INCREMENT' ? Because we assign the ids ourselves, they are included in the json files we parse. > > +CREATE TABLE IF NOT EXISTS `runs_notes` ( > > + `runid` int(10) unsigned NOT NULL, > > This table doesn't even have a primary key. > I think we should prepend > > "`id` integer auto_increment not null primary key" We could, sure. Playing around with mongo a bit, I´m not sure "hasMany" relationships need a primary key (in mongo they are just an array inside the object), although thats what we get taught when working with sql.
(In reply to Arpad Borsos (Swatinem) from comment #50) > (In reply to Peter Bengtsson [:peterbe] from comment #49) > > @@ +30,5 @@ > > > + `name` varchar(128) COLLATE utf8_unicode_ci NOT NULL, > > > + `branch` varchar(256) COLLATE utf8_unicode_ci NOT NULL, > > > + `buildername` varchar(256) COLLATE utf8_unicode_ci, > > > + `hidden` boolean NOT NULL DEFAULT FALSE, > > > + PRIMARY KEY (`name`), > > > > Maybe because I haven't been involved in every comment on this bug but I > > think I saw that it was mentioned that 'name' is sometimes NULL. Also, can > > it every repeat? > > buildername may be NULL, name not. Its how builders are uniquely identified. To answer the other question, I think the idea is that if it does repeat then we catch that and do an UPDATE instead (the idea being that this can initially be NULL or otherwise incorrect, and we will later correct it). If we can make the schema and/or code clearer here to reflect this intent I am all for it, any suggestions ? :) > > @@ +47,5 @@ > > > + `action` varchar(20) COLLATE utf8_unicode_ci NOT NULL, > > > + `who` varchar(256) COLLATE utf8_unicode_ci, > > > + `reason` varchar(256) COLLATE utf8_unicode_ci, > > > + `ip` varchar(20) COLLATE utf8_unicode_ci, > > > + PRIMARY KEY (`builder`,`date`) > > > > Same here, why not use an INT instead. > > The date is certainly going to be a repeated value. Sort of sorts feasible > > that the same builder makes to actions (aka. history) on the same date. > > We could make this int, no problem. Key on builder only then. So is the suggestion here to make just builder the primary key, or to add an id(int) column? > > > +CREATE TABLE IF NOT EXISTS `runs` ( > > > + `id` int(10) unsigned NOT NULL, > > > > Why doesn't this have a 'AUTO_INCREMENT' ? > > Because we assign the ids ourselves, they are included in the json files we > parse. > > > > +CREATE TABLE IF NOT EXISTS `runs_notes` ( > > > + `runid` int(10) unsigned NOT NULL, > > > > This table doesn't even have a primary key. > > I think we should prepend > > > > "`id` integer auto_increment not null primary key" > > We could, sure. Playing around with mongo a bit, I´m not sure "hasMany" > relationships need a primary key (in mongo they are just an array inside the > object), although thats what we get taught when working with sql. I know a lot of people consider this good form in relational databases to always have an internally unique id in the table as the primary key, I don't think it'd hurt in any case.
(In reply to Robert Helmer [:rhelmer] from comment #51) > I know a lot of people consider this good form in relational databases to > always have an internally unique id in the table as the primary key, I don't > think it'd hurt in any case. (the exception being join aka association tables, imho)
(In reply to Robert Helmer [:rhelmer] from comment #51) > > > @@ +47,5 @@ > > > > + `action` varchar(20) COLLATE utf8_unicode_ci NOT NULL, > > > > + `who` varchar(256) COLLATE utf8_unicode_ci, > > > > + `reason` varchar(256) COLLATE utf8_unicode_ci, > > > > + `ip` varchar(20) COLLATE utf8_unicode_ci, > > > > + PRIMARY KEY (`builder`,`date`) > > > > > > Same here, why not use an INT instead. > > > The date is certainly going to be a repeated value. Sort of sorts feasible > > > that the same builder makes to actions (aka. history) on the same date. > > > > We could make this int, no problem. Key on builder only then. > > > So is the suggestion here to make just builder the primary key, or to add an > id(int) column? Add an int. And a non-unique key on builder. Since we have more than one history entry per builder. > I know a lot of people consider this good form in relational databases to > always have an internally unique id in the table as the primary key, I don't > think it'd hurt in any case. Sure. I just like to question everything I know when I learn something new ;-)
Attached patch use mysql instead of mongodb (obsolete) — Splinter Review
Not sure if I addressed everyone's concerns here; I'll let you two figure it out if not :) Here's what I've done: * refactored schema to be more readable (IMHO; let me know if you see any semantic change, AFAICT from the docs "KEY" means "INDEX" in this context, there is no reason not to define the FK and indexes in the CREATE TABLE that I can see) * add an "id integer auto_increment not null primary key" on all 3 tables * foreign keys now reference "id" only (note I added support for this in the insert) * remove "builder" on builders_history, can get this with a "join builders on builder_id = builders.id" * rename runs.run_id to runs.buildbot_id for clarity (I threw this in because I found it confusing to have runs_notes.run_id reference runs.id, and have runs.run_id hanging about) I took a quick look through the PHP patch and I guess this might require changes depending on what "id" is used for (I notice that is in a few selects). I don't see runs.run_id used anywhere though. Removing builders_history.builder would impact php/getBuilderHistory.php, php/updateBuilders.php (I can revert that if it's too much of a pain, just seemed like a reasonable normalization).
Attachment #556282 - Attachment is obsolete: true
Attachment #556510 - Flags: review?(peterbe)
Attachment #556510 - Flags: review?(arpad.borsos)
(In reply to Robert Helmer [:rhelmer] from comment #54) > * add an "id integer auto_increment not null primary key" on all 3 tables s/all 3/all 4/
BTW thanks everyone for putting up with so much churn on this bug, I am hopeful that we'll come up with something that everyone is satisfied with.
Comment on attachment 556510 [details] [diff] [review] use mysql instead of mongodb Review of attachment 556510 [details] [diff] [review]: ----------------------------------------------------------------- The schema starts to look much more plain and right now. Great work!
Attachment #556510 - Flags: review?(peterbe) → review+
Comment on attachment 556510 [details] [diff] [review] use mysql instead of mongodb Review of attachment 556510 [details] [diff] [review]: ----------------------------------------------------------------- This needs a few updates, right. Will probably get to it later today ::: dataimport/schema.sql @@ +31,5 @@ > + `name` varchar(128) COLLATE utf8_unicode_ci NOT NULL, > + `branch` varchar(256) COLLATE utf8_unicode_ci NOT NULL, > + `buildername` varchar(256) COLLATE utf8_unicode_ci, > + `hidden` boolean NOT NULL DEFAULT FALSE, > + INDEX builders_name_idx (name), Probably want a unique for this. @@ +50,5 @@ > + `who` varchar(256) COLLATE utf8_unicode_ci, > + `reason` varchar(256) COLLATE utf8_unicode_ci, > + `ip` varchar(20) COLLATE utf8_unicode_ci, > + INDEX builders_id_date_idx (builder_id), > + INDEX builders_history_date_idx (date), builders_id_date_idx has the wrong name. Also im not sure we need a date index at all. I just added it back then because I thought it would make a good compound primary. @@ +73,5 @@ > + `result` varchar(20) COLLATE utf8_unicode_ci NOT NULL, > + `branch` varchar(256) COLLATE utf8_unicode_ci NOT NULL, > + `log` varchar(256) COLLATE utf8_unicode_ci, > + INDEX runs_buildbot_id_idx (buildbot_id), > + INDEX runs_revision_buildername_idx (revision,buildername) This needs to be branch.
Attachment #556510 - Flags: review?(arpad.borsos) → review-
Good catches, thanks!
Attachment #556250 - Attachment is obsolete: true
Attachment #556562 - Flags: review?(arpad.borsos)
Comment on attachment 556562 [details] [diff] [review] use mysql instead of mongodb Review of attachment 556562 [details] [diff] [review]: ----------------------------------------------------------------- Good. I´m off now, hopefully i will still find some time today to update my patches, but probably have to defer that to tomorrow.
Attachment #556562 - Flags: review?(arpad.borsos) → review+
Refreshed. I moved schema.sql to the root dir and removed some useless comments therein.
Attachment #556284 - Attachment is obsolete: true
Attachment #556776 - Flags: review+
I had to make runs_buildbot_id_idx UNIQUE, otherwise the ON DUPLICATE KEY UPDATE will never work. Man, agreeing on a schema really is annoyingly difficult ;-) Otherwise just a few changes for the new schema, nothing fancy. Do you want to take another look at it Markus?
Attachment #556510 - Attachment is obsolete: true
Attachment #556785 - Flags: review?(mstange)
Comment on attachment 556785 [details] [diff] [review] port php part to mysql Looks good as far as I can tell.
Attachment #556785 - Flags: review?(mstange) → review+
Lets get this landed then. rhelmer: will you take it from here? My patches already include hg headers.
(In reply to Arpad Borsos (Swatinem) from comment #64) > Lets get this landed then. > rhelmer: will you take it from here? My patches already include hg headers. Pushed to: http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/ Revs: 22f844d0e37a 1f260d60531a fe89735ec762 I'll test this in the vagrant env and see if it's all working ok, lmk if you notice anything amiss. Thanks everyone!
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: Webtools → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: