Closed
Bug 806121
Opened 12 years ago
Closed 12 years ago
Update TBPL schema
Categories
(Data & BI Services Team :: DB: MySQL, task)
Data & BI Services Team
DB: MySQL
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: emorley, Assigned: emorley)
References
Details
Attachments
(6 files, 1 obsolete file)
952 bytes,
patch
|
scabral
:
review+
|
Details | Diff | Splinter Review |
1.01 KB,
patch
|
scabral
:
review+
|
Details | Diff | Splinter Review |
4.61 KB,
patch
|
scabral
:
review+
Swatinem
:
feedback+
|
Details | Diff | Splinter Review |
2.90 KB,
patch
|
scabral
:
review+
Swatinem
:
feedback+
|
Details | Diff | Splinter Review |
1.09 KB,
patch
|
scabral
:
review+
|
Details | Diff | Splinter Review |
1.07 KB,
patch
|
scabral
:
review+
|
Details | Diff | Splinter Review |
In bug 799578 it was discovered that TBPL had a different schema for each of {checked-into-tree,production,dev} (!!). Bug 799578 made the checked-in schema match production so as to make it clearer where we were at - and now in this bug I'd like to clean up a number of issues that were found, as well as get dev matching production. (patches will be attached shortly)
Also, is there a specific place that I should document the process for TBPL schema changes (ie: please don't change production schema without CCing myself and/or ensuring bugs are filed to resolved) to ensure that we don't end up having to do bug 799578 again at a later date?
Assignee | ||
Comment 1•12 years ago
|
||
* Change needs applying to production only (dev doesn't have this additional index)
* Once the change is made to production, I will check this into the TBPL tree.
Attachment #675875 -
Flags: review?(scabral)
Assignee | ||
Updated•12 years ago
|
Attachment #675875 -
Attachment description: Remove duplicate index on builders.buildername → Part 1: Remove duplicate index on builders.buildername
Assignee | ||
Comment 2•12 years ago
|
||
* Change needs applying to production only (dev doesn't have this additional index)
* Once the change is made to production, I will check this into the TBPL tree.
Attachment #675880 -
Flags: review?(scabral)
Assignee | ||
Comment 3•12 years ago
|
||
* Will need applying to dev & prod, once Swatinem confirms this change is ok (see bug 799578's production tbpl schema attachment to see current AUTO_INCREMENT row counts for each table; numbers are well under).
* Once the change is made to production, I will check this into the TBPL tree.
Attachment #675881 -
Flags: review?(scabral)
Attachment #675881 -
Flags: feedback?(arpad.borsos)
Assignee | ||
Comment 4•12 years ago
|
||
* Will need applying to dev & prod, once Swatinem confirms this change is ok.
* Once the change is made to production, I will check this into the TBPL tree.
Attachment #675885 -
Flags: review?(scabral)
Attachment #675885 -
Flags: feedback?(arpad.borsos)
Assignee | ||
Comment 5•12 years ago
|
||
* This will be used by bug 803009 (and once we've switched over to using it, we can then remove the index on content, but I'll leave that for another bug).
* Will need applying to dev & prod.
* Once the change is made to production, I will check this into the TBPL tree.
Attachment #675886 -
Flags: review?(scabral)
Attachment #675886 -
Flags: feedback?(arpad.borsos)
Assignee | ||
Updated•12 years ago
|
Assignee | ||
Comment 6•12 years ago
|
||
* Needs applying to dev & prod
Attachment #675887 -
Flags: review?(scabral)
Updated•12 years ago
|
Attachment #675881 -
Flags: feedback?(arpad.borsos) → feedback+
Comment 7•12 years ago
|
||
Comment on attachment 675885 [details] [diff] [review]
Part 4: Use more appropriate default values
Since when did we have DEFAULT NULLs in there?
Attachment #675885 -
Flags: feedback?(arpad.borsos) → feedback+
Assignee | ||
Comment 8•12 years ago
|
||
(In reply to Arpad Borsos (Swatinem) from comment #7)
> Comment on attachment 675885 [details] [diff] [review]
> Part 4: Use more appropriate default values
>
> Since when did we have DEFAULT NULLs in there?
Since bug 799578, which made the checked-in schema reflect production (which had implicit DEFAULT NULLs, but I changed the in-tree schema to list them, so as to remind me to follow up).
Updated•12 years ago
|
Attachment #675886 -
Flags: feedback?(arpad.borsos) → feedback+
Comment 9•12 years ago
|
||
Comment on attachment 675887 [details] [diff] [review]
Part 6: Correct builders.builder_id index name
Review of attachment 675887 [details] [diff] [review]:
-----------------------------------------------------------------
This is merely a change in the name of an index. In MySQL, you can't rename an index, so it has to be dropped and re-created, and we should test whether this is an online operation or not.
Attachment #675887 -
Flags: review?(scabral) → review+
Comment 10•12 years ago
|
||
Comment on attachment 675886 [details] [diff] [review]
Part 5: Add an 'in_progress' boolean to runs_logs
Review of attachment 675886 [details] [diff] [review]:
-----------------------------------------------------------------
This is OK; My only question is, are you sure you want to default to "in_progress" as "false"?
Also, note that MySQL will do REALLY well with searches that look for the 'needle in a haystack' - so if most of the values are "false", searches on "true" will be very fast. But searches on "false" won't even bother to use the index, it's faster to do a full table scan than to look at more than about 20% of the data.....So if the data will be 50% true, 50% false, an index is useless.
Comment 11•12 years ago
|
||
Comment on attachment 675875 [details] [diff] [review]
Part 1: Remove duplicate index on builders.buildername
Review of attachment 675875 [details] [diff] [review]:
-----------------------------------------------------------------
yay for removing duplicate indexes, especially ones that are on a UTF-8 varchar(255)!
Attachment #675875 -
Flags: review?(scabral) → review+
Comment 12•12 years ago
|
||
Comment on attachment 675880 [details] [diff] [review]
Part 2: Only index runs_notes.run_id not (runs_notes.run_id, runs_notes.timestamp)
Review of attachment 675880 [details] [diff] [review]:
-----------------------------------------------------------------
Hrm, run_id needs to be leftmost in the index for the foreign key constraint. I can't see why timestamp is particularly needed, but I don't know the query profile of tbpl_mozilla_org. If there are queries that involve the run_id and the timestamp, keeping this index would be useful - for example, if there's a WHERE filter on run_id and then a sort by timestamp. I feel like we shouldn't remove the index unless we know for a fact that the query(ies) for which it was created are gone. I'd rather change dev to match prod in this case.
Comment 13•12 years ago
|
||
Comment on attachment 675881 [details] [diff] [review]
Part 3: Use more appropriate int types
Review of attachment 675881 [details] [diff] [review]:
-----------------------------------------------------------------
I approve wholeheartedly of the int -> int unsigned, and I assume you're aware of how much data you generate so that MEDIUMINT is appropriate. For reference, MEDIUMINT UNSIGNED has a range from 0-16,777,215.
Attachment #675881 -
Flags: review?(scabral) → review+
Comment 14•12 years ago
|
||
Comment on attachment 675885 [details] [diff] [review]
Part 4: Use more appropriate default values
Review of attachment 675885 [details] [diff] [review]:
-----------------------------------------------------------------
buildername is changing from DEFAULT NULL to NOT NULL with no default. Just want to make sure that's what you want to do (as opposed to something like NOT NULL DEFAULT '' - though my feeling is that NOT NULL with no default is appropriate here).
Attachment #675885 -
Flags: review?(scabral) → review+
Comment 15•12 years ago
|
||
(er, for comment 14, the same goes for all the other changes....)
Also, why change this:
`starttime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
to the default being "zero time"? Is it so you can easily see what hasn't started yet?
Assignee | ||
Comment 16•12 years ago
|
||
Comment on attachment 675880 [details] [diff] [review]
Part 2: Only index runs_notes.run_id not (runs_notes.run_id, runs_notes.timestamp)
Oops, typo in attachment name.
Attachment #675880 -
Attachment description: Part 2: Only index runs.run_id not (runs.run_id, runs.timestamp) → Part 2: Only index runs_notes.run_id not (runs_notes.run_id, runs_notes.timestamp)
Comment 17•12 years ago
|
||
Comment on attachment 675880 [details] [diff] [review]
Part 2: Only index runs_notes.run_id not (runs_notes.run_id, runs_notes.timestamp)
Review of attachment 675880 [details] [diff] [review]:
-----------------------------------------------------------------
Hrm, run_id needs to be leftmost in the index for the foreign key constraint. I can't see why timestamp is particularly needed, but I don't know the query profile of tbpl_mozilla_org. If there are queries that involve the run_id and the timestamp, keeping this index would be useful - for example, if there's a WHERE filter on run_id and then a sort by timestamp. I feel like we shouldn't remove the index unless we know for a fact that the query(ies) for which it was created are gone. I'd rather change dev to match prod in this case.
Assignee | ||
Comment 18•12 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #9)
> This is merely a change in the name of an index. In MySQL, you can't rename
> an index, so it has to be dropped and re-created, and we should test whether
> this is an online operation or not.
Ah, I didn't know it had to be dropped. This table is one of the smaller ones, so hopefully wouldn't be too bad (AUTO_INCREMENT=15209, and there has been some purging, so guessing number of rows fair bit lower than this).
(In reply to Sheeri Cabral [:sheeri] from comment #10)
> Comment on attachment 675886 [details] [diff] [review]
> Part 5: Add an 'in_progress' boolean to runs_logs
>
> This is OK; My only question is, are you sure you want to default to
> "in_progress" as "false"?
>
> Also, note that MySQL will do REALLY well with searches that look for the
> 'needle in a haystack' - so if most of the values are "false", searches on
> "true" will be very fast. But searches on "false" won't even bother to use
> the index, it's faster to do a full table scan than to look at more than
> about 20% of the data.....So if the data will be 50% true, 50% false, an
> index is useless.
Yes sorry 'true' would be more appropriate, since the logs will always be in_progress when first added, and then marked false shortly after.
Unless something goes wrong, we should only have double digits as in_progress=true, so the index should be pretty effective (AUTO_INCREMENT=2733054 for this table, though again, lots of purging).
(In reply to Sheeri Cabral [:sheeri] from comment #11)
> Comment on attachment 675875 [details] [diff] [review]
> Part 1: Remove duplicate index on builders.buildername
>
> Review of attachment 675875 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> yay for removing duplicate indexes, especially ones that are on a UTF-8
> varchar(255)!
I'm pretty sure we don't need UTF-8 for anything other than the logs (runs_logs.content) and maybe the name of people starring (runs_notes.who), although the latter isn't exactly necessary anyway. Would switching benefit us much? I also suspect that many of the varchars we use are much larger than they need be - is there an easy way to find out the current max lengths used for each field?
(In reply to Sheeri Cabral [:sheeri] from comment #12)
> Comment on attachment 675880 [details] [diff] [review]
> Part 2: Only index runs.run_id not (runs.run_id, runs.timestamp)
>
> Review of attachment 675880 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Hrm, run_id needs to be leftmost in the index for the foreign key
> constraint. I can't see why timestamp is particularly needed, but I don't
> know the query profile of tbpl_mozilla_org. If there are queries that
> involve the run_id and the timestamp, keeping this index would be useful -
> for example, if there's a WHERE filter on run_id and then a sort by
> timestamp. I feel like we shouldn't remove the index unless we know for a
> fact that the query(ies) for which it was created are gone. I'd rather
> change dev to match prod in this case.
Sorry I meant to add a comment explaining this one.
The only query we run on this is:
{
$notes = $db->prepare("
SELECT who, note,
unix_timestamp(timestamp) AS timestamp
FROM runs_notes
WHERE run_id = :runid
ORDER BY timestamp ASC;");
$notes->execute(array(":runid" => $run['id']));
$run["notes"] = $notes->fetchAll(PDO::FETCH_ASSOC);
unset($run["id"]); // don’t need the sql internal id
$result[] = $run;
}
Most of the time there will either be 0 or 1 results returned (95% the former). Perhaps 1% of the time there will be 2 (or rarely 3) results (the notes correspond to human added text to explain a failure; the only times we have duplicates is where multiple people have simultaneously added a note before their client updated). I'm presuming the index on timestamp wouldn't save much given how few results are returned?
(In reply to Sheeri Cabral [:sheeri] from comment #13)
> Comment on attachment 675881 [details] [diff] [review]
> Part 3: Use more appropriate int types
>
> Review of attachment 675881 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> I approve wholeheartedly of the int -> int unsigned, and I assume you're
> aware of how much data you generate so that MEDIUMINT is appropriate. For
> reference, MEDIUMINT UNSIGNED has a range from 0-16,777,215.
Yeah, builders and builders_history have AUTO_INCREMENT=15209 and AUTO_INCREMENT=21129 respectively (for 2 years worth of use).
(In reply to Sheeri Cabral [:sheeri] from comment #14)
> Comment on attachment 675885 [details] [diff] [review]
> Part 4: Use more appropriate default values
>
> Review of attachment 675885 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> buildername is changing from DEFAULT NULL to NOT NULL with no default. Just
> want to make sure that's what you want to do (as opposed to something like
> NOT NULL DEFAULT '' - though my feeling is that NOT NULL with no default is
> appropriate here).
They are all mandatory values, so should all be specified. Is that an acceptable use-case of "NOT NULL with no default"? (excuse my mysql ignorance)
(In reply to Sheeri Cabral [:sheeri] from comment #15)
> (er, for comment 14, the same goes for all the other changes....)
>
> Also, why change this:
>
> `starttime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE
> CURRENT_TIMESTAMP
>
> to the default being "zero time"? Is it so you can easily see what hasn't
> started yet?
The current time will always be incorrect for start time & the value should always be specified anyway. In the off-chance it is not, I'd rather an obvious failure rather than psuedo sorted correctly values. Is that ok?
Thank you for the feedback! :-)
Comment 19•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #18)
> (In reply to Sheeri Cabral [:sheeri] from comment #9)
> > This is merely a change in the name of an index. In MySQL, you can't rename
> > an index, so it has to be dropped and re-created, and we should test whether
> > this is an online operation or not.
>
> Ah, I didn't know it had to be dropped. This table is one of the smaller
> ones, so hopefully wouldn't be too bad (AUTO_INCREMENT=15209, and there has
> been some purging, so guessing number of rows fair bit lower than this).
>
*nod* that should be small enough to not have a problem, yes.
> (In reply to Sheeri Cabral [:sheeri] from comment #10)
> > Comment on attachment 675886 [details] [diff] [review]
> > Part 5: Add an 'in_progress' boolean to runs_logs
> >
[snip]
>
> Yes sorry 'true' would be more appropriate, since the logs will always be
> in_progress when first added, and then marked false shortly after.
>
> Unless something goes wrong, we should only have double digits as
> in_progress=true, so the index should be pretty effective
> (AUTO_INCREMENT=2733054 for this table, though again, lots of purging).
OK, great. I'll r+ an attachment with a default of true.
> (In reply to Sheeri Cabral [:sheeri] from comment #11)
> > Comment on attachment 675875 [details] [diff] [review]
> > Part 1: Remove duplicate index on builders.buildername
> >
> > Review of attachment 675875 [details] [diff] [review]:
> > -----------------------------------------------------------------
> >
> > yay for removing duplicate indexes, especially ones that are on a UTF-8
> > varchar(255)!
>
> I'm pretty sure we don't need UTF-8 for anything other than the logs
> (runs_logs.content) and maybe the name of people starring (runs_notes.who),
> although the latter isn't exactly necessary anyway. Would switching benefit
> us much? I also suspect that many of the varchars we use are much larger
> than they need be - is there an easy way to find out the current max lengths
> used for each field?
Sure, you can do SELECT (MAX(LENGTH(field1)) as size1, (MAX(LENGTH(field2)) as size2 FROM.....
.....or you can do SELECT field1,field2 FROM.... PROCEDURE ANALYSE() and get min/max/avg lengths. (or you can have me do them if you'd like)
> (In reply to Sheeri Cabral [:sheeri] from comment #12)
> > Comment on attachment 675880 [details] [diff] [review]
> > Part 2: Only index runs.run_id not (runs.run_id, runs.timestamp)
> Most of the time there will either be 0 or 1 results returned (95% the
> former). Perhaps 1% of the time there will be 2 (or rarely 3) results (the
> notes correspond to human added text to explain a failure; the only times we
> have duplicates is where multiple people have simultaneously added a note
> before their client updated). I'm presuming the index on timestamp wouldn't
> save much given how few results are returned?
Indeed, the index won't save much, and it will cause more overhead on the writes. I agree with nixing it.
> (In reply to Sheeri Cabral [:sheeri] from comment #13)
> > Comment on attachment 675881 [details] [diff] [review]
> Yeah, builders and builders_history have AUTO_INCREMENT=15209 and
> AUTO_INCREMENT=21129 respectively (for 2 years worth of use).
OK.
> (In reply to Sheeri Cabral [:sheeri] from comment #14)
> > Comment on attachment 675885 [details] [diff] [review]
> > Part 4: Use more appropriate default values
> >
> > Review of attachment 675885 [details] [diff] [review]:
> > -----------------------------------------------------------------
> >
> > buildername is changing from DEFAULT NULL to NOT NULL with no default. Just
> > want to make sure that's what you want to do (as opposed to something like
> > NOT NULL DEFAULT '' - though my feeling is that NOT NULL with no default is
> > appropriate here).
>
> They are all mandatory values, so should all be specified. Is that an
> acceptable use-case of "NOT NULL with no default"? (excuse my mysql
> ignorance)
>
Indeed, this is exactly the right use-case of "NOT NULL with no default". (as for ignorance, you should see my code! we all have our strengths and weakneses)
> (In reply to Sheeri Cabral [:sheeri] from comment #15)
> > (er, for comment 14, the same goes for all the other changes....)
> >
> > Also, why change this:
> >
> > `starttime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE
> > CURRENT_TIMESTAMP
> >
> > to the default being "zero time"? Is it so you can easily see what hasn't
> > started yet?
>
> The current time will always be incorrect for start time & the value should
> always be specified anyway. In the off-chance it is not, I'd rather an
> obvious failure rather than psuedo sorted correctly values. Is that ok?
*nod* that's what I thought, and that's acceptable. r+
Assignee | ||
Comment 20•12 years ago
|
||
With default = true
(After the schema change, we'll need to mass change the existing rows to false as part of a staggered landing, but I'll save that to bug 803009)
Attachment #675886 -
Attachment is obsolete: true
Attachment #675886 -
Flags: review?(scabral)
Attachment #677155 -
Flags: review?(scabral)
Assignee | ||
Comment 21•12 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #19)
> > I'm pretty sure we don't need UTF-8 for anything other than the logs
> > (runs_logs.content) and maybe the name of people starring (runs_notes.who),
> > although the latter isn't exactly necessary anyway. Would switching benefit
> > us much? I also suspect that many of the varchars we use are much larger
> > than they need be - is there an easy way to find out the current max lengths
> > used for each field?
>
> Sure, you can do SELECT (MAX(LENGTH(field1)) as size1, (MAX(LENGTH(field2))
> as size2 FROM.....
>
> .....or you can do SELECT field1,field2 FROM.... PROCEDURE ANALYSE() and get
> min/max/avg lengths. (or you can have me do them if you'd like)
I don't suppose you'd mind doing this?
I don't have access to the DB & when I test locally it's only feasible to import the last few days worth of data, so it doesn't really reflect production.
Comment 22•12 years ago
|
||
Comment on attachment 677155 [details] [diff] [review]
Part 5: Add an 'in_progress' boolean to runs_logs v2
Review of attachment 677155 [details] [diff] [review]:
-----------------------------------------------------------------
tested the syntax too, just in case, but it's working fine!
Attachment #677155 -
Flags: review?(scabral) → review+
Comment 23•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #21)
> (In reply to Sheeri Cabral [:sheeri] from comment #19)
> > > I'm pretty sure we don't need UTF-8 for anything other than the logs
> > > (runs_logs.content) and maybe the name of people starring (runs_notes.who),
> > > although the latter isn't exactly necessary anyway. Would switching benefit
> > > us much? I also suspect that many of the varchars we use are much larger
> > > than they need be - is there an easy way to find out the current max lengths
> > > used for each field?
> >
> > Sure, you can do SELECT (MAX(LENGTH(field1)) as size1, (MAX(LENGTH(field2))
> > as size2 FROM.....
> >
> > .....or you can do SELECT field1,field2 FROM.... PROCEDURE ANALYSE() and get
> > min/max/avg lengths. (or you can have me do them if you'd like)
>
> I don't suppose you'd mind doing this?
>
> I don't have access to the DB & when I test locally it's only feasible to
> import the last few days worth of data, so it doesn't really reflect
> production.
I can do this, just not today as I'm in Eastern time and leaving shortly.
Comment 24•12 years ago
|
||
First thing's first - a bit of information about varchar... from http://dev.mysql.com/doc/refman/5.1/en/char.html:
"In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes. "
This means that if you have a 20-character value in a varchar(200) field, it only stores 21 bytes - 20 bytes plus one byte for the size.
There *is* something to worry about though, whenever the field is used in a temporary intermediate table (like for a complex query with sorting/group by or other processing) then MySQL allocates the full amount in memory for each field (see http://dba.stackexchange.com/questions/424/performance-implications-of-mysql-varchar-sizes/2029#2029).
So basically, depending on usage it might not be necessary to reduce the size, but it's not a bad idea either....
0) So, I can't speak to needing utf8, but I'm pretty sure you don't need it, since all this stuff is internally generated and I can't see how it might need anything outside latin-1 (there's no user data like "name" that might need it). But you know the data better....
From here, every table is going to be an item in this list:
####################################################################################
1) bugscache.filename is currently a VARCHAR(255), and json is a TEXT field (utf8). Summary - keep json as TEXT, change filename to TINYTEXT:
mysql> select filename,json from bugscache procedure analyse(0,0)\G
*************************** 1. row ***************************
Field_name: tbpl_mozilla_org.bugscache.filename
Min_value: (runtestlist.py)
Max_value: TestRunner.js)
Min_length: 7
Max_length: 58
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 23.1702
Std: NULL
Optimal_fieldtype: TINYTEXT NOT NULL
*************************** 2. row ***************************
Field_name: tbpl_mozilla_org.bugscache.json
Min_value: []
Max_value: [{"status":"RESOLVED","summary":"TEST-UNEXPECTED-FAIL | test-install-xpi.js | timed out waiting for modal dialog","resolution":"FIXED","id":718812}]
Min_length: 2
Max_length: 7216
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 459.7128
Std: NULL
Optimal_fieldtype: TEXT NOT NULL
2 rows in set (0.00 sec)
####################################################################################
2) in the "builders" table you have:
name varchar(128)
branch varchar(256)
buildername varchar(256)
Given the max lengths, I'd say name could be varchar(80), branch could be varchar(45), and buildername could be varchar(100):
mysql> select name,branch,buildername from builders procedure analyse (0,0)\G
*************************** 1. row ***************************
Field_name: tbpl_mozilla_org.builders.name
Min_value: 3
Max_value: ux-win64-nightly
Min_length: 1
Max_length: 66
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 37.6592
Std: NULL
Optimal_fieldtype: VARCHAR(66) NOT NULL
*************************** 2. row ***************************
Field_name: tbpl_mozilla_org.builders.branch
Min_value: addon-sdk
Max_value: ux
Min_length: 2
Max_length: 23
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 11.2852
Std: NULL
Optimal_fieldtype: VARCHAR(23) NOT NULL
*************************** 3. row ***************************
Field_name: tbpl_mozilla_org.builders.buildername
Min_value: Android alder build
Max_value: WINNT 6.1 x86-64 ux nightly
Min_length: 12
Max_length: 88
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 50.7686
Std: NULL
Optimal_fieldtype: VARCHAR(88) NOT NULL
3 rows in set (0.01 sec)
####################################################################################
3) in the builders_history table:
action varchar(20)
who varchar(256)
reason varchar(256)
ip varchar(20)
At first glance, I'd take the IP field and convert it to an INT UNSIGNED, and use the INET_NTOA() and INET_ATON() functions to convert back and forth. If you're worried about IPv6 addresses or something and don't want to convert it, then certainly IP doesn't need to be utf8, it can be latin1.
you might change action to varchar(10), since the max length is 6, but I see no harm in leaving it as varchar(20). I'd recommend changing "who" to varchar(45), it certainly doesn't need to be varchar(256). reason can stay varchar(256), and again, probably no need for utf8 on these.
mysql> select action, who, reason, ip from builders_history procedure analyse(0,0)\G
*************************** 1. row ***************************
Field_name: tbpl_mozilla_org.builders_history.action
Min_value: hide
Max_value: unhide
Min_length: 4
Max_length: 6
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 5.7174
Std: NULL
Optimal_fieldtype: CHAR(6) NOT NULL
*************************** 2. row ***************************
Field_name: tbpl_mozilla_org.builders_history.who
Min_value: aki
Max_value: Standard8
Min_length: 3
Max_length: 24
Empties_or_zeros: 0
Nulls: 8180
Avg_value_or_avg_length: 7.0214
Std: NULL
Optimal_fieldtype: VARCHAR(24)
*************************** 3. row ***************************
Field_name: tbpl_mozilla_org.builders_history.reason
Min_value: .
Max_value: Your green day was actually quite a while ago
Min_length: 1
Max_length: 199
Empties_or_zeros: 0
Nulls: 8180
Avg_value_or_avg_length: 40.8070
Std: NULL
Optimal_fieldtype: TINYTEXT
*************************** 4. row ***************************
Field_name: tbpl_mozilla_org.builders_history.ip
Min_value: 10.8.33.240
Max_value: 10.8.81.217
Min_length: 11
Max_length: 11
Empties_or_zeros: 0
Nulls: 8180
Avg_value_or_avg_length: 11.0000
Std: NULL
Optimal_fieldtype: CHAR(11)
4 rows in set (0.10 sec)
####################################################################################
4) In the runs table:
buildername varchar(256)
slave varchar(256)
revision varchar(40)
result varchar(20)
branch varchar(256)
log varchar(256)
buildername: change to varchar(100)
slave: change to varchar(45)
revision: change to varchar(20)
result: change to ENUM('busted','exception','retry','success','testfailed') NOT NULL - adding a choice to an ENUM is an online operation in MySQL 5.1 and up, so there's no worry if you need more than these 5 choices.
branch: change to varchar(45)
log: keep at varchar(256) (just in case)
mysql> select buildername, slave, revision, result, branch, log from runs procedure analyse (0,0) \G
*************************** 1. row ***************************
Field_name: tbpl_mozilla_org.runs.buildername
Min_value: Android alder build
Max_value: WINNT 6.1 x86-64 ux nightly
Min_length: 14
Max_length: 88
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 51.0229
Std: NULL
Optimal_fieldtype: VARCHAR(88) NOT NULL
*************************** 2. row ***************************
Field_name: tbpl_mozilla_org.runs.slave
Min_value: bld-centos5-32-vmw-001
Max_value: w64-ix-slave99
Min_length: 9
Max_length: 24
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 15.5687
Std: NULL
Optimal_fieldtype: VARCHAR(24) NOT NULL
*************************** 3. row ***************************
Field_name: tbpl_mozilla_org.runs.revision
Min_value: 0008531f1429
Max_value: THUNDERBIRD_
Min_length: 7
Max_length: 12
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 12.0000
Std: NULL
Optimal_fieldtype: CHAR(12) NOT NULL
*************************** 4. row ***************************
Field_name: tbpl_mozilla_org.runs.result
Min_value: busted
Max_value: testfailed
Min_length: 5
Max_length: 10
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 7.2239
Std: NULL
Optimal_fieldtype: VARCHAR(10) NOT NULL
*************************** 5. row ***************************
Field_name: tbpl_mozilla_org.runs.branch
Min_value: addon-sdk
Max_value: ux
Min_length: 2
Max_length: 23
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 10.8465
Std: NULL
Optimal_fieldtype: VARCHAR(23) NOT NULL
*************************** 6. row ***************************
Field_name: tbpl_mozilla_org.runs.log
Min_value: http://ftp.mozilla.org/pub/mozilla.org/b2g/nightly/2012/09/2012-09-11-14-03-51-mozilla-central/mozilla-central-ics_armv7a_gecko-nightly-bm12-build1-build15.txt.gz
Max_value: http://stage.mozilla.org/pub/mozilla.org/xulrunner/nightly/17.0b4-candidates/build1/logs/release-mozilla-beta-xulrunner_win32_build-bm12-build1-build2.txt.gz
Min_length: 118
Max_length: 218
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 172.9198
Std: NULL
Optimal_fieldtype: VARCHAR(218) NOT NULL
6 rows in set (7.65 sec)
####################################################################################
5) runs_logs has type varchar(64) and content longblob, but blobs don't have charsets:
Looks like you can change type to a varchar(20)...trying to add "content" here made the query too long and timeout, so I think it's OK to leave it as a longblob.
*************************** 1. row ***************************
Field_name: tbpl_mozilla_org.runs_logs.type
Min_value: annotatedsummary
Max_value: tinderbox_print
Min_length: 3
Max_length: 16
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 10.8334
Std: NULL
Optimal_fieldtype: VARCHAR(16) NOT NULL
1 row in set (3.65 sec)
####################################################################################
6) runs_notes has who varchar(256), note text and ip varchar(20).
I have the same advice about IP addresses as I did before - although it looks like there are only 3 IPs that things are coming from, so it could be an ENUM, too. Looks like "runs_notes" should stay varchar(256) or change to TINYTEXT, and note is fine remaining as TEXT.
mysql> select who,note,ip from runs_notes procedure analyse(100,100)\G
*************************** 1. row ***************************
Field_name: tbpl_mozilla_org.runs_notes.who
Min_value: :emk
Max_value: zackw@panix.com
Min_length: 2
Max_length: 37
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 7.7481
Std: NULL
Optimal_fieldtype: TINYTEXT NOT NULL
*************************** 2. row ***************************
Field_name: tbpl_mozilla_org.runs_notes.note
Min_value:
Creating Python environment
Traceback (most recent call last):
File "/builds/slave/m-cen-lnx64/build/python/virtualenv/virtualenv.py", line 2471, in <module>
[...]
OSError: [Errno 17] File exists
[...]
Exception: Error creating virtualenv.
*** Fix above errors and then restart with "make -f client.mk build"
make[2]: Leaving directory `/builds/slave/m-cen-lnx64/build'
make[2]: *** [configure] Error 1
Max_value: you need a newer parent
Min_length: 1
Max_length: 420
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 11.0405
Std: NULL
Optimal_fieldtype: TEXT NOT NULL
*************************** 3. row ***************************
Field_name: tbpl_mozilla_org.runs_notes.ip
Min_value: 10.8.81.215
Max_value: 10.8.81.217
Min_length: 11
Max_length: 11
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 11.0000
Std: NULL
Optimal_fieldtype: ENUM('10.8.81.215','10.8.81.216','10.8.81.217') NOT NULL
3 rows in set (0.13 sec)
####################################################################################
####################################################################################
Updated•12 years ago
|
Assignee: server-ops-database → scabral
Comment 25•12 years ago
|
||
Comment on attachment 675880 [details] [diff] [review]
Part 2: Only index runs_notes.run_id not (runs_notes.run_id, runs_notes.timestamp)
Review of attachment 675880 [details] [diff] [review]:
-----------------------------------------------------------------
Now that it's explained, I'm OK with this.
Attachment #675880 -
Flags: review?(scabral) → review+
Comment 26•12 years ago
|
||
What more is there for this bug?
Assignee | ||
Comment 27•12 years ago
|
||
Assigning to me for now so I remember to come back to this :-)
Assignee: scabral → emorley
Assignee | ||
Comment 28•12 years ago
|
||
At this point in time I don't think it's worth fixing continuing with this bug, since the replacement for TBPL is now in the design phase.
The largest table by far is runs_logs, and most of that will be in the content longblob. So whilst changing the length of some of the other fields / removing unnecessary indexes might gain us a bit; compared to the monthly purge it's not going to win us much.
Thank you for all your help with this Sheeri :-)
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Comment 29•12 years ago
|
||
Please do let us know when tbpl is to be decommissioned. And if you want DB architectural help, please do include us in those meetings.
Assignee | ||
Comment 30•12 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #29)
> Please do let us know when tbpl is to be decommissioned. And if you want DB
> architectural help, please do include us in those meetings.
I imagine TBPL will still be required at least until the end of the year (as we gradually transition).
Jeads is working on schema design for the replacement, and has plans for making sure everyone is in the loop (we've only had rough paper designs as of last week).
Comment 31•12 years ago
|
||
OK, great to know. We'll continue with the purge as normal.
Updated•10 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•