Closed Bug 1328659 (bmo-emoji-utf8mb4-option) Opened 7 years ago Closed 5 years ago

Add support for utf8=utf8mb4 (switches to dynamic/compressed row format, and changes charset to utf8mb4)

Categories

(bugzilla.mozilla.org :: Infrastructure, defect, P3)

Production
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: dylan, Assigned: bobm)

References

Details

User Story

- BMO will need to be hard-hatted (webheads, cronjobs and daemons off)
- set innodb_file_format=Barracuda
- set innodb_file_per_table=ON (this should already be the case)
- set innodb_large_prefix=1
- for all tables T, alter table $T ROW_FORMAT=COMPRESSED (or DYNAMIC, at the DBA's discretion.)
- for all columns that are utf8, convert to utf8mb4
- we un-hard-hat.
- bug 1253535 will be commited
- we follow a normal schema-only code push

From then on, everyone can use poop emoji.

Attachments

(1 file, 2 obsolete files)

We need to do the following operations to all the mysql servers:

- set innodb_file_format=Barracuda
- set innodb_file_per_table=ON (this should already be the case)
- set innodb_large_prefix=1
- for all tables T, alter table $T ROW_FORMAT=COMPRESSED (or DYNAMIC, at the DBA's discretion.)

More detailed planning will be deposited in the user story field.

I suspect it is safest to plan this for a TCW?
I need two things from you on this bug:

1) Do the steps outlined in comment #0 seem correct and complete? 
The goal is to support full utf8mb4 in all fields, and support large prefixes so we don't run into the index limit problems for varchar(255).

2) Assuming we find a TCW to use, what does your availability look like in Q1?
Flags: needinfo?(mpressman)
Any estimate on how long BMO will be unavailable? What impact might this have on replication to AWS?

Next TCW looks to be around Feb 11th or 18th, FWIW.
(In reply to Kendall Libby [:fubar] from comment #2)
> Any estimate on how long BMO will be unavailable? 
About as long a DB dump and reload, so the worst case is multiple hours.

> What impact might this
> have on replication to AWS?

I'm not sure, we'll want to find out.
Ping. Still waiting to hear from :mpressman about this.
So we've soft-announced bug 1253535 on dev-platform, but I'm still waiting on a reply to comment #1.

Natalie: Can we get some motion on this bug?
Flags: needinfo?(nhaywood)
Sorry for the delay here. Since this is prepare and not enact utf8mb4, what I see is fine for converting. I'm available for either the 11th or the 18th, whichever is the next TCW
Flags: needinfo?(mpressman)
Just to check, when are you then planning to convert to utf8mb4?
Flags: needinfo?(nhaywood)
As we'd be commiting bug 1253535 at the same time, the utf8mb4 conversion would happen then. 
thinking about that, I'm concerned the push script may time out if checksetup.pl takes hours, so we can convert the charset
before hand.

I've updated the user story with the complete set of steps, and changed the summary of this bug to better reflect what we want.

Can you give the user-story a once over and let me know if I'm missing a step or if there are any other DB concerns?
Thanks!
User Story: (updated)
Flags: needinfo?(mpressman)
Summary: Prepare MySQL database for supporting utf8mb4 conversion → Prepare MySQL database for support emoji by performing utf8mb4 conversion
Summary: Prepare MySQL database for support emoji by performing utf8mb4 conversion → Prepare MySQL database for emoji by performing utf8mb4 conversion
User story looks good. Just one question on an audit of the size of columns and indexes being used in the new utf8mb4 columns. utf to utf8mb4 conversion will use less characters which can cause problems as it uses 4 bytes per character instead of 3. I did a quick check of the schema and see there are tinytext columns. These can use 85 characters in utf8, but only 63 characters in utf8mb4. I didn't fully examine to see if any of these are used in indexes as the index size for text/varchar columns, but I see you are increasing the default index length to 3072 bytes from 767 with the innodb_large_prefix=1.
Flags: needinfo?(mpressman)
Flags: needinfo?(dylan)
(In reply to Matt Pressman [:mpressman] from comment #9)
> User story looks good. Just one question on an audit of the size of columns
> and indexes being used in the new utf8mb4 columns. utf to utf8mb4 conversion
> will use less characters which can cause problems as it uses 4 bytes per
> character instead of 3. I did a quick check of the schema and see there are
> tinytext columns. These can use 85 characters in utf8, but only 63
> characters in utf8mb4. I didn't fully examine to see if any of these are
> used in indexes as the index size for text/varchar columns, but I see you
> are increasing the default index length to 3072 bytes from 767 with the
> innodb_large_prefix=1.

Just to be clear, it is my understanding that with large prefixes, tinytext can be longer than 63 characters.
If this is not the case -- if any fields will be truncated by this change, I need to know ASAP.
Flags: needinfo?(dylan) → needinfo?(mpressman)
Using the large_prefix increases the index key prefix length. Not the column size itself. A tinytext column can store 255 bytes, which means existing utf8 encoding can only store 85 three-byte characters. When converting to utf8mb4, it will use four-byte characters, thus limiting to only 63 characters in a tinytext column. So any tinytext columns that exceed 63 characters will need their data types changed in order to prevent it from being truncated
Flags: needinfo?(mpressman)
Priority: P1 → P3
Assignee: nobody → dylan
Summary: Prepare MySQL database for emoji by performing utf8mb4 conversion → Prepare MySQL database for emoji by performing row format conversion
Attached file PR with script (obsolete) —
The first step is running this script, which changes the row format of all tables.

It requires three mysql variables to have specific values and will not run otherwise.
Depends on: 1422427
This is set to use DYNAMIC, but maybe we want COMPRESSED? 
Are there downsides to COMPRESSED?
Flags: needinfo?(mpressman)
Changing the needinfo to :cvalaas
Flags: needinfo?(mpressman) → needinfo?(cvalaas)
innodb_file_per_table is already ON, so: good.
innodb_file_format is currently Antelope, but can be changed dynamically.
innodb_large_prefix is OFF, but can also be changed dynamically.

The problem I see is with the ALTER TABLE. That will create a copy of the table (while the ALTER is running) and we already know we don't have disk space for that (with the attach_data table). 
Same thing applies to when you convert the columns in the next step (https://bugzilla.mozilla.org/show_bug.cgi?id=1253535).

Also - I'm not sure what the implications are disk-size-wise post-conversion.

We *could* only convert some tables? Like, all of them except attach_data? But I don't know if having attach_data stuck in the old format is desirable.
Flags: needinfo?(cvalaas)
Would we need to move to a larger disk here?
Flags: needinfo?(cvalaas)
Yes. My understanding is that we've needed to move to a larger disk for about a year, but it was decided to leave it as-is due to the impending move to AWS. Unfortunately the move to AWS keeps getting postponed.
Flags: needinfo?(cvalaas)
How big is bugs_fulltext? It can be deleted and re-created during a hardhat / outage.

Bug 1309706 was blocked before due to a squid proxy problem that I believe has been subsequently resolved,
perhaps we can just finally move those to s3?

:dhouse, can you revive or get me new aws credentials for the bucket bugzilla-prod-attachments-[something that might be secret]?

Short of that, I could temporarily some percentage of attachments out (ones that have a very old access time)
and perform the migration, and load them back in. I wouldn't need to load them back in during an outage window, I reckon.

:cvalaas, how much weight (mibibytes) do I need to lose to fly?
Flags: needinfo?(dhouse)
Flags: needinfo?(cvalaas)
bugs_fulltext is 13G plus another 20G or so for the fulltext indexes.

We've got about 50G free at the moment. So we need at least another 75G free to alter the attach_data table (assuming the table stays the same size, also assuming the change of row format doesn't affect size-on-disk).
Switching to COMPRESSED format may also save on-disk space, but I have no idea how much - or what effect it might have on performance.

There's a lot of variables here - some testing would probably answer some of the unknowns. We could alter some tables on bugzilla[12].stage and see what happens (although the tables are a lot smaller (actually, attach_data is only 48M in stage :sadface_emoji: )

I think you'd have to remove an awful lot of old attachments to free up enough space. I think the best way forward given our constraints is moving the attachments to S3 (Bug 1309706)
Flags: needinfo?(cvalaas)
The attachments on bugzilla-dev are stored in s3, that's why it is so small.
(In reply to chris valaas [:cvalaas] from comment #19)
> bugs_fulltext is 13G plus another 20G or so for the fulltext indexes.
> 
> We've got about 50G free at the moment. So we need at least another 75G free
> to alter the attach_data table (assuming the table stays the same size, also
> assuming the change of row format doesn't affect size-on-disk).
> Switching to COMPRESSED format may also save on-disk space, but I have no
> idea how much - or what effect it might have on performance.
> 
https://www.facebook.com/notes/mysql-at-facebook/innodb-compression-for-read-only-workloads/10150190328375933/

We're going to temporarily archive a huge number of attachments, and restore them after conversion.
Let's see what this looks like on bugzilla-dev. Can you record how much space everything is using there, so we can see what difference the row format makes?
Flags: needinfo?(cvalaas)
Attached file individual bugzilla stage db sizes (obsolete) —
Flags: needinfo?(cvalaas)
Overall sizes on bugzilla[12].stage:

$ sudo du -sh bugzilla_*
35G	bugzilla_allizom_org
20G	bugzilla_dev_allizom_org
After doing some more reading (https://dev.mysql.com/doc/refman/5.6/en/innodb-performance-compression-oltp.html) I would recommend making most of the tables (esp. the ones that get written to frequently) DYNAMIC while tables that are less likely to get frequent writes could be COMPRESSED.
(In reply to chris valaas [:cvalaas] from comment #25)
> After doing some more reading
> (https://dev.mysql.com/doc/refman/5.6/en/innodb-performance-compression-oltp.
> html) I would recommend making most of the tables (esp. the ones that get
> written to frequently) DYNAMIC while tables that are less likely to get
> frequent writes could be COMPRESSED.

The link says on an SSD using compressed also lowers IOPs?
And IIRC we were on SSDs?
(In reply to chris valaas [:cvalaas] from comment #24)
> Overall sizes on bugzilla[12].stage:
> 
> $ sudo du -sh bugzilla_*
> 35G	bugzilla_allizom_org
> 20G	bugzilla_dev_allizom_org

bugzilla_dev_allizom_org: all tables are now DYNAMIC.
I doubt this changed the size any, but better to measure first.
After we have numbers I'm game to convert all or some to COMPRESSED.
I've got the numbers logged. There was a slight space savings (probably due to purging deleted rows). 

I'd say you could go ahead and convert them all to COMPRESSED if you wanted. 
Maybe with the exception of the most write-heavy tables? I'm thinking of "email_setting", for example, where the data is very small (and probably not too compressible), but many many rows - and IIRC it gets a lot of writes.
I'll try a few specific tables first. Altering the code to do this will be a bit more work.
(In reply to Dylan Hardison [:dylan] (he/him) from comment #18)
>
> :dhouse, can you revive or get me new aws credentials for the bucket
> bugzilla-prod-attachments-[something that might be secret]?

I'm pretty sure that's the bucket we set up for Nubis; bugzilla-dev is using a bucket in the devservices AWS account (moz-bugzilladev-attach). Which do you want to use?
Flags: needinfo?(dhouse) → needinfo?(dylan)
devservices please.
Flags: needinfo?(dylan)
Bucket created.

:dhouse, can you go to the aws console, generate keys for the new user, and add those to the credentials file on bugzillaadm? ping me if you don't remember where those bits are, since we walked through it fairly quickly.
Flags: needinfo?(dhouse)
(In reply to Kendall Libby [:fubar] from comment #32)
> Bucket created.
> 
> :dhouse, can you go to the aws console, generate keys for the new user, and
> add those to the credentials file on bugzillaadm? ping me if you don't
> remember where those bits are, since we walked through it fairly quickly.

I started looking at this again. I'm able to get into the console for IAM for the new _prod user to generate the keys, but i'm having trouble reaching bugzillaadm/10.22.75.57 today from vpn/jumphost. I'll try again tomorrow.
(In reply to Dave House [:dhouse] from comment #33)
> (In reply to Kendall Libby [:fubar] from comment #32)
> > Bucket created.
> > 
> > :dhouse, can you go to the aws console, generate keys for the new user, and
> > add those to the credentials file on bugzillaadm? ping me if you don't
> > remember where those bits are, since we walked through it fairly quickly.
> 
> I started looking at this again. I'm able to get into the console for IAM
> for the new _prod user to generate the keys, but i'm having trouble reaching
> bugzillaadm/10.22.75.57 today from vpn/jumphost. I'll try again tomorrow.

I sorted out my access.

:fubar, it looks like you added the key on bugzillaadm:/root/.aws/credentials (I confirmed in the aws console that it is the correct key id and secret created 2017-12-21). Is there anything else that needs done for that to be active?
Flags: needinfo?(dhouse) → needinfo?(klibby)
Dylan, do you want to upload the file or shall I? The user+credentials are set up.
Flags: needinfo?(klibby) → needinfo?(dylan)
Question no longer relevant
Flags: needinfo?(dylan)
Assignee: dylan → bobm
So there is a script attached to this, but ultimately you could just set the ROW_FORMAT=DYNAMIC
to everything.

For longdescs and bugs_fulltext and bugs, it might be best to use ROW_FORMAT=compressed.

Bugzilla itself won't care about the row format.
Phase one of the trial conversion using an identically powered RDS instance and the script took 243 minutes (4 hours) to complete.  The production snapshot took approximately 15 minutes.  Modifying the parameter group and rebooting the instance before hand took ~5 minutes.

I've modified the script to "CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci" for all the tables.  It ran into some foreign key constraint issues, and I've skipped those tables for now, since this is a ballpark timing exercise.  In practice we should use a script with some peer reviewed SQL in it to do these modifications.  Do we need to modify any column sizes to accommodate the new character set?
All told the character set conversion ran 111 minutes (let's say 2 hours).  The following tables needed to be run with FOREIGN_KEY_CHECKS off: bug_cf_blocking_192, bug_cf_locale, cf_blocking_192, cf_locale, namedqueries, profile_setting, setting, setting_value, token_data, tokens.  The namedqueries table has a duplicate entry for the namedqueries_userid_idx index, so I didn't convert that table since I'm not sure what to do about that.

Presently timing REPAIR and OPTIMIZE on all the tables in case we want to do that.
(In reply to Bob Micheletto [:bobm] from comment #39)

> Presently timing REPAIR and OPTIMIZE on all the tables in case we want to do
> that.

The REPAIR and OPTIMIZE statements ran for: 251m3.831s (or just over 4 hours)
:dylan, please verify that running the equivalent of this for every table in bugs is an acceptable way to make the conversion:

SET FOREIGN_KEY_CHECKS=0;
ALTER TABLE __tablename__ CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Also, this is the utility the FxA team used to do their utf8mb4 conversion: https://github.com/github/gh-ost
It might be interesting to try out a replica based migration in stage, and see how it turns out.
Flags: needinfo?(dylan)
Depends on: bmo-emoji-test-run
(In reply to Bob Micheletto [:bobm] from comment #41)
> :dylan, please verify that running the equivalent of this for every table in
> bugs is an acceptable way to make the conversion:
> 
> SET FOREIGN_KEY_CHECKS=0;
> ALTER TABLE __tablename__ CONVERT TO CHARACTER SET utf8mb4 COLLATE
> utf8mb4_unicode_ci;
> 
> Also, this is the utility the FxA team used to do their utf8mb4 conversion:
> https://github.com/github/gh-ost
> It might be interesting to try out a replica based migration in stage, and
> see how it turns out.

The only thing I'm not 100% on the is the utf8mb4 collation. I remember Sheeri saying something was "wrong" about the default collation method, but my google searching didn't turn anything up.
Flags: needinfo?(dylan)
(In reply to Dylan Hardison [:dylan] (he/him) from comment #42)
> (In reply to Bob Micheletto [:bobm] from comment #41)

> The only thing I'm not 100% on the is the utf8mb4 collation. I remember
> Sheeri saying something was "wrong" about the default collation method, but
> my google searching didn't turn anything up.

I was working from https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-conversion.html
:cvalaas can you advise us on the correct way to update this?
Flags: needinfo?(cvalaas)
I'm afraid I'm no more clued in that y'all with this. I would just read the same docs you're looking at. (Although I would look at the docs for 5.6: https://dev.mysql.com/doc/refman/5.6/en/charset-unicode-conversion.html - assuming you're running 5.6.x in AWS?)
Flags: needinfo?(cvalaas)
I found out what piece of a memory I had about the collation:

https://stackoverflow.com/questions/43644218/why-is-table-charset-set-to-utf8mb4-and-collation-to-utf8mb4-unicode-520-ci#43692337

Basically we should use the newest utf8mb4_ unicode collation possible.
Attached file PR
Attachment #8933813 - Attachment is obsolete: true
Attachment #8934293 - Attachment is obsolete: true
Summary: Prepare MySQL database for emoji by performing row format conversion → Add support for utf8=utf8mb4 (switches to dynamic/compressed row format, and changes charset to utf8mb4)
No longer depends on: bmo-emoji-test-run
Depends on: 1473143
Alias: bmo-emoji-utf8mb4-option
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.