Closed Bug 1328659 (bmo-emoji-utf8mb4-option) Opened 8 years ago Closed 7 years ago

Add support for utf8=utf8mb4 (switches to dynamic/compressed row format, and changes charset to utf8mb4)

Categories

(bugzilla.mozilla.org :: Infrastructure, defect, P3)

Product:

Component:

Version:

Production

Type:

defect

Priority:

P3

Severity:

normal

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: dylan, Assigned: bobm)

References

Details

User Story

- BMO will need to be hard-hatted (webheads, cronjobs and daemons off)
- set innodb_file_format=Barracuda
- set innodb_file_per_table=ON (this should already be the case)
- set innodb_large_prefix=1
- for all tables T, alter table $T ROW_FORMAT=COMPRESSED (or DYNAMIC, at the DBA's discretion.)
- for all columns that are utf8, convert to utf8mb4
- we un-hard-hat.
- bug 1253535 will be commited
- we follow a normal schema-only code push

From then on, everyone can use poop emoji.

Attachments

(1 file, 2 obsolete files)

PR with script 8 years ago Dylan Hardison [:dylan] (he/him) 45 bytes, text/x-github-pull-request		Details \| Review
individual bugzilla stage db sizes 7 years ago chris valaas [:cvalaas] 37.16 KB, text/plain		Details
PR 7 years ago Dylan Hardison [:dylan] (he/him) 45 bytes, text/x-github-pull-request		Details \| Review

Dylan Hardison [:dylan] (he/him)

Reporter

Description

•

8 years ago

We need to do the following operations to all the mysql servers: - set innodb_file_format=Barracuda - set innodb_file_per_table=ON (this should already be the case) - set innodb_large_prefix=1 - for all tables T, alter table $T ROW_FORMAT=COMPRESSED (or DYNAMIC, at the DBA's discretion.) More detailed planning will be deposited in the user story field. I suspect it is safest to plan this for a TCW?

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 1

•

8 years ago

I need two things from you on this bug: 1) Do the steps outlined in comment #0 seem correct and complete? The goal is to support full utf8mb4 in all fields, and support large prefixes so we don't run into the index limit problems for varchar(255). 2) Assuming we find a TCW to use, what does your availability look like in Q1?

Flags: needinfo?(mpressman)

Kendall Libby [:fubar] (he/him)

Comment 2

•

8 years ago

Any estimate on how long BMO will be unavailable? What impact might this have on replication to AWS? Next TCW looks to be around Feb 11th or 18th, FWIW.

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 3

•

8 years ago

(In reply to Kendall Libby [:fubar] from comment #2) > Any estimate on how long BMO will be unavailable? About as long a DB dump and reload, so the worst case is multiple hours. > What impact might this > have on replication to AWS? I'm not sure, we'll want to find out.

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 4

•

8 years ago

Ping. Still waiting to hear from :mpressman about this.

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 5

•

8 years ago

So we've soft-announced bug 1253535 on dev-platform, but I'm still waiting on a reply to comment #1. Natalie: Can we get some motion on this bug?

Flags: needinfo?(nhaywood)

Matt Pressman [:mpressman]

Comment 6

•

8 years ago

Sorry for the delay here. Since this is prepare and not enact utf8mb4, what I see is fine for converting. I'm available for either the 11th or the 18th, whichever is the next TCW

Flags: needinfo?(mpressman)

Matt Pressman [:mpressman]

Comment 7

•

8 years ago

Just to check, when are you then planning to convert to utf8mb4?

Flags: needinfo?(nhaywood)

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 8

•

8 years ago

As we'd be commiting bug 1253535 at the same time, the utf8mb4 conversion would happen then. thinking about that, I'm concerned the push script may time out if checksetup.pl takes hours, so we can convert the charset before hand. I've updated the user story with the complete set of steps, and changed the summary of this bug to better reflect what we want. Can you give the user-story a once over and let me know if I'm missing a step or if there are any other DB concerns? Thanks!

User Story: (updated)

Flags: needinfo?(mpressman)

Summary: Prepare MySQL database for supporting utf8mb4 conversion → Prepare MySQL database for support emoji by performing utf8mb4 conversion

Dylan Hardison [:dylan] (he/him)

Reporter

Updated

•

8 years ago

Summary: Prepare MySQL database for support emoji by performing utf8mb4 conversion → Prepare MySQL database for emoji by performing utf8mb4 conversion

Matt Pressman [:mpressman]

Comment 9

•

8 years ago

User story looks good. Just one question on an audit of the size of columns and indexes being used in the new utf8mb4 columns. utf to utf8mb4 conversion will use less characters which can cause problems as it uses 4 bytes per character instead of 3. I did a quick check of the schema and see there are tinytext columns. These can use 85 characters in utf8, but only 63 characters in utf8mb4. I didn't fully examine to see if any of these are used in indexes as the index size for text/varchar columns, but I see you are increasing the default index length to 3072 bytes from 767 with the innodb_large_prefix=1.

Flags: needinfo?(mpressman)

Mark Côté [:mcote]

Updated

•

8 years ago

Flags: needinfo?(dylan)

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 10

•

8 years ago

(In reply to Matt Pressman [:mpressman] from comment #9) > User story looks good. Just one question on an audit of the size of columns > and indexes being used in the new utf8mb4 columns. utf to utf8mb4 conversion > will use less characters which can cause problems as it uses 4 bytes per > character instead of 3. I did a quick check of the schema and see there are > tinytext columns. These can use 85 characters in utf8, but only 63 > characters in utf8mb4. I didn't fully examine to see if any of these are > used in indexes as the index size for text/varchar columns, but I see you > are increasing the default index length to 3072 bytes from 767 with the > innodb_large_prefix=1. Just to be clear, it is my understanding that with large prefixes, tinytext can be longer than 63 characters. If this is not the case -- if any fields will be truncated by this change, I need to know ASAP.

Flags: needinfo?(dylan) → needinfo?(mpressman)

Matt Pressman [:mpressman]

Comment 11

•

8 years ago

Using the large_prefix increases the index key prefix length. Not the column size itself. A tinytext column can store 255 bytes, which means existing utf8 encoding can only store 85 three-byte characters. When converting to utf8mb4, it will use four-byte characters, thus limiting to only 63 characters in a tinytext column. So any tinytext columns that exceed 63 characters will need their data types changed in order to prevent it from being truncated

Flags: needinfo?(mpressman)

Dylan Hardison [:dylan] (he/him)

Reporter

Updated

•

8 years ago

Priority: P1 → P3

Dylan Hardison [:dylan] (he/him)

Reporter

Updated

•

8 years ago

Assignee: nobody → dylan

Dylan Hardison [:dylan] (he/him)

Reporter

Updated

•

8 years ago

Summary: Prepare MySQL database for emoji by performing utf8mb4 conversion → Prepare MySQL database for emoji by performing row format conversion

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 12

•

8 years ago

Attached file PR with script (obsolete) — Details

The first step is running this script, which changes the row format of all tables. It requires three mysql variables to have specific values and will not run otherwise.

Dylan Hardison [:dylan] (he/him)

Reporter

Updated

•

8 years ago

Depends on: 1422427

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 13

•

8 years ago

This is set to use DYNAMIC, but maybe we want COMPRESSED? Are there downsides to COMPRESSED?

Flags: needinfo?(mpressman)

Matt Pressman [:mpressman]

Comment 14

•

8 years ago

Changing the needinfo to :cvalaas

Flags: needinfo?(mpressman) → needinfo?(cvalaas)

chris valaas [:cvalaas]

Comment 15

•

8 years ago

innodb_file_per_table is already ON, so: good. innodb_file_format is currently Antelope, but can be changed dynamically. innodb_large_prefix is OFF, but can also be changed dynamically. The problem I see is with the ALTER TABLE. That will create a copy of the table (while the ALTER is running) and we already know we don't have disk space for that (with the attach_data table). Same thing applies to when you convert the columns in the next step (https://bugzilla.mozilla.org/show_bug.cgi?id=1253535). Also - I'm not sure what the implications are disk-size-wise post-conversion. We *could* only convert some tables? Like, all of them except attach_data? But I don't know if having attach_data stuck in the old format is desirable.

Flags: needinfo?(cvalaas)

Emma Humphries ☕️🎸🧞‍♀️✨ (she/they) [:emceeaich] (Pacific Time) use needinfo

Comment 16

•

8 years ago

Would we need to move to a larger disk here?

Flags: needinfo?(cvalaas)

chris valaas [:cvalaas]

Comment 17

•

8 years ago

Yes. My understanding is that we've needed to move to a larger disk for about a year, but it was decided to leave it as-is due to the impending move to AWS. Unfortunately the move to AWS keeps getting postponed.

Flags: needinfo?(cvalaas)

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 18

•

8 years ago

How big is bugs_fulltext? It can be deleted and re-created during a hardhat / outage. Bug 1309706 was blocked before due to a squid proxy problem that I believe has been subsequently resolved, perhaps we can just finally move those to s3? :dhouse, can you revive or get me new aws credentials for the bucket bugzilla-prod-attachments-[something that might be secret]? Short of that, I could temporarily some percentage of attachments out (ones that have a very old access time) and perform the migration, and load them back in. I wouldn't need to load them back in during an outage window, I reckon. :cvalaas, how much weight (mibibytes) do I need to lose to fly?

Flags: needinfo?(dhouse)

Flags: needinfo?(cvalaas)

chris valaas [:cvalaas]

Comment 19

•

8 years ago

bugs_fulltext is 13G plus another 20G or so for the fulltext indexes. We've got about 50G free at the moment. So we need at least another 75G free to alter the attach_data table (assuming the table stays the same size, also assuming the change of row format doesn't affect size-on-disk). Switching to COMPRESSED format may also save on-disk space, but I have no idea how much - or what effect it might have on performance. There's a lot of variables here - some testing would probably answer some of the unknowns. We could alter some tables on bugzilla[12].stage and see what happens (although the tables are a lot smaller (actually, attach_data is only 48M in stage :sadface_emoji: ) I think you'd have to remove an awful lot of old attachments to free up enough space. I think the best way forward given our constraints is moving the attachments to S3 (Bug 1309706)

Flags: needinfo?(cvalaas)

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 20

•

8 years ago

The attachments on bugzilla-dev are stored in s3, that's why it is so small.

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 21

•

8 years ago

(In reply to chris valaas [:cvalaas] from comment #19) > bugs_fulltext is 13G plus another 20G or so for the fulltext indexes. > > We've got about 50G free at the moment. So we need at least another 75G free > to alter the attach_data table (assuming the table stays the same size, also > assuming the change of row format doesn't affect size-on-disk). > Switching to COMPRESSED format may also save on-disk space, but I have no > idea how much - or what effect it might have on performance. > https://www.facebook.com/notes/mysql-at-facebook/innodb-compression-for-read-only-workloads/10150190328375933/ We're going to temporarily archive a huge number of attachments, and restore them after conversion.

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 22

•

7 years ago

Let's see what this looks like on bugzilla-dev. Can you record how much space everything is using there, so we can see what difference the row format makes?

Flags: needinfo?(cvalaas)

chris valaas [:cvalaas]

Comment 23

•

7 years ago

Attached file individual bugzilla stage db sizes (obsolete) — Details

Flags: needinfo?(cvalaas)

chris valaas [:cvalaas]

Comment 24

•

7 years ago

Overall sizes on bugzilla[12].stage: $ sudo du -sh bugzilla_* 35G bugzilla_allizom_org 20G bugzilla_dev_allizom_org

chris valaas [:cvalaas]

Comment 25

•

7 years ago

After doing some more reading (https://dev.mysql.com/doc/refman/5.6/en/innodb-performance-compression-oltp.html) I would recommend making most of the tables (esp. the ones that get written to frequently) DYNAMIC while tables that are less likely to get frequent writes could be COMPRESSED.

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 26

•

7 years ago

(In reply to chris valaas [:cvalaas] from comment #25) > After doing some more reading > (https://dev.mysql.com/doc/refman/5.6/en/innodb-performance-compression-oltp. > html) I would recommend making most of the tables (esp. the ones that get > written to frequently) DYNAMIC while tables that are less likely to get > frequent writes could be COMPRESSED. The link says on an SSD using compressed also lowers IOPs? And IIRC we were on SSDs?

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 27

•

7 years ago

(In reply to chris valaas [:cvalaas] from comment #24) > Overall sizes on bugzilla[12].stage: > > $ sudo du -sh bugzilla_* > 35G bugzilla_allizom_org > 20G bugzilla_dev_allizom_org bugzilla_dev_allizom_org: all tables are now DYNAMIC. I doubt this changed the size any, but better to measure first. After we have numbers I'm game to convert all or some to COMPRESSED.

chris valaas [:cvalaas]

Comment 28

•

7 years ago

I've got the numbers logged. There was a slight space savings (probably due to purging deleted rows). I'd say you could go ahead and convert them all to COMPRESSED if you wanted. Maybe with the exception of the most write-heavy tables? I'm thinking of "email_setting", for example, where the data is very small (and probably not too compressible), but many many rows - and IIRC it gets a lot of writes.

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 29

•

7 years ago

I'll try a few specific tables first. Altering the code to do this will be a bit more work.

Kendall Libby [:fubar] (he/him)

Comment 30

•

7 years ago

(In reply to Dylan Hardison [:dylan] (he/him) from comment #18) > > :dhouse, can you revive or get me new aws credentials for the bucket > bugzilla-prod-attachments-[something that might be secret]? I'm pretty sure that's the bucket we set up for Nubis; bugzilla-dev is using a bucket in the devservices AWS account (moz-bugzilladev-attach). Which do you want to use?

Flags: needinfo?(dhouse) → needinfo?(dylan)

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 31

•

7 years ago

devservices please.

Flags: needinfo?(dylan)

Kendall Libby [:fubar] (he/him)

Comment 32

•

7 years ago

Bucket created. :dhouse, can you go to the aws console, generate keys for the new user, and add those to the credentials file on bugzillaadm? ping me if you don't remember where those bits are, since we walked through it fairly quickly.

Flags: needinfo?(dhouse)

Comment 33

•

7 years ago

(In reply to Kendall Libby [:fubar] from comment #32) > Bucket created. > > :dhouse, can you go to the aws console, generate keys for the new user, and > add those to the credentials file on bugzillaadm? ping me if you don't > remember where those bits are, since we walked through it fairly quickly. I started looking at this again. I'm able to get into the console for IAM for the new _prod user to generate the keys, but i'm having trouble reaching bugzillaadm/10.22.75.57 today from vpn/jumphost. I'll try again tomorrow.

Comment 34

•

7 years ago

(In reply to Dave House [:dhouse] from comment #33) > (In reply to Kendall Libby [:fubar] from comment #32) > > Bucket created. > > > > :dhouse, can you go to the aws console, generate keys for the new user, and > > add those to the credentials file on bugzillaadm? ping me if you don't > > remember where those bits are, since we walked through it fairly quickly. > > I started looking at this again. I'm able to get into the console for IAM > for the new _prod user to generate the keys, but i'm having trouble reaching > bugzillaadm/10.22.75.57 today from vpn/jumphost. I'll try again tomorrow. I sorted out my access. :fubar, it looks like you added the key on bugzillaadm:/root/.aws/credentials (I confirmed in the aws console that it is the correct key id and secret created 2017-12-21). Is there anything else that needs done for that to be active?

Flags: needinfo?(dhouse) → needinfo?(klibby)

Comment 35

•

7 years ago

Dylan, do you want to upload the file or shall I? The user+credentials are set up.

Flags: needinfo?(klibby) → needinfo?(dylan)

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 36

•

7 years ago

Question no longer relevant

Flags: needinfo?(dylan)

Dylan Hardison [:dylan] (he/him)

Reporter

Updated

•

7 years ago

Assignee: dylan → bobm

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 37

•

7 years ago

So there is a script attached to this, but ultimately you could just set the ROW_FORMAT=DYNAMIC to everything. For longdescs and bugs_fulltext and bugs, it might be best to use ROW_FORMAT=compressed. Bugzilla itself won't care about the row format.

Bob Micheletto [:bobm]

Assignee

Comment 38

•

7 years ago

Phase one of the trial conversion using an identically powered RDS instance and the script took 243 minutes (4 hours) to complete. The production snapshot took approximately 15 minutes. Modifying the parameter group and rebooting the instance before hand took ~5 minutes. I've modified the script to "CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci" for all the tables. It ran into some foreign key constraint issues, and I've skipped those tables for now, since this is a ballpark timing exercise. In practice we should use a script with some peer reviewed SQL in it to do these modifications. Do we need to modify any column sizes to accommodate the new character set?

Bob Micheletto [:bobm]

Assignee

Comment 39

•

7 years ago

All told the character set conversion ran 111 minutes (let's say 2 hours). The following tables needed to be run with FOREIGN_KEY_CHECKS off: bug_cf_blocking_192, bug_cf_locale, cf_blocking_192, cf_locale, namedqueries, profile_setting, setting, setting_value, token_data, tokens. The namedqueries table has a duplicate entry for the namedqueries_userid_idx index, so I didn't convert that table since I'm not sure what to do about that. Presently timing REPAIR and OPTIMIZE on all the tables in case we want to do that.

Bob Micheletto [:bobm]

Assignee

Comment 40

•

7 years ago

(In reply to Bob Micheletto [:bobm] from comment #39) > Presently timing REPAIR and OPTIMIZE on all the tables in case we want to do > that. The REPAIR and OPTIMIZE statements ran for: 251m3.831s (or just over 4 hours)

Bob Micheletto [:bobm]

Assignee

Comment 41

•

7 years ago

:dylan, please verify that running the equivalent of this for every table in bugs is an acceptable way to make the conversion: SET FOREIGN_KEY_CHECKS=0; ALTER TABLE __tablename__ CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; Also, this is the utility the FxA team used to do their utf8mb4 conversion: https://github.com/github/gh-ost It might be interesting to try out a replica based migration in stage, and see how it turns out.

Flags: needinfo?(dylan)

Bob Micheletto [:bobm]

Assignee

Updated

•

7 years ago

Depends on: bmo-emoji-test-run

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 42

•

7 years ago

(In reply to Bob Micheletto [:bobm] from comment #41) > :dylan, please verify that running the equivalent of this for every table in > bugs is an acceptable way to make the conversion: > > SET FOREIGN_KEY_CHECKS=0; > ALTER TABLE __tablename__ CONVERT TO CHARACTER SET utf8mb4 COLLATE > utf8mb4_unicode_ci; > > Also, this is the utility the FxA team used to do their utf8mb4 conversion: > https://github.com/github/gh-ost > It might be interesting to try out a replica based migration in stage, and > see how it turns out. The only thing I'm not 100% on the is the utf8mb4 collation. I remember Sheeri saying something was "wrong" about the default collation method, but my google searching didn't turn anything up.

Flags: needinfo?(dylan)

Bob Micheletto [:bobm]

Assignee

Comment 43

•

7 years ago

(In reply to Dylan Hardison [:dylan] (he/him) from comment #42) > (In reply to Bob Micheletto [:bobm] from comment #41) > The only thing I'm not 100% on the is the utf8mb4 collation. I remember > Sheeri saying something was "wrong" about the default collation method, but > my google searching didn't turn anything up. I was working from https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-conversion.html :cvalaas can you advise us on the correct way to update this?

Flags: needinfo?(cvalaas)

chris valaas [:cvalaas]

Comment 44

•

7 years ago

I'm afraid I'm no more clued in that y'all with this. I would just read the same docs you're looking at. (Although I would look at the docs for 5.6: https://dev.mysql.com/doc/refman/5.6/en/charset-unicode-conversion.html - assuming you're running 5.6.x in AWS?)

Flags: needinfo?(cvalaas)

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 45

•

7 years ago

I found out what piece of a memory I had about the collation: https://stackoverflow.com/questions/43644218/why-is-table-charset-set-to-utf8mb4-and-collation-to-utf8mb4-unicode-520-ci#43692337 Basically we should use the newest utf8mb4_ unicode collation possible.

Dylan Hardison [:dylan] (he/him)

Reporter

Comment 46

•

7 years ago

Attached file PR — Details

Attachment #8933813 - Attachment is obsolete: true

Attachment #8934293 - Attachment is obsolete: true

Dylan Hardison [:dylan] (he/him)

Reporter

Updated

•

7 years ago

Summary: Prepare MySQL database for emoji by performing row format conversion → Add support for utf8=utf8mb4 (switches to dynamic/compressed row format, and changes charset to utf8mb4)

Dylan Hardison [:dylan] (he/him)

Reporter

Updated

•

7 years ago

Blocks: bmo-emoji-test-run

No longer depends on: bmo-emoji-test-run

Dylan Hardison [:dylan] (he/him)

Reporter

Updated

•

7 years ago

Depends on: 1473143

Dylan Hardison [:dylan] (he/him)

Reporter

Updated

•

7 years ago

Alias: bmo-emoji-utf8mb4-option

Dylan Hardison [:dylan] (he/him)

Reporter

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.