Janet tried to revert the page to its state before my edits; the history shows success, but the generated page is still truncated.
Severity: normal → major
The original page has been restored. I just did a hard refresh (Ctrl + Shift + R). Not sure why that's necessary, but it should make a good workaround for now.
Emoji require 4-byte unicode characters, which are not supported by the utf8 character set used by our tables in MySQL. There is a utf8mb4 character set: http://stackoverflow.com/questions/7814293/how-to-insert-utf-8-mb4-characteremoji-in-ios5-in-mysql#8109948 We're planning on moving to PostgreSQL, and will ensure that 4-byte unicode is supported in the new database.
Depends on: 1159930
Summary: MDN truncates pages with Unicode supplementary characters → MDN truncates pages with Unicode supplementary characters (codepoint above U+FFFF)
I did manage to revert the Text_formatting page and make a few corrections to it.
This bug isn't assigned to anyone. I'd like to help. The importance should be changed from "major" to "critical", since it causes loss of data and prevents documenting modern technology. This bug currently "depends on 1159930". I think that turns out to be a mistake. Switching from MySQL to PostgreSQL is evidently difficult and doesn't appear to have critical importance or to have anyone assigned to it. There doesn't appear to be any expectation for bug 1159930 to be resolved within any time frame. Bug 1159930 may be a worthwhile long-term goal if anyone is sufficiently motivated, but it's counterproductive to treat an easy-to-solve critical bug as dependent on a hard-to-solve non-critical enhancement. This bug can be solved with a few MySQL commands, like "ALTER TABLE mytable charset=utf8mb4, MODIFY COLUMN textfield1 VARCHAR(255) CHARACTER SET utf8mb4..." as described in the stackoverflow link referred to above. Figuring out the right MySQL commands should take a few hours at most. Of course, the webmaster will need to make backups, upgrade to a newer version of MySQL if a version before 5.5 is currently in use, and be in charge of actually running the commands while minimizing disruption of service. Could anyone please add a comment here indicating which MySQL version is currently in use, and references to the table structures? Also, this same bug exists on both developer.mozilla.org and bugzilla.mozilla.org. Are there two separate databases for the two subdomains? Do they run on different servers? Is there a mechanism in place for duplicating the websites for test purposes?
I'm sorry you were affected by this bug, and thank you for advocating for it. I believe major is the correct severity, due to the frequency this limitation is encountered. In any case, Bugzilla severity is one of several factors in deciding what work to do, and changing it will not change the priority. If you'd like to work on this, a good place to start is the Kuma development environment. It uses MySQL 5.6 and has the same table structure as production: https://kuma.readthedocs.io/en/latest/ Yes, bugzilla.mozilla.org (BMO) has a different database and development process. If there's an existing bug on that issue, you can reach the BMO team by following and commenting on that bug. If not, you can file it and refer to this bug.
Thank you, John. I've installed Docker and Kuma. The Kuma installation has problems running, similar to those reported at https://github.com/puckel/docker-airflow/issues/18 ("Can't connect to MySQL server on 'mysql' (111)"); I'll try to solve that. Anyway, the source of bug 857438 is evident in kuma/mdn_sample_db.sql. It has 138 occurrences of 'utf8' and zero occurrences of 'utf8mb4'. The solution should be to replace all occurrences in kuma/mdn_sample_db.sql of 'utf8' with 'utf8mb4', with the exception of one line that begins "INSERT INTO ..." and includes URLs like "https://github.com/search?utf8=...".
BTW, the corresponding BMO bug is: https://bugzilla.mozilla.org/show_bug.cgi?id=868867
This appears to be a good guide for updating to utfmb4: https://mathiasbynens.be/notes/mysql-utf8mb4 It mentions a potential problem: > The InnoDB storage engine has a maximum index length of 767 bytes, so for utf8 or utf8mb4 columns, you can index a maximum of 255 or 191 characters, respectively. If you currently have utf8 columns with indexes longer than 191 characters, you will need to index a smaller number of characters when using utf8mb4. (Because of this, I had to change some indexed VARCHAR(255) columns to VARCHAR(191).) We have a unique index on email fields, defined as VARCHAR(255), and reducing that to VARCHAR(191) would mean that some valid email addresses would not be allowed: https://stackoverflow.com/questions/386294/what-is-the-maximum-length-of-a-valid-email-address https://stackoverflow.com/questions/2049502/what-characters-are-allowed-in-an-email-address https://tools.ietf.org/html/rfc6531 It's not clear to me if supplementary unicode characters are allowed in email addresses, and may require a closer deeper reading of RFC 6531. To keep the unique index on emails, we may need to keep the utf8 encoding, or wait for InnoDB to support 1026-key indexes. I'm not sure if there is an open bug for that. Or, switch to postgres.
bug 1253535 is specific to bugzilla.mozilla.org, and comment 46  has some of the requirements to get it working (MySQL 5.7, DYNAMIC/COMPRESSED row formats, innodb_large_prefix setting).  https://bugzilla.mozilla.org/show_bug.cgi?id=1253535#c46
Yes, innodb_large_prefix should solve the VARCHAR(255) problem, and DYNAMIC/COMPRESSED row formats may be beneficial for efficiency. The complete solution should be to make those two changes along with changing utf8 to utf8mb4.
You need to log in before you can comment on or make changes to this bug.