What feature should be changed? Please provide the URL of the feature if possible. ================================================================================== Tag names use a custom collation utf8_distinct_ci, rather than the utf8_general_ci collation used by other fields and directly supported by MySQL. utf8_disctinct_ci is introduced in a MariaDB blog post , and was applied in June 2015.  https://mariadb.com/resources/blog/adding-case-insensitive-distinct-unicode-collation What problems would this solve? =============================== The custom collation requires customizing MySQL server, which is not possible with AWS-managed MySQL servers (RDS). If we can switch to a built-in collation, we gain AWS's monitoring infrastruture, automated updates, fast backups, replication, etc. etc., and do not have to manage the database servers ourselves. Who would use this? =================== MDN users, MDN editors, SREs, and MDN developers What would users see? ===================== MDN users will see document tags with accented letters that do not match the spelling in their language, such as "Reference" vs the French "Référence". MDN Editors may try to add tags that only differ in accented characters, and will get alternate spelling. SREs will use standard AWS tools for provisioning, monitoring, and maintaining databases. MDN developers will have simpler MySQL installs in development environments for development and testing. What would users do? What would happen as a result? =================================================== Some MDN users and editors will be upset or angry at incorrectly spelled tags. SREs will spend much less time installing, monitoring, maintaining, and upgrading database servers. MDN developers can focus on a true translated tags feature (bug 671721), and upgrading to Postgres (bug 1159930) may be easier. Is there anything else we should know? ====================================== I've written much more on this issue in a Google doc: https://docs.google.com/document/d/1xGeFQuRZa_aJ_obpgKHgGXT-8NQg61lFyD98eKedyz0/edit# There's a related spreadsheet with all the tags as of today, and some suggested name changes to make them fit in utf8_general_ci: https://docs.google.com/spreadsheets/d/1QjuT5vj1-yLcVa-XNt9JFHNKR3lKryOd162PkY5SLD4/edit
Assignee: nobody → jwhitlock
Commits pushed to master at https://github.com/mozilla/kuma https://github.com/mozilla/kuma/commit/f510674c247a64792748a3510f54ee0951acce72 bug 1391084: Disable duplicate tag test This test was already marked xfail because it fails with an IntegrityError when run with --no-migrations. After the collation is changed from utf8_distinct_ci to utf8_general_ci, it will always fail, because the unique index will disallow the duplicate tag. https://github.com/mozilla/kuma/commit/bf945397d5e1472feb05645facf1be2245e2517f bug 1391084: Migrate tags to utf8_general_ci Split into a data and schema migration that should be run together: 1. Change tags that will collide in utf8_general_ci, by keeping the name of the oldest tag an adding (2), (3), etc. to the later tag names. 2. Update the schema to use utf8_general_ci for tag names. Changes around 100 tags, and runs in 8 seconds locally, so no site downtime expected. https://github.com/mozilla/kuma/commit/b63e30855010ea4a984c99663aaf8e00bee70474 Merge pull request #4376 from jwhitlock/utf8_general_ci_1391084 bug 1391084: Switch tag names from utf8_distinct_ci to utf8_general_ci
Commits pushed to master at https://github.com/mozilla/kuma https://github.com/mozilla/kuma/commit/adfe31e4c7b2d0267c1e2209c9331e4ec7e03b90 bug 1391084: Upgrade sqlparse and related reqs Move sqlparse from contraints to default, because it is needed for RunSQL data migrations. Also update it and some other requirements: * sqlparse 0.1.19 → 0.2.3 - Cleanup, refactoring, bug fixes * django-debug-toolbar 1.4 → 1.8 - sqlparse 0.2 compat, Django 1.11 compatibility, manual setup required (with code changes) * hashin 0.9.0 → 0.11.2 - Update how latest version is determined https://github.com/mozilla/kuma/commit/fc03aad8490991553c149633c2f0280693b6b9ec Merge pull request #4383 from jwhitlock/upgrade-deps-1391084 bug 1391084: Upgrade sqlparse and related reqs
Commits pushed to master at https://github.com/mozilla/kuma https://github.com/mozilla/kuma/commit/b0f3b3495442a505cdd0921a2fa9a25448f23dcb bug 1391084: Update sample database resources * Add the Interactive Editor content experiment * Add waffle flags wiki_samples, redesign_beta, redesign_live, line_length, and sample_frame * Remove waffle flag iperceptions * Add waffle switches foundation_callout and helpful-survey-2 The regenerated sample database reflects removing the custom collation, disambiguating tags, and the homepage with fewer links to Mozilla-specific documentation. https://github.com/mozilla/kuma/commit/f6664cc58154b082e6657c980eabb60f027de44c Merge pull request #4422 from jwhitlock/sample-db-1391084 bug 1391084: Update sample database resources
The run-time dependency on utf8_distinct_ci has been removed. More work is needed to remove this from testing environments and the code. This is tracked in bug 1401253.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.