Closed Bug 1391084 Opened 7 years ago Closed 7 years ago

Change tag names to use utf8_general_ci collation


( Graveyard :: General, enhancement)

Not set


(Not tracked)



(Reporter: jwhitlock, Assigned: jwhitlock)



(Keywords: in-triage, Whiteboard: [specification][type:change])

What feature should be changed? Please provide the URL of the feature if possible.
Tag names use a custom collation utf8_distinct_ci, rather than the utf8_general_ci collation used by other fields and directly supported by MySQL. utf8_disctinct_ci is introduced in a MariaDB blog post [1], and was applied in June 2015.


What problems would this solve?
The custom collation requires customizing MySQL server, which is not possible with AWS-managed MySQL servers (RDS).

If we can switch to a built-in collation, we gain AWS's monitoring infrastruture, automated updates, fast backups, replication, etc. etc., and do not have to manage the database servers ourselves.

Who would use this?
MDN users, MDN editors, SREs, and MDN developers

What would users see?
MDN users will see document tags with accented letters that do not match the spelling in their language, such as "Reference" vs the French "Référence".

MDN Editors may try to add tags that only differ in accented characters, and will get alternate spelling.

SREs will use standard AWS tools for provisioning, monitoring, and maintaining databases.

MDN developers will have simpler MySQL installs in development environments for development and testing.

What would users do? What would happen as a result?
Some MDN users and editors will be upset or angry at incorrectly spelled tags.

SREs will spend much less time installing, monitoring, maintaining, and upgrading database servers.

MDN developers can focus on a true translated tags feature (bug 671721), and upgrading to Postgres (bug 1159930) may be easier. 

Is there anything else we should know?
I've written much more on this issue in a Google doc:

There's a related spreadsheet with all the tags as of today, and some suggested name changes to make them fit in utf8_general_ci:
Assignee: nobody → jwhitlock
Blocks: 110799
Blocks: 1110799
No longer blocks: 110799
See Also: → 1159930
Commits pushed to master at
bug 1391084: Disable duplicate tag test

This test was already marked xfail because it fails with an
IntegrityError when run with --no-migrations. After the collation is
changed from utf8_distinct_ci to utf8_general_ci, it will always fail,
because the unique index will disallow the duplicate tag.
bug 1391084: Migrate tags to utf8_general_ci

Split into a data and schema migration that should be run together:

1. Change tags that will collide in utf8_general_ci, by keeping the name of
the oldest tag an adding (2), (3), etc. to the later tag names.

2. Update the schema to use utf8_general_ci for tag names.

Changes around 100 tags, and runs in 8 seconds locally, so no site
downtime expected.
Merge pull request #4376 from jwhitlock/utf8_general_ci_1391084

bug 1391084: Switch tag names from utf8_distinct_ci to utf8_general_ci
Commits pushed to master at
bug 1391084: Upgrade sqlparse and related reqs

Move sqlparse from contraints to default, because it is needed for
RunSQL data migrations. Also update it and some other requirements:

* sqlparse 0.1.19 → 0.2.3 - Cleanup, refactoring, bug fixes
* django-debug-toolbar 1.4 → 1.8 - sqlparse 0.2 compat, Django 1.11
  compatibility, manual setup required (with code changes)
* hashin 0.9.0 → 0.11.2 - Update how latest version is determined
Merge pull request #4383 from jwhitlock/upgrade-deps-1391084

bug 1391084: Upgrade sqlparse and related reqs
Commits pushed to master at
bug 1391084: Update sample database resources

* Add the Interactive Editor content experiment
* Add waffle flags wiki_samples, redesign_beta, redesign_live,
  line_length, and sample_frame
* Remove waffle flag iperceptions
* Add waffle switches foundation_callout and helpful-survey-2

The regenerated sample database reflects removing the custom collation,
disambiguating tags, and the homepage with fewer links to
Mozilla-specific documentation.
Merge pull request #4422 from jwhitlock/sample-db-1391084

bug 1391084: Update sample database resources
See Also: → 1401253
The run-time dependency on utf8_distinct_ci has been removed. More work is needed to remove this from testing environments and the code. This is tracked in bug 1401253.
Closed: 7 years ago
Resolution: --- → FIXED
Product: → Graveyard
You need to log in before you can comment on or make changes to this bug.