Open
Bug 1343630
Opened 8 years ago
Updated 5 years ago
Implement longer term solution to one-off MySQL utf8->utf8mb4 commit table conversion in bug 1115608
Categories
(Tree Management :: Treeherder, enhancement, P5)
Tree Management
Treeherder
Tracking
(Not tracked)
NEW
People
(Reporter: emorley, Unassigned)
References
Details
In bug 1115608, production pushlog ingestion was failing due to an emoji having been used in a commit message (non-BMP aka astral unicode character).
In the bug I:
* Manually ran SQL against prototype/stage/prod to convert the `commit` table from utf8 to utf8mb4.
* Landed a PR to update the DATABASES dict in settings.py to make Django set the client charset as utf8mb4 during the connection handshake itself.
This is fine as a short term fix to unblock prod, however:
1) The SQL was manually run and not part of a migration, so now means the prod/... schema differs from Vagrant (ie partly regressing bug 1303763)
2) We need to make a longer-term decision whether to:
- use utf8mb4 across all tables
- use utf8mb4 across some tables (kind of annoying since only possible to do this via RunSQL migrations and not via a native Django feature)
- don't use utf8mb4 at all, and just strip astral characters in the commit table after all (like we do for the failure_line table currently)
Complications for switching are:
* Time to run the conversion on large tables
* Field/index length limits (though in the cases that appeared to be problematic I think we want to change schema anyway, eg job_details)
I'm leaning towards switching all tables, and then we can also remove the failure_line workaround added by bug 1275425.
Comment 1•8 years ago
|
||
We should totally switch all tables non-UTF8-"UTF8" is just broken. Note nox's suggestion from https://bugzilla.mozilla.org/show_bug.cgi?id=1115608#c8 about first converting to binary to speed up the conversion process.
Reporter | ||
Updated•8 years ago
|
Assignee: nobody → emorley
Reporter | ||
Comment 2•6 years ago
•
|
||
I agree switching all tables makes sense.
For anyone working on this in the future, this will require:
- Adjusting the mysql.cnf file in the Treeherder repository used by Vagrant, so that the dev DB that gets created uses utf8mb4 instead of utf8
- Updating the Terraform config for RDS so with the equivalent change from (1): https://github.com/mozilla-platform-ops/devservices-aws/blob/master/treeherder/rds.tf
- Actually updating the existing dev/stage/prod DB schemas (since the change in (2) will only apply to new tables) using the techniques described here:
Assignee: emorley → nobody
Summary: Implement longer term solution to one-off utf8->utf8mb4 commit table conversion in bug 1115608 → Implement longer term solution to one-off MySQL utf8->utf8mb4 commit table conversion in bug 1115608
Comment 3•5 years ago
|
||
We will see if this becomes a higher priority when we explore a different backend
Priority: P3 → P5
You need to log in
before you can comment on or make changes to this bug.
Description
•