Closed
Bug 1115608
Opened 10 years ago
Closed 8 years ago
Pushlog ingestion fails if commit messages contain astral unicode characters ("OperationalError: (1366, "Incorrect string value:...")
Categories
(Tree Management :: Treeherder: Data Ingestion, defect, P1)
Tree Management
Treeherder: Data Ingestion
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Assigned: emorley)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
https://tbpl.mozilla.org/?tree=Try&rev=fe405d217531 shows my pile-of-poo try push, but 40 minutes after I pushed, https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=fe405d217531 continues to say that it's still Waiting for a push with revision fe405d217531
Reporter | ||
Comment 1•10 years ago
|
||
Mmm, if someone pushes something more likely to actually be used to mozilla-inbound, like "Bug 1234567 - Fix display of
Severity: normal → blocker
Priority: -- → P1
Reporter | ||
Comment 2•10 years ago
|
||
Heh, pretty awesome that bugzilla ate the rest of that comment.
I suspect, having just pushed a couple of things with try syntax which contained no non-ASCII, that what I actually meant to file was "try pushes that don't produce any builds claim in treeherder to be perpetually 'Waiting for a push with revision bb847e6159c2' instead of showing the lack of any builds produced." Which is still fairly severe and should block killing tbpl, but much less severe than having a way to hide pushes on any tree.
Severity: blocker → normal
Priority: P1 → --
Assignee | ||
Comment 3•10 years ago
|
||
Whichever it is, I think we should still take a look sooner rather than later, so making this a P1 again.
Priority: -- → P1
Updated•10 years ago
|
Assignee: nobody → cdawson
Status: NEW → ASSIGNED
Assignee | ||
Updated•10 years ago
|
Priority: P1 → P2
Comment 4•10 years ago
|
||
Un-assigning from myself. I'm not going to get to this soon.
Assignee: cdawson → nobody
Assignee | ||
Updated•10 years ago
|
Priority: P2 → P3
Assignee | ||
Updated•10 years ago
|
Status: ASSIGNED → NEW
Assignee | ||
Comment 5•9 years ago
|
||
Guess we'll need to take the approach used in bug 1275425 here too. (Strip astral unicode characters; will move that to a common file.)
Assignee: nobody → emorley
Depends on: 1275425
Comment 6•9 years ago
|
||
Well we could just fix it properly :) I don't know if there will be breakage though; some of the stuff about index lengths did scare me a little.
Assignee | ||
Updated•8 years ago
|
Assignee: emorley → nobody
Assignee | ||
Comment 7•8 years ago
|
||
Production is currently failing to ingest autoland and mozilla-central pushes due to:
django.db.utils:OperationalError: (1366, "Incorrect string value: '\\xF0\\x9F\\x92\\xA3 (...' for column 'comments' at row 1")
The issue started on autoland, but the problematic commit has now been merged to mozilla-central. I've asked :tomcat to not merge mozilla-central anywhere else for now.
I've tracked the problematic commit back to:
https://hg.mozilla.org/integration/autoland/rev/ad8230a681a8
...which comes from servo vendoring:
https://github.com/servo/servo/pull/15769
The first line of the commit message includes an astral unicode character.
Options for us:
1) Strip out astral unicode during ingestion
2) Convert the table to utf8mb4 (viable, since it's a small table)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → emorley
Priority: P3 → P1
Summary: Treeherder failed to ingest a try push with U+1F4A9 in the commit message → Pushlog ingestion fails if commit messages contain astral unicode characters ("OperationalError: (1366, "Incorrect string value:...")
Comment 8•8 years ago
|
||
AFAIR from my days working at a MySQL-powered dating website, given that the (MySQL) utf8 charset is a subset of utf8mb4, converting the charset to binary and only then to utf8mb4 should be fast and not spend pointless time doing noop conversions.
Assignee | ||
Comment 9•8 years ago
|
||
Converting to utf8mb4 seemed preferable, since:
* Didn't appear to require code changes to quicker to fix prod
* This table is small so quick to run / easy to switch back if we decide stripping characters is preferred
* It looks like it's the direction we might take in the future for all tables
So against stage I ran:
`ALTER TABLE treeherder.commit CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`
...which completed in 54s.
However exceptions continued on stage, which I've tracked down to Django hardcoding the connection charset to utf8:
https://github.com/django/django/blob/1.10.5/django/db/backends/mysql/base.py#L239
We can either:
1) Override this by modifying settings.py's DATABASES['default']['OPTIONS'] (though not sure if this will have any implications for other tables)
2) Use the stripping approach for now and decide what our overall utf8 strategy should be later
Comment 10•8 years ago
|
||
Let's make "later" not be 2 years again.
Comment 11•8 years ago
|
||
Can't be worse than Django itself though, I guess.
https://code.djangoproject.com/ticket/18392
Comment 12•8 years ago
|
||
Assignee | ||
Comment 13•8 years ago
|
||
Comment on attachment 8842396 [details] [review]
[treeherder] mozilla:django-utf8mb4-connection > mozilla:master
This worked fine on stage (which has the converted commit table).
I'll pre-emptively cherrypick to prod prior to review.
Attachment #8842396 -
Flags: review?(james)
Assignee | ||
Comment 14•8 years ago
|
||
I've converted the `commit` table on prod/prototype to utf8mb4 (using the SQL in comment 9), and a prod push is in progress (2a778372 plus this PR and an unrelated exception fix [6b0b8585] cherrypicked).
Comment 15•8 years ago
|
||
Comment on attachment 8842396 [details] [review]
[treeherder] mozilla:django-utf8mb4-connection > mozilla:master
r+'ing this so :emorley can land.
Attachment #8842396 -
Flags: review?(james) → review+
Comment 16•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/d37cece20b8eb1a37c82a64b0e90261e81af3613
Bug 1115608 - Set Django DB connection charset to utf8mb4
This allows non-BMP unicode values to be sent by the client to MySQL
server, to allow insertion into tables that have been converted to
utf8mb4 (currently only the `commit` table).
Assignee | ||
Comment 17•8 years ago
|
||
No issues since this morning.
Plus side of the conversion rather than stripping the characters: Treeherder now supports emoji \o/
eg: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=378fa5c177f1c4055803e2f445cc8ea5d89d2eea
I've filed bug 1343630 for handling the longer-term utf8mb4 discussion.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•