Closed
Bug 1184513
Opened 10 years ago
Closed 5 years ago
MDN page with a non-BMP character is not saved correctly
Categories
(developer.mozilla.org Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: fredw, Unassigned)
References
Details
(Keywords: in-triage, Whiteboard: [specification][type:bug])
What did you do?
================
1. Edit a page on MDN.
2. Move to the source tab and insert "<p>blah blah 𝓐 blah blah</p>"
3. Go back to the wysiwyg tab, you should see "blah blah
Reporter | ||
Comment 1•10 years ago
|
||
oh no, the end of my comment has been cut too, I though this bug was fixed in Bugzilla :-( :-(
Reporter | ||
Comment 2•10 years ago
|
||
Reposting without using U+1D4D0...
-----------------------------------------
What did you do?
1. Edit a page on MDN.
2. Move to the source tab and insert "<p>blah blah 𝓐 blah blah</p>"
3. Go back to the wysiwyg tab, you should see "blah blah *** blah blah" where *** is U+1D4D0 MATHEMATICAL BOLD SCRIPT CAPITAL A
4.Save the page
What happened?
The page is cut before the U+1D4D0 character.
What should have happened?
The page should render the same as in wysiwyg view.
Is there anything else we should know?
U+1D4D0 is made of surrogate pairs. Maybe that's not correctly handled by Kuma.
Inserting the math alphanum symbols (https://en.wikipedia.org/wiki/Mathematical_alphanumeric_symbols for more characters) is needed for the MathML a11y test page (https://developer.mozilla.org/en-US/docs/Mozilla/MathML_Project/a11y).
Reporter | ||
Comment 3•10 years ago
|
||
(In reply to Frédéric Wang (:fredw) from comment #1)
> oh no, the end of my comment has been cut too, I though this bug was fixed
> in Bugzilla :-( :-(
OK, this is bug 405011 but bmo is still running 4.2.x
Comment 4•10 years ago
|
||
Thanks for the bug report, :fredw. Can you include/add a URL for an example/test page that breaks?
Reporter | ||
Comment 5•10 years ago
|
||
I think
https://developer.mozilla.org/en-US/docs/Mozilla/MathML_Project/Test_Surrogate_Pairs
is what you want, and is essentially comment 0
Flags: needinfo?(fred.wang)
Comment 6•10 years ago
|
||
Updated•9 years ago
|
Assignee: nobody → jezdez
Status: NEW → ASSIGNED
Comment 7•9 years ago
|
||
So long story short: MySQL's utf8 collations don't support the full utf-8 encoding scheme and doesn't support 4 byte Unicode characters like the mathematical symbols. That also includes emoji btw.
The way forward is to instead switch to a separate collation called utf8mb4 for those fields that we deem requiring the use of those characters (revision content, document rendered html etc).
This should not result in data loss but is still freaking me out. There are other options, like replacing those characters with HTML encoded values, but I'm not so stoked about that since we want to store the exact state of data as submitted by the user.
This blogpost has some good detailed explanation if you're willing to bath in the glory of database configuration issues: https://mathiasbynens.be/notes/mysql-utf8mb4
I'm weary of tasks like that since they involved touching lots of data and I'm just not comfortable with doing that on a regular basis as it is happening now. Instead I just want to go back using Postgres that doesn't have that problem at all.
To be clear for the project managers, this is a hard problem and may require a hard hat deploy to change the collations.
Comment 8•9 years ago
|
||
Thanks for the info. I added a comment and "See also" on bug 1159930 pointing back to this. I agree this would freak me out a bit too. At some point of installing multiple custom collations onto MySQL, we are spinning our wheels trying to re-create Postgres features in MySQL. :/
Comment 9•9 years ago
|
||
There is a PR now for this: https://github.com/mozilla/kuma/pull/3403
Sadly the tests don't pass (https://travis-ci.org/mozilla/kuma/builds/75097709) with an error that was also mentioned in the blog post "Specified key was too long; max key length is 767 bytes".
The reason behind that is that can be found in this paragraph:
The same goes for index keys. The InnoDB storage engine has a maximum
index length of 767 bytes, so for utf8 or utf8mb4 columns, you can index
a maximum of 255 or 191 characters, respectively. If you currently have
utf8 columns with indexes longer than 191 characters, you will need to
index a smaller number of characters when using utf8mb4. (Because of
this, I had to change some indexed VARCHAR(255) columns to VARCHAR(191).)
Since the database migration that triggers that error is from a 3rd party app django-celery we can't easily change the length of the columns that lead to this error without essentially overwriting django-celery's code and/or forking the app.
Long story short, this has become a lot harder and IMO not worth the hassle. We should concentrate moving to Postgres instead of patching up MySQL.
Comment 10•9 years ago
|
||
:lonnen - we are blocking this bug on moving to Postgres and/or changing db backends, which could be part of a potential AWS migration.
Depends on: 1159930
Flags: needinfo?(chris.lonnen)
Comment 11•9 years ago
|
||
It's certainly something we can do. For stability and simplicity in the move I recommend doing it in two steps -- move to AWS, then change datastores.
Flags: needinfo?(chris.lonnen)
Updated•9 years ago
|
Assignee: jezdez → nobody
Status: ASSIGNED → NEW
Reporter | ||
Comment 12•9 years ago
|
||
Hi Florian. I was trying to add Maghreb / Machrek Arabic Style this morning and got beaten by this bug again. Can you please revert to revision "Jun 19, 2016, 8:45:50 AM" (it seems that I don't have permission to do that):
https://developer.mozilla.org/en-US/docs/Mozilla/MathML_Project/MathML_Torture_Test$history
Flags: needinfo?(fscholz)
Comment 13•9 years ago
|
||
Hi Fred, I'm sorry this bug is still an issue. I have reverted the document now.
Flags: needinfo?(fscholz)
Comment 14•5 years ago
|
||
MDN Web Docs' bug reporting has now moved to GitHub. From now on, please file content bugs at https://github.com/mdn/sprints/issues/ and platform bugs at https://github.com/mdn/kuma/issues/.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Updated•5 years ago
|
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•