Closed
Bug 527140
Opened 15 years ago
Closed 14 years ago
Special characters are converted into HTML entities in the translator forms
Categories
(support.mozilla.org :: Localization, task)
support.mozilla.org
Localization
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
Future
People
(Reporter: thomas.lendo, Assigned: jsocol)
References
()
Details
Attachments
(1 file, 6 obsolete files)
1.25 KB,
patch
|
paulc
:
review-
|
Details | Diff | Splinter Review |
If you re-translate an already translated string that contains special characters like Umlauts then SUMO changes the special characters to HTML entities. E.g. "ö" turns into "ö".
I tested this issue only with German Umlauts but maybe this is a more general problem.
Steps to reproduce:
1) Translate a string with the online translator. The translation should contain a German Umlaut. (e.g. "2) Get Personal Help" -> "2) Holen Sie sich persönliche Hilfe" at https://support.mozilla.com/de/kb/)
2) Submit the translation. All's well.
3) Reopen the same string with the online translator. You will see that Umlaut is shown as HTML entity. If you save this then the entity will be shown on the page instead of the correct character.
Updated•15 years ago
|
Target Milestone: --- → 1.6
Reporter | ||
Comment 1•15 years ago
|
||
Please fix this bug asap. I have to correct broken strings of the SUMO interface with ü's and ä's every day.
Comment 3•15 years ago
|
||
This should be okay security-wise, I think.
Assignee: nobody → paulc
Attachment #418302 -
Flags: review?(james)
Assignee | ||
Comment 4•15 years ago
|
||
Comment on attachment 418302 [details] [diff] [review]
escape only single quotes
The 'quotes' option does not HTML encode single quotes, it adds a slash to them.
I'm wondering if, rather than only escaping single quotes, setting the fourth argument (double-encode?) of htmlentities() and htmlspecialchars() to false is safe. I didn't see any examples where it wasn't but I'm not 100% on this. The only way I think it could be a problem is if there was actually an entity in the translation. Does that occur?
Attachment #418302 -
Flags: review?(james) → review-
Assignee | ||
Comment 5•15 years ago
|
||
By "an entity in the translation" I mean, for example, typing the following into the interactive translator:
"This & That"
rather than typing
"This & That"
Assignee | ||
Comment 6•15 years ago
|
||
Here's my example implementation of this with double-encode set to false.
My concern isn't really the translations, but other places where this filter is used. The code is so schizophrenic with it's escaping that I'm wary of making a change like this.
Maybe this could be implemented as a new mode for the {$var|escape} modifier?
Comment 7•15 years ago
|
||
(In reply to comment #6)
> Maybe this could be implemented as a new mode for the {$var|escape} modifier?
Yeah I was thinking that too. It doesn't seem like the risk of modifying something used site-wide is worth the gain for this instance alone.
Comment 8•15 years ago
|
||
Attachment #418302 -
Attachment is obsolete: true
Attachment #419952 -
Flags: review?(james)
Assignee | ||
Updated•15 years ago
|
Attachment #418448 -
Attachment is obsolete: true
Assignee | ||
Comment 9•15 years ago
|
||
Comment on attachment 419952 [details] [diff] [review]
adds new escape modifier which avoids double-encoding
WFM, thx.
Attachment #419952 -
Flags: review?(james) → review+
Comment 10•15 years ago
|
||
r59233
To test this:
1. Your user language (in "My account") must be set to something other than english (e.g. German)
2. Toggle interface translation on.
3. Translate a string with special characters (e.g. for German, the sidebar menu on the homepage -- see comment 0)
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Summary: Special characters are converted into HTML entities → Special characters are converted into HTML entities in the interactive translator
Assignee | ||
Comment 11•15 years ago
|
||
I translated a string into fake French, including an ampersand, then opened the translator again. The special characters are typically just UTF-8 characters.
Thomas, would you be interested in helping us test this out on http://support-stage.mozilla.org/de/kb/?
Reporter | ||
Comment 13•15 years ago
|
||
Hi Stephen. I've translated something at the start page of the staging site and in the language table. It seems to work fine.
But if you search for a string in the language table, the string in the search field will be false encoded after search:
1) Go to https://support-stage.mozilla.org/tiki-edit_languages.php?locale=en-US
2) Select "de" (German) as language to edit
3) Click "Edit language table"
3) Search for "Menüpfad" in the search field under "Edit translations:"
4) The search result list shows right encoding of the string, but the string in the search field is false encoded - see my attached screenshot
Reporter | ||
Comment 14•15 years ago
|
||
Assignee | ||
Comment 15•15 years ago
|
||
Thomas, if you replace the ü with a regular, unicode "ü" and save it, does the problem go away on subsequent edits?
Reporter | ||
Comment 16•15 years ago
|
||
James, tthe translation itself is properly encoded. The problem exists only in the search text field above the translation table.
If I click "translate", the translation itself will be saved and displayed properly. But the search text field entry will turn from "Menüpfad" to "Menüpfad".
Assignee | ||
Comment 17•15 years ago
|
||
Paul, would you mind patching up the tiki-edit_languages.php template to fix this? Ideally if I put in Unicode characters, it should just output Unicode.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: Special characters are converted into HTML entities in the interactive translator → Special characters are converted into HTML entities in the translator forms
Comment 18•15 years ago
|
||
This should do it.
James: should we flag this as tiki_bug for upstreaming the smarty filter?
Attachment #421319 -
Flags: review?(james)
Assignee | ||
Comment 19•15 years ago
|
||
Looking at this a little more closely, I think our approach was flawed in the first place.
For some reason, the filter defaults to ISO-8859-1 (webroot/lib/smarty/libs/plugins/modifier.escape.php:22). Rather than adding a new type to the filter, I think we should consider changing the filter to default to UTF-8.
I've also noticed that webroot/tiki-interactive_trans.php uses htmlentities() on new translations, instead of htmlspecialchars(). So we can type ü into the interactive translator, but it gets saved as 'ü'. The template escaping is then (mostly) working as intended (except for the character set thing.
The search field double-encoding is also a case of htmlentities() where htmlspecialchars() should probably be used. (tiki-edit_languages.php:199)
Laura, would that switch from htmlentities() to htmlspecialchars() cause any problems you can think of? It seems to me we should be encoding only htmlspecialchars(), and only on the way out (ie, not saving encoded entities in the database). Does that seem right?
If so, I can file a new patch to back out the first fix, and make the correct changes here.
Assignee | ||
Comment 20•15 years ago
|
||
Comment on attachment 421319 [details] [diff] [review]
fix search field double-encoding
Per comment 19.
Attachment #421319 -
Flags: review?(james) → review-
Comment 21•15 years ago
|
||
It should definitely be encoded on the way out, and I can't see any reason not to use htmlspecialchars(), probably with doubleencode turned off.
Comment 22•15 years ago
|
||
This reverts the previous fix. Then we can apply James' initial patch, which fixes both problems (I will un-obsolete it).
Attachment #421510 -
Flags: review?(james)
Updated•15 years ago
|
Attachment #418448 -
Attachment is obsolete: false
Assignee | ||
Updated•15 years ago
|
Attachment #421510 -
Flags: review?(james) → review+
Comment 23•15 years ago
|
||
r59810 (reverts)
Assignee | ||
Comment 24•15 years ago
|
||
This patch is not a 100% solution--that's very difficult considering the circumstances--some HTML in interactive translator blocks, already-escaped strings in the database, etc--but this will help.
This just replaces htmlentities in three places with htmlspecialchars. Things like <>& that were already being encoded will still be encoded, but things like ü will not be.
When editing a translation via the interactive translator, the string will be encoded: that's how it's stored in the database. However, when you change the entities back to characters--which you have to do now, anyway--they will stay as characters in the database this time.
Not perfect, but it works for now. We can revisit this problem when we look at the l10n workflow this year.
Assignee: paulc → james
Attachment #418448 -
Attachment is obsolete: true
Attachment #419952 -
Attachment is obsolete: true
Attachment #420892 -
Attachment is obsolete: true
Attachment #421319 -
Attachment is obsolete: true
Attachment #421510 -
Attachment is obsolete: true
Attachment #421976 -
Flags: review?(paulc)
Comment 25•15 years ago
|
||
Comment on attachment 421976 [details] [diff] [review]
replace htmlentities with htmlspecialchars
The first part of this bug worked for me with this patch. I.e. using the interface translator, strings now display on edit just as they do as they are saved. This is great.
However, the second part didn't work. Searching for "Menüpfad" in tiki-edit_languages.php does not return any results -- because the translation is stored already-escaped in the database. However, even when I tried searching for "Menüpfad", it was double escaped into "Men&uuml;pfad" and, again, no results were returned.
To fix this, maybe disable double-encoding.
Bottom line however, I don't like the fact that translators have to search for the escaped version of the string with this patch. We could fix this by running a script to re-encode all of the translated strings (~125K rows in tiki_language). What do you think, James?
Attachment #421976 -
Flags: review?(paulc) → review-
Updated•15 years ago
|
Target Milestone: 1.5.1 → Future
Assignee | ||
Updated•14 years ago
|
Status: REOPENED → RESOLVED
Closed: 15 years ago → 14 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•