Double encoding in Mozilla Community Directory

RESOLVED FIXED

Status

--
major
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: tomer, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

6 years ago
Created attachment 630109 [details] [diff] [review]
patch

For some reasons, I see weird encoding on mozilla.org/community/directory.html. I suspect this introduced by revision 105293, because I see the right encoding before that revision (search for Bulgaria for example).

Currently the file encoding is broken, probably because a double-encoding typo in the file itself, while the page template still delivered correctly. 

Most (if not all) these issues could be fixed by the following command: 
iconv -f utf-8 -t iso-8859-1 directory.html -c > directory.html.1

Patch attached.
Attachment #630109 - Flags: review?(dboswell)
(Reporter)

Comment 1

6 years ago
Bugzilla diff view seems to misinterpret the file encoding. The raw view of the file seems to better deal with the file encoding. I can't understand most of the non English text on the file, but I see Arabic letters in the Arabic link, Cyrillic letters in the Bulgarian link etc.

Comment 2

6 years ago
Comment on attachment 630109 [details] [diff] [review]
patch

Looks good to me.  For next steps, would it make sense for Chris to re-export using this command line fix and I'll repost?
Attachment #630109 - Flags: review?(dboswell) → review+
(Reporter)

Comment 3

6 years ago
(In reply to David Boswell from comment #2)
> Looks good to me.  For next steps, would it make sense for Chris to
> re-export using this command line fix and I'll repost?
I suggest checking the Mediawiki exporting script for errors. Is it available online?
Blocks: 731445

Comment 4

6 years ago
Let me give this a shot now. The raw view of the file on the command line looks right.

Comment 5

6 years ago
:tomer: where did you run the iconv command referenced in comment 1? I ran it from our unix dev server and the -c argument omitted an invalid characters. If I run it without the -c I get "iconv: illegal input sequence at position 643" as soon as it gets to some Japanese characters.

Comment 6

6 years ago
I got it resolved. The iconv command above to convert the character encoding was reversed. I got it to work with my script like:

iconv -t utf-8 -f iso-8859-1 directory.html -c > directory.html.1

I'm running it now.
(Reporter)

Comment 7

6 years ago
(In reply to Chris More [:cmore] from comment #6)
> I got it resolved. The iconv command above to convert the character encoding
> was reversed. I got it to work with my script like:

oops. Sorry for that.

Comment 8

6 years ago
I have the conversion baked into the script that creates the directory page and I'm done a test run of it to make sure everything is fine. I will post a link to the script after I have it up on github.

Comment 9

6 years ago
Created attachment 633161 [details]
Fresh directory output (no conversion)

Here is an updated file and I do not believe it was a problem with my script. I think when David copy/pasted my HTML it must have messed up the character encoding and it did not convert back to UTF-8 properly. This file can be put on SVN as it is the complete directory.html with the php code.

David: What program did you use to copy/paste the output of my script to the file with the php code?
Attachment #630109 - Attachment is obsolete: true

Comment 10

6 years ago
(In reply to Chris More [:cmore] from comment #9)
> David: What program did you use to copy/paste the output of my script to the
> file with the php code?

I viewed the attachment in Firefox on Mac and copied it from there and pasted it into a terminal window running pico.  I've done other l10n edits in pico before, fwiw.

Comment 11

6 years ago
That could have been the problem there. The directory.html didn't have the utf-8 character encoding meta tag in it because that is baked into the PHP template on the community page. When you copy/paste from Firefox it could have got the character encoding messed up because it didn't auto-detect it correctly.

Are you editing files locally on your mac with pico and then checking them in with SVN?

Comment 12

6 years ago
(In reply to Chris More [:cmore] from comment #11)
> Are you editing files locally on your mac with pico and then checking them
> in with SVN?

Yes.

Comment 13

6 years ago
David: While we are figuring out the copy/paste issue. Can you take this attachment and upload it to SVN to replace the current directory? https://bugzilla.mozilla.org/attachment.cgi?id=633161

Comment 14

6 years ago
OK, checked in.  The formatting issue with various languages seems to be fixed now.

http://viewvc.svn.mozilla.org/vc?view=revision&revision=106453

Comment 15

6 years ago
Much better! When will this get pushed to prod?

Comment 16

6 years ago
Looks like it is live and looks good!

http://www.mozilla.org/community/directory.html
(Assignee)

Updated

6 years ago
Component: www.mozilla.org → General
Product: Websites → www.mozilla.org
The page looks good to me.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.