Closed
Bug 272812
Opened 20 years ago
Closed 20 years ago
RSS item Subject has mis-transcoded international characters
Categories
(MailNews Core :: Feed Reader, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: leech_joe, Assigned: mscott)
References
Details
Attachments
(3 files)
6.32 KB,
patch
|
mscott
:
review+
|
Details | Diff | Splinter Review |
2.25 KB,
text/plain
|
Details | |
1.50 KB,
patch
|
mscott
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Thunderbird 1.0 RC1 and earlier
For example [1]. Works fine with firefox but with thunderbird i get Ã?3 instead
of Ö3 for example.
[1] http://rss.orf.at/oesterreich.xml
Reproducible: Always
Steps to Reproduce:
1. subscribe to [1]
2. look at the subject of the messages
Actual Results:
The umlauts are not correctly displayed
Expected Results:
Display them correct :-).
Reporter | ||
Comment 1•20 years ago
|
||
Seems to work sometimes and sometimes not. Here is one that doesn't work:
http://rss.orf.at/oe3.xml
Subject: Ä?3 Freundeskreis
linked to
http://oe3.orf.at/oe3.orf?read=detail&id=224269&channel=4
Assignee | ||
Comment 2•20 years ago
|
||
the ones that don't work are listing the wrong charset so we guess incorrectly.
Typically these feeds list a specific charset even though the feed is UTF-8 so
we use the charset it lists. (I've seen the opposite too where it says the feed
is UTF-8 but the strings are in a charset). I'm not sure what we can do for these...
Reporter | ||
Comment 3•20 years ago
|
||
Ok, so the problem is caused by the feed and not by thunderbird.
Hmm firefox guesses correct. So why not simply using the same algorithm, when
thunderbird has to guess ?
btw. It seems also Bugzilla doesn't like umlauts (see my first posting) :-).
Assignee | ||
Comment 4•20 years ago
|
||
Here's why I think your first example feed is invalid:
It says the charset is: iso-8859-15
but it looks like the characters in the actual feed are UTF-8.
So we end up converting the characters from iso-8859-15 to unicode and they look
incorrect....
Reporter | ||
Comment 5•20 years ago
|
||
Ok I contacted the customer service to correct the bug. Thanks anyway.
Reporter | ||
Comment 6•20 years ago
|
||
Hmm they say that the feed is valid. I checked this myself with [1] and [2] and
no problems occurred. So this seems to be a bug in thunderbird (or in the
validater implementation -> but I do not believe this).
Kind regards, Joe
[1] http://www.w3.org/RDF/Validator/
[2] http://www.feedvalidator.org/
PS: Works fine with konqueror and centericq.
Comment 7•20 years ago
|
||
The characters are fine if you print the feed to the console just before it hits the RDF parser in
parseAsRSS1(), and a print statement in nsXMLHttpRequest showed that the charset is being correctly
detected as ISO-8859-15.
http://bonsai.mozilla.org/cvsview2.cgi?
diff_mode=context&whitespace_mode=show&file=nsIRDFXMLParser.idl&branch=&root=/
cvsroot&subdir=mozilla/rdf/base/idl&command=DIFF_FRAMESET&rev1=1.3&rev2=1.4
Seems it the method was changed to take an nsAUTF8String as input... shouldn't it be a double byte
string to talk to javascript? The change says it went in to correct this exact problem, but the only place
it's used in script is in Feed.js, according to lxr.
Comment 8•20 years ago
|
||
*** Bug 285391 has been marked as a duplicate of this bug. ***
Comment 9•20 years ago
|
||
Note that the feed shown at the duplicate is served as UTF-8, but still has
problems with the Subject line:
http://japan-in-nutshell.blogspot.com/atom.xml
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → Windows 2000
Summary: RSS subject index doesn't display umlauts correct → RSS item Subject has mis-transliterated international characters
Comment 10•20 years ago
|
||
Duh, that's not "transliteration" we're talking about.
Summary: RSS item Subject has mis-transliterated international characters → RSS item Subject has mis-transcoded international characters
Comment 11•20 years ago
|
||
Comment 12•20 years ago
|
||
request.responseText returns UTF-8 by default. This is causing problems when
RSS1 documents are not returned in a compatible charset, perhaps because of the
way we override the MIME type in the request (perhaps this also catches the
charset parameter?).
In any case, the XML parser seems to catch this (perhaps by sniffing the XML
declaration) and the DOM is always correct. As a short term solution, I decided
to serialize the responseXML DOM and feed that to the RDF parser, instead of
the responseText. I suppose there's a performance penalty in there, but it was
imperceptable to me, and it's the only solution I could think of that would
keep the changes minimal during release mode.
On the plus side, the patch fixes every encoding bug I could find.
Comment 13•20 years ago
|
||
Comment 14•20 years ago
|
||
(In reply to comment #9)
> Note that the feed shown at the duplicate is served as UTF-8, but still has
> problems with the Subject line:
> http://japan-in-nutshell.blogspot.com/atom.xml
This blog appears to be in Hungarian, and seems to work with the patch (I missed
this one in my testing).
Updated•20 years ago
|
Attachment #185984 -
Flags: review?(mscott)
Assignee | ||
Comment 15•20 years ago
|
||
Comment on attachment 185984 [details] [diff] [review]
character encoding fixes
thanks a lot Robert.
Attachment #185984 -
Flags: review?(mscott) → review+
Assignee | ||
Updated•20 years ago
|
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird1.1
Comment 16•20 years ago
|
||
I'm testing 2005-06-15-05-trunk.
Some of feeds imported from OPML file are not loaded.
e.g. http://weblogs.mozillazine.org/hyatt/blogger_rss.xml
JavaScript Console says:
Error: XMLSerializer is not defined
Source File: chrome://messenger-newsblog/content/feed-parser.js Line: 67
Same feeds added myself are loaded correctly. Hmm...
Comment 17•20 years ago
|
||
(In reply to comment #16)
This problem has happened since attachment 185984 [details] [diff] [review] checked in.
No problem with 2005-06-12-05-trunk.
Comment 18•20 years ago
|
||
I see the bug here too, now that I've added Hyatt's feed. It has to do with
declaring that XMLSerializer. The problem seems to go away if the one declared
at the top of the file is used.
Updated•20 years ago
|
Attachment #186405 -
Flags: review?(mscott)
Comment 19•20 years ago
|
||
There's a little glitch with OPML-imported feeds.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Updated•20 years ago
|
Attachment #186405 -
Flags: review?(mscott) → review+
Assignee | ||
Updated•20 years ago
|
Attachment #186405 -
Attachment description: reuse the serializer instance at the top of the file → [patch checked in] reuse the serializer instance at the top of the file
Comment 20•20 years ago
|
||
(In reply to comment #17)
> (In reply to comment #16)
> This problem has happened since attachment 185984 [details] [diff] [review] [edit] checked in.
> No problem with 2005-06-12-05-trunk.
Kohei, could you verify this is fixed for you with the 2005-06-16 trunk? I think
it's patched up.
Comment 21•20 years ago
|
||
I just tested 2005-06-16-06-trunk.
Looks good. No error. Thanks!
Status: REOPENED → RESOLVED
Closed: 20 years ago → 20 years ago
Resolution: --- → FIXED
Comment 22•20 years ago
|
||
*** Bug 276350 has been marked as a duplicate of this bug. ***
Comment 23•20 years ago
|
||
*** Bug 293279 has been marked as a duplicate of this bug. ***
![]() |
||
Comment 24•19 years ago
|
||
Note that the analysis here was wrong. The encoding of responseText is always UTF16 and any time there is an actual responseXML DOM around the responseText is correct. What you guys _actually_ ran into was bug 230275 -- the RDF parser is broken. I suggest backing out this hackaround once that issue is fixed...
Depends on: 230275
Component: RSS → Feed Reader
Product: Thunderbird → MailNews Core
Target Milestone: Thunderbird1.1 → ---
You need to log in
before you can comment on or make changes to this bug.
Description
•