problem with character encoding of loaded xml document

RESOLVED WORKSFORME

Status

()

Core
Internationalization
RESOLVED WORKSFORME
13 years ago
13 years ago

People

(Reporter: surkov, Assigned: smontagu)

Tracking

Trunk
x86
Windows 2000
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

(Reporter)

Description

13 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; ru-RU; rv:1.7.8) Gecko/20050511 Firefox/1.0.4 (ax)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; ru-RU; rv:1.7.8) Gecko/20050511 Firefox/1.0.4 (ax)

I load xml file and parse it (in instance, XMLRequestor and DOMParser). Xml file
was saved in 'utf-8' encoding. When I load and parse xml file then charackter
encoding of (in instance) attributes values is 'utf-8'. If I want to work with
the xml document then I must convert it to unicode. Even if I specify 'encoding'
attribute of <?xml?> processing instruction then I have the same behaviour.

I guess if attribute 'encoding' is not specified then mozilla should convert it
from 'utf-8' to unicode and if attribute 'encoding' is presented then mozilla
should covert it to unicode. When mozilla loads xml file by self (in instance,
when I open xul file or load xml file into frame) then mozilla coverts it to
unicode by self. I expect the same behaviour when I load and parse xml files.

Reproducible: Always

Comment 1

13 years ago
But UTF-8 is just a way to encode Unicode characters (besides UTF16,UTF32,UCS2
and UCS4). So i quite don't understand what you mean with "I guess if attribute
'encoding' is not specified then mozilla should convert it
from 'utf-8' to unicode" (UTF-8 is a encoding, Unicode a character set, roughly
said). 
(Reporter)

Comment 2

13 years ago
I don't exactly what encoding is used by mozilla when mozilla loads file. But I
think it is not utf-8. I mean if I load file then mozilla should convert it to
its internal encoding. I have xml document saved in utf-8 with russian symbols.
When I load and parse the xml document then I must to convert attributes and
textnodes to unicode by nsIScriptableUnicodeConverter.ConvertToUnicode(string,
"utf-8"). I think I don't convert by self.

Comment 3

13 years ago
Do you mean the value of a text node you get XMLRequestor and DOMParser is
"U+00D0 U+0090" (a zero-extended sequence of the UTF-8 representation of U+0410)
when it should be "U+0410" because what you have in that node is Cyrillic
Capital Letter A (U+0410) ? If your XML file is in Windows-1251 and a text node
has U+0410 (0xC0 in Windows-1251), do you get U+00C0 instead of "U+0410"? If
that's the case, this is clearly a bug (probably already reported. I may have
reported it or seen it before....)

Will you please put up a simple test case somewhere or attach it to this bug and
tell us what you expect and what you actually get ? 

(Reporter)

Comment 4

13 years ago
Exactly as you say. I'll attach a testcase.
(Reporter)

Comment 5

13 years ago
Created attachment 193258 [details]
xml file in utf-8
(Reporter)

Comment 6

13 years ago
Created attachment 193259 [details]
xml file in windows-1251
(Reporter)

Comment 7

13 years ago
Created attachment 193260 [details]
testcase
(Reporter)

Comment 8

13 years ago
Can't reproduce the problem. Testcase is invalid.
Status: UNCONFIRMED → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.