Closed Bug 305075 Opened 19 years ago Closed 19 years ago

problem with character encoding of loaded xml document

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: surkov, Assigned: smontagu)

Details

Attachments

(3 files)

xml file in utf-8 19 years ago alexander :surkov (:asurkov) 40 bytes, text/xml		Details
xml file in windows-1251 19 years ago alexander :surkov (:asurkov) 39 bytes, text/xml		Details
testcase 19 years ago alexander :surkov (:asurkov) 1.99 KB, application/vnd.mozilla.xul+xml		Details

alexander :surkov (:asurkov)

Reporter

Description

•

19 years ago

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; ru-RU; rv:1.7.8) Gecko/20050511 Firefox/1.0.4 (ax)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; ru-RU; rv:1.7.8) Gecko/20050511 Firefox/1.0.4 (ax)

I load xml file and parse it (in instance, XMLRequestor and DOMParser). Xml file
was saved in 'utf-8' encoding. When I load and parse xml file then charackter
encoding of (in instance) attributes values is 'utf-8'. If I want to work with
the xml document then I must convert it to unicode. Even if I specify 'encoding'
attribute of <?xml?> processing instruction then I have the same behaviour.

I guess if attribute 'encoding' is not specified then mozilla should convert it
from 'utf-8' to unicode and if attribute 'encoding' is presented then mozilla
should covert it to unicode. When mozilla loads xml file by self (in instance,
when I open xul file or load xml file into frame) then mozilla coverts it to
unicode by self. I expect the same behaviour when I load and parse xml files.

Reproducible: Always

Frank Wein [:mcsmurf]

Comment 1

•

19 years ago

But UTF-8 is just a way to encode Unicode characters (besides UTF16,UTF32,UCS2
and UCS4). So i quite don't understand what you mean with "I guess if attribute
'encoding' is not specified then mozilla should convert it
from 'utf-8' to unicode" (UTF-8 is a encoding, Unicode a character set, roughly
said).

alexander :surkov (:asurkov)

Reporter

Comment 2

•

19 years ago

I don't exactly what encoding is used by mozilla when mozilla loads file. But I
think it is not utf-8. I mean if I load file then mozilla should convert it to
its internal encoding. I have xml document saved in utf-8 with russian symbols.
When I load and parse the xml document then I must to convert attributes and
textnodes to unicode by nsIScriptableUnicodeConverter.ConvertToUnicode(string,
"utf-8"). I think I don't convert by self.

Jungshik Shin

Comment 3

•

19 years ago

Do you mean the value of a text node you get XMLRequestor and DOMParser is
"U+00D0 U+0090" (a zero-extended sequence of the UTF-8 representation of U+0410)
when it should be "U+0410" because what you have in that node is Cyrillic
Capital Letter A (U+0410) ? If your XML file is in Windows-1251 and a text node
has U+0410 (0xC0 in Windows-1251), do you get U+00C0 instead of "U+0410"? If
that's the case, this is clearly a bug (probably already reported. I may have
reported it or seen it before....)

Will you please put up a simple test case somewhere or attach it to this bug and
tell us what you expect and what you actually get ?

alexander :surkov (:asurkov)

Reporter

Comment 4

•

19 years ago

Exactly as you say. I'll attach a testcase.

alexander :surkov (:asurkov)

Reporter

Comment 5

•

19 years ago

Attached file xml file in utf-8 — Details

alexander :surkov (:asurkov)

Reporter

Comment 6

•

19 years ago

Attached file xml file in windows-1251 — Details

alexander :surkov (:asurkov)

Reporter

Comment 7

•

19 years ago

Attached file testcase — Details

alexander :surkov (:asurkov)

Reporter

Comment 8

•

19 years ago

Can't reproduce the problem. Testcase is invalid.

Status: UNCONFIRMED → RESOLVED

Closed: 19 years ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

problem with character encoding of loaded xml document

Categories

(Core :: Internationalization, defect)

Tracking

()

People

(Reporter: surkov, Assigned: smontagu)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(3 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Attachment

General

Description

File Name

Content Type