Open Bug 977923 Opened 11 years ago

Extended ASCII characters cause importxml.pl to fail on MULTIPLE bugs, but not on ONE

Categories

(Bugzilla :: Bug Import/Export & Moving, defect)

4.4.2
defect
Not set
normal

Tracking

()

UNCONFIRMED

People

(Reporter: jwiseheart, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0 (Beta/Release) Build ID: 20140212131424 Steps to reproduce: I attempted to import multiple bugs with importxml.pl, following the DTD; here's an example XML file that has been stripped down, and names changed for anonymity: <bugzilla version="4.4.2" urlbase="http://bugzilla/" maintainer="admin@ourcompany.com" exporter="bugzilla@ourcompany.com"> <bug> <bug_id>723</bug_id> ...other bug parameters here... </long_desc> <thetext><![CDATA[ This imported text has no special characters. ]]></thetext> </long_desc> </bug> <bug> <bug_id>724</bug_id> ...other bug parameters here... </long_desc> <thetext><![CDATA[ This imported text has extended ASCII characters like Bërt's name and ±5°F ]]></thetext> </long_desc> </bug> <bug> <bug_id>725</bug_id> ...other bug parameters here... </long_desc> <thetext><![CDATA[ This text we import has no special characters either. ]]></thetext> </long_desc> </bug> </bugzilla> Actual results: In this example, bug_id 723 will import, but extended ASCII characters (ASCII code > 128) like ë, ±, and ° cause bug_id 724 to throw an error similar to the following: "not well-formed (invalid token) at line 3, column 286, byte 410 at /usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi/XML/Parser.pm line 187 at ./importxml.pl line 1267" Expected results: To try to get a better look at what was happening, I copied JUST THE ONE BUG that was causing an error into a new XML file, like so: <bugzilla version="4.4.2" urlbase="http://bugzilla/" maintainer="admin@ourcompany.com" exporter="bugzilla@ourcompany.com"> <bug> <bug_id>724</bug_id> ...other bug parameters here... </long_desc> <thetext><![CDATA[ This imported text has extended ASCII characters like Bërt's name and ±5°F ]]></thetext> </long_desc> </bug> </bugzilla> When I try running importxml.pl with one bug at a time like this, it imports fine, extended ASCII characters and all! Why does this work with a single bug with extended ASCII characters, but fails on an XML file with multiple bugs? I have 4,000 bugs to import, and a few hundred have this problem...
You need to log in before you can comment on or make changes to this bug.