Closed
Bug 415796
Opened 18 years ago
Closed 17 years ago
All webservice methods fail if a string has multibyte characters in it
Categories
(Bugzilla :: WebService, defect, P1)
Tracking
()
RESOLVED
FIXED
Bugzilla 3.2
People
(Reporter: LpSolit, Assigned: mkanat)
References
Details
Attachments
(2 files, 1 obsolete file)
2.14 KB,
patch
|
LpSolit
:
review+
|
Details | Diff | Splinter Review |
5.60 KB,
text/plain
|
Details |
I just tested the existing Bug.get and the new User.get webservice methods in 3.1.3+ (but I suspect the same problem exists in 3.0.3+), and they both fail if a string has non-ASCII characters in it (such as "Frédéric"). Despite the returned XML file starts with <?xml version="1.0" encoding="UTF-8"?>, such strings are mangled, generating the following error:
not well-formed (invalid token) at line 1, column 203, byte 203 at /usr/lib/perl5/vendor_perl/5.8.8/i386-linux/XML/Parser.pm line 187
So for some reason, strings we return are not valid UTF-8 strings. Did we forget to turn on a parameter somewhere?
Requesting blocking as non-US installations use a lot of non-ASCII characters, in bug summaries, product names, real names, etc..., making webservice unusable.
Flags: blocking3.2?
Flags: blocking3.0.4?
Assignee | ||
Comment 1•18 years ago
|
||
This is definitely a Bugzilla 3.2 blocker. I suspect the same problem does not exist in 3.0, since we changed the utf8 bit in 3.2. We would have to actually check 3.0 before making this a blocker or targeting it at 3.0.
Flags: blocking3.2? → blocking3.2+
Target Milestone: Bugzilla 3.0 → Bugzilla 3.2
![]() |
Reporter | |
Comment 2•18 years ago
|
||
OK, I just tested it on 3.0.3 and I cannot reproduce the problem there. So definitely a 3.2-problem only. Good.
Flags: blocking3.0.4?
Version: 3.0.3 → 3.1.3
![]() |
Reporter | |
Comment 3•17 years ago
|
||
Marc, any idea how to fix this problem? That's the last blocker we want to fix before releasing 3.1.4.
Comment 4•17 years ago
|
||
I looked into this, but I can't seem to get a grip on this. I don't think I can help much here.
Assignee | ||
Comment 5•17 years ago
|
||
Okay, I suppose I'll have to take it then, unless himorin has any ideas.
Assignee: webservice → mkanat
Comment 6•17 years ago
|
||
i'll look into this bug. (from now.)
Comment 7•17 years ago
|
||
With cvs tip at Feb 7, i've got xml-rpc error message like
<?xml version="1.0" encoding="UTF-8"?>
<methodResponse><
fault><value><struct><member><name>faultString</name>
<value><string>Wide character in subroutine entry at /usr/lib/perl5/site_perl/5.8.5/XMLRPC/Lite.pm line 167.</string></value>
</member><member><name>faultCode</name>
<value><string>Client</string></value></member></struct></value>
</fault></methodResponse>.
I'm not sure that this is the same one which LpSolit describes.
Bug, it's not related with utf8 string.
For method Bug.get, this warning occured with the bug object
> $item{'internals'} = $bug;
$bug is from new Bugzilla::Bug($bug_id).
Comment 8•17 years ago
|
||
ah, i forget to comment about utf8-ed string.
when i commented out Bug.get internals, i could get the normal responce from the xml-rpc interface.
Comment 9•17 years ago
|
||
as talking at irc, this is not a bug related on utf8=1.
so, my comment #7 is not related with this bug.
Assignee | ||
Comment 10•17 years ago
|
||
I've started looking into this. The problem is that internally SOAP is serializing data into Base64 and then deserializing it into byte strings, which turns off the utf8 bit incorrectly. Fixing this behavior is proving to be somewhat difficult.
Flags: blocking3.2+ → blocking3.2-
Assignee | ||
Updated•17 years ago
|
Flags: blocking3.2- → blocking3.2+
Comment 11•17 years ago
|
||
(In reply to comment #10)
> I've started looking into this. The problem is that internally SOAP is
> serializing data into Base64 and then deserializing it into byte strings, which
> turns off the utf8 bit incorrectly. Fixing this behavior is proving to be
> somewhat difficult.
mkanat, the problem you've described on comment #10 seems to be the same as my comment #7, but not the original LpSolit's.
i think you should try with utf8=0.
![]() |
Reporter | |
Updated•17 years ago
|
Priority: -- → P1
Assignee | ||
Comment 12•17 years ago
|
||
Okay, this fixes the XMLRPC data type serializer (which is the problem, see the bug referenced in the code), but seems to cause some other problem that I haven't figured out yet.
Assignee | ||
Comment 13•17 years ago
|
||
This fixes it.
The problem was that some data ("internals" in our case) wasn't being passed through SOAP::Data->type, so it was getting auto-typed. This was turning our utf-8 strings into base64. Then, when they were deserialized on the other end (back into strings) they didn't have the utf-8 bit set on them. This was supposed to be fixed in SOAP::Lite 0.71, but it doesn't seem to be.
So, I had to subclass XMLRPC::Serializer to fix it. HOWEVER, there's another bug in SOAP::Lite that causes array datatypes to break if you subclass XMLRPC::Serializer, so I had to work around that:
http://rt.cpan.org/Ticket/Display.html?id=34515
Finally, this code now throws a warning, but seems to work correctly. I've looked at the raw XML and it's correct. The warning seems to be a bug in SOAP::Lite:
http://rt.cpan.org/Ticket/Display.html?id=34515
Attachment #311935 -
Attachment is obsolete: true
Attachment #312412 -
Flags: review?(LpSolit)
Assignee | ||
Comment 14•17 years ago
|
||
That first RT link should be:
http://rt.cpan.org/Ticket/Display.html?id=34514
Status: NEW → ASSIGNED
![]() |
Reporter | |
Comment 15•17 years ago
|
||
In debug mode, I get (on my local installation):
SOAP::Parser::decode: ()
Can't call method "paramsall" on an undefined value at /usr/bin/XMLRPCsh.pl line 23, <> line 1.
Assignee | ||
Comment 16•17 years ago
|
||
(In reply to comment #15)
> In debug mode, I get (on my local installation):
Hmm, but it should have printed out the XML before that. Could you attach the XML that it's getting that throws that error? (Also, what version of SOAP::Lite?)
![]() |
Reporter | |
Comment 17•17 years ago
|
||
Using SOAP::Lite 0.69, XML::Parser 2.34.
Assignee | ||
Comment 18•17 years ago
|
||
Comment on attachment 312447 [details]
debug log
This attachment seems to contain characters that aren't valid UTF-8.
Do you have the utf8 flag set on?
Any other idea as to how invalid UTF-8 could have gotten here? (Is the database correctly recoded?)
![]() |
Reporter | |
Comment 19•17 years ago
|
||
(In reply to comment #18)
> Do you have the utf8 flag set on?
Of course I have.
> Any other idea as to how invalid UTF-8 could have gotten here? (Is the database
> correctly recoded?)
AFAIK, the data is correctly encoded.
![]() |
Reporter | |
Comment 20•17 years ago
|
||
(In reply to comment #19)
> AFAIK, the data is correctly encoded.
If it wasn't correctly encoded, then é, à, etc... would be badly displayed in pages, as they use the UTF-8 encoding.
Assignee | ||
Comment 21•17 years ago
|
||
(In reply to comment #20)
> If it wasn't correctly encoded, then é, à, etc... would be badly displayed in
> pages, as they use the UTF-8 encoding.
As they are. Open up the attachment in Firefox, set Firefox to UTF-8, and look at the XML. It shows up as misencoded for me.
![]() |
Reporter | |
Comment 22•17 years ago
|
||
(In reply to comment #21)
> As they are. Open up the attachment in Firefox, set Firefox to UTF-8, and
> look at the XML. It shows up as misencoded for me.
That's what I got when using webservice. Viewing and saving the bug as XML displays all characters correctly with UTF-8. So the problem occurs somewhere between the server and the client, not in my DB itself.
![]() |
Reporter | |
Comment 23•17 years ago
|
||
Comment on attachment 312412 [details] [diff] [review]
v1
This fixes the problem with UTF-8 characters, but still doesn't fix the problem with upper-ASCII characters, such as é, à, ü. I will clone the bug to fix it separately and mark it as a blocker. r=LpSolit
Attachment #312412 -
Flags: review?(LpSolit) → review+
![]() |
Reporter | |
Updated•17 years ago
|
Flags: approval+
Assignee | ||
Updated•17 years ago
|
Summary: All webservice methods fail if a string has non-ASCII characters in it → All webservice methods fail if a string has multibyte characters in it
![]() |
Reporter | |
Comment 24•17 years ago
|
||
(In reply to comment #23)
> with upper-ASCII characters, such as é, à, ü. I will clone the bug
I filed bug 426899. Please check in this patch, Max. :)
Assignee | ||
Comment 25•17 years ago
|
||
Yeah, I was going to check this in the other day but I had to wait for justdave to do his CVS changes, and then I haven't had the opportunity since then.
Checking in xmlrpc.cgi;
/cvsroot/mozilla/webtools/bugzilla/xmlrpc.cgi,v <-- xmlrpc.cgi
new revision: 1.5; previous revision: 1.4
done
Checking in Bugzilla/WebService.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/WebService.pm,v <-- WebService.pm
new revision: 1.7; previous revision: 1.6
done
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•