Closed Bug 415796 Opened 18 years ago Closed 17 years ago

All webservice methods fail if a string has multibyte characters in it

Categories

(Bugzilla :: WebService, defect, P1)

3.1.3

Tracking

()

RESOLVED FIXED
Bugzilla 3.2

People

(Reporter: LpSolit, Assigned: mkanat)

References

Details

Attachments

(2 files, 1 obsolete file)

I just tested the existing Bug.get and the new User.get webservice methods in 3.1.3+ (but I suspect the same problem exists in 3.0.3+), and they both fail if a string has non-ASCII characters in it (such as "Frédéric"). Despite the returned XML file starts with <?xml version="1.0" encoding="UTF-8"?>, such strings are mangled, generating the following error: not well-formed (invalid token) at line 1, column 203, byte 203 at /usr/lib/perl5/vendor_perl/5.8.8/i386-linux/XML/Parser.pm line 187 So for some reason, strings we return are not valid UTF-8 strings. Did we forget to turn on a parameter somewhere? Requesting blocking as non-US installations use a lot of non-ASCII characters, in bug summaries, product names, real names, etc..., making webservice unusable.
Flags: blocking3.2?
Flags: blocking3.0.4?
This is definitely a Bugzilla 3.2 blocker. I suspect the same problem does not exist in 3.0, since we changed the utf8 bit in 3.2. We would have to actually check 3.0 before making this a blocker or targeting it at 3.0.
Flags: blocking3.2? → blocking3.2+
Target Milestone: Bugzilla 3.0 → Bugzilla 3.2
OK, I just tested it on 3.0.3 and I cannot reproduce the problem there. So definitely a 3.2-problem only. Good.
Flags: blocking3.0.4?
Version: 3.0.3 → 3.1.3
Marc, any idea how to fix this problem? That's the last blocker we want to fix before releasing 3.1.4.
I looked into this, but I can't seem to get a grip on this. I don't think I can help much here.
Okay, I suppose I'll have to take it then, unless himorin has any ideas.
Assignee: webservice → mkanat
i'll look into this bug. (from now.)
With cvs tip at Feb 7, i've got xml-rpc error message like <?xml version="1.0" encoding="UTF-8"?> <methodResponse>< fault><value><struct><member><name>faultString</name> <value><string>Wide character in subroutine entry at /usr/lib/perl5/site_perl/5.8.5/XMLRPC/Lite.pm line 167.</string></value> </member><member><name>faultCode</name> <value><string>Client</string></value></member></struct></value> </fault></methodResponse>. I'm not sure that this is the same one which LpSolit describes. Bug, it's not related with utf8 string. For method Bug.get, this warning occured with the bug object > $item{'internals'} = $bug; $bug is from new Bugzilla::Bug($bug_id).
ah, i forget to comment about utf8-ed string. when i commented out Bug.get internals, i could get the normal responce from the xml-rpc interface.
as talking at irc, this is not a bug related on utf8=1. so, my comment #7 is not related with this bug.
I've started looking into this. The problem is that internally SOAP is serializing data into Base64 and then deserializing it into byte strings, which turns off the utf8 bit incorrectly. Fixing this behavior is proving to be somewhat difficult.
Flags: blocking3.2+ → blocking3.2-
Flags: blocking3.2- → blocking3.2+
(In reply to comment #10) > I've started looking into this. The problem is that internally SOAP is > serializing data into Base64 and then deserializing it into byte strings, which > turns off the utf8 bit incorrectly. Fixing this behavior is proving to be > somewhat difficult. mkanat, the problem you've described on comment #10 seems to be the same as my comment #7, but not the original LpSolit's. i think you should try with utf8=0.
Priority: -- → P1
Attached patch WIP (obsolete) — Splinter Review
Okay, this fixes the XMLRPC data type serializer (which is the problem, see the bug referenced in the code), but seems to cause some other problem that I haven't figured out yet.
Attached patch v1Splinter Review
This fixes it. The problem was that some data ("internals" in our case) wasn't being passed through SOAP::Data->type, so it was getting auto-typed. This was turning our utf-8 strings into base64. Then, when they were deserialized on the other end (back into strings) they didn't have the utf-8 bit set on them. This was supposed to be fixed in SOAP::Lite 0.71, but it doesn't seem to be. So, I had to subclass XMLRPC::Serializer to fix it. HOWEVER, there's another bug in SOAP::Lite that causes array datatypes to break if you subclass XMLRPC::Serializer, so I had to work around that: http://rt.cpan.org/Ticket/Display.html?id=34515 Finally, this code now throws a warning, but seems to work correctly. I've looked at the raw XML and it's correct. The warning seems to be a bug in SOAP::Lite: http://rt.cpan.org/Ticket/Display.html?id=34515
Attachment #311935 - Attachment is obsolete: true
Attachment #312412 - Flags: review?(LpSolit)
Status: NEW → ASSIGNED
In debug mode, I get (on my local installation): SOAP::Parser::decode: () Can't call method "paramsall" on an undefined value at /usr/bin/XMLRPCsh.pl line 23, <> line 1.
(In reply to comment #15) > In debug mode, I get (on my local installation): Hmm, but it should have printed out the XML before that. Could you attach the XML that it's getting that throws that error? (Also, what version of SOAP::Lite?)
Attached file debug log
Using SOAP::Lite 0.69, XML::Parser 2.34.
Comment on attachment 312447 [details] debug log This attachment seems to contain characters that aren't valid UTF-8. Do you have the utf8 flag set on? Any other idea as to how invalid UTF-8 could have gotten here? (Is the database correctly recoded?)
(In reply to comment #18) > Do you have the utf8 flag set on? Of course I have. > Any other idea as to how invalid UTF-8 could have gotten here? (Is the database > correctly recoded?) AFAIK, the data is correctly encoded.
(In reply to comment #19) > AFAIK, the data is correctly encoded. If it wasn't correctly encoded, then é, à, etc... would be badly displayed in pages, as they use the UTF-8 encoding.
(In reply to comment #20) > If it wasn't correctly encoded, then é, à, etc... would be badly displayed in > pages, as they use the UTF-8 encoding. As they are. Open up the attachment in Firefox, set Firefox to UTF-8, and look at the XML. It shows up as misencoded for me.
(In reply to comment #21) > As they are. Open up the attachment in Firefox, set Firefox to UTF-8, and > look at the XML. It shows up as misencoded for me. That's what I got when using webservice. Viewing and saving the bug as XML displays all characters correctly with UTF-8. So the problem occurs somewhere between the server and the client, not in my DB itself.
Comment on attachment 312412 [details] [diff] [review] v1 This fixes the problem with UTF-8 characters, but still doesn't fix the problem with upper-ASCII characters, such as é, à, ü. I will clone the bug to fix it separately and mark it as a blocker. r=LpSolit
Attachment #312412 - Flags: review?(LpSolit) → review+
Flags: approval+
Blocks: 426899
Summary: All webservice methods fail if a string has non-ASCII characters in it → All webservice methods fail if a string has multibyte characters in it
(In reply to comment #23) > with upper-ASCII characters, such as é, à, ü. I will clone the bug I filed bug 426899. Please check in this patch, Max. :)
Yeah, I was going to check this in the other day but I had to wait for justdave to do his CVS changes, and then I haven't had the opportunity since then. Checking in xmlrpc.cgi; /cvsroot/mozilla/webtools/bugzilla/xmlrpc.cgi,v <-- xmlrpc.cgi new revision: 1.5; previous revision: 1.4 done Checking in Bugzilla/WebService.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/WebService.pm,v <-- WebService.pm new revision: 1.7; previous revision: 1.6 done
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: