Closed Bug 1285703 Opened 8 years ago Closed 8 years ago

Unify the type of incoming messages for |def filterwrite(messages)| in v-c-t/hgext/reviewboard/client.py

Categories

(MozReview Graveyard :: Integration: Mercurial, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kikuo, Assigned: kikuo)

Details

Attachments

(2 files)

Exception 'UnicodeDecodeError' is raised when UNICODE_STRING.startswith(STRING) if STRING contains values not in range(128). Should find out where the unicode string comes from and kill it.

See https://github.com/mozilla/version-control-tools/pull/8
The key/value in res from [1] are all in unicode, e.g.
=======
{u'display': [],
 u'parentrrid': u'42169',
 u'rburl': u'https://reviewboard.mozilla.org',
 u'reviewid': u'bz://1259355/kikuo',
 u'reviewrequests': {u'42169': {u'public': False, u'status': u'pending'},
                     u'42171': {u'node': u'c5e04987f16ad3d5ccd0dd7cd9d0dce21f3061be',
                                u'public': False,
                                u'reviewers': [u'daoshengmu'],
                                u'status': u'pending'}}}
=======

This resulted in generating unicode 'reviewurl'/'parentreviewurl' and may cause the exception mentioned above.

[1] https://hg.mozilla.org/hgcustom/version-control-tools/file/89bb283aec74/hgext/reviewboard/client.py#l576
Hi gps,

I've narrowed down the cause and provide a patch.
I don't think we can change the generated response, but I can re-encode the value when assigning to other objects.
It would be appreciated if you could give me some advice. Thanks
Attachment #8769415 - Flags: feedback?(gps)
Comment on attachment 8769415 [details] [diff] [review]
Encode some fields of resposnse in utf-8 to avoid UnicodeDecodeError while filtering messages on platforms where specific system language is applied.

Review of attachment 8769415 [details] [diff] [review]:
-----------------------------------------------------------------

Good catch. This looks like the right approach. And it looks like there are more instances of "res" in this code that also need updated.

The only real question is which encoding to use. I think things coming from the server should be UTF-8. So that should be fine.
Attachment #8769415 - Flags: feedback?(gps) → feedback+
(In reply to Kilik Kuo [:kikuo] from comment #4)
> Created attachment 8770657 [details]
> Bug 1285703 - Re-encode the response from server to utf-8 byte-stream.
> 
> Review commit: https://reviewboard.mozilla.org/r/64072/diff/#index_header
> See other reviews: https://reviewboard.mozilla.org/r/64072/

I've tested this fake data, IMHO I think that's complicated enough.
====
res = {
u'display': [],
u'parentrrid': u'42169',
u'rburl': u'https://reviewboard.mozilla.org',
u'reviewid': [u'bz://1259355/kikuo', [u'kgb', {2: u'fbi'}, u'taiwan'], {u'oo' : [u'some']}],
u'reviewrequests': {u'42169': {u'public': False, u'status': u'pending'},
                    u'42171': {u'node': u'c5e04987f16ad3d5ccd0dd7cd9d0dce21f3061be',
                               u'private': False,
                               u'reviewers': [u'daoshengmu',u'GDSF', {u"a":u"bb"}],
                               u'status': u'pending'}},
u'xxx': {u'42169': {u'public': False, u'status': [{1: u'h', u'2': True}, u'executing']},
                    u'42171': [{u'node': u'c5e04987f16ad3d5ccd0dd7cd9d0dce21f3061be'}],
                    u'public': [False, [{u'innerK': u'innerV'}]],
                    u'reviewers': [u'daoshengmu', {3:u"bb", u'4':[True, u'None']}],
                    u'status': [u'pending', [u'GGG', u'DDD']],},
u'lru': [u'hello', u'I', {u'like': u'to', u'test': [u'something']}]}
===
Attachment #8770657 - Flags: review?(gps)
Comment on attachment 8770657 [details]
Bug 1285703 - Re-encode the response from server to utf-8 byte-stream.

https://reviewboard.mozilla.org/r/64072/#review62106

::: hgext/reviewboard/client.py:580
(Diff revision 1)
> +    def reencoderesponseinplace(response):
> +        """
> +        tuple is not handled by now.
> +        """
> +        def dicthandler(dicRes):
> +            for k, v in dicRes.items():
> +                nk = k.encode('utf-8') if isinstance(k, unicode) else k
> +                dicRes[nk] = dicRes.pop(k)
> +
> +                if isinstance(v, unicode):
> +                    dicRes[nk] = v.encode('utf-8')
> +                else:
> +                    reencoderesponseinplace(v)
> +
> +        def listhandler(lstRes):
> +            for i in lstRes:
> +                if isinstance(i, unicode):
> +                    lstRes[lstRes.index(i)] = i.encode('utf-8')
> +                else:
> +                    reencoderesponseinplace(i)
> +
> +        if isinstance(response, dict):
> +            dicthandler(response)
> +        elif isinstance(response, list):
> +            listhandler(response)

Please extract this to a module-level function, ideally as part of `hgext/reviewboard/hgrb/util.py`.

::: hgext/reviewboard/client.py:584
(Diff revision 1)
> +
> +    def reencoderesponseinplace(response):
> +        """
> +        tuple is not handled by now.
> +        """
> +        def dicthandler(dicRes):

Nit: don't use camelCase in this file.

We tend to use Mercurial's style in this file which is (sadly) lowercase (no camels and no underscores). I find that difficult to read, especially if you don't know English well. But that's what Mercurial does :/

::: hgext/reviewboard/client.py:597
(Diff revision 1)
> +                    reencoderesponseinplace(v)
> +
> +        def listhandler(lstRes):
> +            for i in lstRes:
> +                if isinstance(i, unicode):
> +                    lstRes[lstRes.index(i)] = i.encode('utf-8')

This is incorrect for the case where the list contains duplicate elements.

You'll probably want to use the `enumerate()` function to iterate over the index and list value. Or, `for i in range(len(lstRes))`.
Comment on attachment 8770657 [details]
Bug 1285703 - Re-encode the response from server to utf-8 byte-stream.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64072/diff/1-2/
Attachment #8770657 - Flags: review?(gps)
https://reviewboard.mozilla.org/r/64072/#review62106

Hi gps, thanks for the review, I've addressed these issues.

I also tried to run mozreview tests locally in a virtual env(for python2.7.9+), but can't accomplish it because a warning messageg "Docker is not available" when I ran "./create-test-environment" (but I installed it!).
Do we have try server for mozreivew so that I could run these test remotely ?
https://reviewboard.mozilla.org/r/64072/#review62106

It is a giant pain to run the Docker tests. We have CI for them, but not a "try" infrastructure, sadly. I'll run the tests before I grant review.
Attachment #8770657 - Flags: review?(gps) → review+
Comment on attachment 8770657 [details]
Bug 1285703 - Re-encode the response from server to utf-8 byte-stream.

https://reviewboard.mozilla.org/r/64072/#review63216

The tests pass. I'll check this in for you.
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/61904ce1473a
Re-encode the response from server to utf-8 byte-stream. r=gps
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
(In reply to Gregory Szorc [:gps] from comment #10)
> Comment on attachment 8770657 [details]
> Bug 1285703 - Re-encode the response from server to utf-8 byte-stream.
> 
> https://reviewboard.mozilla.org/r/64072/#review63216
> 
> The tests pass. I'll check this in for you.

Thanks a lot : )
Assignee: nobody → kikuo
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: