Non-ASCII characters don't export well in comments

RESOLVED FIXED

Status

RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: jdm, Assigned: sfink)

Tracking

Details

Attachments

(2 attachments, 3 obsolete attachments)

(Reporter)

Description

6 years ago
From bug 687724 comment 15:

 Nicholas Nethercote [:njn] 2012-06-25 23:19:00 PDT

> │ 
> ├──ÃÃ
> ‚¢Ã‚”€5,514,760 B (06.21%) --
> top(http://www.mozilla.org/en-US/, id=6)

Wow, |hg bzexport| doesn't handle non-ASCII chars well.  That's meant to look like this:

│  ├───5,514,760 B (06.21%) -- top(http://www.mozilla.org/en-US/, id=6)
│  │   ├──4,963,504 B (05.59%) -- active/window(http://www.mozilla.org/en-US/)
│  │   │  ├──3,809,744 B (04.29%) ++ js/compartment(http://www.mozilla.org/en-US/)
│  │   │  ├────642,400 B (00.72%) ++ layout
│  │   │  ├────299,072 B (00.34%) ++ dom
│  │   │  └────212,288 B (00.24%) ── style-sheets
We should see if the data we get back from ui.edit is sane:
http://hg.mozilla.org/users/tmielczarek_mozilla.com/bzexport/file/a8ebbe6e3ddc/bzexport.py#l710

If so, this might be an encoding issue when passing the data through BzAPI.
(Reporter)

Comment 2

6 years ago
The output of ui.edit appears to be sane, based on the contents of last_bzexport.txt.
(Reporter)

Comment 3

6 years ago
Gerv, this looks like it might be a bzapi problem. I don't see any instructions in the docs about encoding of comment data (the text field itself is described as holding 'plain text').
OK. This works locally, but not on BMO :-|

https://api-dev.bugzilla.mozilla.org/1.0/bug/768342/comment (this bug) shows gibberish and then more gibberish.

The /comment call is backed by XML-RPC. Doing a direct XML-RPC call to BMO results in the right box-drawing Unicode characters:


"Wow, |hg bzexport| doesn't handle non-ASCII chars well.  That's meant to look like this:

\x{2502}  \x{251c}\x{2500}\x{2500}\x{2500}5,514,760 B (06.21%) -- top(http://www.mozilla.org/en-US/, id=6)
..."

So it looks like a problem on the BzAPI server.

Gerv
No, I lie: it does work on BMO with JSON:

https://api-dev.bugzilla.mozilla.org/1.0/bug/768342/comment?content-type=application/json

Seems like there's possibly a problem with the YAML HTML serializer, but I assume you are not parsing the output of that? :-)

The default encoding for JSON is UTF-8, which is what that data is.

Can the developer of bzexport please provide a BzAPI URL on api-dev.bugzilla.mozilla.org which is not returning the data in the correct form they expect? Once I have such a URL, I can debug the problem.

Gerv
(Reporter)

Comment 6

6 years ago
I'm not sure precisely what you're asking for; the output from viewing comments is always consistent (ie. bzapi shows us the same thing we see when viewing bugzilla directly). It's the comment addition stage that gets messed up - I am printing out the comment text right before we make the API call to create a new attachment, and I see the correct ASCII characters. I look at the resulting comment in bugzilla, the characters do not show up the same way they did in my terminal.

I can replicate this by making an mq entry, adding

[bzexport]
username = ...
password = ...
bugzilla = https://landfill.bugzilla.org/bzapi_sandbox/
api_server = https://api-dev.bugzilla.mozilla.org/test/latest/

to my .hgrc, running |hg bzexport -e|, and filling in a valid landfill bug and a comment that consists of some data copy+pasted from about:memory. We're POSTing to bug/NNN/attachment, and setting the comment data to be the desired comment.
Oh, I see. I got the impression that the bug was about fetching comments via BzAPI, not posting them. Let me look again.

Gerv
│  ├───5,514,760 B (06.21%) -- top(http://www.mozilla.org/en-US/, id=6)
│  │   ├──4,963,504 B (05.59%) -- active/window(http://www.mozilla.org/en-US/)
│  │   │  ├──3,809,744 B (04.29%) ++
js/compartment(http://www.mozilla.org/en-US/)
│  │   │  ├────642,400 B (00.72%) ++ layout
│  │   │  ├────299,072 B (00.34%) ++ dom
│  │   │  └────212,288 B (00.24%) ── style-sheets

Gerv
Comment 8 was posted to this bug using BzAPI, as follows:

Method: POST
URL: https://api-dev.bugzilla.mozilla.org/1.0/bug/768342/comment?username=gerv@mozilla.org&password=XXXXXXX&content-type=application/json
Headers:
Content-Type: application/json
Accept: application/json
Body:
{
  "text": "│  ├───5,514,760 B (06.21%) -- top(http://www.mozilla.org/en-US/, id=6)\n│  │   ├──4,963,504 B (05.59%) -- active/window(http://www.mozilla.org/en-US/)\n│  │   │  ├──3,809,744 B (04.29%) ++\njs/compartment(http://www.mozilla.org/en-US/)\n│  │   │  ├────642,400 B (00.72%) ++ layout\n│  │   │  ├────299,072 B (00.34%) ++ dom\n│  │   │  └────212,288 B (00.24%) ── style-sheets\n\nGerv",
  "is_private": false
}

So as far as I can see, it works. If bzexport is doing something different, you'll need to tell me what :-)

Gerv
(Assignee)

Comment 10

6 years ago
Created attachment 661871 [details] [diff] [review]
Allow unicode bug comments

I seem to be getting slightly different behavior from what has been described. When I inject unicode characters, I get encoding failure exceptions rather than mangled comments.

This patch allows me to post to landfill successfully. See https://landfill.bugzilla.org/bzapi_sandbox//show_bug.cgi?id=11223 for example.

Note that I knew nothing whatsoever of Python's Unicode support going into this, and the whole situation still seems totally crazy -- or at least, unfortunate -- to me. I don't know if this needs to be tested with various Python and mercurial versions, or what. From what I see, mercurial's internals aren't terribly unicode-friendly. (Or rather, mercurial together with python's default encoding of 'ascii' aren't unicode-friendly.)
Attachment #661871 - Flags: review?(ted.mielczarek)
(Assignee)

Updated

6 years ago
Assignee: nobody → sphink
I think there's a reason that unicode->string and string->bytes in Python 3.
I suspect you can probably do this without monkeypatching. Instead of passing the data directly to ui.edit, pass the bytes you'd get from unicode.decodeing it into utf-8.
(Assignee)

Comment 13

6 years ago
Created attachment 661884 [details] [diff] [review]
Allow unicode bug comments
Attachment #661884 - Flags: review?(ted.mielczarek)
(Assignee)

Comment 14

6 years ago
Created attachment 661885 [details] [diff] [review]
Allow unicode bug comments

Hm. I thought I had tried that.

Well, not decodeing. The data is already decoded into a Python unicode string, and decoding it again will fail. (As will attempting to write the unicode string into a file with the default encoding of ascii.)

encodeing, on the other hand, seems to work. I don't know what I was trying to do. Or maybe I was getting confused by the later failures or something.
Attachment #661885 - Flags: review?(ted.mielczarek)
(Assignee)

Updated

6 years ago
Attachment #661871 - Attachment is obsolete: true
Attachment #661871 - Flags: review?(ted.mielczarek)
(Assignee)

Updated

6 years ago
Attachment #661884 - Attachment is obsolete: true
Attachment #661884 - Flags: review?(ted.mielczarek)
(Assignee)

Comment 15

6 years ago
Created attachment 661887 [details] [diff] [review]
Allow unicode bug comments

Sorry for the bugspam. Thinking about it, it bothered me that I was changing from a decoded unicode string to an encoded string via the ui.edit call.
Attachment #661887 - Flags: review?(ted.mielczarek)
(Assignee)

Updated

6 years ago
Attachment #661885 - Attachment is obsolete: true
Attachment #661885 - Flags: review?(ted.mielczarek)
Comment on attachment 661887 [details] [diff] [review]
Allow unicode bug comments

Review of attachment 661887 [details] [diff] [review]:
-----------------------------------------------------------------

Better!
Attachment #661887 - Flags: review?(ted.mielczarek) → review+
(Assignee)

Updated

6 years ago
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
I updated to 8783ef37edb0 and just tried to bzexport a patch with some Unicode in the comments (just a test string "├──").  As soon as I exited Vim (I was using -e) it barfed:

** unknown exception encountered, please report by visiting
**  http://mercurial.selenic.com/wiki/BugTracker
** Python 2.7.3 (default, Aug  1 2012, 05:14:39) [GCC 4.6.3]
** Mercurial Distributed SCM (version 2.0.2)
** Extensions loaded: mq, rebase, convert, hgk, extdiff, progress, record, bzexport, color
Traceback (most recent call last):
  File "/usr/bin/hg", line 38, in <module>
    mercurial.dispatch.run()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 27, in run
    sys.exit(dispatch(request(sys.argv[1:])))
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 64, in dispatch
    return _runcatch(req)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 87, in _runcatch
    return _dispatch(req)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 684, in _dispatch
    cmdpats, cmdoptions)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 466, in runcommand
    ret = _runcommand(ui, options, cmd, d)
  File "/usr/lib/python2.7/dist-packages/mercurial/extensions.py", line 184, in wrap
    return wrapper(origfn, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/hgext/color.py", line 373, in colorcmd
    return orig(ui_, opts, cmd, cmdfunc)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 738, in _runcommand
    return checkargs()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 692, in checkargs
    return cmdfunc()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 681, in <lambda>
    d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 348, in __call__
    util.checksignature(self.fn)(ui, *args, **opts)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/extensions.py", line 139, in wrap
    util.checksignature(origfn), *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/hgext/mq.py", line 3229, in mqcommand
    return orig(ui, repo, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check
    return func(*args, **kwargs)
  File "/home/njn/moz/bzexport/__init__.py", line 771, in bzexport
    values = edit_form(ui, repo, values, 'existing_bug_template')
  File "/home/njn/moz/bzexport/__init__.py", line 283, in edit_form
    saved = savefile(repo, "last_bzexport.txt", new)
  File "/home/njn/moz/bzexport/__init__.py", line 197, in savefile
    fp.write(text)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1243-1245: ordinal not in range(128)
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 64, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python2.7/dist-packages/mercurial/demandimport.py", line 95, in _demandimport
    return _import(name, globals, locals, fromlist, level)
  File "/usr/lib/python2.7/dist-packages/apport/__init__.py", line 1, in <module>
    from apport.report import Report
  File "/usr/lib/python2.7/dist-packages/mercurial/demandimport.py", line 114, in _demandimport
    mod = _origimport(name, globals, locals)
  File "/usr/lib/python2.7/dist-packages/apport/report.py", line 155, in <module>
    class Report(problem_report.ProblemReport):
  File "/usr/lib/python2.7/dist-packages/mercurial/demandimport.py", line 86, in __getattribute__
    self._load()
  File "/usr/lib/python2.7/dist-packages/mercurial/demandimport.py", line 58, in _load
    mod = _origimport(head, globals, locals)
  File "/usr/lib/python2.7/dist-packages/problem_report.py", line 93, in <module>
    class ProblemReport(UserDict):
TypeError: Error when calling the metaclass bases
    module.__init__() takes at most 2 arguments (3 given)

Original exception was:
Traceback (most recent call last):
  File "/usr/bin/hg", line 38, in <module>
    mercurial.dispatch.run()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 27, in run
    sys.exit(dispatch(request(sys.argv[1:])))
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 64, in dispatch
    return _runcatch(req)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 87, in _runcatch
    return _dispatch(req)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 684, in _dispatch
    cmdpats, cmdoptions)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 466, in runcommand
    ret = _runcommand(ui, options, cmd, d)
  File "/usr/lib/python2.7/dist-packages/mercurial/extensions.py", line 184, in wrap
    return wrapper(origfn, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/hgext/color.py", line 373, in colorcmd
    return orig(ui_, opts, cmd, cmdfunc)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 738, in _runcommand
    return checkargs()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 692, in checkargs
    return cmdfunc()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 681, in <lambda>
    d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 348, in __call__
    util.checksignature(self.fn)(ui, *args, **opts)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/extensions.py", line 139, in wrap
    util.checksignature(origfn), *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/hgext/mq.py", line 3229, in mqcommand
    return orig(ui, repo, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check
    return func(*args, **kwargs)
  File "/home/njn/moz/bzexport/__init__.py", line 771, in bzexport
    values = edit_form(ui, repo, values, 'existing_bug_template')
  File "/home/njn/moz/bzexport/__init__.py", line 283, in edit_form
    saved = savefile(repo, "last_bzexport.txt", new)
  File "/home/njn/moz/bzexport/__init__.py", line 197, in savefile
    fp.write(text)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1243-1245: ordinal not in range(128)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 20

6 years ago
njn: hg --version?

I suppose I ought to test the range of versions I have installed.
Mercurial Distributed SCM (version 2.0.2)
(Assignee)

Comment 22

6 years ago
Created attachment 663547 [details] [diff] [review]
Followup to fix writing of the savefile

That failure has nothing to do with the hg version, and everything to do with the last-minute fix I added that couldn't possibly break anything. The fix was correct, but it exposed another unicode-handling bug. (And it fails 100% of the time if you have unicode in your comments, so you know I didn't actually test my final final version.)
Attachment #663547 - Flags: review?(ted.mielczarek)
Attachment #663547 - Flags: review?(ted.mielczarek) → review+
(Assignee)

Updated

6 years ago
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
(Assignee)

Comment 24

6 years ago
See also bug 822756
Product: Other Applications → Developer Services
You need to log in before you can comment on or make changes to this bug.