Closed
Bug 768342
Opened 12 years ago
Closed 12 years ago
Non-ASCII characters don't export well in comments
Categories
(Developer Services :: Mercurial: bzexport, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jdm, Assigned: sfink)
Details
Attachments
(2 files, 3 obsolete files)
4.09 KB,
patch
|
ted
:
review+
|
Details | Diff | Splinter Review |
1.50 KB,
patch
|
ted
:
review+
|
Details | Diff | Splinter Review |
From bug 687724 comment 15: Nicholas Nethercote [:njn] 2012-06-25 23:19:00 PDT > ÃÂâÃÂÃÂÃÂà> ÃÂâÃÂÃÂÃÂÃÂÃÂâÃÂÃÂÃÂÃÂÃÂâÃÂÃÂÃÂÃÂÃÂà > ¢ÃÂÃÂÃÂÃÂ5,514,760 B (06.21%) -- > top(http://www.mozilla.org/en-US/, id=6) Wow, |hg bzexport| doesn't handle non-ASCII chars well. That's meant to look like this: │ ├───5,514,760 B (06.21%) -- top(http://www.mozilla.org/en-US/, id=6) │ │ ├──4,963,504 B (05.59%) -- active/window(http://www.mozilla.org/en-US/) │ │ │ ├──3,809,744 B (04.29%) ++ js/compartment(http://www.mozilla.org/en-US/) │ │ │ ├────642,400 B (00.72%) ++ layout │ │ │ ├────299,072 B (00.34%) ++ dom │ │ │ └────212,288 B (00.24%) ── style-sheets
Comment 1•12 years ago
|
||
We should see if the data we get back from ui.edit is sane: http://hg.mozilla.org/users/tmielczarek_mozilla.com/bzexport/file/a8ebbe6e3ddc/bzexport.py#l710 If so, this might be an encoding issue when passing the data through BzAPI.
Reporter | ||
Comment 2•12 years ago
|
||
The output of ui.edit appears to be sane, based on the contents of last_bzexport.txt.
Reporter | ||
Comment 3•12 years ago
|
||
Gerv, this looks like it might be a bzapi problem. I don't see any instructions in the docs about encoding of comment data (the text field itself is described as holding 'plain text').
Comment 4•12 years ago
|
||
OK. This works locally, but not on BMO :-| https://api-dev.bugzilla.mozilla.org/1.0/bug/768342/comment (this bug) shows gibberish and then more gibberish. The /comment call is backed by XML-RPC. Doing a direct XML-RPC call to BMO results in the right box-drawing Unicode characters: "Wow, |hg bzexport| doesn't handle non-ASCII chars well. That's meant to look like this: \x{2502} \x{251c}\x{2500}\x{2500}\x{2500}5,514,760 B (06.21%) -- top(http://www.mozilla.org/en-US/, id=6) ..." So it looks like a problem on the BzAPI server. Gerv
Comment 5•12 years ago
|
||
No, I lie: it does work on BMO with JSON: https://api-dev.bugzilla.mozilla.org/1.0/bug/768342/comment?content-type=application/json Seems like there's possibly a problem with the YAML HTML serializer, but I assume you are not parsing the output of that? :-) The default encoding for JSON is UTF-8, which is what that data is. Can the developer of bzexport please provide a BzAPI URL on api-dev.bugzilla.mozilla.org which is not returning the data in the correct form they expect? Once I have such a URL, I can debug the problem. Gerv
Reporter | ||
Comment 6•12 years ago
|
||
I'm not sure precisely what you're asking for; the output from viewing comments is always consistent (ie. bzapi shows us the same thing we see when viewing bugzilla directly). It's the comment addition stage that gets messed up - I am printing out the comment text right before we make the API call to create a new attachment, and I see the correct ASCII characters. I look at the resulting comment in bugzilla, the characters do not show up the same way they did in my terminal. I can replicate this by making an mq entry, adding [bzexport] username = ... password = ... bugzilla = https://landfill.bugzilla.org/bzapi_sandbox/ api_server = https://api-dev.bugzilla.mozilla.org/test/latest/ to my .hgrc, running |hg bzexport -e|, and filling in a valid landfill bug and a comment that consists of some data copy+pasted from about:memory. We're POSTing to bug/NNN/attachment, and setting the comment data to be the desired comment.
Comment 7•12 years ago
|
||
Oh, I see. I got the impression that the bug was about fetching comments via BzAPI, not posting them. Let me look again. Gerv
Comment 8•12 years ago
|
||
│ ├───5,514,760 B (06.21%) -- top(http://www.mozilla.org/en-US/, id=6) │ │ ├──4,963,504 B (05.59%) -- active/window(http://www.mozilla.org/en-US/) │ │ │ ├──3,809,744 B (04.29%) ++ js/compartment(http://www.mozilla.org/en-US/) │ │ │ ├────642,400 B (00.72%) ++ layout │ │ │ ├────299,072 B (00.34%) ++ dom │ │ │ └────212,288 B (00.24%) ── style-sheets Gerv
Comment 9•12 years ago
|
||
Comment 8 was posted to this bug using BzAPI, as follows: Method: POST URL: https://api-dev.bugzilla.mozilla.org/1.0/bug/768342/comment?username=gerv@mozilla.org&password=XXXXXXX&content-type=application/json Headers: Content-Type: application/json Accept: application/json Body: { "text": "│ ├───5,514,760 B (06.21%) -- top(http://www.mozilla.org/en-US/, id=6)\n│ │ ├──4,963,504 B (05.59%) -- active/window(http://www.mozilla.org/en-US/)\n│ │ │ ├──3,809,744 B (04.29%) ++\njs/compartment(http://www.mozilla.org/en-US/)\n│ │ │ ├────642,400 B (00.72%) ++ layout\n│ │ │ ├────299,072 B (00.34%) ++ dom\n│ │ │ └────212,288 B (00.24%) ── style-sheets\n\nGerv", "is_private": false } So as far as I can see, it works. If bzexport is doing something different, you'll need to tell me what :-) Gerv
Assignee | ||
Comment 10•12 years ago
|
||
I seem to be getting slightly different behavior from what has been described. When I inject unicode characters, I get encoding failure exceptions rather than mangled comments. This patch allows me to post to landfill successfully. See https://landfill.bugzilla.org/bzapi_sandbox//show_bug.cgi?id=11223 for example. Note that I knew nothing whatsoever of Python's Unicode support going into this, and the whole situation still seems totally crazy -- or at least, unfortunate -- to me. I don't know if this needs to be tested with various Python and mercurial versions, or what. From what I see, mercurial's internals aren't terribly unicode-friendly. (Or rather, mercurial together with python's default encoding of 'ascii' aren't unicode-friendly.)
Attachment #661871 -
Flags: review?(ted.mielczarek)
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → sphink
Comment 11•12 years ago
|
||
I think there's a reason that unicode->string and string->bytes in Python 3.
Comment 12•12 years ago
|
||
I suspect you can probably do this without monkeypatching. Instead of passing the data directly to ui.edit, pass the bytes you'd get from unicode.decodeing it into utf-8.
Assignee | ||
Comment 13•12 years ago
|
||
Attachment #661884 -
Flags: review?(ted.mielczarek)
Assignee | ||
Comment 14•12 years ago
|
||
Hm. I thought I had tried that. Well, not decodeing. The data is already decoded into a Python unicode string, and decoding it again will fail. (As will attempting to write the unicode string into a file with the default encoding of ascii.) encodeing, on the other hand, seems to work. I don't know what I was trying to do. Or maybe I was getting confused by the later failures or something.
Attachment #661885 -
Flags: review?(ted.mielczarek)
Assignee | ||
Updated•12 years ago
|
Attachment #661871 -
Attachment is obsolete: true
Attachment #661871 -
Flags: review?(ted.mielczarek)
Assignee | ||
Updated•12 years ago
|
Attachment #661884 -
Attachment is obsolete: true
Attachment #661884 -
Flags: review?(ted.mielczarek)
Assignee | ||
Comment 15•12 years ago
|
||
Sorry for the bugspam. Thinking about it, it bothered me that I was changing from a decoded unicode string to an encoded string via the ui.edit call.
Attachment #661887 -
Flags: review?(ted.mielczarek)
Assignee | ||
Updated•12 years ago
|
Attachment #661885 -
Attachment is obsolete: true
Attachment #661885 -
Flags: review?(ted.mielczarek)
Comment 16•12 years ago
|
||
Comment on attachment 661887 [details] [diff] [review] Allow unicode bug comments Review of attachment 661887 [details] [diff] [review]: ----------------------------------------------------------------- Better!
Attachment #661887 -
Flags: review?(ted.mielczarek) → review+
Assignee | ||
Comment 17•12 years ago
|
||
http://bitbucket.org/sfink/bzexport/rev/8783ef37edb0
Assignee | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 18•12 years ago
|
||
Oops. Pushed to wrong repo. http://hg.mozilla.org/users/tmielczarek_mozilla.com/bzexport/rev/8783ef37edb0
Comment 19•12 years ago
|
||
I updated to 8783ef37edb0 and just tried to bzexport a patch with some Unicode in the comments (just a test string "├──"). As soon as I exited Vim (I was using -e) it barfed: ** unknown exception encountered, please report by visiting ** http://mercurial.selenic.com/wiki/BugTracker ** Python 2.7.3 (default, Aug 1 2012, 05:14:39) [GCC 4.6.3] ** Mercurial Distributed SCM (version 2.0.2) ** Extensions loaded: mq, rebase, convert, hgk, extdiff, progress, record, bzexport, color Traceback (most recent call last): File "/usr/bin/hg", line 38, in <module> mercurial.dispatch.run() File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 27, in run sys.exit(dispatch(request(sys.argv[1:]))) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 64, in dispatch return _runcatch(req) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 87, in _runcatch return _dispatch(req) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 684, in _dispatch cmdpats, cmdoptions) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 466, in runcommand ret = _runcommand(ui, options, cmd, d) File "/usr/lib/python2.7/dist-packages/mercurial/extensions.py", line 184, in wrap return wrapper(origfn, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/hgext/color.py", line 373, in colorcmd return orig(ui_, opts, cmd, cmdfunc) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 738, in _runcommand return checkargs() File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 692, in checkargs return cmdfunc() File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 681, in <lambda> d = lambda: util.checksignature(func)(ui, *args, **cmdoptions) File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 348, in __call__ util.checksignature(self.fn)(ui, *args, **opts) File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/mercurial/extensions.py", line 139, in wrap util.checksignature(origfn), *args, **kwargs) File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/hgext/mq.py", line 3229, in mqcommand return orig(ui, repo, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check return func(*args, **kwargs) File "/home/njn/moz/bzexport/__init__.py", line 771, in bzexport values = edit_form(ui, repo, values, 'existing_bug_template') File "/home/njn/moz/bzexport/__init__.py", line 283, in edit_form saved = savefile(repo, "last_bzexport.txt", new) File "/home/njn/moz/bzexport/__init__.py", line 197, in savefile fp.write(text) UnicodeEncodeError: 'ascii' codec can't encode characters in position 1243-1245: ordinal not in range(128) Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 64, in apport_excepthook from apport.fileutils import likely_packaged, get_recent_crashes File "/usr/lib/python2.7/dist-packages/mercurial/demandimport.py", line 95, in _demandimport return _import(name, globals, locals, fromlist, level) File "/usr/lib/python2.7/dist-packages/apport/__init__.py", line 1, in <module> from apport.report import Report File "/usr/lib/python2.7/dist-packages/mercurial/demandimport.py", line 114, in _demandimport mod = _origimport(name, globals, locals) File "/usr/lib/python2.7/dist-packages/apport/report.py", line 155, in <module> class Report(problem_report.ProblemReport): File "/usr/lib/python2.7/dist-packages/mercurial/demandimport.py", line 86, in __getattribute__ self._load() File "/usr/lib/python2.7/dist-packages/mercurial/demandimport.py", line 58, in _load mod = _origimport(head, globals, locals) File "/usr/lib/python2.7/dist-packages/problem_report.py", line 93, in <module> class ProblemReport(UserDict): TypeError: Error when calling the metaclass bases module.__init__() takes at most 2 arguments (3 given) Original exception was: Traceback (most recent call last): File "/usr/bin/hg", line 38, in <module> mercurial.dispatch.run() File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 27, in run sys.exit(dispatch(request(sys.argv[1:]))) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 64, in dispatch return _runcatch(req) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 87, in _runcatch return _dispatch(req) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 684, in _dispatch cmdpats, cmdoptions) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 466, in runcommand ret = _runcommand(ui, options, cmd, d) File "/usr/lib/python2.7/dist-packages/mercurial/extensions.py", line 184, in wrap return wrapper(origfn, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/hgext/color.py", line 373, in colorcmd return orig(ui_, opts, cmd, cmdfunc) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 738, in _runcommand return checkargs() File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 692, in checkargs return cmdfunc() File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 681, in <lambda> d = lambda: util.checksignature(func)(ui, *args, **cmdoptions) File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 348, in __call__ util.checksignature(self.fn)(ui, *args, **opts) File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/mercurial/extensions.py", line 139, in wrap util.checksignature(origfn), *args, **kwargs) File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/hgext/mq.py", line 3229, in mqcommand return orig(ui, repo, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 458, in check return func(*args, **kwargs) File "/home/njn/moz/bzexport/__init__.py", line 771, in bzexport values = edit_form(ui, repo, values, 'existing_bug_template') File "/home/njn/moz/bzexport/__init__.py", line 283, in edit_form saved = savefile(repo, "last_bzexport.txt", new) File "/home/njn/moz/bzexport/__init__.py", line 197, in savefile fp.write(text) UnicodeEncodeError: 'ascii' codec can't encode characters in position 1243-1245: ordinal not in range(128)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 20•12 years ago
|
||
njn: hg --version? I suppose I ought to test the range of versions I have installed.
Comment 21•12 years ago
|
||
Mercurial Distributed SCM (version 2.0.2)
Assignee | ||
Comment 22•12 years ago
|
||
That failure has nothing to do with the hg version, and everything to do with the last-minute fix I added that couldn't possibly break anything. The fix was correct, but it exposed another unicode-handling bug. (And it fails 100% of the time if you have unicode in your comments, so you know I didn't actually test my final final version.)
Attachment #663547 -
Flags: review?(ted.mielczarek)
Updated•12 years ago
|
Attachment #663547 -
Flags: review?(ted.mielczarek) → review+
Assignee | ||
Comment 23•12 years ago
|
||
http://hg.mozilla.org/users/tmielczarek_mozilla.com/bzexport/rev/99e4f75682d1
Assignee | ||
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 24•12 years ago
|
||
See also bug 822756
Updated•10 years ago
|
Product: Other Applications → Developer Services
You need to log in
before you can comment on or make changes to this bug.
Description
•