Closed
Bug 533890
Opened 15 years ago
Closed 15 years ago
Need to figure out encoding/decoding of patches in qimportbz
Categories
(Developer Services :: Mercurial: qimportbz, defect)
Developer Services
Mercurial: qimportbz
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: robarnold, Assigned: sgautherie)
References
Details
(Keywords: crash)
Attachments
(4 files, 2 obsolete files)
1.86 KB,
patch
|
robarnold
:
review+
|
Details | Diff | Splinter Review |
3.77 KB,
patch
|
robarnold
:
review+
|
Details | Diff | Splinter Review |
795 bytes,
patch
|
Details | Diff | Splinter Review | |
2.27 KB,
patch
|
robarnold
:
review+
|
Details | Diff | Splinter Review |
I thought we had it down but crowder gave me this trace:
$ hg qimport bz:502933
Fetching...done
Parsing...** unknown exception encountered, details follow
** report bug details to http://www.selenic.com/mercurial/bts
** or mercurial@selenic.com
** Mercurial Distributed SCM (version 1.2.1)
** Extensions loaded: extdiff, graphlog, mq, win32text, qimportbz
Traceback (most recent call last):
File "hg", line 27, in <module>
File "mercurial\dispatch.pyc", line 16, in run
File "mercurial\dispatch.pyc", line 25, in dispatch
File "mercurial\dispatch.pyc", line 41, in _runcatch
File "mercurial\dispatch.pyc", line 372, in _dispatch
File "mercurial\dispatch.pyc", line 247, in runcommand
File "mercurial\dispatch.pyc", line 417, in _runcommand
File "mercurial\dispatch.pyc", line 377, in checkargs
File "mercurial\dispatch.pyc", line 371, in <lambda>
File "mercurial\util.pyc", line 718, in check
File "mercurial\extensions.pyc", line 100, in wrap
File "mercurial\util.pyc", line 718, in check
File "c:\Documents and Settings\crowder\qimportbz\__init__.py", line 120, in qimporthook
orig(ui, repo, *files, **opts)
File "mercurial\util.pyc", line 718, in check
File "hgext\mq.pyc", line 1686, in qimport
File "hgext\mq.pyc", line 1600, in qimport
File "mercurial\url.pyc", line 314, in open
File "urllib2.pyc", line 387, in open
File "c:\Documents and Settings\crowder\qimportbz\bzhandler.py", line 63, in bz_response
bug = bz.Bug(self.ui, data)
File "c:\Documents and Settings\crowder\qimportbz\bz.py", line 176, in __init__
self.attachments = [Attachment.parse(self, a) for a in xml.findall("bug/attachment")]
File "c:\Documents and Settings\crowder\qimportbz\bz.py", line 34, in parse
return ctor(bug, node)
File "c:\Documents and Settings\crowder\qimportbz\bz.py", line 72, in __init__
self.data = fp.read().decode('utf-8')
File "encodings\utf_8.pyc", line 16, in decode
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1063149-1063150: invalid data
Assignee | ||
Comment 1•15 years ago
|
||
Another example, with a smaller bug, (on Windows 2000):
{
$ hg qimport bz:531576
"UnicodeDecodeError: 'utf8' codec can't decode bytes in position 2391-2393: invalid data"
}
Assignee | ||
Comment 2•15 years ago
|
||
(In reply to comment #1)
> "UnicodeDecodeError: 'utf8' codec can't decode bytes in position 2391-2393:
Ftr, testing with 'ascii' reports:
"UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 2391: ordinal not in range(128)"
To be explicit,
*that's the 'ã' character in "Dão".
*and that patch is an "ansi" version of the previously attached "utf-8" patch (which decodes correctly).
Assignee | ||
Comment 3•15 years ago
|
||
Can't do much more, as there is no way to know what the actual file character set would be :-<
Assignee | ||
Updated•15 years ago
|
Reporter | ||
Comment 4•15 years ago
|
||
Comment on attachment 420704 [details] [diff] [review]
(Av1) Discard patch file on UnicodeDecodeError, Improve console outputs.
>- self.data = fp.read().decode('utf-8')
>+ try:
>+ self.data = fp.read().decode('utf-8')
>+ except UnicodeDecodeError:
>+ bug.settings.ui.warn("Patch id=%s desc=\"%s\" file was discarded:\n" % (self.id, self.desc))
>+ # Print the exception without its traceback.
>+ import sys
>+ sys.excepthook(sys.exc_info()[0], sys.exc_info()[1], None)
>+ # Can't do better than discard data:
>+ # trying |.decode('utf-8', 'replace')| as a fallback would be too risky
>+ # if user imports the result then forgets to fix it.
>+ self.data = ''
I don't like the import sys in the middle here. Also, can we try decoding in other formats? What is the testcase here and what is its encoding? A latin-1/ascii decoding seems like a common mis-encoding for a patch (we could convert it to utf-8 for them with a warning).
The rest of the changes in the patch are unrelated to this bug (and I'm not sure I like them).
Attachment #420704 -
Flags: review?(tellrob) → review-
Assignee | ||
Comment 5•15 years ago
|
||
Av1, with comment 4 suggestion(s).
(In reply to comment #4)
> I don't like the import sys in the middle here. Also, can we try decoding in
Moved to the start of the method.
> other formats? What is the testcase here and what is its encoding? A
> latin-1/ascii decoding seems like a common mis-encoding for a patch (we could
> convert it to utf-8 for them with a warning).
Testcase is comment 1 and 2.
BugZilla doesn't know what the exact encoding is: that's the issue!
(ascii is a subpart of utf-8 and causes (nor solves) no problem.)
I don't know of a sure way to select the correct ISO-8859-* variant...
Anyway, this patch is meant to stop aborting and let proceed with the other patches.
As a follow-up patch, you could add a charset override feature, for example.
> The rest of the changes in the patch are unrelated to this bug (and I'm not
> sure I like them).
I filed bug 540164...
Attachment #420704 -
Attachment is obsolete: true
Attachment #422005 -
Flags: review?(tellrob)
Reporter | ||
Comment 6•15 years ago
|
||
Comment on attachment 422005 [details] [diff] [review]
(Av2) Discard patch file on UnicodeDecodeError
>From: Serge Gautherie <sgautherie.bz@free.fr>
>
>Bug 533890 - Need to figure out encoding/decoding of patches in qimportbz;
>(Av2) Discard patch file on UnicodeDecodeError.
>
>diff --git a/bz.py b/bz.py
>--- a/bz.py
>+++ b/bz.py
>@@ -77,25 +77,35 @@ class Flag(object):
>
> # Compare by flag name
> def __cmp__(self, other):
> flagorder = ['r', 'sr', 'ui-r', 'a', 'c']
> return cmp(flagorder.index(self.abbrev()), flagorder.index(other.abbrev()))
>
> class Patch(Attachment):
> def __init__(self, bug, node):
>+ import sys
I actually meant that this should be at the top level of the file.
> Attachment.__init__(self, bug, node)
> self.flags = list(sorted(Flag(bug, n) for n in node.findall('flag')))
> rawtext = base64.b64decode(node.find('data').text)
> data = StringIO.StringIO(rawtext)
> filename, message, user, date, branch, nodeid, p1, p2 = patch.extract(bug.settings.ui, data)
> # for some reason, patch.extract writes a temporary file with the diff hunks
> if filename:
> fp = file(filename)
>- self.data = fp.read().decode('utf-8')
>+ try:
>+ self.data = fp.read().decode('utf-8')
>+ except UnicodeDecodeError:
>+ bug.settings.ui.warn("Patch id=%s desc=\"%s\" file was discarded:\n" % (self.id, self.desc))
>+ # Print the exception without its traceback.
>+ sys.excepthook(sys.exc_info()[0], sys.exc_info()[1], None)
>+ # Can't do better than discard data:
>+ # trying |.decode('utf-8', 'replace')| as a fallback would be too risky
>+ # if user imports the result then forgets to fix it.
>+ self.data = ''
How do you feel about prompting the user for the text? We could provide the url for them and throw up an editor via HG's API.
Assignee | ||
Comment 7•15 years ago
|
||
Av2, with comment 6 suggestion(s).
(In reply to comment #6)
> I actually meant that this should be at the top level of the file.
Done, with some reordering too.
> How do you feel about prompting the user for the text? We could provide the url
The qbz user could only guess at what the charset the patch author used is :-|
> for them and throw up an editor via HG's API.
From my experience (but ymmv), the case (does exist but) is rare and of low value:
if one is really interested in the problematic patch, one can either download+fix it "manually" or just ask the author to attach a fixed version (= fix it at the source).
Attachment #422005 -
Attachment is obsolete: true
Attachment #427248 -
Flags: review?(tellrob)
Attachment #422005 -
Flags: review?(tellrob)
Reporter | ||
Comment 8•15 years ago
|
||
(In reply to comment #7)
> Created an attachment (id=427248) [details]
> (Av2a) Discard patch file on UnicodeDecodeError
>
> Av2, with comment 6 suggestion(s).
>
> (In reply to comment #6)
>
> > I actually meant that this should be at the top level of the file.
>
> Done, with some reordering too.
>
> > How do you feel about prompting the user for the text? We could provide the url
>
> The qbz user could only guess at what the charset the patch author used is :-|
Well I figure that Firefox will decode it correctly and they can just copy & paste (probably some unicode).
> > for them and throw up an editor via HG's API.
>
> From my experience (but ymmv), the case (does exist but) is rare and of low
> value:
> if one is really interested in the problematic patch, one can either
> download+fix it "manually" or just ask the author to attach a fixed version (=
> fix it at the source).
I think it at least makes the download&fix case easier.
Assignee | ||
Comment 9•15 years ago
|
||
(In reply to comment #8)
> Well I figure that Firefox will decode it correctly and they can just copy &
> paste (probably some unicode).
I assume the browser downloads the file "as is": I don't think it can guess/fix the (missing) charset either.
But anyway.
> I think it at least makes the download&fix case easier.
I understood you: you asked for my feeling, I gave you my opinion.
Then, I already replied in comment 5 that my patch cares about the exception fix only/first and leaves enabling an enhanced behavior to someone who cares about it (as you do and as I don't atm).
Reporter | ||
Updated•15 years ago
|
Attachment #427248 -
Flags: review?(tellrob) → review+
Assignee | ||
Comment 10•15 years ago
|
||
Comment on attachment 427248 [details] [diff] [review]
(Av2a) Discard patch file on UnicodeDecodeError
[Checkin: Comment 10]
http://hg.mozilla.org/users/robarnold_cmu.edu/qimportbz/rev/0fa47807ab5e
Attachment #427248 -
Attachment description: (Av2a) Discard patch file on UnicodeDecodeError → (Av2a) Discard patch file on UnicodeDecodeError
[Checkin: Comment 10]
Assignee | ||
Updated•15 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 11•15 years ago
|
||
Case is: hg qimport -p bz:213932
NB: I assume 'message' should need the same fix too ... but I prefer to wait for an actual (test)case :-|
Attachment #436790 -
Flags: review?(tellrob)
Reporter | ||
Updated•15 years ago
|
Attachment #436790 -
Flags: review?(tellrob) → review+
Assignee | ||
Comment 12•15 years ago
|
||
Comment on attachment 436790 [details] [diff] [review]
(Bv1) Discard user data on UnicodeDecodeError, Some rewrite and documentation too
[Checkin: Comment 12]
http://hg.mozilla.org/users/robarnold_cmu.edu/qimportbz/rev/518ac050d9fa
Attachment #436790 -
Attachment description: (Bv1) Discard user data on UnicodeDecodeError, Some rewrite and documentation too → (Bv1) Discard user data on UnicodeDecodeError, Some rewrite and documentation too
[Checkin: Comment 12]
Assignee | ||
Comment 13•15 years ago
|
||
Assignee | ||
Comment 14•15 years ago
|
||
Testcase: hg qimport -p bz:533890
Attachment #440976 -
Flags: review?(tellrob)
Assignee | ||
Updated•15 years ago
|
Severity: major → critical
Reporter | ||
Updated•15 years ago
|
Attachment #440976 -
Flags: review?(tellrob) → review+
Assignee | ||
Comment 15•15 years ago
|
||
Comment on attachment 440976 [details] [diff] [review]
(Cv1) Discard message data on UnicodeDecodeError
[Checkin: Comment 15+16]
http://hg.mozilla.org/users/robarnold_cmu.edu/qimportbz/rev/3a0e20e87a1c
Attachment #440976 -
Attachment description: (Cv1) Discard message data on UnicodeDecodeError → (Cv1) Discard message data on UnicodeDecodeError
[Checkin: Comment 15]
Assignee | ||
Comment 16•15 years ago
|
||
Comment on attachment 440976 [details] [diff] [review]
(Cv1) Discard message data on UnicodeDecodeError
[Checkin: Comment 15+16]
+
http://hg.mozilla.org/users/robarnold_cmu.edu/qimportbz/rev/6bb40df6000e
(Dv1) Fix patch Cv1 (empty message) regression.
Attachment #440976 -
Attachment description: (Cv1) Discard message data on UnicodeDecodeError
[Checkin: Comment 15] → (Cv1) Discard message data on UnicodeDecodeError
[Checkin: Comment 15+16]
Updated•10 years ago
|
Product: Other Applications → Developer Services
You need to log in
before you can comment on or make changes to this bug.
Description
•