Closed Bug 533890 Opened 15 years ago Closed 15 years ago

Need to figure out encoding/decoding of patches in qimportbz

Categories

(Developer Services :: Mercurial: qimportbz, defect)

defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: robarnold, Assigned: sgautherie)

References

Details

(Keywords: crash)

Attachments

(4 files, 2 obsolete files)

I thought we had it down but crowder gave me this trace: $ hg qimport bz:502933 Fetching...done Parsing...** unknown exception encountered, details follow ** report bug details to http://www.selenic.com/mercurial/bts ** or mercurial@selenic.com ** Mercurial Distributed SCM (version 1.2.1) ** Extensions loaded: extdiff, graphlog, mq, win32text, qimportbz Traceback (most recent call last): File "hg", line 27, in <module> File "mercurial\dispatch.pyc", line 16, in run File "mercurial\dispatch.pyc", line 25, in dispatch File "mercurial\dispatch.pyc", line 41, in _runcatch File "mercurial\dispatch.pyc", line 372, in _dispatch File "mercurial\dispatch.pyc", line 247, in runcommand File "mercurial\dispatch.pyc", line 417, in _runcommand File "mercurial\dispatch.pyc", line 377, in checkargs File "mercurial\dispatch.pyc", line 371, in <lambda> File "mercurial\util.pyc", line 718, in check File "mercurial\extensions.pyc", line 100, in wrap File "mercurial\util.pyc", line 718, in check File "c:\Documents and Settings\crowder\qimportbz\__init__.py", line 120, in qimporthook orig(ui, repo, *files, **opts) File "mercurial\util.pyc", line 718, in check File "hgext\mq.pyc", line 1686, in qimport File "hgext\mq.pyc", line 1600, in qimport File "mercurial\url.pyc", line 314, in open File "urllib2.pyc", line 387, in open File "c:\Documents and Settings\crowder\qimportbz\bzhandler.py", line 63, in bz_response bug = bz.Bug(self.ui, data) File "c:\Documents and Settings\crowder\qimportbz\bz.py", line 176, in __init__ self.attachments = [Attachment.parse(self, a) for a in xml.findall("bug/attachment")] File "c:\Documents and Settings\crowder\qimportbz\bz.py", line 34, in parse return ctor(bug, node) File "c:\Documents and Settings\crowder\qimportbz\bz.py", line 72, in __init__ self.data = fp.read().decode('utf-8') File "encodings\utf_8.pyc", line 16, in decode UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1063149-1063150: invalid data
Another example, with a smaller bug, (on Windows 2000): { $ hg qimport bz:531576 "UnicodeDecodeError: 'utf8' codec can't decode bytes in position 2391-2393: invalid data" }
(In reply to comment #1) > "UnicodeDecodeError: 'utf8' codec can't decode bytes in position 2391-2393: Ftr, testing with 'ascii' reports: "UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 2391: ordinal not in range(128)" To be explicit, *that's the 'ã' character in "Dão". *and that patch is an "ansi" version of the previously attached "utf-8" patch (which decodes correctly).
Can't do much more, as there is no way to know what the actual file character set would be :-<
Assignee: nobody → sgautherie.bz
Status: NEW → ASSIGNED
Attachment #420704 - Flags: review?(tellrob)
Blocks: 502933, 531576
Severity: normal → major
Comment on attachment 420704 [details] [diff] [review] (Av1) Discard patch file on UnicodeDecodeError, Improve console outputs. >- self.data = fp.read().decode('utf-8') >+ try: >+ self.data = fp.read().decode('utf-8') >+ except UnicodeDecodeError: >+ bug.settings.ui.warn("Patch id=%s desc=\"%s\" file was discarded:\n" % (self.id, self.desc)) >+ # Print the exception without its traceback. >+ import sys >+ sys.excepthook(sys.exc_info()[0], sys.exc_info()[1], None) >+ # Can't do better than discard data: >+ # trying |.decode('utf-8', 'replace')| as a fallback would be too risky >+ # if user imports the result then forgets to fix it. >+ self.data = '' I don't like the import sys in the middle here. Also, can we try decoding in other formats? What is the testcase here and what is its encoding? A latin-1/ascii decoding seems like a common mis-encoding for a patch (we could convert it to utf-8 for them with a warning). The rest of the changes in the patch are unrelated to this bug (and I'm not sure I like them).
Attachment #420704 - Flags: review?(tellrob) → review-
Av1, with comment 4 suggestion(s). (In reply to comment #4) > I don't like the import sys in the middle here. Also, can we try decoding in Moved to the start of the method. > other formats? What is the testcase here and what is its encoding? A > latin-1/ascii decoding seems like a common mis-encoding for a patch (we could > convert it to utf-8 for them with a warning). Testcase is comment 1 and 2. BugZilla doesn't know what the exact encoding is: that's the issue! (ascii is a subpart of utf-8 and causes (nor solves) no problem.) I don't know of a sure way to select the correct ISO-8859-* variant... Anyway, this patch is meant to stop aborting and let proceed with the other patches. As a follow-up patch, you could add a charset override feature, for example. > The rest of the changes in the patch are unrelated to this bug (and I'm not > sure I like them). I filed bug 540164...
Attachment #420704 - Attachment is obsolete: true
Attachment #422005 - Flags: review?(tellrob)
Comment on attachment 422005 [details] [diff] [review] (Av2) Discard patch file on UnicodeDecodeError >From: Serge Gautherie <sgautherie.bz@free.fr> > >Bug 533890 - Need to figure out encoding/decoding of patches in qimportbz; >(Av2) Discard patch file on UnicodeDecodeError. > >diff --git a/bz.py b/bz.py >--- a/bz.py >+++ b/bz.py >@@ -77,25 +77,35 @@ class Flag(object): > > # Compare by flag name > def __cmp__(self, other): > flagorder = ['r', 'sr', 'ui-r', 'a', 'c'] > return cmp(flagorder.index(self.abbrev()), flagorder.index(other.abbrev())) > > class Patch(Attachment): > def __init__(self, bug, node): >+ import sys I actually meant that this should be at the top level of the file. > Attachment.__init__(self, bug, node) > self.flags = list(sorted(Flag(bug, n) for n in node.findall('flag'))) > rawtext = base64.b64decode(node.find('data').text) > data = StringIO.StringIO(rawtext) > filename, message, user, date, branch, nodeid, p1, p2 = patch.extract(bug.settings.ui, data) > # for some reason, patch.extract writes a temporary file with the diff hunks > if filename: > fp = file(filename) >- self.data = fp.read().decode('utf-8') >+ try: >+ self.data = fp.read().decode('utf-8') >+ except UnicodeDecodeError: >+ bug.settings.ui.warn("Patch id=%s desc=\"%s\" file was discarded:\n" % (self.id, self.desc)) >+ # Print the exception without its traceback. >+ sys.excepthook(sys.exc_info()[0], sys.exc_info()[1], None) >+ # Can't do better than discard data: >+ # trying |.decode('utf-8', 'replace')| as a fallback would be too risky >+ # if user imports the result then forgets to fix it. >+ self.data = '' How do you feel about prompting the user for the text? We could provide the url for them and throw up an editor via HG's API.
Av2, with comment 6 suggestion(s). (In reply to comment #6) > I actually meant that this should be at the top level of the file. Done, with some reordering too. > How do you feel about prompting the user for the text? We could provide the url The qbz user could only guess at what the charset the patch author used is :-| > for them and throw up an editor via HG's API. From my experience (but ymmv), the case (does exist but) is rare and of low value: if one is really interested in the problematic patch, one can either download+fix it "manually" or just ask the author to attach a fixed version (= fix it at the source).
Attachment #422005 - Attachment is obsolete: true
Attachment #427248 - Flags: review?(tellrob)
Attachment #422005 - Flags: review?(tellrob)
(In reply to comment #7) > Created an attachment (id=427248) [details] > (Av2a) Discard patch file on UnicodeDecodeError > > Av2, with comment 6 suggestion(s). > > (In reply to comment #6) > > > I actually meant that this should be at the top level of the file. > > Done, with some reordering too. > > > How do you feel about prompting the user for the text? We could provide the url > > The qbz user could only guess at what the charset the patch author used is :-| Well I figure that Firefox will decode it correctly and they can just copy & paste (probably some unicode). > > for them and throw up an editor via HG's API. > > From my experience (but ymmv), the case (does exist but) is rare and of low > value: > if one is really interested in the problematic patch, one can either > download+fix it "manually" or just ask the author to attach a fixed version (= > fix it at the source). I think it at least makes the download&fix case easier.
(In reply to comment #8) > Well I figure that Firefox will decode it correctly and they can just copy & > paste (probably some unicode). I assume the browser downloads the file "as is": I don't think it can guess/fix the (missing) charset either. But anyway. > I think it at least makes the download&fix case easier. I understood you: you asked for my feeling, I gave you my opinion. Then, I already replied in comment 5 that my patch cares about the exception fix only/first and leaves enabling an enhanced behavior to someone who cares about it (as you do and as I don't atm).
Attachment #427248 - Flags: review?(tellrob) → review+
Attachment #427248 - Attachment description: (Av2a) Discard patch file on UnicodeDecodeError → (Av2a) Discard patch file on UnicodeDecodeError [Checkin: Comment 10]
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Case is: hg qimport -p bz:213932 NB: I assume 'message' should need the same fix too ... but I prefer to wait for an actual (test)case :-|
Attachment #436790 - Flags: review?(tellrob)
Blocks: 213932
Attachment #436790 - Flags: review?(tellrob) → review+
Comment on attachment 436790 [details] [diff] [review] (Bv1) Discard user data on UnicodeDecodeError, Some rewrite and documentation too [Checkin: Comment 12] http://hg.mozilla.org/users/robarnold_cmu.edu/qimportbz/rev/518ac050d9fa
Attachment #436790 - Attachment description: (Bv1) Discard user data on UnicodeDecodeError, Some rewrite and documentation too → (Bv1) Discard user data on UnicodeDecodeError, Some rewrite and documentation too [Checkin: Comment 12]
Testcase: hg qimport -p bz:533890
Attachment #440976 - Flags: review?(tellrob)
Keywords: crash
Severity: major → critical
Blocks: 535674
Attachment #440976 - Flags: review?(tellrob) → review+
Attachment #440976 - Attachment description: (Cv1) Discard message data on UnicodeDecodeError → (Cv1) Discard message data on UnicodeDecodeError [Checkin: Comment 15]
Comment on attachment 440976 [details] [diff] [review] (Cv1) Discard message data on UnicodeDecodeError [Checkin: Comment 15+16] + http://hg.mozilla.org/users/robarnold_cmu.edu/qimportbz/rev/6bb40df6000e (Dv1) Fix patch Cv1 (empty message) regression.
Attachment #440976 - Attachment description: (Cv1) Discard message data on UnicodeDecodeError [Checkin: Comment 15] → (Cv1) Discard message data on UnicodeDecodeError [Checkin: Comment 15+16]
Product: Other Applications → Developer Services
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: