Closed Bug 1607517 Opened 5 years ago Closed 5 years ago

moz-phab patch fails on windows due to unicode characters in the name

Categories

(Conduit :: moz-phab, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Gijs, Assigned: zalun)

References

Details

(Keywords: conduit-triaged)

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #1567252 +++

Like bug 1567252, but now we're failing when decoding:

Patching revision: D58726
Checked out .
transaction abort!
rollback completed
abort: decoding near 'lio Cobos []lvarez <e': 'utf8' codec can't decode byte 0xc1 in position 13: invalid start byte!
CommandError: command 'hg.exe' failed to complete successfully

(there's a weird horizontal bar + vertical line above it symbol instead of the [] (I'm retyping from a different machine))

Priority: P2 → --
Summary: moz-phab patch fails on windows with UnicodeDecodeError due to unicode characters in the name → moz-phab patch fails on windows with due to unicode characters in the name
Summary: moz-phab patch fails on windows with due to unicode characters in the name → moz-phab patch fails on windows due to unicode characters in the name

Looks like something somewhere encoded Emilio's name as cp1250/1252/iso-8859-1 and then decoding as utf-8 broke.

I've got issues with the Win10 Fusion, so I can't replicate.
I'm just leaving a note that I tried on OSX and it is working fine.

(In reply to Piotr Zalewa [:zalun] from comment #2)

I've got issues with the Win10 Fusion, so I can't replicate.
I'm just leaving a note that I tried on OSX and it is working fine.

Is there some way I can gather more details for you?

Flags: needinfo?(gijskruitbosch+bugs)

Oops, needinfo the right person for comment #3...

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(pzalewa)

@glob will check it out - I'm kind of in the dark for now. Fixing the W10 install

Flags: needinfo?(pzalewa)
$ moz-phab patch D58726 --trace
Patching revision: D58726
Checked out 33ccfb45bb3f
Bookmark set to D58726_1
abort: decoding near 'lio Cobos ┴lvarez <e': 'utf8' codec can't decode byte 0xc1 in position 13: invalid start byte!
Traceback (most recent call last):
  File "c:\mozilla-build\python3\lib\site-packages\mozphab\mozphab.py", line 259, in check_call
    subprocess.check_call(command, **kwargs)
  File "c:\mozilla-build\python3\lib\subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['hg.exe', '--config', 'extensions.rebase=', '--pager', 'never', '--config', 'rebase.experimental.inmemory=true', 'import', 'C:\\Users\\byron\\AppData\\Local\\Temp\\tmpqr_vyt2q', '--quiet', '-l', 'C:\\Users\\byron\\AppData\\Local\\Temp\\tmpc99byl1w', '-u', 'Emilio Cobos Álvarez <emilio@crisal.io>', '-d', '2020-01-08T05:00:42']' returned non-zero exit status 255.

The problem is the -u Emilio Cobos Álvarez <emilio@crisal.io> argument to hg import; it works if that argument isn't present.

Keywords: conduit-triaged
Priority: -- → P1

Please note that &lt; is a "<" in a terminal. That's a bug in BMO.

Some additional findings.

Running a clear command from a Python3 script is fine:
subprocess.check_call(['hg', "commit", "-m", "alvarez", "-u", 'Emilio Cobos Álvarez <emilio@example.com>'])

The same command with the string from the API failed with the encoding error.
I tried to compare the strings and they seem to be the same.
But when the author is hardcoded the exception is not raised (!)

#2212
print(author == 'Emilio Cobos Álvarez <emilio@example.com>')  # True
print(type(author))  # class 'str'>
print(type('Emilio Cobos Álvarez <emilio@example.com>'))  # class 'str'>
author = 'Emilio Cobos Álvarez <emilio@example.com>'

On Windows local encoding is cp1252:

byron@win10vm ~
$ python3
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getdefaultlocale()
('en_AU', 'cp1252')
>>>

Mercurial's documentation at https://www.mercurial-scm.org/wiki/EncodingStrategy#Local_strings says:

All user input in the form of command line arguments, configuration files, etc. are assumed to be in the local encoding.

Adding debugging code to HG reveals that the argument is being provided as cp1252 encoded:
Emilio Cobos \xc1lvarez <emilio@crisal.io>

Looks like to me that this is a Mercurial bug.
zalun: can you report this on their bug tracker please?

I've played around a bit but I can't see any easy way to work around this with command line args/encoding/etc.

One way I can think of working around this is to assemble a HG changeset patch file from the body, author, date, diff, etc:

# HG changeset patch
# User byron jones <glob@mozilla.com>
# Date 1581435354 -28800
This is the first line of the commit description.

And this is the last line.
diff --git a/README.txt b/README.txt
--- a/README.txt
+++ b/README.txt
@@ -1,8 +1,10 @@
+Cheese
+
 An explanation of the Mozilla Source Code Directory Structure and links to
 project pages with documentation can be found at:

     https://firefox-source-docs.mozilla.org/contributing/directory_structure.html

 For information on how to build Mozilla from the source code and create the patch see:

     https://firefox-source-docs.mozilla.org/contributing/how_to_contribute_firefox.html

I was able to import this successfully even though the Node ID and Parent fields were not provided.

Patching failed on Windows because of an encoding error when user string
non ASCII characters was passed in command argument.

Assignee: nobody → pzalewa
Status: NEW → ASSIGNED
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: