Closed
Bug 795394
Opened 13 years ago
Closed 13 years ago
"mach warnings-list | less" produces this error: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2018' in position 100: ordinal not in range(128)
Categories
(Firefox Build System :: Mach Core, enhancement)
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla18
People
(Reporter: dholbert, Assigned: gps)
References
Details
Attachments
(1 file)
2.40 KB,
patch
|
k0scist
:
review+
|
Details | Diff | Splinter Review |
As noted in bug 794713 comment 20: if I apply that bug's patch and run...
> ./mach warnings-list | less
...then I get the following (and only the following) displayed in my "less" output:
{
No handlers could be found for logger "pymake.data"
Traceback (most recent call last):
File "./mach", line 48, in <module>
mach.run(sys.argv[1:])
File "/scratch/work/builds/mozilla-central/mozilla-central.12-03-19.10-25/mozilla/python/mach/mach/main.py", line 133, in run
obj-x86_64-unknown-linux-gnu/dist/include/mozilla/Assertions.h:263:54 [-Wvariadic-macros] anonymous variadic macros were introduced in C99
obj-x86_64-unknown-linux-gnu/dist/include/mozilla/Assertions.h:267:33 [-Wvariadic-macros] anonymous variadic macros were introduced in C99
obj-x86_64-unknown-linux-gnu/dist/include/mozilla/Assertions.h:275:22 [-Wvariadic-macros] anonymous variadic macros were introduced in C99
fn(**stripped)
File "/scratch/work/builds/mozilla-central/mozilla-central.12-03-19.10-25/mozilla/python/mach/mach/warnings.py", line 61, in list
warning['column'], warning['flag'], warning['message']))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2018' in position 100: ordinal not in range(128)
}
I suspect this is because unicode characters are now making it past the build process, but we can't handle them after that point if we're piped to e.g. "less".
If I don't pipe it through "less", I don't have any problems. (list of build warnings prints as expected)
Assignee | ||
Comment 1•13 years ago
|
||
I can reproduce this.
The warnings.json file (objdir/.mozbuild/warnings.json) has standard Unicode escaping:
"files": {
"/home/gps/src/mozilla-central/xpcom/tests/TestCOMArray.cpp": {
"hash": "00ad4df58de719ecca19b66d2cfd82a37b39b0dc",
"warnings": [
{
"column": 7,
"line": 239,
"flag": "-Wuninitialized",
"message": "\u2018fourthObject\u2019 may be used uninitialized in this function",
"filename": "/home/gps/src/mozilla-central/xpcom/tests/TestCOMArray.cpp"
},
i.e. the file itself is ASCII however the JSON serializer converts to/from UTF-8 at serialization time.
When we've parsed the file back into Python data structures, we see unicode instances. e.g.
{u'column': 7, u'filename': u'/home/gps/src/mozilla-central/xpcom/tests/TestCOMArray.cpp', u'flag': u'-Wuninitialized', u'line': 239, u'message': u'\u2018fourthObject\u2019 may be used uninitialized in this function'}
Just to be sure:
print w['message'][0]
‘
If we print warnings-list directly to the terminal (the default), we see our Unicode smart quotes and all is well.
However, as soon as we pipe into less, we blow up.
I suspect sys.stdout.encoding is different/not-UTF-8 when piping to less.
Assignee | ||
Comment 2•13 years ago
|
||
When printing to the terminal sys.stdout.encoding is UTF-8. When piping to less it is None. Yay default encoding of ASCII! Not.
Assignee | ||
Comment 3•13 years ago
|
||
Searching the internets for a recommended solution to this problem (I don't want to have to .encode('utf-8') every time I print) reveals a minefield. CC'ing the Python brain trust.
Assignee | ||
Comment 4•13 years ago
|
||
This is how I think it should be done. The rest of the Internet seems to be on crack with its setdefaultencoding and PYTHONIOENCODING nonsense. This seems much saner (and simpler) to me. And, it appears to work!
If I could change anything about this patch it would be to have a context manager for transparent encoding overriding on file objects instead of the try..finally. I don't think I saw one in codecs. Maybe one is lurking elsewhere in the stdlib. (I'm too lazy to write my own, meh).
Comment 5•13 years ago
|
||
Comment on attachment 666011 [details] [diff] [review]
Transparently wrap standard streams with UTF-8 encoding, v1
Review of attachment 666011 [details] [diff] [review]:
-----------------------------------------------------------------
This will now blow up if you try to write an non-ASCII |str|:
>>> codecs.getwriter("utf-8")(StringIO.StringIO()).write("\x98")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x98 in position 0: ordinal not in range(128)
The bug 794713 patch basically forces output from process to be unicode, so I suppose you're assuming this won't be a problem.
Comment 6•13 years ago
|
||
Comment on attachment 666011 [details] [diff] [review]
Transparently wrap standard streams with UTF-8 encoding, v1
TBH, I'm wary of this change. If we have non-UTF-8, will this break? (as much as I'd love if the whole world was UTF-8, that doesn't seem likely). I would like to see tests with this.
Assignee | ||
Comment 7•13 years ago
|
||
So, the theory is that everything inside of mach/mozbuild is unicode (Python 2.7) or str (Python 3). Either way, data is stored as Unicode.
I concede that this patch currently blows up if we try to write out str or bytes. But, mach should not currently write out non-textual data, so I don't think we have to cross this bridge today.
Anyway, if we are converting from Unicode to bytes, we need to choose an encoding. And, UTF-8 seems like a reasonable choice (if none has been defined). The only additional thing we should consider is whether to throw error='ignore' in there. I can make a case for that. But, at this juncture I'd like to leave it out because I'd like to know where we aren't dealing with Unicode internally.
Comment 8•13 years ago
|
||
Even if all mach/mozbuild is unicode, we still have to worry about non-unicode from m-c and the system. I'll give this an r+, but still, am wary. I suppose it is no worse than what we do now
Updated•13 years ago
|
Attachment #666011 -
Flags: review?(jhammel) → review+
Assignee | ||
Comment 9•13 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Updated•7 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•