Unicode decode error trying to run mochitest-remote for b2g-emulator M10

RESOLVED FIXED

Status

defect
RESOLVED FIXED
5 years ago
5 months ago

People

(Reporter: jesup, Assigned: ahal)

Tracking

32 Branch
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 4 obsolete attachments)

./mach mochitest-remote --total-chunks 15 --this-chunk 10 
After a while, I get the following error (repeatable) and the framework stops logging.  The tests seem to continue on.

1326 INFO TEST-PASS | /tests/dom/smil/test/test_smilAccessKey.xhtml | Unexpected start time for animation with begin: accessKey(a)
1327 INFO TEST-PASS | /tests/dom/smil/test/test_smilAccessKey.xhtml | No interval created for 'accessKey(a)' with input [charCode: 97; keyCode: 66]
1328 INFO TEST-PASS | /tests/dom/smil/test/test_smilAccessKey.xhtml | Expected animations to be paused
1329 INFO TEST-PASS | /tests/dom/smil/test/test_smilAccessKey.xhtml | Unexpected exception: InvalidStateError
1330 INFO TEST-PASS | /tests/dom/smil/test/test_smilAccessKey.xhtml | Unexpected exception code: 11
1331 INFO TEST-PASS | /tests/dom/smil/test/test_smilAccessKey.xhtml | Interval unexpectedly created for 'accessKey(a)' with input [charCode: 98; keyCode: 97]
1332 INFO TEST-PASS | /tests/dom/smil/test/test_smilAccessKey.xhtml | Expected animations to be paused
[Child 299] WARNING: jschar out of char range; high bits of data lost: 0x20ac: file ../../../../js/xpconnect/src/XPCConvert.cpp, line 354
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 764, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/jesup/src/mozilla/B2G-emulator/../inbound/testing/mozbase/mozprocess/mozprocess/processhandler.py", line 774, in _processOutput
    self.processOutputLine(line.rstrip())
  File "/home/jesup/src/mozilla/B2G-emulator/../inbound/testing/mozbase/mozprocess/mozprocess/processhandler.py", line 719, in processOutputLine
    handler(line)
  File "/home/jesup/src/mozilla/B2G-emulator/../inbound/testing/mozbase/mozprocess/mozprocess/processhandler.py", line 916, in __call__
    self.stream.write(line + '\n')
  File "/usr/lib64/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 128: ordinal not in range(128)
Posted patch mozprocess_decode (obsolete) — Splinter Review
This should fix your immediate problem. Though I think this bug raises some questions around how we handle unicode in general throughout mozprocess. I want to spend some time testing it out more and make sure there aren't other hidden bugs like this waiting to be exposed.
This appears to work for me (got a crash on shutdown, but not due to this).
Thanks!
Comment on attachment 8401521 [details] [diff] [review]
mozprocess_decode

You want this to be "line.decode('utf-8', 'replace')"

Otherwise, if invalid UTF-8 is seen, the work will blow up again.

We dealt with this problem extensively in mach land. See https://hg.mozilla.org/mozilla-central/file/6c924a018540/python/mach/mach/mixin/process.py#l80

What I'm trying to hold the line on is that any textual data inside Python be converted to unicode (not str) at the process border. If we fail to do that, we're setting ourselves up for all kinds of str/bytes/unicode implicit conversion fail come Python 3.

You can keep things as str now. But I highly recommend you to rewrite mozprocess to be encoding aware. In this patch, you should look at self.stream.encoding and perform the encoding explicitly in __call__. If Python 2 is left to do it, it will likely do the wrong thing. Python 3 will refuse outright.
Posted patch mozprocess unicode (obsolete) — Splinter Review
I think this is what we want. Not all file-like objects have an 'encoding' property (like StringIO which some of our tests use), so I had to add a check for that. In this case I think it is reasonable to leave it up to the user to make sure the object's StreamWriter handles unrecognized input properly.

Gps, mind doing the review since you've dealt with similar issues before? If not feel free to redirect to :wlach.
Assignee: nobody → ahalberstadt
Attachment #8401521 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8401905 - Flags: review?(gps)
Comment on attachment 8401905 [details] [diff] [review]
mozprocess unicode

Review of attachment 8401905 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/mozbase/mozprocess/mozprocess/processhandler.py
@@ +771,5 @@
>              (lines, self.didTimeout) = self.readWithTimeout(logsource, lineReadTimeout)
>              while lines != "":
>                  for line in lines.splitlines():
> +                    if isinstance(line, bytes):
> +                        line = line.decode(sys.stdout.encoding or 'utf-8', 'replace')

This actually scares me a bit. This is a backwards-incompatible API change and may cause other things (like mach) to start barfing. I feel this is already a house of cards and things are ready to topple.

I would perform the encoding/decoding in StreamOutput and leave this be, if possible. Later on, we can figure out a way to make mozprocess consistently emit str or unicode.

Sound like a plan?

@@ +915,5 @@
>          self.stream = stream
>  
>      def __call__(self, line):
> +        if hasattr(self.stream, 'encoding'):
> +            line = line.encode(self.stream.encoding or 'utf-8', 'replace')

Presumably you are expecting a unicode here. I may assert that.

Keep in mind the common operations for str/bytes <-> unicode are:

str.decode() -> unicode
unicode.encode() -> str

str.encode() is only really useful for the non-unicode encodings. e.g. b'foo'.encode('rot13'). See https://docs.python.org/2/library/codecs.html#standard-encodings.

I guess technically the output stream could have one of those non-unicode encodings. It's up to you. It's definitely safest to avoid the type check, especially if you won't be coercing the lines to unicode as they come off the process.

::: testing/mozbase/mozprocess/tests/procunicode.py
@@ +1,2 @@
> +# -*- coding: utf-8 -*-
> +print '字åäö'

This isn't Python 3 compatible. You'll want to do:

print(u'字åäö'.encode('utf-8'))

::: testing/mozbase/mozprocess/tests/test_mozprocess_output.py
@@ +70,5 @@
>  
> +    def test_handle_unicode(self):
> +        """
> +        Handlers don't trip up on non-ascii output
> +        """

w00t - I'm very happily to see unicode tests here!
Attachment #8401905 - Flags: review?(gps) → feedback+
So is the patch I obsoleted basically what you are suggesting we go with for now?

https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=991866&attachment=8401521
Flags: needinfo?(gps)
Yes, I think the initial patch was pretty good. It just needed an "ignore" or "replace" so invalid utf-8 byte sequences wouldn't raise an exception.
Flags: needinfo?(gps)
Duplicate of this bug: 1009490
Posted patch encoding-issue.diff (obsolete) — Splinter Review
Andrew, the same patch as in bug 1009490 (just updated the bug number).
I am doing an encoding with iso because .decode("utf-8") is failing with "UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 29: invalid start byte"
Attachment #8428713 - Flags: review?(ahalberstadt)
Comment on attachment 8428713 [details] [diff] [review]
encoding-issue.diff

Review of attachment 8428713 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think we want to decode to ISO-8859-1 in mozprocess here. It may fix your specific case, but it will trip up on Asian or even eastern European characters. I'll update my old obsoleted patch here and add "replace" to the call to decode which should replace the character in question with a '?' if it can't be decoded to utf-8
Attachment #8428713 - Flags: review?(ahalberstadt) → review-
Sorry, this slipped off my radar for a while there. Continuing the old discussion with :gps..

If I just do "line.decode('utf-8', 'replace')" like we talked about, the test case I created fails. Also the fact that :sylvestre is running into problems with ISO-8859-1 encoded strings makes me think maybe we should coerce everything to unicode. As per your suggestion though, I'll move it into the StreamOutput handler for now.

:sylvestre, can you verify this fixes your problem?
Attachment #8401905 - Attachment is obsolete: true
Attachment #8428874 - Flags: review?(gps)
Flags: needinfo?(sledru)
Same issue with your patch:
----
  Putting entry 175...    TestUniChar::TestUniChar() 175
OK...
Testing Get:
Exception in thread Thread-12:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/data/mozilla-central-coverage/testing/mozbase/mozprocess/mozprocess/processhandler.py", line 774, in _processOutput
    self.processOutputLine(line.rstrip())
  File "/data/mozilla-central-coverage/testing/mozbase/mozprocess/mozprocess/processhandler.py", line 719, in processOutputLine
    handler(line)
  File "/data/mozilla-central-coverage/testing/mozbase/mozprocess/mozprocess/processhandler.py", line 921, in __call__
    self.stream.write(line + '\n')
  File "/usr/lib/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 29: ordinal not in range(128)
----
Flags: needinfo?(sledru)
Comment on attachment 8428874 [details] [diff] [review]
mozprocess unicode

:sylvestre, do you happen to know what character it is tripping up on? It would be good to add a test for it once we figure it out what the problem is. I'm not quite sure how to fix this in the general case.. maybe we should just take your initial try: except patch after all..
Attachment #8428874 - Flags: review?(gps)
Flags: needinfo?(sledru)
It fails on:
  Getting entry nbsp...Found  
  Getting entry iexcl...Found ¡
  Getting entry cent...Found ¢
  Getting entry pound...Found £
  Getting entry curren...Found ¤
  Getting entry yen...Found ¥

So, basic stuff.

(which I can see with my patch).
Flags: needinfo?(sledru)
Andrew, are you ok with my patch? thanks
Flags: needinfo?(ahalberstadt)
No, but here's a variation on your patch that should do the same thing. This patch actually fails the test I created for the last patch, but I think we should go ahead and land this as it should fix the problem for most people who run into this. We can re-open if there are still problems with other random encodings in the future.
Attachment #8428713 - Attachment is obsolete: true
Attachment #8433453 - Flags: review?(gps)
Flags: needinfo?(ahalberstadt)
Sorry, I don't think the last patch would ever re-raise the exception in that else clause. Here's an updated patch that does the right thing.
Attachment #8433453 - Attachment is obsolete: true
Attachment #8433453 - Flags: review?(gps)
Attachment #8433472 - Flags: review?(gps)
Attachment #8433472 - Flags: review?(gps) → review+
Andrew, I tried on try with my attachment from bug 1009490: 
https://tbpl.mozilla.org/?tree=Try&rev=d4d7c66b4188

It seems green.
Ok I guess that won't make anything worse off than it already is, so if you want to get it landed in the meantime I'm ok with that (though it isn't a fix to this bug). I'd want to at least include a 'TODO' comment with a reference to this bug.

Would you mind re-uploading that patch with the comment included and flag me for review?
Thanks!
Attachment #8450143 - Flags: review?(ahalberstadt)
Attachment #8450143 - Attachment is patch: true
Comment on attachment 8450143 [details] [diff] [review]
workaround on the utf-8 display failing

Review of attachment 8450143 [details] [diff] [review]:
-----------------------------------------------------------------

Lgtm, thanks!
Attachment #8450143 - Flags: review?(ahalberstadt) → review+
The above patch is a workaround that may improve things for people in western Europe, but will still fail in other cases. Leaving open to fix this properly (eventually).
Whiteboard: [leave-open]
Blocks: 1050756
Assignee: ahalberstadt → nobody
Status: ASSIGNED → NEW
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Keywords: leave-open
Whiteboard: [leave-open]

The leave-open keyword is there and there is no activity for 6 months.
:gbrown, maybe it's time to close this bug?

Flags: needinfo?(gbrown)

:whimboo -- Did you want to follow-up here?

Flags: needinfo?(gbrown) → needinfo?(hskupin)

The workaround fix landed in comment 28 and comment 29. Since then no more work was done / necessary. So I would say lets close this bug as fixed, and if necessary file a new bug for a real fix. For now the Unicode work stalled anyway.

Assignee: nobody → ahal
Status: REOPENED → RESOLVED
Closed: 3 years ago5 months ago
Flags: needinfo?(hskupin)
Keywords: leave-open
Resolution: --- → FIXED
Version: unspecified → 32 Branch
You need to log in before you can comment on or make changes to this bug.