Open Bug 1492805 Opened 6 years ago Updated 2 years ago

Some logs fail to parse due to "UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 467: unexpected end of data"

Categories

(Tree Management :: Treeherder, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: dluca, Unassigned)

References

Details

Both have this failure in the log:

[task 2018-09-20T05:48:04.764Z] 05:48:04  WARNING -  TEST-UNEXPECTED-FAIL | dom/base/test/unit/test_xmlserializer.js | run_test/< - [run_test/< : 36] "<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?>\\r<foo/>\\r<!-- é -->" == "<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?>\\r<!DOCTYPE whatever PUBLIC \\"-//MOZ//WHATEVER//EN\\" \\"http://mozilla.org/ns/foo\\">\\r<foo xmlns=\\"htp://mozilla.org/ns\\">\\r    <baz/><!-- a comment --> <bar> &lt;robots&gt; &amp; &lt;aliens&gt;\\r<mozilla> a a a a a éèàùûî</mozilla>\\r        <firefox>Lorem ip<!-- aaa -->sum dolor sit amet, consectetuer adipiscing elit. Nam eu sapien. Sed viverra lacus. Donec quis ipsum. Nunc cursus aliquet lectus. Nunc vitae eros. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos hymenaeos. Nam tellus massa, fringilla aliquam, fermentum sit amet, posuere ac, est. Duis tristique egestas ligula. Mauris quis felis. Fusce a ipsum non lacus posuere aliquet. Sed fermentum posuere nulla. Donec tempor. Donec sollicitudin tortor lacinia libero ullamcorper laoreet. Cras quis nisi at odio consectetuer molestie.</firefox>\\r        <?xml-foo \\"hey\\" ?>\\r</bar>\\r              <!-- a comment \\r         on several lines-->\\r <?xml-foo \\"another pi on two lines\\"  \\r              example=\\"hello\\"?>\\r</foo>"
Looking at New Relic (I believe some of you have access; let me know if any more people want adding?) I see:
https://rpm.newrelic.com/accounts/677903/applications/14179757/filterable_errors?tw%5Bend%5D=1537443927&tw%5Bstart%5D=1537400727#/show/5b2d8a00-bcb7-11e8-bfca-0242ac110008_5138_10270/stack_trace?top_facet=transactionUiName&primary_facet=error.class&barchart=barchart

ie: in the last 12 hours, 3 instances of:

Traceback:
...
File "/app/treeherder/workers/task.py", line 43, in inner
File "/app/treeherder/log_parser/tasks.py", line 54, in parse_logs
File "/app/treeherder/log_parser/tasks.py", line 97, in parse_unstructured_log
File "/app/treeherder/log_parser/utils.py", line 38, in post_log_artifacts
File "/app/treeherder/log_parser/utils.py", line 27, in extract_text_log_artifacts
File "/app/.heroku/python/lib/python2.7/site-packages/simplejson/__init__.py", line 382, in dumps
File "/app/.heroku/python/lib/python2.7/site-packages/simplejson/encoder.py", line 296, in encode
File "/app/.heroku/python/lib/python2.7/site-packages/simplejson/encoder.py", line 378, in iterencode
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 467: unexpected end of data

Example log:
https://queue.taskcluster.net/v1/task/Q1rJboiTS4Sf_cAacBKF9w/runs/0/artifacts/public/logs/live_backing.log

It looks like that log line that was extracted isn't valid UTF-8.

I'm pretty swamped at the moment, so given this is pretty specific (ie these particular failures only and it's possible to look at the raw log), I won't really have a chance to look into this right now.

However if anyone wanted to have a go at fixing it, the line in question is here:
https://github.com/mozilla/treeherder/blob/b621781f2606000bac3693eb324e8e565728682c/treeherder/log_parser/utils.py#L27

And `artifact` comes from:
https://github.com/mozilla/treeherder/blob/b621781f2606000bac3693eb324e8e565728682c/treeherder/log_parser/artifactbuilders.py#L61-L64
https://github.com/mozilla/treeherder/blob/b621781f2606000bac3693eb324e8e565728682c/treeherder/log_parser/parsers.py#L37-L39
https://github.com/mozilla/treeherder/blob/b621781f2606000bac3693eb324e8e565728682c/treeherder/log_parser/parsers.py#L85-L88

Perhaps this line truncation isn't helping?
https://github.com/mozilla/treeherder/blob/b621781f2606000bac3693eb324e8e565728682c/treeherder/log_parser/artifactbuilders.py#L49
Priority: -- → P2
Summary: Log are not being parsed → Some logs fail to parse due to "UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 467: unexpected end of data"
Another example:
https://queue.taskcluster.net/v1/task/SBmZyNJ9QJC ADFLDfytWwQ/runs/0/artifacts/public/logs/live_backing.log

UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 467: unexpected end of data

I'm not seeing any errors like this in New Relic.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INVALID
Assignee: nobody → sclements
Status: RESOLVED → REOPENED
Priority: P3 → P2
Resolution: INVALID → ---
Status: REOPENED → ASSIGNED

I am also hitting this locally when running some (locally bad) xpcshell tests:

 5:30.90 INFO Following exceptions were raised:
 5:30.90 ERROR Traceback (most recent call last):
  File "/home/simon/work/mozilla-unified/testing/xpcshell/runxpcshelltests.py", line 224, in run
    self.run_test()
  File "/home/simon/work/mozilla-unified/testing/xpcshell/runxpcshelltests.py", line 855, in run_test
    self.log_full_output()
  File "/home/simon/work/mozilla-unified/testing/xpcshell/runxpcshelltests.py", line 649, in log_full_output
    self.log_line(line)
  File "/home/simon/work/mozilla-unified/testing/xpcshell/runxpcshelltests.py", line 630, in log_line
    line = self.fix_text_output(line).rstrip('\r\n')
  File "/home/simon/work/mozilla-unified/testing/xpcshell/runxpcshelltests.py", line 621, in fix_text_output
    line = cleanup_encoding(line)
  File "/home/simon/work/mozilla-unified/testing/xpcshell/runxpcshelltests.py", line 109, in cleanup_encoding
    return six.ensure_str(s)
  File "/home/simon/work/mozilla-unified/third_party/python/six/six.py", line 899, in ensure_str
    s = s.decode(encoding, errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 266: invalid continuation byte

Sorry, we're a little short staffed at the moment so I'm not sure when I'll be able to get to this.

Priority: P2 → P3
Assignee: sclements → nobody
Status: ASSIGNED → NEW
Component: Treeherder: Log Parsing & Classification → TreeHerder
You need to log in before you can comment on or make changes to this bug.