Closed Bug 1166958 Opened 9 years ago Closed 9 years ago

Log parser errors with some taskcluster jobs

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1165335

People

(Reporter: wlach, Unassigned)

Details

Getting some errors parsing taskcluster jobs:

[2015-05-20 15:13:04,117: ERROR/Worker-3] Failed to download/parse log for try c57b6e95-de44-4ca6-b4cf-c175a7cb1a96/0 (https://queue.taskcluster.net/v1/task/xXtuld5ETKa0z8F1p8salg/runs/0/artifacts/public/logs/live_backing.log): 'ascii' codec can't encode character u'\u2026' in position 23: ordinal not in range(128)

It's not clear to me what's happening. When I manually try to see what the result for parsing the log is locally, I get this:

(venv)vagrant@local:~/treeherder$ ./manage.py test_parse_log https://queue.taskcluster.net/v1/task/xXtuld5ETKa0z8F1p8salg/runs/0/artifacts/public/logs/live_backing.log
text_log_summary, {"header": {}, "step_data": {"all_errors": [{"line": "21:51:41     INFO -    AssertionError: \"Saving\u2026\" === \"Bubble\"", "linenumber": 5569}], "steps": [{"errors": [], "name": "Checkout mozharness", "started": "2015-05-20 21:25:25.611597", "started_linenumber": 14, "finished_linenumber": 36, "finished": "2015-05-20 21:25:28.890186", "error_count": 0, "duration": 3, "order": 0, "result": "success"}, {"errors": [{"line": "21:51:41     INFO -    AssertionError: \"Saving\u2026\" === \"Bubble\"", "linenumber": 5569}], "name": "Running tests", "started": "2015-05-20 21:25:28.896866", "started_linenumber": 38, "finished_linenumber": 5647, "finished": "2015-05-20 21:57:47.335691", "error_count": 1, "duration": 1938, "order": 1, "result": "success"}], "errors_truncated": false}, "logurl": "https://queue.taskcluster.net/v1/task/xXtuld5ETKa0z8F1p8salg/runs/0/artifacts/public/logs/live_backing.log"}
Job Info, {"job_details": [{"value": "gaia-integration-tests: 51/0/0\r", "content_type": "raw_html"}], "logurl": "https://queue.taskcluster.net/v1/task/xXtuld5ETKa0z8F1p8salg/runs/0/artifacts/public/logs/live_backing.log"}
(venv)vagrant@local:~/treeherder$ 

This looks pretty ok? Is taskcluster notifying us about a log file before it's actually ready? Needinfo'ing :jonasfj and :lightsofapollo on the off chance they might know what's going on?
Flags: needinfo?(jopsen)
Flags: needinfo?(jlal)
> This looks pretty ok? Is taskcluster notifying us about a log file before it's actually ready? 
This is possible... But don't you think your error message would say something other than:
> [2015-05-20 15:13:04,117: ERROR/Worker-3] Failed to download/parse log for try c57b6e95-de44-4ca6-b4cf-c175a7cb1a96/0
> (https://queue.taskcluster.net/v1/task/xXtuld5ETKa0z8F1p8salg/runs/0/artifacts/public/logs/live_backing.log):
> 'ascii' codec can't encode character u'\u2026' in position 23: ordinal not in range(128)

From looking at the call sites for createLogReferences() in:
https://github.com/taskcluster/mozilla-taskcluster/blob/61be8b9653f547f9cc8aaae1216369f5810e1406/src/treeherder/job_handler.js#L107-L117
It seems to me that logs are reported after task resolution where they are guaranteed to be present,
well, there can be corner cases where they are not present. But we have a bug for changing upload flow
to ensure artifacts are present.

Either way, "'ascii' codec can't encode character u'\u2026' in position 23: ordinal not in range(128)"
Does not spell HTTP GET error to me. Note, S3 would return 403 or 404 for missing objects (probably 403).
Flags: needinfo?(jopsen)
Flags: needinfo?(jlal)
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #1)

> Either way, "'ascii' codec can't encode character u'\u2026' in position 23:
> ordinal not in range(128)"
> Does not spell HTTP GET error to me. Note, S3 would return 403 or 404 for
> missing objects (probably 403).

Yes, I agree this doesn't sound like a 404. But it does seem to suggest that perhaps the content changed between the time we got the error (when it triggered the above exception) and I ran the log parsing test (when everything seems fine). Any idea what might explain that?
> Any idea what might explain that?
Nope, the backing log is uploaded to S3, it can't be overwritten, so this is not affected by live logging
which is done under the artifact name live.log.

So our logs do contain terminal escape codes, that could be making it "fun".
But an error like: "'ascii' codec can't encode character u'\u2026'" smells like python UTF-8 issue.
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #3)
> > Any idea what might explain that?
> Nope, the backing log is uploaded to S3, it can't be overwritten, so this is
> not affected by live logging
> which is done under the artifact name live.log.
> 
> So our logs do contain terminal escape codes, that could be making it "fun".
> But an error like: "'ascii' codec can't encode character u'\u2026'" smells
> like python UTF-8 issue.

Indeed it does, but it's so weird that this doesn't happen when we do a parse on the command line later... more investigation required, apparently.
Giving this bug a Pri, so it doesn't show up in the triage list.
OS: Unspecified → All
Priority: -- → P2
Hardware: Unspecified → All
Pretty sure this was bug 1165335, which is now fixed.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.