Perma [tier2] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 17243: character maps to <undefined>
Categories
(Firefox Build System :: General, defect, P5)
Tracking
(firefox102 fixed)
Tracking | Status | |
---|---|---|
firefox102 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: gsvelto)
Details
(Keywords: intermittent-failure)
Attachments
(3 files)
Filed by: apavel [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=376043395&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Gpg5o8PVQ5G4owyfpWX1KQ/runs/0/artifacts/public/logs/live_backing.log
[task 2022-04-28T00:03:37.928Z] DEBUG:chardet.charsetprober:windows-1255 Hebrew confidence = 0.0
[task 2022-04-28T00:03:37.928Z] DEBUG:chardet.charsetprober:windows-1255 Hebrew confidence = 0.0
[task 2022-04-28T00:03:37.930Z] Traceback (most recent call last):
[task 2022-04-28T00:03:37.930Z] File "/builds/worker/symsrv-fetch.py", line 528, in <module>
[task 2022-04-28T00:03:37.930Z] main()
[task 2022-04-28T00:03:37.930Z] File "/builds/worker/symsrv-fetch.py", line 489, in main
[task 2022-04-28T00:03:37.930Z] args.missing_symbols
[task 2022-04-28T00:03:37.930Z] File "/builds/worker/symsrv-fetch.py", line 448, in get_base_data
[task 2022-04-28T00:03:37.930Z] return asyncio.run(helper(url))
[task 2022-04-28T00:03:37.930Z] File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
[task 2022-04-28T00:03:37.930Z] return loop.run_until_complete(main)
[task 2022-04-28T00:03:37.930Z] File "/usr/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
[task 2022-04-28T00:03:37.930Z] return future.result()
[task 2022-04-28T00:03:37.930Z] File "/builds/worker/symsrv-fetch.py", line 445, in helper
[task 2022-04-28T00:03:37.930Z] get_skiplist(),
[task 2022-04-28T00:03:37.930Z] File "/builds/worker/symsrv-fetch.py", line 152, in fetch_missing_symbols
[task 2022-04-28T00:03:37.930Z] data = await resp.text()
[task 2022-04-28T00:03:37.930Z] File "/usr/local/lib/python3.7/dist-packages/aiohttp/client_reqrep.py", line 1014, in text
[task 2022-04-28T00:03:37.930Z] return self._body.decode(encoding, errors=errors) # type: ignore
[task 2022-04-28T00:03:37.930Z] File "/usr/lib/python3.7/encodings/cp1254.py", line 15, in decode
[task 2022-04-28T00:03:37.930Z] return codecs.charmap_decode(input,errors,decoding_table)
[task 2022-04-28T00:03:37.930Z] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 17243: character maps to <undefined>
[taskcluster 2022-04-28 00:03:38.297Z] === Task Finished ===
[taskcluster 2022-04-28 00:03:38.422Z] Unsuccessful task run with exit code: 1 completed in 27.78 seconds
Comment 1•2 years ago
|
||
Started on this merge: https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=139c89a60b7261a619deb3cb40a997cc6a295ec0
Jonathan, is this a regression of https://hg.mozilla.org/mozilla-central/rev/b149e68aac0735d254f350a4f291f8c160e64c30?
Comment 2•2 years ago
|
||
It's a task that gets data from Microsoft servers. Chances are some new data there is not supported by the underlying script, and no change in the tree is responsible for the failure itself.
Assignee | ||
Comment 3•2 years ago
|
||
This should be a trivial fix. The resp.text()
call is not given an encoding and there must not be one in the requests' header so it's probing the encoding and failing. I don't reproduce the issue locally which means we might be hitting a bug in the version of Python we have in automation. Either way I'm 99.99% sure that particular bit is UTF-8 so we can just pass that to the resp.text()
call and skip the probing.
Assignee | ||
Updated•2 years ago
|
Comment 4•2 years ago
|
||
Looking at the log lines just before the failure, one odd thing I noticed is:
[task 2022-04-28T00:03:37.927Z] DEBUG:chardet.charsetprober:utf-8 not active
which I guess explains why the prober ends up settling for ISO-8859-9 Turkish as the best match; the "correct" encoding wasn't ever considered. So decoding then dies on the byte 0x8d
, which is indeed undefined in that codepage (whereas it was presumably supposed to be a UTF-8 lead byte).
Assignee | ||
Comment 5•2 years ago
|
||
Assignee | ||
Comment 6•2 years ago
|
||
This also adjusts the scripts used to create the Docker image and to
run the task.
Depends on D145059
Assignee | ||
Comment 7•2 years ago
|
||
Depends on D145060
Assignee | ||
Comment 8•2 years ago
|
||
This was supposed to be a one-line patch but as I tested it on try I noticed that other stuff was broken/bitrotted and needed a good overhaul.
Comment hidden (Intermittent Failures Robot) |
Comment 10•2 years ago
|
||
Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6b2300d31cd7 Always decode the missing symbol list as UTF-8 r=glandium https://hg.mozilla.org/integration/autoland/rev/0ad672d9d6d7 Update the Window system symbols scraper's requirements r=glandium https://hg.mozilla.org/integration/autoland/rev/052854773b2a Remove all rejected words from the Windows system symbols scraper r=glandium
Comment 11•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/6b2300d31cd7
https://hg.mozilla.org/mozilla-central/rev/0ad672d9d6d7
https://hg.mozilla.org/mozilla-central/rev/052854773b2a
Description
•