Closed Bug 1766753 Opened 2 years ago Closed 2 years ago

Perma [tier2] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 17243: character maps to <undefined>

Categories

(Firefox Build System :: General, defect, P5)

defect

Tracking

(firefox102 fixed)

RESOLVED FIXED
102 Branch
Tracking Status
firefox102 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gsvelto)

Details

(Keywords: intermittent-failure)

Attachments

(3 files)

Filed by: apavel [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=376043395&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Gpg5o8PVQ5G4owyfpWX1KQ/runs/0/artifacts/public/logs/live_backing.log


[task 2022-04-28T00:03:37.928Z] DEBUG:chardet.charsetprober:windows-1255 Hebrew confidence = 0.0
[task 2022-04-28T00:03:37.928Z] DEBUG:chardet.charsetprober:windows-1255 Hebrew confidence = 0.0
[task 2022-04-28T00:03:37.930Z] Traceback (most recent call last):
[task 2022-04-28T00:03:37.930Z]   File "/builds/worker/symsrv-fetch.py", line 528, in <module>
[task 2022-04-28T00:03:37.930Z]     main()
[task 2022-04-28T00:03:37.930Z]   File "/builds/worker/symsrv-fetch.py", line 489, in main
[task 2022-04-28T00:03:37.930Z]     args.missing_symbols
[task 2022-04-28T00:03:37.930Z]   File "/builds/worker/symsrv-fetch.py", line 448, in get_base_data
[task 2022-04-28T00:03:37.930Z]     return asyncio.run(helper(url))
[task 2022-04-28T00:03:37.930Z]   File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
[task 2022-04-28T00:03:37.930Z]     return loop.run_until_complete(main)
[task 2022-04-28T00:03:37.930Z]   File "/usr/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
[task 2022-04-28T00:03:37.930Z]     return future.result()
[task 2022-04-28T00:03:37.930Z]   File "/builds/worker/symsrv-fetch.py", line 445, in helper
[task 2022-04-28T00:03:37.930Z]     get_skiplist(),
[task 2022-04-28T00:03:37.930Z]   File "/builds/worker/symsrv-fetch.py", line 152, in fetch_missing_symbols
[task 2022-04-28T00:03:37.930Z]     data = await resp.text()
[task 2022-04-28T00:03:37.930Z]   File "/usr/local/lib/python3.7/dist-packages/aiohttp/client_reqrep.py", line 1014, in text
[task 2022-04-28T00:03:37.930Z]     return self._body.decode(encoding, errors=errors)  # type: ignore
[task 2022-04-28T00:03:37.930Z]   File "/usr/lib/python3.7/encodings/cp1254.py", line 15, in decode
[task 2022-04-28T00:03:37.930Z]     return codecs.charmap_decode(input,errors,decoding_table)
[task 2022-04-28T00:03:37.930Z] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 17243: character maps to <undefined>
[taskcluster 2022-04-28 00:03:38.297Z] === Task Finished ===
[taskcluster 2022-04-28 00:03:38.422Z] Unsuccessful task run with exit code: 1 completed in 27.78 seconds

It's a task that gets data from Microsoft servers. Chances are some new data there is not supported by the underlying script, and no change in the tree is responsible for the failure itself.

Flags: needinfo?(jfkthame) → needinfo?(gsvelto)

This should be a trivial fix. The resp.text() call is not given an encoding and there must not be one in the requests' header so it's probing the encoding and failing. I don't reproduce the issue locally which means we might be hitting a bug in the version of Python we have in automation. Either way I'm 99.99% sure that particular bit is UTF-8 so we can just pass that to the resp.text() call and skip the probing.

Flags: needinfo?(gsvelto)
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED

Looking at the log lines just before the failure, one odd thing I noticed is:

[task 2022-04-28T00:03:37.927Z] DEBUG:chardet.charsetprober:utf-8 not active

which I guess explains why the prober ends up settling for ISO-8859-9 Turkish as the best match; the "correct" encoding wasn't ever considered. So decoding then dies on the byte 0x8d, which is indeed undefined in that codepage (whereas it was presumably supposed to be a UTF-8 lead byte).

This also adjusts the scripts used to create the Docker image and to
run the task.

Depends on D145059

This was supposed to be a one-line patch but as I tested it on try I noticed that other stuff was broken/bitrotted and needed a good overhaul.

Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6b2300d31cd7
Always decode the missing symbol list as UTF-8 r=glandium
https://hg.mozilla.org/integration/autoland/rev/0ad672d9d6d7
Update the Window system symbols scraper's requirements r=glandium
https://hg.mozilla.org/integration/autoland/rev/052854773b2a
Remove all rejected words from the Windows system symbols scraper r=glandium
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 102 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: