Closed Bug 1769013 Opened 3 years ago Closed 3 years ago

Permanent "failure" lines for telemetry tests (but task exits successfully) json.decoder.JSONDecodeError: Invalid control character at: line 1 column 7541 (char 7540) - DO NOT USE FOR CLASSIFICATION

Categories

(Toolkit :: Telemetry, defect, P5)

defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox-esr91 --- unaffected
firefox100 --- unaffected
firefox101 --- unaffected
firefox102 --- affected

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Filed by: archaeopteryx [at] coole-files.de
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=377787142&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/A9mvwwGTRDC5drFEKSrpqw/runs/0/artifacts/public/logs/live_backing.log


There are several instances of this logged for the telemetry tests since pingsender2 got reenabled. The test suite still passes without errors reported.

[task 2022-05-12T08:48:04.302Z] 08:48:04     INFO -  1652345284302	Toolkit.Telemetry	TRACE	TelemetryStorage::_popAndPerformQueuedOperation - Performing queued operation.
[task 2022-05-12T08:48:04.334Z] 08:48:04     INFO -  *** You are running in background task mode. ***
[task 2022-05-12T08:48:04.334Z] 08:48:04     INFO -  *** You are running in headless mode.
[task 2022-05-12T08:48:04.363Z] 08:48:04     INFO -  [2022-05-12T08:48:04Z ERROR xulstore::persist] removeDocument error: unavailable
[task 2022-05-12T08:48:04.439Z] 08:48:04     INFO -  console.log: "pingsender: sending /builds/worker/workspace/build/tmprol0w9wb.mozrunner/saved-telemetry-pings/d81d32ff-fc20-4476-8005-90fca393aef7 to http://localhost:8000/pings/submit/telemetry/d81d32ff-fc20-4476-8005-90fca393aef7/event/Firefox/102.0a1/default/20220510225537?v=4"
[task 2022-05-12T08:48:04.450Z] 08:48:04     INFO -  console.log: "pingsender: sending /builds/worker/workspace/build/tmprol0w9wb.mozrunner/saved-telemetry-pings/121ead59-546a-4fbc-9308-e7b0ac3cde67 to http://localhost:8000/pings/submit/telemetry/121ead59-546a-4fbc-9308-e7b0ac3cde67/main/Firefox/102.0a1/default/20220510225537?v=4"
[task 2022-05-12T08:48:04.529Z] 08:48:04     INFO -  HTTPException in handler: 500
[task 2022-05-12T08:48:04.536Z] 08:48:04     INFO -  Traceback (most recent call last):
[task 2022-05-12T08:48:04.537Z] 08:48:04     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/wptserve/handlers.py", line 359, in __call__
[task 2022-05-12T08:48:04.538Z] 08:48:04     INFO -      rv = self.func(request, response)
[task 2022-05-12T08:48:04.538Z] 08:48:04     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/telemetry_harness/ping_server.py", line 36, in pings_handler
[task 2022-05-12T08:48:04.538Z] 08:48:04     INFO -      ping_data = json.loads(request_data)
[task 2022-05-12T08:48:04.539Z] 08:48:04     INFO -    File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
[task 2022-05-12T08:48:04.539Z] 08:48:04     INFO -      return _default_decoder.decode(s)
[task 2022-05-12T08:48:04.539Z] 08:48:04     INFO -    File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
[task 2022-05-12T08:48:04.540Z] 08:48:04     INFO -      obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[task 2022-05-12T08:48:04.540Z] 08:48:04     INFO -    File "/usr/lib/python3.6/json/decoder.py", line 355, in raw_decode
[task 2022-05-12T08:48:04.540Z] 08:48:04     INFO -      obj, end = self.scan_once(s, idx)
[task 2022-05-12T08:48:04.540Z] 08:48:04     INFO -  json.decoder.JSONDecodeError: Invalid control character at: line 1 column 7541 (char 7540)
[task 2022-05-12T08:48:04.541Z] 08:48:04     INFO -  During handling of the above exception, another exception occurred:
[task 2022-05-12T08:48:04.542Z] 08:48:04     INFO -  Traceback (most recent call last):
[task 2022-05-12T08:48:04.542Z] 08:48:04     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/wptserve/server.py", line 287, in finish_handling
[task 2022-05-12T08:48:04.542Z] 08:48:04     INFO -      handler(request, response)
[task 2022-05-12T08:48:04.542Z] 08:48:04     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/wptserve/handlers.py", line 364, in __call__
[task 2022-05-12T08:48:04.543Z] 08:48:04     INFO -      raise HTTPException(500, message=msg)
[task 2022-05-12T08:48:04.543Z] 08:48:04     INFO -  wptserve.utils.HTTPException: 500

chutten: this strongly suggests that pingsender2 is sending manifestly different bytes than are sent by pingsender1. I do see that the background task slurps the ping as UTF-8 here where-as the standalone binary slurps bytes and appends them to a C++ string here. My guess is that this doesn't matter to the actual Telemetry pipeline, but I do see from the spec that yes, JSON is UTF-8, but no, it MUST NOT have unescaped control characters in it.

I will try to reproduce this locally; perhaps we can make pingsender2 behave just like pingsender1 and address this issue. But perhaps you have seen issues like this, or can comment on the ingestion parts? Are there other considerations at play here that I'm not aware of? Is this a weakness in the existing test harness: should this fail outright?

Flags: needinfo?(nalexander) → needinfo?(chutten)

The only other time I've seen this was bug 1736524 which is already in the dep tree and was fixed by switching back to pingsender1. (so may not be particularly helpful)

As for ingestion, I can't speak to specifics there. Per the docs the edge probably won't reject it, but subsequent steps that decode the payload may decide to dump it into the error stream. For more details about what could go wrong and where, I'd ask on Slack#data-help or maybe bring in :relud or :whd.

Flags: needinfo?(chutten)
See Also: → 1736524
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.