Closed Bug 1605180 Opened 4 years ago Closed 4 years ago

Perma HTTPError: 504 Server Error: Gateway Time-out for url: http://activedata.allizom.org/query

Categories

(Testing Graveyard :: ActiveData, defect, P5)

Version 3
defect

Tracking

(firefox73 fixed)

RESOLVED FIXED
mozilla73
Tracking Status
firefox73 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gbrown)

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Filed by: ccoroiu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=281873708&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/TXHgxBEyQe-Vgc5TT3P0fQ/runs/0/artifacts/public/logs/live_backing.log


[task 2019-12-19T10:09:54.692Z] AttributeError: 'LogRecord' object has no attribute 'params'
[task 2019-12-19T10:09:54.693Z] Logged from file connectionpool.py, line 207
[task 2019-12-19T10:10:52.162Z] Traceback (most recent call last):
[task 2019-12-19T10:10:52.164Z] File "/builds/worker/checkouts/gecko/testing/testinfo.py", line 74, in run
[task 2019-12-19T10:10:52.175Z] self.response = self.ti.activedata_query(self.query)
[task 2019-12-19T10:10:52.176Z] File "/builds/worker/checkouts/gecko/testing/testinfo.py", line 53, in activedata_query
[task 2019-12-19T10:10:52.176Z] response.raise_for_status()
[task 2019-12-19T10:10:52.176Z] File "/builds/worker/checkouts/gecko/third_party/python/requests/requests/models.py", line 840, in raise_for_status
[task 2019-12-19T10:10:52.374Z] raise HTTPError(http_error_msg, response=self)
[task 2019-12-19T10:10:52.434Z] HTTPError: 504 Server Error: Gateway Time-out for url: http://activedata.allizom.org/query
[task 2019-12-19T10:10:52.450Z] Traceback (most recent call last):
[task 2019-12-19T10:10:52.526Z] File "/usr/lib/python2.7/logging/init.py", line 859, in emit
[task 2019-12-19T10:10:52.582Z] msg = self.format(record)
[task 2019-12-19T10:10:52.602Z] File "/usr/lib/python2.7/logging/init.py", line 732, in format
[task 2019-12-19T10:10:52.654Z] return fmt.format(record)
[task 2019-12-19T10:10:52.709Z] File "/builds/worker/checkouts/gecko/python/mach/mach/logging.py", line 71, in format
[task 2019-12-19T10:10:52.709Z] f = record.msg.format(**record.params)
[task 2019-12-19T10:10:52.710Z] AttributeError: 'LogRecord' object has no attribute 'params'
[task 2019-12-19T10:10:52.710Z] Logged from file connectionpool.py, line 207
[task 2019-12-19T10:11:03.438Z] Traceback (most recent call last):
[task 2019-12-19T10:11:03.438Z] File "/usr/lib/python2.7/logging/init.py", line 859, in emit
[task 2019-12-19T10:11:03.439Z] msg = self.format(record)
[task 2019-12-19T10:11:03.439Z] File "/usr/lib/python2.7/logging/init.py", line 732, in format
[task 2019-12-19T10:11:03.443Z] return fmt.format(record)
[task 2019-12-19T10:11:03.443Z] File "/builds/worker/checkouts/gecko/python/mach/mach/logging.py", line 71, in format
[task 2019-12-19T10:11:03.443Z] f = record.msg.format(**record.params)
[task 2019-12-19T10:11:03.443Z] AttributeError: 'LogRecord' object has no attribute 'params'
[task 2019-12-19T10:11:03.443Z] Logged from file connectionpool.py, line 207
[task 2019-12-19T10:11:22.975Z] Traceback (most recent call last):
[task 2019-12-19T10:11:22.976Z] File "/usr/lib/python2.7/logging/init.py", line 859, in emit
[task 2019-12-19T10:11:22.983Z] msg = self.format(record)
[task 2019-12-19T10:11:22.983Z] File "/usr/lib/python2.7/logging/init.py", line 732, in format
[task 2019-12-19T10:11:22.983Z] return fmt.format(record)
[task 2019-12-19T10:11:22.983Z] File "/builds/worker/checkouts/gecko/python/mach/mach/logging.py", line 71, in format
[task 2019-12-19T10:11:22.983Z] f = record.msg.format(**record.params)

Summary: Intermittent HTTPError: 504 Server Error: Gateway Time-out for url: http://activedata.allizom.org/query → Perma HTTPError: 504 Server Error: Gateway Time-out for url: http://activedata.allizom.org/query
Assignee: klahnakoski → gbrown

Occasional 504s are anticipated by test-info: It logs the HTTPError and retries; once retries are exhausted it still doesn't fail the job, just warns that the ActiveData data may not be complete.

In the failures reported against this bug, the task failed because it ran out of time -- it hit the task 5400 second (90 minute) timeout. But treeherder's log parser noted the HTTPErrors, and there are so many treeherder-recognized errors, it never got around to even seeing the task timeout.

Why did these requests take so long (test-info-all was completing in about 30 minutes total in earlier pushes)? Is the task timeout reasonable? How can logging be improved?

I understand better now. There are a few HTTPErrors reported, but nothing serious. Most of the time is spent retrying the reftest queries. Until bug 1596567 is fixed, we expect reftest queries to have trouble matching test names; still, the queries had been matching some records -- today they are not. Possibly the matching data simply aged out? I won't worry about that difference much. I think the right approach is to not retry when a query succeeds but returns no records -- that should save lots of time.

When an ActiveData query returns HTTP 200 but no data, treat that as a permanent condition:
Do not retry. I expect that will be the case for some test-info queries for reftests
until reftest test names are normalized.

Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fbe6cfae6057
In test-info, avoid ActiveData retries when no matching data; r=jmaher
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla73
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: