Open Bug 1742129 Opened 4 years ago Updated 4 years ago

treeherder prototype task ingestion backlogged

Categories

(Cloud Services :: Operations: Releng, defect)

defect

Tracking

(Not tracked)

People

(Reporter: aryx, Unassigned)

Details

Prototypes task ingestion is lagging, tasks show up with a 40+ minutes delay (the taskcluster page will show the real-time status).

There was an issue with parsing logs or storing the results of that, might this have slowed the process down?

The treeherder-store-pulse-data task is the one at max load: https://console.cloud.google.com/kubernetes/deployment/us-west1/treeherder-nonprod-v1/prototype-treeherder/treeherder-store-pulse-data/overview?project=moz-fx-treeherde-nonprod-34ec

Summary: prototype task ingestion backlogged → treeherder prototype task ingestion backlogged

cChris, can we try one of the proposed solutions you had mentioned in the Treeherder meeting two weeks ago?

Flags: needinfo?(cvalaas)

[For prototype:] I just scaled the treeherder-log-parser deployment to two pods. That will last until the next prototype deploy (although I can make it permanent if things look good).
I didn't rescale treeherder-store-pulse-data, because it's not currently maxxed out (sorry I took so long to get to this) and I didn't want to change too many things at once. But I can scale it to two pods if we want.

Flags: needinfo?(cvalaas)
You need to log in before you can comment on or make changes to this bug.