Closed Bug 1589202 Opened 5 years ago Closed 5 years ago

treeherder currently misses jobs (at least for a frequently failing machine)

Categories

(Tree Management :: Treeherder: Data Ingestion, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: aryx, Unassigned)

Details

Summary: treeherder currently misses jobs (at least for a frequent failure) → treeherder currently misses jobs (at least for a frequently failing machine)

Treeherder knows about the jobs but because it regards them as still running, they have the start of the unix epoch as the time stamp.

I mentioned this to :Aryx in IRC but recording it here too.

All these failures happened on physical Android devices running in bitbar. Each of these devices is controlled by a device host that acts as the intermediary between Taskcluster and the device, similar to a foopy for anyone old enough to remember those. The device host introduces another layer that could be intercepting or failing to perpetuate status.

If the devices are failing tests in rapid succession, it's possible the device host can't keep up.

Sounds like the failed task result isn't getting through to Treeherder. Either it wasn't sent to us, or we somehow dropped it during ingestion.

Armenzg: Is this something that would fall into your domain to investigate?
Aryx: Do you have a more recent example of this happening? The original links are now pointing to the TC instance that was shut down Nov 9th.

Flags: needinfo?(armenzg)
Flags: needinfo?(aryx.bugmail)

Couldn't find a recent occurrence (looked at those Android jobs from comment 0).

Flags: needinfo?(aryx.bugmail)

This is harder to look at since all the links are broken (post-migration).

If this happens again, the ingestion can be tested locally like this:

 # One tab
docker-compose up
# Another tab
docker-compose run backend bash
./manage.py ingest_push_and_tasks task --task-id <id>

Please re-open if you have post-migration examples

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(armenzg)
Resolution: --- → INCOMPLETE

FYI I also fixed bug 1595902 not long ago which I believe this could be related to.

You need to log in before you can comment on or make changes to this bug.