Taskcluster not providing blobber-uploaded files to treeherder

RESOLVED FIXED

Status

P3
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: jgraham, Assigned: garndt)

Tracking

Details

(Whiteboard: [feature])

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
Test jobs upload files to blobber that are then used either by developers or other tools. For an example of these files, see the "Job Detail" panel for [1]. Of particular importance is the errorsummary file, which contains a machine-readable list of test failures and other errors, and is the basis for the autoclassification feature.

Taskcluster jobs are currently not providing treeherder with the information required to locate these uploaded files, and as a result features inclusing autoclassification do not work with taskcluster jobs.

[1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&selectedJob=3870022
(Reporter)

Comment 2

2 years ago
And the input to that code is generated by the TinderboxPrintParser: https://github.com/mozilla/treeherder/blob/master/treeherder/log_parser/parsers.py#L278
(Reporter)

Comment 3

2 years ago
wlach pointed out that the TinderboxPrintParser stuff isn't actually used as input to that function, but is responsible for the links in the treeherder UI.
(Assignee)

Comment 4

2 years ago
So looking at the sample job, and others I found on treeherder, there is a set of artifact information that are uploaded for jobs.  Is this only for some of the artifacts produced by a job or for any artifact?

How are these artifacts submitted to Treeherder? Are they just artifacts in the job details piece of the "Job Info" message with a content_type of link? http://treeherder.readthedocs.io/submitting_data.html#job-artifacts-format


If so, when posting the completion of a job to treeherder, we could get a list of all artifacts for a task run and post them within the job message.
(In reply to Greg Arndt [:garndt] from comment #4)
> So looking at the sample job, and others I found on treeherder, there is a
> set of artifact information that are uploaded for jobs.  Is this only for
> some of the artifacts produced by a job or for any artifact?

They aren't actually related to artifacts at all, they're part of the log_references property of jobs that you submit. See this link that jgraham posted in comment 2.

(to simplify things and reduce confusion, I think we should just kill the TinderboxPrints that print out that information, and put the extra log references in the job details panel... I'll file a bug for this later)
(Reporter)

Comment 6

2 years ago
Note that it's not just extra log references, it's everything in the blobber_files property. At the moment treeherder does the needed magic to convert a _errorsummary file to work like a log_url. I don't particularly mind if taskcluster instead wants to put the errorsummary file in with the other logs, as long as we don't end up double-counting each file. And we also want to ensure that we get all the other non-log blobber files since they are used by developers.
Triaging as "feature" to complete in Q2. I realize that this is for parity with buildbot. For our project, it's feature work in that someone (ahem, garndt) is going to have to dig into the parsing and create something new to resolve it.
Priority: P1 → P3
Whiteboard: [feature]
Greg - is this something that we can look at after the tc-treeherder service deployment? Could someone other than you have a look?
Flags: needinfo?(garndt)
(Assignee)

Comment 9

2 years ago
I made some changes to taskcluster-treeherder staging to show uploaded artifacts.  Is this what we're after here?

Click the job details panel for this job:

https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&selectedJob=28662952

Comment 10

2 years ago
It looks good to me.
Passing NI to filer:
http://people.mozilla.org/~armenzg/sattap/9e28eb50.png
Flags: needinfo?(garndt) → needinfo?(james)
(Assignee)

Comment 11

2 years ago
Right now the artifact name/path displayed is limited to 50 characters but there will be a patch soon for treeherder that will increase this to 125.  I know we have some artifact names+path that are longer than 50 characters.

Taskcluster artifact names are usually a path such as "public/test_info/resource-usage.json".  Should these display that way or do we just want "resource-usage.json" to appear.  

If having just the filename is ideal, there might be collisions in treeherder as they are going to be put uniqueness constraints on the displayed text such that if two artifacts have the same name, and we are only displaying the base file name rather than full path, only the latest url will be stored regardless if one artifact is path/to/artifact/artifact.json and the other is different/path/artifact.json.
(Reporter)

Comment 12

2 years ago
It's not clear to me if this is the complete change, because I don't know if this puts the data through the same ingestion pipeline as buildbot artifacts. In particular if we don't go through the code at [1] then further changes are needed to make TC work with autoclassification, albeit possibly not changes that the TC team need to make.

[1] https://github.com/mozilla/treeherder/blob/b040bb40455e2baac403813c882d143711262b49/treeherder/etl/buildapi.py#L169
Flags: needinfo?(james) → needinfo?(cdawson)
James-- It won't go through that code.  The paths start being the same at ``store_jobs``.  So, ``job_loader`` would need your modification for blobber files special handling.  Or perhaps create an abstracted function they can both share.
Flags: needinfo?(cdawson)
(Assignee)

Comment 14

2 years ago
I have updated the PR to use the filename as the link test rather than the whole artifact path.  There is a limit on the TH side of 125 characters for the linkText.

Artifacts will now appear as:
https://treeherder.allizom.org/#/jobs?repo=try&selectedJob=24293578

For artifacts that have the same file name (but live as different artifact paths), they will display with an incrementing number.  This is because if treeherder encounters a link that has the same label ("artifact uploaded") and same link text (such as the filename) then the most recent URL recorded will win resulting in all of the links having the same URL.  The incrementing number helps mitigate that.
(Assignee)

Comment 15

2 years ago
Created attachment 8770377 [details] [review]
taskcluster-treeherder PR 29
Attachment #8770377 - Flags: review?(cdawson)

Updated

2 years ago
Attachment #8770377 - Flags: review?(cdawson) → review+
(Assignee)

Comment 16

2 years ago
This has been merged into prod and deployed.
Assignee: nobody → garndt
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
(Reporter)

Updated

2 years ago
Blocks: 1294149
You need to log in before you can comment on or make changes to this bug.