1603249 - Do not store "artifact uploaded" into JobDetail table

Reporter

Description

•

5 years ago

•

This is one of the pieces of data ingestion that makes the Treeherder database the most write intensive. Every single task that we ingest has multiple artifacts uploaded and this adds up quite quickly.

This is where the code is first added into the pipeline:
https://github.com/mozilla/treeherder/blob/610fc5082615cebb9c15d19c838560b77cff732d/treeherder/etl/taskcluster_pulse/handler.py#L346

We can change the API to fetch the artifacts for the task from Taskcluster and return the artifacts as part of the API. In a sense, we want to make the change to look like a no-op for consumers of the API. I'm aware that mozscreenshots queries this API.

This is a sample TC API:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/aNQOqM0HQwC9eK612X7nQQ/runs/0/artifacts

Sample entry:

      {
            "job_id": 280744315,
            "job_guid": "14be8c02-4387-402f-a59c-ae17f3d4d1ee/0",
            "title": "artifact uploaded",
            "value": "live_backing.log",
            "url": "https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/FL6MAkOHQC-lnK4X89TR7g/runs/0/artifacts/public/logs/live_backing.log"
        },

Armen [:armenzg]

Reporter

Updated

•

5 years ago

Depends on: 1603261

Armen [:armenzg]

Reporter

Comment 1

•

5 years ago

~65% of the last 10M rows is TC artifacts: https://sql.telemetry.mozilla.org/queries/67114#169999

Kyle Lahnakoski [:ekyle]

Comment 2

•

5 years ago

The writes can be decreased with a transaction in the log parser. Armen, may you point me to code? I will make a bug

Flags: needinfo?(armenzg)

Armen [:armenzg]

Reporter

Comment 3

•

5 years ago

Hi Kyle, I'm not entirely sure what you mean about the log parser.

I mentioned this code in the first comment. Is that what you're looking for?

Flags: needinfo?(armenzg)

Cameron Dawson [:camd]

Updated

•

5 years ago

Priority: -- → P3

Sarah Clements [:sclements]

Assignee

Updated

•

5 years ago

Assignee: nobody → sclements

Status: NEW → ASSIGNED

Priority: P3 → P1

Sarah Clements [:sclements]

Assignee

Comment 5

•

5 years ago

•

Edited

Per Armen's initial comment, mozscreenshots does indeed seem to be the primary external consumer of the JobDetail endpoint: https://papertrailapp.com/systems/treeherder-prod/events?q=%22Get%20%2Fapi%2Fjobdetail%22%20-%22https%3A%2F%2Ftreeherder.mozilla.org%2F%22

Sarah Clements [:sclements]

Assignee

Comment 6

•

5 years ago

•

Edited

I did some more research into this and we have everything we need in the UI to fetch the artifacts at the point they're needed directly rather than call JobDetails -> taskclusterAPI -> return results. I spoke with Matt Noorenberghe and he's willing to refactor mozscreenshots to query taskcluster directly rather than JobDetails. The Jobs API returns the retry_id(runId), task_id and we will have access to the root_url via repository.

So my thinking is to do this in stages. I can switch the UI to retrieve artifacts directly. Prep the backend code for when we stop ingesting artifacts and remove the JobDetails endoint (and meanwhile notify other users of its deprecation in x weeks). Then merge those backend changes once Matt has made changes in early-mid April. Does anyone have an objection to this idea?

Cameron Dawson [:camd]

Comment 7

•

5 years ago

This sounds great! :) Though we still want to process the logs for our own uses. So some of the "artifact uploaded" are still needed. Or at least we need to parse them one way or another.

Specific to Push Health is we need the "*_errorsummary.log" because it's parsed to create FailureLine objects, which Push Health relies upon.

But I like Sarah's plan. Modify UI first. Then wean us off ingesting the links for the artifacts. We might even consider switching some of them to be saved to a different or a few different tables, as the case may be. But I'll leave the minutiae up to you. :)

Armen [:armenzg]

Reporter

Comment 8

•

5 years ago

Thanks for reaching out to Matt!

This is a fine plan.

Sarah Clements [:sclements]

Assignee

Updated

•

5 years ago

Depends on: 1625033

Sarah Clements [:sclements]

Assignee

Updated

•

5 years ago

Comment 9

•

5 years ago

•

Edited

After chatting with Tom.Prince the other day, I've come into new information about what we store in the JobDetail table. In a nutshell, in addition to storing uploaded artifacts in the table we're also parsing log lines with the name "TinderboxPrint" and storing them in the table. Much of that content seems to be of questionable value and could be found by someone looking at the logs. I also found an old bug filed by Ed with a useful, if somewhat outdated, analysis.

So before I can proceed with the removal of the jobdetails endpoint completely we need to figure out if we should keep supporting that log parsing/storing of data. I'll reopen bug 1342296 as a meta (it was closed as invalid) and add an update since some of the information is no longer applicable.

I think the approach to take is to continue with the idea of deprecating the use of /jobdetail/ endpoint for uploaded artifact retrieval both in our UI and for mozscreenshots.

For the job details panel in TH, we'll need to still retrieve the job details that are not uploaded artifacts (from TinderboxPrint and anything else) until we decide what, if anything, we should store in JobDetail. But this will at least cut down on a large chunk of the writes to the table in the meantime.

Sarah Clements [:sclements]

Assignee

Updated

•

5 years ago

Blocks: 1342296

Comment 10

•

5 years ago

So before I can proceed with the removal of the jobdetails endpoint completely

I was not asking to remove the jobdetails endpoint or to stop storing anything related to TinderboxPrints.
My original request was not to store artifacts in the DB since it has a high write impact and we can instead list in the UI the artifacts from TC APIs or put a link to the TC UI that shows artifacts for a task.

I think your approach in the last two paragraphs makes sense.

Thanks Sarah!

Sarah Clements [:sclements]

Assignee

Comment 11

•

5 years ago

•

Edited

(In reply to Armen [:armenzg] from comment #10)

I was not asking to remove the jobdetails endpoint or to stop storing anything related to TinderboxPrints.
My original request was not to store artifacts in the DB since it has a high write impact and we can instead list in the UI the artifacts from TC APIs or put a link to the TC UI that shows artifacts for a task.

Yes, I know. But instead of only focusing on one aspect - the storage of uploaded artifacts - I think it's worth evaluating whether or not we should be storing the TinderboxPrint data in the table and whether we need to have the /jobdetails/ endpoint at all (I was a little premature in thinking we could do this right away, but now that I've done more research I still think it's worth considering). But like I said, that'll be a next step.

Sarah Clements [:sclements]

Assignee

Updated

•

5 years ago

Depends on: 1605426

Sarah Clements [:sclements]

Assignee

Updated

•

5 years ago

No longer depends on: 1625033

GitHub Bugzilla PR Linker

Comment 12

•

5 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/6242 — Details

Cameron Dawson [:camd]

Comment 13

•

5 years ago

(In reply to Sarah Clements [:sclements] from comment #11)

(In reply to Armen [:armenzg] from comment #10)

I was not asking to remove the jobdetails endpoint or to stop storing anything related to TinderboxPrints.
My original request was not to store artifacts in the DB since it has a high write impact and we can instead list in the UI the artifacts from TC APIs or put a link to the TC UI that shows artifacts for a task.

Yes, I know. But instead of only focusing on one aspect - the storage of uploaded artifacts - I think it's worth evaluating whether or not we should be storing the TinderboxPrint data in the table and whether we need to have the /jobdetails/ endpoint at all (I was a little premature in thinking we could do this right away, but now that I've done more research I still think it's worth considering). But like I said, that'll be a next step.

Yeah, I agree with you here, Sarah. wrt those TinderboxPrintlines, I wonder how we can determine if they give any value to folks. It's possible nobody cares about them. I remember some conversations we had with Ed way back when (as you mentioned). We could add a little badge in the ui next to the printlines of deprecated or something and see if someone files a bug asking to keep them. Or remove them and see if anybody screams. :D.

Joel Maher ( :jmaher ) (UTC -8)

Comment 14

•

5 years ago

can you hide them behind a + button- keep metrics whenever the + button is expanded to show the data? I would say 45 days of collecting data would be enough data to make an informed decision.

Sarah Clements [:sclements]

Assignee

Comment 15

•

5 years ago

•

Edited

(In reply to Cameron Dawson [:camd] from comment #13)

Yeah, I agree with you here, Sarah. wrt those TinderboxPrintlines, I wonder how we can determine if they give any value to folks. It's possible nobody cares about them. I remember some conversations we had with Ed way back when (as you mentioned). We could add a little badge in the ui next to the printlines of deprecated or something and see if someone files a bug asking to keep them. Or remove them and see if anybody screams. :D.

I was planning to send an email out on Monday to dev-platform mailing list about the plan to stop ingesting uploaded artifacts at (roughly) end of month. So, I could also mention that we are thinking of not processing those TinderboxPrint lines at a later date and solicit feedback on it. They'd still be in the logs for people to look at, but we wouldn't be parsing it to add to the JobDetail table. Maybe we'd get some feedback that way.

I know that Tom.Prince finds value in the "Built by... " urls parsed from TinderboxPrint and he suggested he could create a structured artifact - see bug 1625033. (I'm actually realizing I'm not clear on what that means, but could it then be retrieved from the taskcluster public artifacts API instead?)

can you hide them behind a + button- keep metrics whenever the + button is expanded to show the data? I would say 45 days of collecting data would be enough data to make an informed decision.

I don't think how they're displayed is the issue, just a question of if we're ingesting them and no one really looks at that data in the job details or log viewer panels.

Joel Maher ( :jmaher ) (UTC -8)

Comment 16

•

5 years ago

(In reply to Sarah Clements [:sclements] from comment #15)

can you hide them behind a + button- keep metrics whenever the + button is expanded to show the data? I would say 45 days of collecting data would be enough data to make an informed decision.

I don't think how they're displayed is the issue, just a question of if we're ingesting them and no one really looks at that data in the job details or log viewer panels.

My point wasn't to change our display, my point was to track when people intentionally view the data. If the data is displayed by default we have no way to track the usage. If the data was behind an API that users had to click to access, then we could easily track usage of the API.

Sarah Clements [:sclements]

Assignee

Comment 17

•

5 years ago

Ah, I see. That sounds like a good idea and would be a trivial change to make to the UI. But I think we only have logs going back 3 days in Papertrail (anything older is archived) so I'd probably have to create a script to process the logs if we wanted to track usage over several weeks. And then determine a threshold for how many queries per day determines whether it's useful enough to keep around.

Joel Maher ( :jmaher ) (UTC -8)

Comment 18

•

5 years ago

that starts to get complicated. Another approach is to determine what jobs have "tinderbox print" statements and ask the owners- depending on what has the info and what is in the info it could be a small set of people to confirm with.

Sarah Clements [:sclements]

Assignee

Comment 19

•

5 years ago

On second thought, New Relic might be able to provide some insight since it tracks throughput/requests per minute for a specific API over a period of time. Right now though, mozscreenshots is the primary external consumer so once it stops using that endpoint, it'll be easier to measure.

that starts to get complicated. Another approach is to determine what jobs have "tinderbox print" statements and ask the owners- depending on what has the info and what is in the info it could be a small set of people to confirm with.

Yes, and I have yet to look into that. I can however see on bug 1342296 that Ed had filed two bugs a while back about the TinderboxPrint lines. So perhaps its time to follow up with them if they are still using it.

Sarah Clements [:sclements]

Assignee

Comment 20

•

5 years ago

First pr for UI changes has been merged: https://github.com/mozilla/treeherder/commit/77bf3ab9a2d159f46b2d1c922e27471d1602aea8

Ionuț Goldan [:igoldan]

Updated

•

5 years ago

Comment 21

•

5 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/6349 — Details

Sarah Clements [:sclements]

Assignee

Comment 22

•

5 years ago

Merged: https://github.com/mozilla/treeherder/commit/d598ad45447ab03c324d88b259417c435a634e83

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

4 years ago

Regressions: 1670064

Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/6242 5 years ago GitHub Bugzilla PR Linker 47 bytes, text/x-github-pull-request		Details \| Review
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/6349 5 years ago GitHub Bugzilla PR Linker 47 bytes, text/x-github-pull-request		Details \| Review