Closed Bug 1169264 Opened 9 years ago Closed 9 years ago

Stop storing the raw builds-{pending,4hr} entry as an artifact for buildbot jobs

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

At the moment for buildbot jobs, we store the raw builds-{pending,running,4hr}.js content as an artifact for each job (minus a few properties that are deleted).

Whilst these artifacts are theoretically useful for debugging, we rarely use them - and for builds-4hr we have the daily archives at https://secure.pub.build.mozilla.org/builddata/buildjson/ anyway, so don't need to store them.

By storing them, we:
* Increase the number of rows stored in the job_artifacts table (up to three rows per job - one for each of pending, running, 4hr) - reducing perf and increasing storage requirements.
* Increase the time/DB impact of processing builds-*.js since we have quite a few more artifacts to insert.
* Presumably also increase the peak memory usage of the buildapi workers, which doesn't help with bug 1165283.

Let's just stop storing them. For builds-4hr issues later we can refer to the daily archives (in fact these are preferred, since the copies we store have certain properties changed/deleted) and for builds-{pending,running} we just just add logging or investigate locally.

This may also help with bug 1165984 - if there are fewer rows in the job_artefacts table, maybe the deletes will be less timeout prone.
For mozilla-inbound, these artifacts account for 7.4 million out of 20 million total rows.

> SELECT `name`, COUNT(*) FROM mozilla_inbound_jobs_1.job_artifact GROUP BY `name`

+ --------- + ------------- +
| name      | COUNT(*)      |
+ --------- + ------------- +
| Bug suggestions | 2930093       |
| Job Info  | 2996164       |
| Structured Log | 1046298       |
| buildapi  | 3998498       |
| buildapi_complete | 2606298       |
| buildapi_pending | 2110130       |
| buildapi_running | 2729045       |
| json_log_summary | 19532         |
| privatebuild | 37274         |
| text_log_summary | 1866597       |
+ --------- + ------------- +
10 rows
copied from my PR comment:

I have to admit, the presence of those artifacts is exactly how I tracked down bug 1164545. Without it, I wouldn't have known that they changed which revision they belonged to. Maybe we can talk about how to minimize this, or end of life them sooner?  I find these artifacts really useful for debugging.
Hey Ed--  My concern here is for the ability to debug when ingestion goes wrong at the various stages.  This may be short lived when we move to task cluster bridge (if I understand that correctly).  But could we add logging of this info with the job_id or job_guid so we can track down ingestion errors?

Like you said elsewhere, I guess we can go into the daily archives.  Maybe this PR could add a doc section describing where to find that info?

If we can store the info to debug problems in a more sane way than these artifacts, I'm all for it!!  :)
Comment on attachment 8612321 [details] [review]
Stop storing the buildapi json artifact for pending/completed jobs

Going with an r- asking for:
* docs on how to get info from daily archives
* logging of the pending/running info
Attachment #8612321 - Flags: review?(cdawson) → review-
Typical timing with bug 1164545 just to prove the "there aren't useful very often" wrong hehe :-)

For builds-4hr, the archives are in the same location as builds-4hr:
https://secure.pub.build.mozilla.org/builddata/buildjson/

...so as you say, I think we can stop storing them for completed jobs.

For pending, a typical blob looks like:

    {
        "submitted_at": 1433257645,
        "id": 71239699,
        "buildername": "Rev4 MacOSX Snow Leopard 10.6 mozilla-aurora debug test mochitest-5"
    },

Now for pending, we already store the id & buildername as "request_id" and "buildername" in the buildapi artifact. And if either of these were to change between the pending job and the running job, we'd end up with a different guid, and so two jobs inserted. As such, we could just retrieve the id/buildername from the duplicated (and still pending) job to see its original values from builds-pending.

As for the submitted_at, it's not really used for anything much (vs some of the other properties), so I'm ok with not snapshotting its state directly.

For running, a typical blob is:

    {
        "submitted_at": 1433256673,
        "buildername": "Android armv7 API 9 fx-team build",
        "start_time": 1433257481,
        "number": 125,
        "claimed_by_name": "buildbot-master77.bb.releng.use1.mozilla.com:/builds/buildbot/build1/master",
        "request_ids": [71238489],
        "last_heartbeat": 1433257710,
        "id": 72514416,
        "revision": "a02a945edeb4d38863e59f9cb6b3634a65e92dae"
    },

We can ignore the properties that we don't ingest ("number", "claimed_by_name", "last_heartbeat", "id" [ie build_id]) - we currently unnecessary store them in the artifact). Also buildername can be omitted IMO, for the same reason as pending: if it wasn't the same as the value in the buildapi artifact, we'd have a different job and thus different guid anyway.

This leaves us for a running job, with:

    {
        "submitted_at": 1433256673,
        "start_time": 1433257481,
        "request_ids": [71238489],
        "revision": "a02a945edeb4d38863e59f9cb6b3634a65e92dae"
    },

As such how about this:
1) We stop storing artifacts for builds-4hr
2) We stop storing artifacts for builds-pending
3) For builds-running we trim the artifact down to the four properties shown above?

Whilst I like the idea of logging - is it perhaps going to be too verbose? And also still pretty unusable?

Alternatively, how practical would it be to forbid/log if and only if we try to update a job and we end up changing an-already set value? (eg changing the revision) Or would this be too much of a perf hit on the API?
I think there might be a bit of a perf hit checking the values.  Maybe there's a clever way to do that.  Some values can be updated just fine, like the result, of course.  But some could be forbidden.  It would be nice to have that since it would alert us to this issue much faster than "something weird is happening... sometimes..." like bug 1164545.  :) 

However, even if the running artifact was just smaller, like you propose, that might help a bit.  If we wanted less rows, we could have a buildbot artifact that we update, rather than adding new ones.  Read it in, add more values to the struct, write it back out.  But that *sounds* slow.  Not sure if it is, compared to everything else.

The slim-it-down proposal you make above sounds like the quickest win.
Yeah let's go with the 80% fix for now, and we can see about moving to something else later.
Summary: Stop storing the raw builds-{pending,running,4hr} entry as an artifact for buildbot jobs → Stop storing the raw builds-{pending,4hr} entry as an artifact for buildbot jobs
Comment on attachment 8612321 [details] [review]
Stop storing the buildapi json artifact for pending/completed jobs

PR updated :-)
Attachment #8612321 - Attachment description: Stop storing the raw buildapi json artifact for each job → Stop storing the buildapi json artifact for pending/completed jobs
Attachment #8612321 - Flags: review- → review?(cdawson)
Blocks: 1165984
I've just tested deleting just the complete/pending artifacts on stage's mozilla-inbound job artifacts table (with an optimize before/after) - and it reduced the job artifacts table size on disk by 27%, and that doesn't include the running artefact being trimmed by this PR either. That plus there being 20% fewer rows should make the artifacts table slightly less of a pain wrt cycle-data in bug 1165984.
Comment on attachment 8612321 [details] [review]
Stop storing the buildapi json artifact for pending/completed jobs

Looks great!
Attachment #8612321 - Flags: review?(cdawson) → review+
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/3a2869508132f28a123855db7433976d82678985
Bug 1169264 - Stop storing the raw builds-4hr json artifact for each job

We do not need to store the raw builds-4hr json blob for each job, since
a daily archive of completed jobs is kept at:
https://secure.pub.build.mozilla.org/builddata/buildjson/

This will reduce the number of queries required to store each job and
mean there are fewer rows in the jobs artifacts table, improving perf
and reducing disk usage.

https://github.com/mozilla/treeherder/commit/233df1b323828b5923506da195fa021391656027
Bug 1169264 - Store less of the raw pending/running job json blob

Stops storing the raw json from builds-pending.js completely, and halves
the size of the blob stored from builds-running.js (by switching to a
whitelist of important properties).
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: 1161618
Blocks: 1176130
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: