Store log information in treeherder database, instead of generic per-project artifacts

RESOLVED FIXED

Status

Tree Management
Treeherder
RESOLVED FIXED
a year ago
9 months ago

People

(Reporter: wlach, Assigned: wlach)

Tracking

(Blocks: 2 bugs)

Details

Attachments

(3 attachments, 1 obsolete attachment)

Comment hidden (empty)
Summary: Store log information in treeherder database, instead of generic per-project → Store log information in treeherder database, instead of generic per-project artifacts
We know what the structure of job logs are, we should just store them in the master database using the Django ORM, much as we do for performance artifacts, and retrieve them as needed using a proper API. This will make them easier to deal with, on both the client and server side. Proposed models:

```
class JobLogStep(models.Model):

    repository = models.ForeignKey(Repository)
    job_id = models.PositiveIntegerField(db_index=True)

    SUCCESS = 0
    SKIPPED = 1
    TEST_FAILED = 2

    RESULTS = ((SUCCESS, 'success'),
               (SKIPPED, 'skipped'),
               (TEST_FAILED, 'testfailed'))

    name = models.CharField(max_length=200)
    started = models.DateTimeField()
    started_linenumber = models.PositiveIntegerField()
    finished_linenumber = models.PositiveIntegerField()
    finished = models.DateTimeField()
    error_count = models.PositiveIntegerField()
    result = models.IntegerField(choices=RESULTS)

class JobLogStepError(models.Model):

    job_log_step = models.ForeignKey(JobLogStep)
    line = models.CharField(max_length=1024)
    line_number = models.PositiveIntegerField()
```

I can think of two ways of implementing this:

1. Implement the new model, migrate existing stuff in one shot (downside: log artifacts will be temporarily inaccessible while migration completes)
2. Implement the new model, but only ingest the new data, don't change anything else (and keep on creating artifacts as before). Add a migrated column for existing artifacts (to track what has been migrated, new artifacts are migrated by default), then start the migration. When everything done, switch over to using new data and delete the artifacts.

I guess it depends if we're willing to accept a limited period where some artifacts are inaccessible. Regardless, this should be relatively easy to prototype and implement.
Blocks: 1178641
Are steps a thing we still want to keep (at all, or at least in its current form)?
The taskcluster jobs only populate some of them properly (hence the dummy steps I had to implement in bug 1188444).
(In reply to Ed Morley (Away 24th-28th March) [:emorley] from comment #2)
> Are steps a thing we still want to keep (at all, or at least in its current
> form)?
> The taskcluster jobs only populate some of them properly (hence the dummy
> steps I had to implement in bug 1188444).

I agree, it's debatable how useful steps are. They're only used by the log viewer, and I suspect the normal use there is to be able to jump to a particular error line. So we just could just consider killing them.
Actually, now that I look at it, we already have a FailureLine model in treeherder which seems similar to my proposal above to JobLogStepError, but they don't seem to be associated with a particular job. James, would it maybe make sense to just extend your model to handle this behaviour?
Flags: needinfo?(james)
FailureLine is only structured lines at the moment. I guess it's possible to imagine a solution where we dump all log lines in there and somehow differentiate the structured and unstructured lines (this is not the way that the autoclassify stuff works right now, but it might be a reasonable refactor).

Things are associated with a job via job_guid. Using (repository, job_id) sounds appealing, but I think that composite primary keys and joining on composite keys is very poorly supported in django.
Flags: needinfo?(james)
Comment hidden (typo)
Comment hidden (typo)
I am working on this now. It's both the next logical step for artifact migration and will be helpful for putting log steps in the database for analysis.
Assignee: nobody → wlachance
Blocks: 1100414

Comment 9

11 months ago
Created attachment 8771557 [details] [review]
[treeherder] wlach:1258861 > mozilla:master

Updated

11 months ago
Blocks: 1278680
Comment on attachment 8771557 [details] [review]
[treeherder] wlach:1258861 > mozilla:master

Ok, I believe this is pretty much ready to go except:

(1) We need a migration script for migrating old logs
(2) We need to split this up into seperate commits so we can stage a migration in the usual manner (migrate old data into new tables, then switch the ingestion)

So this is actually more limited a refactoring than I'd like, but I kept it to more or less "just" changing the storage mechanism to the text log summary in the interests of:

* Not changing too much at once
* Getting datasource out of the tree sooner than later

In particular, there are still some intermediary data structures (TextLogSummary and TextLogSummaryLine) which should probably be merged with one of the new models I'm adding (TextLogError) at some future point. But that may be best done in the context of work on the autoclassification model, which I don't want to get into right now because of the above reasons. ^^^
Attachment #8771557 - Flags: feedback?(james)
Attachment #8771557 - Flags: feedback?(emorley)
Attachment #8771557 - Flags: feedback?(cdawson)

Comment 11

10 months ago
I haven't reviewed the code yet, but before I forget:

But wrt steps, it might be good to keep bug 1245760 in mind.   

tl;dr: there was a job that failed, but we got no failure lines (infra error).  The step itself had a failure code, but we didn't display it as a failure in the log viewer because none of the lines in the step matched a failure regex.  

So while I agree that getting rid of the step parser is probably a good call at this point, we should look for step lines that indicate an error of some kind.  Probably that's just adding a regex to catch it in the parser.
(In reply to Cameron Dawson [:camd] from comment #11)
> So while I agree that getting rid of the step parser is probably a good call
> at this point, we should look for step lines that indicate an error of some
> kind.  Probably that's just adding a regex to catch it in the parser.

FWIW this patch is pretty limited in scope. We keep exactly the same data structure / elements that we had before, they're just now stored in the db with a schema instead of standalone job artifacts.

After discussing this with a lot of people, I think we do in fact want to keep the step parser in some form (and keep storing steps), so we have this data available for later analysis.

Comment 13

10 months ago
Comment on attachment 8771557 [details] [review]
[treeherder] wlach:1258861 > mozilla:master

Since this is just a feedback request, I didn't go through EVERY file.  But this looks good to me.  I went through most of them.  I like the consolidation of the creation of the bug suggestions.
Attachment #8771557 - Flags: feedback?(cdawson) → feedback+

Updated

10 months ago
Attachment #8771557 - Flags: feedback?(emorley) → feedback+

Comment 14

10 months ago
Created attachment 8786471 [details] [review]
[treeherder] wlach:1258861-pre > mozilla:master
Based on previous experience, I think it makes sense to land this in two stages:

1. DB schema changes + migration file
2. Ingestion, webapp, mvc changes

Attached a pull request for just (1)
Comment on attachment 8771557 [details] [review]
[treeherder] wlach:1258861 > mozilla:master

Ok, I'm pretty confident this is ready to go (using the staged deployment strategy I mentioned earlier). James, could you take a look at this when you have a moment? Ideally within the next couple of days (I was hoping to actually move production over to this next week, so I can move on to killing bug suggestion artifacts)

I've described some details of the changes (from the point of view of an external consumer) in the treeherder newsgroup:

https://groups.google.com/forum/#!topic/mozilla.tools.treeherder/iYSee3u2SZg
Attachment #8771557 - Flags: feedback?(james) → review?(james)
Comment on attachment 8771557 [details] [review]
[treeherder] wlach:1258861 > mozilla:master

Looks good so far, have made some comments. Thanks for doing this!
Attachment #8771557 - Flags: review?(james)

Comment 18

10 months ago
We're now at the point where we're pretty much ready to switch stage to Heroku, however I think it will be best to land this prior to doing so (particularly prior to the start of the SCl3 prod -> heroku prod DB replication in bug 1283170).

Does landing this in the next 48 hours sound like something that is viable? :-)
(In reply to Ed Morley [:emorley] from comment #18)
> We're now at the point where we're pretty much ready to switch stage to
> Heroku, however I think it will be best to land this prior to doing so
> (particularly prior to the start of the SCl3 prod -> heroku prod DB
> replication in bug 1283170).
> 
> Does landing this in the next 48 hours sound like something that is viable?
> :-)

For stage? I think it should be. I still need to address some of :jgraham's feedback (see the PR) but there don't seem to be any show-stoppers.

I'd probably want at least a couple days of testing on stage before actually deploying this to production.

Comment 20

10 months ago
Just seen this new cycle-data exception on stage:

django.db.utils:IntegrityError: (1451, 'Cannot delete or update a parent row: a foreign key constraint fails (`treeherder_stage`.`text_log_error`, CONSTRAINT `text_log_error_job_id_4b21fee19bc31759_fk_job_id` FOREIGN KEY (`job_id`) REFERENCES `job` (`id`))') 

https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/c6b5ca13-73ff-11e6-a90b-b82a72d22a14_0_7128
(In reply to Ed Morley [:emorley] from comment #20)
> Just seen this new cycle-data exception on stage:
> 
> django.db.utils:IntegrityError: (1451, 'Cannot delete or update a parent
> row: a foreign key constraint fails (`treeherder_stage`.`text_log_error`,
> CONSTRAINT `text_log_error_job_id_4b21fee19bc31759_fk_job_id` FOREIGN KEY
> (`job_id`) REFERENCES `job` (`id`))') 
> 
> https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/
> c6b5ca13-73ff-11e6-a90b-b82a72d22a14_0_7128

Hmm, I think the problem here is that the job log error model has two foreign key references (the job log step and the job itself). We could either tweak the on delete behaviour or remove the job foreign key. I'll probably do the latter, since it shouldn't be necessary (the reference is in the job log step as well).
(In reply to William Lachance (:wlach) from comment #21)
> (In reply to Ed Morley [:emorley] from comment #20)
> > Just seen this new cycle-data exception on stage:
> > 
> > django.db.utils:IntegrityError: (1451, 'Cannot delete or update a parent
> > row: a foreign key constraint fails (`treeherder_stage`.`text_log_error`,
> > CONSTRAINT `text_log_error_job_id_4b21fee19bc31759_fk_job_id` FOREIGN KEY
> > (`job_id`) REFERENCES `job` (`id`))') 
> > 
> > https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors/
> > c6b5ca13-73ff-11e6-a90b-b82a72d22a14_0_7128
> 
> Hmm, I think the problem here is that the job log error model has two
> foreign key references (the job log step and the job itself). We could
> either tweak the on delete behaviour or remove the job foreign key. I'll
> probably do the latter, since it shouldn't be necessary (the reference is in
> the job log step as well).

Latest patch removes the foreign key, let's see how that works. :P
Comment on attachment 8771557 [details] [review]
[treeherder] wlach:1258861 > mozilla:master

I think this is ready for another review.
Attachment #8771557 - Flags: review?(james)
(Assignee)

Updated

10 months ago
Blocks: 1301477

Comment 24

10 months ago
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/e5a19c182f46ce0317cdeb3156d8c043a650007e
Bug 1258861 - Preparatory work for storing text log steps / errors in main db (#1822)

* Model updates, new db tables, migration script for text log db
Comment on attachment 8771557 [details] [review]
[treeherder] wlach:1258861 > mozilla:master

\o/

Let's hope it works now :)
Attachment #8771557 - Flags: review?(james) → review+

Comment 26

9 months ago
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/f0105d47f5cd72cbb8742025567c2ca17463f73e
Bug 1258861 - Move text log steps and errors into main database (#1696)

This changes ingestion, the API endpoints, and the frontend to match
the new structure. For now we continue to store text_log_summary artifacts,
though they don't do anything anymore.
This is deployed to production, but leaving open for now. I'll mark as fixed when I've removed the migration script and transitional code.
Depends on: 1302224
Created attachment 8791334 [details] [review]
Stop storing text log artifacts + remove migration script
Attachment #8791334 - Flags: review?(emorley)

Updated

9 months ago
Attachment #8791334 - Flags: review?(emorley) → review+

Comment 29

9 months ago
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/9ffb8c49b5e7cf3fb81e6f7053883a99f5005154
Bug 1258861 - Stop creating text log summary artifacts

We only kept creating them so we could revert, but things are looking
happy on stage and production so I don't think we'll need to do that.

https://github.com/mozilla/treeherder/commit/a1f115e2dd11cea7d70839122b67faa69c5770df
Bug 1258861 - Remove script for migrating text log artifacts
Going to call this done. Any further cleanup can be done in followups.

Updated

9 months ago
Blocks: 1303069
Status: NEW → RESOLVED
Last Resolved: 9 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.