Closed Bug 1162682 Opened 10 years ago Closed 10 years ago

Catch exceptions whilst handling objectstore jobs to prevent losing all jobs in that batch

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(1 file)

Catch exceptions whilst processing objectstore jobs 10 years ago Ed Morley [:emorley] 46 bytes, text/x-github-pull-request	camd : review+	Details \| Review

Ed Morley [:emorley]

Assignee

Description

•

10 years ago

One of the causes of bug 1125476 is us hitting exceptions whilst we're processing the contents of json_blob in the objectstore (as opposed to the worker being killed due to infra reasons). Now we already handle exceptions during deserialization of json_blob, however we do not do so for the _load_ref_and_job_data_structs() call afterwards, which if anything is more risky. This means that if an exception occurs, we fail to store any of the other jobs in that batch, leaving up to 100 jobs stuck in the 'loading' state indefinitely - even if the remaining 99 were handled successfully.

Ed Morley [:emorley]

Assignee

Comment 1

•

10 years ago

Attached file Catch exceptions whilst processing objectstore jobs — Details

Attachment #8603054 - Flags: review?(cdawson)

Cameron Dawson [:camd]

Updated

•

10 years ago

Attachment #8603054 - Flags: review?(cdawson) → review+

Treeherder GitHub Bugbot

Comment 2

•

10 years ago

Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/8c27f6656b9692549a00b988a208e785ab703229 Bug 1162682 - Catch exceptions whilst processing objectstore jobs Now if an exception occurs during _load_ref_and_job_data_structs(), we mark that job in the objectstore as errored and continue inserting the other jobs. Previously the exception would have meant all of the other jobs were not inserted, causing up to 100 rows in the objectstore to be stuck in the 'loading' processed_state indefinitely. The exception string passed to mark_object_error() isn't ideal, but it's the same as the handling above, so will do for now until we remove the objectstore. In addition, this change means that we lose visibility in New Relic for these exceptions - and someone has to manually check the objectstore for jobs with error = "Y". However short term this seems preferable to dropping 100 jobs every time we get an exception, particularly since this is already the case for deserialisation exceptions. In a followup bug we could always try using the New Relic Python agent's record_exception() to maintain reporting without having to re-raise the exception ourselves.

Ed Morley [:emorley]

Assignee

Updated

•

10 years ago

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Ed Morley [:emorley]

Assignee

Updated

•

10 years ago

Blocks: 1145998

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Catch exceptions whilst handling objectstore jobs to prevent losing all jobs in that batch

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P1)

Tracking

(Not tracked)

People

(Reporter: emorley, Assigned: emorley)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Comment 2

Updated

Updated

Attachment

General

Description

File Name

Content Type