Bug 1152896 (treeherder-nr-exceptions)

[Meta] Drive the New Relic exception rate down

NEW
Unassigned

Status

Tree Management
Treeherder: Infrastructure
P3
normal
3 years ago
14 days ago

People

(Reporter: emorley, Unassigned)

Tracking

(Depends on: 9 bugs)

Details

(Reporter)

Description

3 years ago
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors
(Reporter)

Updated

3 years ago
(Reporter)

Comment 1

3 years ago
In the last 7 days:

parse-log
celery.exceptions:Retry
14,486 occurrences (max 10 retries per log)
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/3531807536/similar_errors?original_error_id=3531807536
-> Mainly HTTPError(), one SSLError('The read operation timed out',)
...we should see if we can get these to display more intelligently, since they are the individual retries (so we don't want to suppress them), but are currently all lumped together, rather than split by reason.

parse-log
urllib2:HTTPError
1,496 occurrences
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/3531874326/similar_errors?original_error_id=3531874326
-> all: HTTP Error 404: Not Found
...I've landed an improvement to the logging in bug 1152769 which will give us the info we need to track these down.

fetch-buildapi-running
ssl:SSLError
66 occurrences
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/3531187577/similar_errors?original_error_id=3531187577
-> Combination of "The read operation timed out" and "_ssl.c:495: The handshake operation timed out"

submit-bug-comment
celery.exceptions:Retry
50 occurrences
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/3519054823/similar_errors?original_error_id=3519054823
-> all: 401 Client Error: Authorization Required
...presume bug 1142258

fetch-buildapi-pending
ssl:SSLError
49 occurrences
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/3531904278/similar_errors?original_error_id=3531904278
...same as the fetch-buildapi-running one above, but for pending

fetch-buildapi-build4h
ssl:SSLError
42 occurrences
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/3531769620/similar_errors?original_error_id=3531769620
...ditto but for build4h

parse-json-log
celery.exceptions:Retry
20 occurrences
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/3528554684/similar_errors?original_error_id=3528554684
-> all: MemoryError() (apart from one timeout similar to the above)
...bug 1152742.

process-objects
exceptions:TypeError
17 occurrences
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/3531829249/similar_errors?original_error_id=3531829249
-> 'NoneType' object has no attribute '__getitem__'
...need a bug filed.

fetch-buildapi-build4h
treeherder.etl.mixins:CollectionNotLoadedException
9 occurrences
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/3528253581/similar_errors?original_error_id=3528253581
-> eg "[try] Error posting data to objectstore: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><html><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8"><title>Service Unavailable</title><style type="text/css">body, p, h1 { font-family: Verdana, Arial, Helvetica, sans-serif;}h2 { font-family: Arial, Helvetica, sans-serif; color: #b10b29;}</style></head><body><h2>Service Unavailable</h2><p>The service is temporarily unavailable. Please try again later.</p></body></html>"
...needs a bug filed.

parse-log
exceptions:UnicodeDecodeError
6 occurrences
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/3527093001/similar_errors?original_error_id=3527093001
-> "'utf8' codec can't decode byte 0xe0 in position 156: invalid continuation byte"
...bug 1091759

submit-bug-comment
requests.exceptions:HTTPError
4 occurrences
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/3519108386/similar_errors?original_error_id=3519108386
...is the counterpart to the submit-bug-comment retry one above.

fetch-buildapi-running
urllib2:URLError
2 occurrences
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors/3529398646/similar_errors?original_error_id=3529398646
-> "<urlopen error timed out>" and "<urlopen error _ssl.c:495: The handshake operation timed out>"

(and a few other single occurrence exceptions that are likely just infra blip related)
(Reporter)

Updated

3 years ago
Depends on: 1154248
(Reporter)

Updated

3 years ago
Depends on: 1154249
(Reporter)

Updated

3 years ago
No longer depends on: 1154249
(Reporter)

Updated

3 years ago
Depends on: 1155647
(Reporter)

Updated

3 years ago
Depends on: 1155702
(Reporter)

Updated

3 years ago
Depends on: 1159934
(Reporter)

Updated

3 years ago
No longer depends on: 1154248
(Reporter)

Updated

3 years ago
Depends on: 1165335
(Reporter)

Updated

2 years ago
Depends on: 1205049
(Reporter)

Updated

2 years ago
Depends on: 1220418, 1220427, 1213939
(Reporter)

Updated

2 years ago
Depends on: 1224931
(Reporter)

Updated

2 years ago
Depends on: 1268676
(Reporter)

Comment 2

2 years ago
The New Relic exception rate on production is higher than normal at the moment (the KeyError exceptions should now be fixed on master, but there are still lots of others) - please could everyone take a look and see if there's anything obvious that stands out? (I see exceptions there relating to auto-classification, perfherder, and possibly some refdata datasource changes fallout.)

See:
https://rpm.newrelic.com/accounts/677903/applications/4180461/traced_errors
(It will default to 30 mins; best to switch to 24 hours or 3 day view to catch the periodic tasks)

Many thanks! :-)
Depends on: 1265188
The main autoclassification intermittent is also fixed on master.

(fwiw I think the filterable_errors view i.e. the Error Analytics panel is a big improvement).
(Reporter)

Updated

2 years ago
Depends on: 1272532
(Reporter)

Updated

2 years ago
Depends on: 1277499
(Reporter)

Updated

2 years ago
Depends on: 1233164
(Reporter)

Updated

2 years ago
Depends on: 1277506
(Reporter)

Updated

2 years ago
Depends on: 1277575
(Reporter)

Updated

a year ago
Depends on: 1281808
(Reporter)

Updated

a year ago
Depends on: 1281809
(Reporter)

Updated

a year ago
Depends on: 1281810
(Reporter)

Updated

a year ago
Depends on: 1283413
(Reporter)

Updated

a year ago
Depends on: 1283505
(Reporter)

Updated

a year ago
Depends on: 1283856
(Reporter)

Updated

a year ago
Depends on: 1283859
(Reporter)

Updated

a year ago
Depends on: 1283146
(Reporter)

Updated

a year ago
Depends on: 1284360
(Reporter)

Updated

a year ago
Depends on: 1284418
(Reporter)

Updated

a year ago
Depends on: 1284429
(Reporter)

Updated

a year ago
Depends on: 1284432
(Reporter)

Updated

a year ago
Depends on: 1287111
(Reporter)

Updated

a year ago
Depends on: 1287113
(Reporter)

Updated

a year ago
Depends on: 1287930
(Reporter)

Updated

a year ago
Depends on: 1288202
(Reporter)

Updated

a year ago
Depends on: 1289354
(Reporter)

Updated

a year ago
Depends on: 1289404
(Reporter)

Updated

a year ago
Depends on: 1295536
(Reporter)

Updated

a year ago
Depends on: 1300789
(Reporter)

Updated

a year ago
Depends on: 1301698
(Reporter)

Updated

a year ago
Depends on: 1301700
(Reporter)

Updated

a year ago
Depends on: 1301702
(Reporter)

Updated

a year ago
Depends on: 1306580
(Reporter)

Updated

a year ago
Depends on: 1308122
(Reporter)

Updated

a year ago
Depends on: 1308123
(Reporter)

Updated

a year ago
Depends on: 1308166
(Reporter)

Updated

a year ago
Alias: treeherder-nr-exceptions
(Reporter)

Updated

a year ago
Depends on: 1310053
(Reporter)

Updated

a year ago
Depends on: 1311974
(Reporter)

Updated

a year ago
Depends on: 1311976
(Reporter)

Updated

a year ago
Depends on: 1311977
(Reporter)

Updated

a year ago
Depends on: 1311980
(Reporter)

Updated

a year ago
Depends on: 1311982
(Reporter)

Updated

6 months ago
Depends on: 1368982, 1368984, 1368985
(Reporter)

Updated

6 months ago
Depends on: 1368988
(Reporter)

Updated

6 months ago
Depends on: 1368989
(Reporter)

Updated

6 months ago
Depends on: 1368991
(Reporter)

Updated

4 months ago
Depends on: 1380450
(Reporter)

Updated

14 days ago
Depends on: 1416001
You need to log in before you can comment on or make changes to this bug.