Figure out why some celery tasks hitting the soft time limit aren't appearing in New Relic

RESOLVED FIXED

Status

P1
normal
RESOLVED FIXED
2 years ago
2 months ago

People

(Reporter: emorley, Assigned: emorley)

Tracking

(Blocks: 1 bug)

Details

(Assignee)

Description

2 years ago
As part of bug 1215587 we were seeing lots of soft task timeouts in Papertrail. eg:

Soft time limit (900s) exceeded for log-parser[06347a16-dec7-4f58-aded-20c9569aee48] 

These should be showing up in New Relic as exceptions, but weren't. (Unlike hard time limit infractions, soft time limits should give the task enough time to finish up).
(Assignee)

Updated

a year ago
Assignee: emorley → nobody
Status: ASSIGNED → NEW
(Assignee)

Updated

a year ago
See Also: → bug 1387536
(Assignee)

Updated

6 months ago
Depends on: 1483267
(Assignee)

Updated

6 months ago
Depends on: 1483289
(Assignee)

Updated

6 months ago
Depends on: 1483406
(Assignee)

Comment 1

6 months ago
I've spent pretty much all of today going through any SoftTimeLimitExceeded exceptions that appear in the papertrail logs, after the latest spell of bug commenter timeouts.

It turns out we are catching some of the SoftTimeLimitExceeded exceptions - since there are some on New Relic for the log parser. However there are definitely cases where they aren't being reported - some of which I was able to figure out and file specific dep bugs.

This just leaves the original bug commenter case, for which I'm not entirely sure still why it's not showing up (I fully replicated the timeout on the 1200+ weekly bug commenter task locally, but that caught it).

"""
...
Aug 12 08:28:53 treeherder-prod app/worker_default.1: [2018-08-12 07:28:53,213] ERROR [treeherder.intermittents_commenter.commenter:261] error posting comment to bugzilla for bug 1481385 due to 401 Client Error: Authorization Required for url: https://bugzilla.mozilla.org/rest/bug/1481385 
Aug 12 08:28:53 treeherder-prod app/worker_default.1: [2018-08-12 07:28:53,213: ERROR/Worker-3] error posting comment to bugzilla for bug 1481385 due to 401 Client Error: Authorization Required for url: https://bugzilla.mozilla.org/rest/bug/1481385 
Aug 12 08:30:00 treeherder-prod app/worker_default.1: [2018-08-12 07:30:00,081: WARNING/MainProcess] Soft time limit (1800s) exceeded for intermittents-commenter[9d0320f9-6343-4a1f-bd17-8e7a9a56be1c] 
Aug 12 08:30:00 treeherder-prod app/worker_default.1: [2018-08-12 07:30:00,205: ERROR/MainProcess] Task intermittents-commenter[9d0320f9-6343-4a1f-bd17-8e7a9a56be1c] raised unexpected: SoftTimeLimitExceeded() 
Aug 12 08:30:00 treeherder-prod app/worker_default.1: Traceback (most recent call last): 
Aug 12 08:30:00 treeherder-prod app/worker_default.1:   File "/app/.heroku/python/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task 
Aug 12 08:30:00 treeherder-prod app/worker_default.1:     R = retval = fun(*args, **kwargs) 
Aug 12 08:30:00 treeherder-prod app/worker_default.1:   File "/app/.heroku/python/lib/python2.7/site-packages/newrelic/hooks/application_celery.py", line 85, in wrapper 
Aug 12 08:30:00 treeherder-prod app/worker_default.1:     return wrapped(*args, **kwargs) 
Aug 12 08:30:00 treeherder-prod app/worker_default.1:   File "/app/.heroku/python/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__ 
Aug 12 08:30:00 treeherder-prod app/worker_default.1:     return self.run(*args, **kwargs) 
Aug 12 08:30:00 treeherder-prod app/worker_default.1:   File "/app/treeherder/intermittents_commenter/tasks.py", line 10, in run_commenter 
Aug 12 08:30:00 treeherder-prod app/worker_default.1:     process.run() 
Aug 12 08:30:00 treeherder-prod app/worker_default.1:   File "/app/treeherder/intermittents_commenter/commenter.py", line 44, in run 
Aug 12 08:30:00 treeherder-prod app/worker_default.1:     self.print_or_submit_changes(all_bug_changes) 
Aug 12 08:30:00 treeherder-prod app/worker_default.1:   File "/app/treeherder/intermittents_commenter/commenter.py", line 179, in print_or_submit_changes 
Aug 12 08:30:00 treeherder-prod app/worker_default.1:     time.sleep(1) 
Aug 12 08:30:00 treeherder-prod app/worker_default.1:   File "/app/.heroku/python/lib/python2.7/site-packages/billiard/pool.py", line 235, in soft_timeout_sighandler 
Aug 12 08:30:00 treeherder-prod app/worker_default.1:     raise SoftTimeLimitExceeded() 
Aug 12 08:30:00 treeherder-prod app/worker_default.1: SoftTimeLimitExceeded: SoftTimeLimitExceeded() 
Aug 12 08:30:22 treeherder-prod app/worker_default.1: [2018-08-12 07:30:22,029: WARNING/Worker-3] /app/.heroku/python/lib/python2.7/site-packages/newrelic/hooks/database_dbapi2.py:25: Warning: (1003L, u"/* select#1 */ select `treeherder`.`repository`.`name` AS `name`,`treeherder`.`machine_platform`.`platform` AS `platform`,`treeherder`.`bug_job_map`.`bug_id` AS `bug_id` from `treeherder`.`repository` `U0` join `treeherder`.`bug_job_map` join `treeherder`.`job` join `treeherder`.`push` join `treeherder`.`repository` join `treeherder`.`machine_platform` where ((`treeherder`.`job`.`id` = `treeherder`.`bug_job_map`.`job_id`) and (`treeherder`.`push`.`id` = `treeherder`.`job`.`push_id`) and (`treeherder`.`machine_platform`.`id` = `treeherder`.`job`.`machine_platform_id`) and (`treeherder`.`U0`.`id` = `treeherder`.`job`.`repository_id`) and (`treeherder`.`repository`.`id` = `treeherder`.`job`.`repository_id`) and (`treeherder`.`U0`.`active_status` = 'active') and (`treeherder`.`push`.`time` between '2018-08-05 00:00:00' and '2018-08-11 23:59:59.000999'))") 
Aug 12 08:30:22 treeherder-prod app/worker_default.1:   *args, **kwargs) 
"""
Assignee: nobody → emorley
(Assignee)

Comment 2

2 months ago
Fixed by the deps of this bug.
Status: NEW → RESOLVED
Last Resolved: 2 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.