Closed Bug 1372581 Opened 7 years ago Closed 7 years ago

KeyError: 'payload' during store-pulse-resultsets

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: camd)

Details

Attachments

(2 files)

I was trying to grab a sample Pulse message to fix bug 1355812 (I was doing an "Add new jobs" request) and I noticed that the Pulse listener was not getting anything.
I looked in Treeherder's Papertrail and noticed the following.

Searching for "Error transforming resultset" in the Papertrail logs show hundreds of these messages.

This is important to fix as it affects day-to-day tasks for sheriffs.
It is also important to figure out why NewRelic isn't alerting about this.



Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1: [2017-06-13 13:36:40,439] ERROR [treeherder.etl.resultset_loader:47] Error transforming resultset 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1: Traceback (most recent call last): 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1:   File "/app/treeherder/etl/resultset_loader.py", line 29, in process 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1:     message_body["payload"]) 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1: KeyError: 'payload' 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1: [2017-06-13 13:36:40,439: ERROR/Worker-1] Error transforming resultset 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1: Traceback (most recent call last): 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1:   File "/app/treeherder/etl/resultset_loader.py", line 29, in process 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1:     message_body["payload"]) 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1: KeyError: 'payload' 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1: [2017-06-13 13:36:40,479] ERROR [treeherder.etl.resultset_loader:47] Error transforming resultset 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1: Traceback (most recent call last): 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1:   File "/app/treeherder/etl/resultset_loader.py", line 29, in process 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1:     message_body["payload"]) 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1: KeyError: 'payload' 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1: [2017-06-13 13:36:40,479: ERROR/Worker-2] Error transforming resultset 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1: Traceback (most recent call last): 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1:   File "/app/treeherder/etl/resultset_loader.py", line 29, in process 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1:     message_body["payload"]) 
Jun 13 09:36:40 treeherder-prod app/worker_store_pulse_resultsets.1: KeyError: 'payload' 
Jun 13 09:36:43 treeherder-prod app/worker_store_pulse_jobs.3: [2017-06-13 13:36:43,482] WARNING [treeherder.etl.job_loader:97] Pulse job had revision_hash instead of revision: mozilla-central:06e5fb0f474b056f9ca6c1100dc673db4db60120 
Jun 13 09:36:43 treeherder-prod app/worker_store_pulse_jobs.3: [2017-06-13 13:36:43,482: WARNING/Worker-2] Pulse job had revision_hash instead of revision: mozilla-central:06e5fb0f474b056f9ca6c1100dc673db4db60120 
Jun 13 09:36:47 treeherder-prod app/worker_store_pulse_jobs.1: [2017-06-13 13:36:46,950] WARNING [treeherder.etl.job_loader:97] Pulse job had revision_hash instead of revision: mozilla-central:06e5fb0f474b056f9ca6c1100dc673db4db60120 
Jun 13 09:36:47 treeherder-prod app/worker_store_pulse_jobs.1: [2017-06-13 13:36:46,950: WARNING/Worker-2] Pulse job had revision_hash instead of revision: mozilla-central:06e5fb0f474b056f9ca6c1100dc673db4db60120 
Jun 13 09:36:47 treeherder-prod app/worker_store_pulse_jobs.3: [2017-06-13 13:36:47,313] WARNING [treeherder.etl.job_loader:97] Pulse job had revision_hash instead of revision: mozilla-central:06e5fb0f474b056f9ca6c1100dc673db4db60120 
Jun 13 09:36:47 treeherder-prod app/worker_store_pulse_jobs.3: [2017-06-13 13:36:47,313: WARNING/Worker-3] Pulse job had revision_hash instead of revision: mozilla-central:06e5fb0f474b056f9ca6c1100dc673db4db60120 
Jun 13 09:36:48 treeherder-prod app/worker_store_pulse_jobs.1: [2017-06-13 13:36:48,566] WARNING [treeherder.etl.job_loader:97] Pulse job had revision_hash instead of revision: mozilla-central:06e5fb0f474b056f9ca6c1100dc673db4db60120 
Jun 13 09:36:48 treeherder-prod app/worker_store_pulse_jobs.1: [2017-06-13 13:36:48,566: WARNING/Worker-3] Pulse job had revision_hash instead of revision: mozilla-central:06e5fb0f474b056f9ca6c1100dc673db4db60120 
Jun 13 09:36:51 treeherder-prod app/worker_store_pulse_jobs.4: [2017-06-13 13:36:51,587] WARNING [treeherder.etl.job_loader:97] Pulse job had revision_hash instead of revision: mozilla-central:06e5fb0f474b056f9ca6c1100dc673db4db60120 
Jun 13 09:36:51 treeherder-prod app/worker_store_pulse_jobs.4: [2017-06-13 13:36:51,587: WARNING/Worker-2] Pulse job had revision_hash instead of revision: mozilla-central:06e5fb0f474b056f9ca6c1100dc673db4db60120 
Jun 13 09:36:52 treeherder-prod app/worker_store_pulse_jobs.4: [2017-06-13 13:36:52,209] WARNING [treeherder.etl.job_loader:97] Pulse job had revision_hash instead of revision: mozilla-central:06e5fb0f474b056f9ca6c1100dc673db4db60120 
Jun 13 09:36:52 treeherder-prod app/worker_store_pulse_jobs.4: [2017-06-13 13:36:52,209: WARNING/Worker-3] Pulse job had revision_hash instead of revision: mozilla-central:06e5fb0f474b056f9ca6c1100dc673db4db60120
Thank you for filing - this is definitely something we want to fix, however it's not related to "add new jobs" - this is pulse push ingestion.

Cameron, could you take a look at the KeyError, and I'll look at why the New Relic Python agent isn't catching this?
Component: Treeherder → Treeherder: Data Ingestion
Flags: needinfo?(cdawson)
Priority: -- → P1
Summary: Requesting Treeherder actions is throwing some errors → KeyError: 'payload' during store-pulse-resultsets
Blocks: 1355812
As emorley points out this is a red herring to my bug.
No longer blocks: 1355812
Cameron, do you know when you might be able to look at this?
These are exceptions during data ingestion on production that are presumably causing data loss.
This exception is happening when we get a push for which we don't have an entry in the Repository table.  The exception happens when trying to get info to put into the new relic insights ``record_custom_event``.

I'll create a PR to avoid the exception and still post SOMETHING to insights even if it can't find the data it's looking for.  Hopefully this will help us track down any incorrect event skipping.

Actually, I think this isn't causing any actual data loss.  But this will help us determine that.
Flags: needinfo?(cdawson)
Comment on attachment 8884881 [details] [review]
[treeherder] mozilla:key-error-pulse-push-ingest > mozilla:master

Hey Ed--  This should eliminate this exception and help me track down if there's  legitimate push events that are getting dropped.
Attachment #8884881 - Flags: review?(emorley)
Need-info'ing myself to remind me to check insights when this is deployed to see what kind of reporting we get from these messages.
Flags: needinfo?(cdawson)
Comment on attachment 8884881 [details] [review]
[treeherder] mozilla:key-error-pulse-push-ingest > mozilla:master

Left some comments :-)
Attachment #8884881 - Flags: review?(emorley) → review-
Assignee: nobody → cdawson
Comment on attachment 8884881 [details] [review]
[treeherder] mozilla:key-error-pulse-push-ingest > mozilla:master

Hey there, thanks for the feedback.  Helped prompt me to look a little deeper and find what I think are all the problems now.  :)
Flags: needinfo?(cdawson)
Attachment #8884881 - Flags: review- → review?(emorley)
Comment on attachment 8884881 [details] [review]
[treeherder] mozilla:key-error-pulse-push-ingest > mozilla:master

Left a comment :-)
Attachment #8884881 - Flags: review?(emorley)
Comment on attachment 8884881 [details] [review]
[treeherder] mozilla:key-error-pulse-push-ingest > mozilla:master

Hey Ed--  OK, I made that change we talked about.  :)
Attachment #8884881 - Flags: review?(emorley)
Comment on attachment 8884881 [details] [review]
[treeherder] mozilla:key-error-pulse-push-ingest > mozilla:master

Many thanks :-)
Attachment #8884881 - Flags: review?(emorley) → review+
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/3a16745fdf35d25a97b8146094c3fff479b88ea0
Bug 1372581 - Fix exception in error reporting for missing Repository (#2624)
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
On New Relic I'm now seeing exceptions during store-pulse-resultsets of form:

File "/app/treeherder/etl/tasks/pulse_tasks.py", line 30, in store_pulse_resultsets
File "/app/treeherder/etl/resultset_loader.py", line 45, in process
File "/app/treeherder/etl/resultset.py", line 70, in store_result_set_data
File "/app/treeherder/etl/resultset.py", line 13, in store_push
exceptions:AttributeError: 'NoneType' object has no attribute 'get'

eg:
https://rpm.newrelic.com/accounts/677903/applications/14179733/filterable_errors#/show/976912c0-67bb-11e7-931c-0242ac110012_0_4022/stack_trace?top_facet=transactionUiName&primary_facet=error.class&barchart=barchart&_k=48kcqw

Holding off on prod deploy for now.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Flags: needinfo?(cdawson)
Interesting.  I'm still seeing errors such as these occasionally.  Looks like they all come from the "Servo" repo that I see so far.

In papertrail I'm seeing just prior to the attribute error:
 
Jul 17 15:26:26 treeherder-stage app/worker_store_pulse_resultsets.1: [2017-07-17 22:26:26,268: ERROR/Worker-3] Error fetching commits 
Jul 17 15:26:26 treeherder-stage app/worker_store_pulse_resultsets.1: Traceback (most recent call last): 
Jul 17 15:26:26 treeherder-stage app/worker_store_pulse_resultsets.1:   File "/app/treeherder/etl/resultset_loader.py", line 105, in fetch_resultset 
Jul 17 15:26:26 treeherder-stage app/worker_store_pulse_resultsets.1:     commit["commit"]["author"]["email"]), 
Jul 17 15:26:26 treeherder-stage app/worker_store_pulse_resultsets.1: UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in position 13: ordinal not in range(128) 

What happens is that I'm catching the exception in ``resultset_loader`` and logging it to new relic and the logs, but not handling it.  So I return a null resultset and then we try to get the "revision" out of the resultset and can't.  I shouldn't be catching that exception there.  But also need to perhaps sanitize my data a bit.

Still working...
So, it ends up this is actually because we had a commit with the user named "Fausto Núñez Alberro".  When I tried to use the string .format() function with that, it bails because I'm trying to put non-ascii characters into a string.  So if I change that field to unicode, it works.  I updated the unit test for this.

In reality, this is not related to the original bug here, just a coincidence of timing and code location.  :)  But still needed fixing.
Flags: needinfo?(cdawson)
Comment on attachment 8887270 [details] [review]
[treeherder] mozilla:attribute-error > mozilla:master

Here's another bundle of joy for you, Ed.
Attachment #8887270 - Flags: review?(emorley)
Ah good spot! I'll take a look tomorrow with fresher eyes :-)
Comment on attachment 8887270 [details] [review]
[treeherder] mozilla:attribute-error > mozilla:master

Many thanks! A couple of review comments but doesn't need re-review after :-)
Attachment #8887270 - Flags: review?(emorley) → review+
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/0c4387f1570d611ca229597873690952df4701f4
Bug 1372581 - Better new relic reporting for resultset_loader

Add fields of ``url`` and ``repo`` in case exceptions happen.
This was catching an exception that could happen in
``fetch_resultset`` for github pushes, but wasn't raising it, so
we would get exceptions later due to a null object returned
from that function.  This allows the exception to raise and
not try to do any more processing on that push.

https://github.com/mozilla/treeherder/commit/9183e56d27fb03c0c0469f948da8f50626d1a995
Bug 1372581 - Handle latin characters in commit author names

We use python .format() to concatenate the commit author’s name and
email address.  When we hit an author name with non-ASCII
characters, it would bomb out trying to format it into a String.
So this makes sure to put those fields into a Unicode field to preserve
them.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: