Closed Bug 1416901 Opened 7 years ago Closed 6 years ago

[traceback] upload_crash_report_json_schema crontabber job failing in -stage-new

Categories

(Socorro :: Infra, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

Details

https://sentry.prod.mozaws.net/operations/socorro-new-stage/issues/684197/

TypeError: argument of type 'NoneType' is not iterable
  File "crontabber/app.py", line 1053, in _run_one
    for last_success in self._run_job(job_class, config, info):
  File "crontabber/base.py", line 189, in main
    function()
  File "crontabber/base.py", line 259, in _run_proxy
    return self.run(*args, **kwargs)
  File "socorro/cron/jobs/upload_crash_report_json_schema.py", line 50, in run
    connection = connection_context._connect()
  File "socorro/external/boto/connection_context.py", line 201, in _connect
    **self._get_credentials()
  File "boto/__init__.py", line 140, in connect_s3
    return S3Connection(aws_access_key_id, aws_secret_access_key, **kwargs)
  File "boto/s3/connection.py", line 190, in __init__
    validate_certs=validate_certs, profile_name=profile_name)
  File "boto/connection.py", line 572, in __init__
    host, config, self.provider, self._required_auth_capability())
  File "boto/auth.py", line 930, in _wrapper
    if '.cn-' in self.host:


I think there are a couple of possibilities:

1. the crontabber container is missing some configuration or some configuration in the docker/config/ files is wrong

2. the TelemetryS3CrashStorage needs some code fixes to work with the auth scheme we're using in -stage-new

This bug covers fixing this.
As an aside, I didn't realize crontabber used the TelemetryS3 bucket. I'm not sure any of that works right in the local development environment. Should look into that as well.
Making this a P1. This blocks the new infrastructure work.
Priority: -- → P1
Bug #1410167 fixes some TelemetryS3CrashStorage configuration. After that lands, I'll work on this.
Grabbing this to work on.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
I checked Sentry and it looks like we haven't seen one of these since November 21st.

However, this sentry report pops up then and continues to now and kind of looks like the same thing:

https://sentry.prod.mozaws.net/operations/socorro-new-stage/issues/707192/

TypeError: argument of type 'NoneType' is not iterable
  File "crontabber/app.py", line 1053, in _run_one
    for last_success in self._run_job(job_class, config, info):
  File "crontabber/base.py", line 189, in main
    function()
  File "crontabber/base.py", line 259, in _run_proxy
    return self.run(*args, **kwargs)
  File "socorro/cron/jobs/upload_crash_report_json_schema.py", line 50, in run
    connection = connection_context._connect()
  File "socorro/external/boto/connection_context.py", line 201, in _connect
    **self._get_credentials()
  File "boto/__init__.py", line 141, in connect_s3
    return S3Connection(aws_access_key_id, aws_secret_access_key, **kwargs)
  File "boto/s3/connection.py", line 194, in __init__
    validate_certs=validate_certs, profile_name=profile_name)
  File "boto/connection.py", line 569, in __init__
    host, config, self.provider, self._required_auth_capability())
  File "boto/auth.py", line 1070, in _wrapper
    if test in self.host:

This cron job with the long name is using an S3ConnectionContext:

https://github.com/mozilla-services/socorro/blob/f75753c9de436378a737481d99fcca791ba0ce57/socorro/external/boto/connection_context.py#L300

I'm pretty sure the connection context classes got fixed so they didn't require an access_id and secret_access_id. But that's not the issue here. The issue here is that the boto code is dying on "host" suggesting that it's None. That puzzles me because we don't ever set the host.

Is the crontabber node set up what it needs to talk to S3?

Tagging Miles to verify that.
Flags: needinfo?(miles)
It's possible it's failing because of the wrong connection_context being set. I did a PR to remove that from the env vars:

https://github.com/mozilla-services/cloudops-deployment/pull/1426

Regardless of whether that PR fixes this issue, we should do it because that's definitely the wrong connection context class to use.
Miles landed the PR and now it should be all set.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Flags: needinfo?(miles)
You need to log in before you can comment on or make changes to this bug.