Closed Bug 950358 Opened 11 years ago Closed 10 years ago

[useless traceback] AttributeError: 'Options' object has no attribute 'object_name'

Categories

(support.mozilla.org :: Code Quality, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

Details

(Whiteboard: u=dev c=codequality p= s=2013.backlog)

Recently, we started getting these and now we get lots and lots and lots.


Traceback (most recent call last):

  File "/data/www/support.mozilla.org/kitsune/vendor/src/raven-python/raven/utils/serializer/manager.py", line 73, in transform
    return self.transform(repr(value), **kwargs)

  File "/data/www/support.mozilla.org/kitsune/vendor/src/django/django/db/models/options.py", line 198, in __repr__
    return '<Options for %s>' % self.object_name

AttributeError: 'Options' object has no attribute 'object_name'


Request repr() unavailable.
Two things:

1. I saw my first one on 11/15/2013 at 10:31pm Eastern Time. I got 5 at that time and didn't get another one until ....

2. The first on of the recent batch was on 12/12/2013 at 7:41pm Eastern Time. Since then, I've gotten like 1700 or so. Seems like it might be related to something we pushed out then. We should be able to track that down pretty quickly given that we record all deployments.


Additional thought:

Clearly the error here is Raven trying to repr() something so it can show that something in whatever error is happening. i.e. This is an error triggered by trying to show the context for a different error. I don't know what happens when Raven hits this error. Does it send this error email and give up? Does it send this error email and then report the original error to Sentry? If it's the latter, then we should be able to see a plethora of errors in Sentry and we can fix the original error and ignore this one.


One last additional thought:

This kind of stuff from Raven drives me crazy. I'm a huge +1 on finding out if this is fixed in a recent version of Raven and if not, forking it and fixing it in our fork. It shouldn't send us a gazillion emails because something throws an error when it's repr()d. That's dumb and not helpful. The whole point of using Sentry was not to get floods of email.
Putting it in this sprint and making it a P1 on the off chance that it's indicative of some bigger problem which has had 1700 occurrences in the last 2 days.
Priority: -- → P1
Whiteboard: u=dev c=codequality p= s=2013.25
Grabbing this to look into.
Assignee: nobody → willkg
There's also this completely useless traceback:

Traceback (most recent call last):

  File "/data/www/support.mozilla.org/kitsune/vendor/src/raven-python/raven/utils/serializer/manager.py", line 73, in transform
    return self.transform(repr(value), **kwargs)

  File "/data/www/support.mozilla.org/kitsune/vendor/src/django/django/db/models/related.py", line 50, in __repr__
    return "<RelatedObject: %s related to %s>" % (self.name, self.field.name)

AttributeError: 'RelatedObject' object has no attribute 'name'


Request repr() unavailable.


I'm betting they're related. It even has "related" in the class name. QED.
New Relic shows a deployment on 12/12/2013 at 16:42. NR doesn't tell us what fucking tz that is, but that's the only nearby deployment, so if this is related to something we changed in code, then that's the candidate.

The things that went out with that are:

3b8e84412efae785ab870939fce456e7d274ad24 Fix escalate cron test.
6659c562e47323bc66f79372e50e3861172457be [Bug 928899] Add login tokens to answer emails.
2a0bbc8a119c9e8594c63f5d1b47f02af1f772a2 [bug 946498] Add a low end (7 days) to the range for question escalation.

Two things:

1. HOLY CRAP I LOVE JAMES SOOOOOO MUCH!
2. SRSLY! JAMES IS THE BEST!
I checked Sentry and don't see any events that correlate with the emails I'm getting. I suspect my worst fear is true: Raven is throwing an error when trying to tell us about an error and then just dies.

I'm going to assume that's the case and go from there.


3b8e84412efae785ab870939fce456e7d274ad24 Fix escalate cron test.

This literally just changes some numbers in TestEscalateCron and shouldn't affect production.


6659c562e47323bc66f79372e50e3861172457be [Bug 928899] Add login tokens to answer emails.

I looked for new code and changed code that might kick off when modules are loaded and didn't see anything obvious. Sometimes middleware is "interesting", so one possibility is to wrap the TokenLoginMiddleware.process_request in some exception handling code. I know we have a problem with middlewares reading request.GET and Django kicking up an IOError (we should fix that one, too, but that's a separate bug.)

Are we guaranteed to always have a request.user in TokenLoginMiddleware.process_request? What ensures that guarantee?

Probably unrelated, TokenLoginBackend.authenticate doesn't catch any errors. If we have malformed auth tokens, pretty sure that'll cause problems there. For example, base64.b64decode can throw a TypeError.


2a0bbc8a119c9e8594c63f5d1b47f02af1f772a2 [bug 946498] Add a low end (7 days) to the range for question escalation.

This one re-adds the escalate questions cron job which runs at 59 minutes past the hour. I don't see a correlation between the error emails I'm getting and the minutes past the hour. Also, they're not periodic. I don't think they're related to a cron job.

This changeset just re-enables the cron job with some minor changes. The changeset that adds the cron job is 1c9c5aa.


1c9c5aa [bug 946498] Automatically escalate questions.

This just adds some cron job code. I don't see anything that should have a non-periodic symptom if it was erroring out.


I'm going to grab this one and tweak the auth token login code and see where that gets us.
Status: NEW → ASSIGNED
I did some minor cleanup for the token auth code. In PR: https://github.com/mozilla/kitsune/pull/1775
I pushed those changes to prod and promptly got a bunch of emails. It's possible those emails were from bouncing things. We'll have to see. I'm betting that my changes didn't fix it, though.

Next steps include:

1. updating raven-python -- maybe that'll fix something
2. forking raven-python and instrumenting that code to see if we can get a better error message
3. updating django (bug #889467)
4. rewriting in node.js -- that seems to work for everyone else
We're running 3.1.16 of raven-python and the current version is 3.6.1. There are a lot of changes between the two versions and there are a handful that cover that transform method or are near by. I can't reproduce the issues we're seeing in production in my local environment, so it's hard to know for sure whether this will make things better, make things worse or have no effect.

However, I think there's a good chance this will make things better.

In a PR: https://github.com/mozilla/kitsune/pull/1777
Landed the raven-python update in master in https://github.com/mozilla/kitsune/commit/2c3fecb

Pushed that to production.

I'm still seeing these error messages. The traceback is slightly different now:

Traceback (most recent call last):

  File "/data/www/support.mozilla.org/kitsune/vendor/src/raven-python/raven/utils/serializer/manager.py", line 76, in transform
    return repr(value)

  File "/data/www/support.mozilla.org/kitsune/vendor/src/django/django/db/models/options.py", line 198, in __repr__
    return '<Options for %s>' % self.object_name

AttributeError: 'Options' object has no attribute 'object_name'


Request repr() unavailable.


I think the next step is to fork raven-python and instrument the code so we can get a better error email.

But that's a project for next year.
Whiteboard: u=dev c=codequality p= s=2013.25 → u=dev c=codequality p= s=2014.1
Saw this one today and it's probably related-ish if only in the way that it's raven probably obscuring the real error:

Traceback (most recent call last):

  File "/data/www/support.mozilla.org/kitsune/vendor/src/raven-python/raven/utils/serializer/manager.py", line 76, in transform
    return repr(value)

  File "/data/www/support.mozilla.org/kitsune/vendor/src/celery/celery/datastructures.py", line 326, in __repr__
    return repr(dict(self.iteritems()))

  File "/data/www/support.mozilla.org/kitsune/vendor/src/celery/celery/datastructures.py", line 341, in _iterate_items
    return ((key, self[key]) for key in self)

  File "/data/www/support.mozilla.org/kitsune/vendor/src/celery/celery/datastructures.py", line 329, in __iter__
    return self.iterkeys()

  File "/data/www/support.mozilla.org/kitsune/vendor/src/celery/celery/datastructures.py", line 337, in _iterate_keys
    return uniq(self._iter(lambda d: d.iterkeys()))

  File "/data/www/support.mozilla.org/kitsune/vendor/src/celery/celery/datastructures.py", line 334, in _iter
    return chain(*[op(d) for d in reversed(self._order)])

  File "/data/www/support.mozilla.org/kitsune/vendor/src/celery/celery/datastructures.py", line 337, in <lambda>
    return uniq(self._iter(lambda d: d.iterkeys()))

AttributeError: 'DictAttribute' object has no attribute 'iterkeys'


Request repr() unavailable.
I forked raven-python to do some work on it and read through the code involved. It suggests that when repr() fails, we'll get an error email, but that the raven-python code actually does recover and sends the original error to Sentry. 

Earlier, I said I don't see evidence that there's a spike of errors in Sentry that correspond with spikes of error emails and therefore raven-python was probably dying.

However, I think that theory is incorrect because:

1. Sure seems like the code is right.

2. It's likely that raven-python kicks up a bunch of error emails from repr failures in order to send one report to Sentry. Ergo, there won't necessarily be a spike of errors in Sentry corresponding with a spike in error emails. Further, we're talking about the Options object and it's very likely that there's more than one in a given context.


So... I don't think it's helpful to instrument raven-python right now and the risks are pretty high since we can only test this effectively in production.

Instead, I think it's worth pruning the errors in Sentry and at some point p, maybe the problems will go away.
Made this depend on upgrading to django 1.5. Pushing to backlog.
Whiteboard: u=dev c=codequality p= s=2014.1 → u=dev c=codequality p= s=2013.backlog
Upgrading django to 1.5 didn't help--still seeing the error emails as well as the copy/deepcopy issues.
Haven't seen any of these in ages. I suspect they stopped around the time we switched from threads to processes. Closing it as FIXED.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.