Closed Bug 1277726 Opened 8 years ago Closed 8 years ago

New Relic Python agent runtime instrumentation error ("Exiting the transaction but the node stack is not empty")

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: emorley, Assigned: emorley)

References

Details

This only appears to be occurring on Heroku (both the prototype and the new stage instance), eg: https://rpm.newrelic.com/accounts/677903/applications/14179733/traced_errors/d85d96-f6bfcfde-291f-11e6-b947-b82a72d22a14 (A "SystemExit: 1" during /treeherder.webapp.api.jobs:JobsViewSet.create) Papertrail has more info: ``` ... File "/app/.heroku/python/lib/python2.7/site-packages/newrelic-2.64.0.48/newrelic/api/web_transaction.py", line 1342, in _nr_wsgi_application_wrapper_ result = wrapped(*args, **kwargs) File "/app/.heroku/python/lib/python2.7/site-packages/newrelic-2.64.0.48/newrelic/api/time_trace.py", line 103, in __exit__ parent = transaction._pop_current(self) File "/app/.heroku/python/lib/python2.7/site-packages/newrelic-2.64.0.48/newrelic/api/transaction.py", line 1163, in _pop_current assert last == node AssertionError ... 2016-06-02 17:07:04,601 (265/Dummy-1) newrelic.core.agent INFO - New Relic Python Agent Shutdown 2016-06-02 17:07:04,921 (265/Dummy-1) newrelic.api.transaction ERROR - Runtime instrumentation error. Exiting the transaction but the node stack is not empty. Node stack is [<newrelic.api.transaction.Sentinel object at 0x7f34a934e950>, <FunctionTrace {'label': None, 'params': None, 'rollup': None, 'group': 'Python/WSGI', 'name': 'Application', 'terminal': False}>]. Report this issue to New Relic support. File "/app/.heroku/python/lib/python2.7/site-packages/newrelic-2.64.0.48/newrelic/api/transaction.py", line 200, in __del__ self.__exit__(None, None, None) ``` More complete log output here (there a few variations): https://emorley.pastebin.mozilla.org/8874582 -> This is a bug in the New Relic Python agent, have filed: https://support.newrelic.com/tickets/196417/edit
Blocks: 1277304
Depends on: 1278267
We've had several replies back and forth on the New Relic ticket - the New Relic Python agent maintainers are currently digging into it.
Assignee: nobody → emorley
We've tried a suggested workaround, however it didn't help. They are looking further.
Had three more "thank you for your patience, we're working on it" replies from New Relic support on the 9th, 13th, 15th June.
I've since provided them with the exact Heroku slug archive used, since they were having problems reproducing.
We no longer hit the issue after bug 1281056 landed, since we no longer use our own API for jobs ingestion. Third party submissions are unlikely to hit it, since their requests were not near the 30s threshold anyway. (Plus Taskcluster is now using Pulse ingestion). As such this doesn't block the Heroku migration any more, however I'll leave the ticket open to track any resolution on the New Relic side.
No longer blocks: treeherder-heroku, 1277304
Priority: P1 → P2
""" Hi Ed, We realize it has been a while since our last update, but just wanted to let you know that our Engineering team is still investigating the issue. Since it has been a bit harder than expected to track down the cause of the error, we think the best course of action is to file a bug report, close this ticket, and update you outside of this ticket as we make progress. The good news is that the log message you are seeing when this error is triggered looks more worrisome than it really is. What we believe to be happening is that the agent may be dropping the one transaction when this edge-case is hit, but otherwise is functioning normally. Per our normal process, we will be continuing to work on this issue and updating you outside of this ticket as updates are available. Thank you as always for your patience. If you have any other follow ups or if we can clarify anything, please just let us know! Regards, Andrew """ Given we no longer hit this due to changes on our side (comment 5), marking INCOMPLETE for now.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.