Closed Bug 1534016 Opened 6 years ago Closed 6 years ago

stackdriver error reporting combines unrelated errors

Categories

(Taskcluster :: Operations and Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: dustin, Unassigned)

Details

Attachments

(2 files)

The headline for this error is an error from the hooks service from a few days ago. The errors in the "recent samples" list are from the auth service and are about roles, with a different error message. Yet these were combined..

https://console.cloud.google.com/errors/CJSTprrxmruauQE?time=PT1H&project=heroku-logging

Increasing the time-range of that view does, indeed, show the hooks error. It appears that stackdriver only matches on the first line of the stack?

 at ServerResponse.res.reply (schema.js:76) 

Is that configurable?

Even then, it appears to allow some slop. So basically the error are matched on "happened in a file named X somewhere near line Y".

TypeError: Cannot read property 'user_id' of undefined
    at Handler.identityFromProfile (/app/services/login/src/handlers/mozilla-auth0.js:162:47)
    at Handler.getUser (/app/services/login/src/handlers/mozilla-auth0.js:90:26)

and

TypeError: Cannot read property 'fxa_sub' of undefined
    at Handler.identityFromProfile (/app/services/login/src/handlers/mozilla-auth0.js:172:47)
    at Handler.getUser (/app/services/login/src/handlers/mozilla-auth0.js:90:26)

are considered equivalent.

Which is sort of worse than useless.

Another instance:
https://console.cloud.google.com/errors/CMuy9diB767SgwE?time=P7D&project=heroku-logging

seems to be matching any error with "at process._tickCallback (next_tick.js:61)" in its stack. And those familiar with node stacks will notice that's the bottommost line on just about every one. This means that lots of unrelated errors are being folded into this single error record and thus not reported in such a way that we can see them.

One of the raw tracebacks is

Time:2019-03-08T15:25:40.4539092Z
at /app/node_modules/fast-azure-storage/lib/queue.js:321:27
at tryCallOne (/app/node_modules/promise/lib/core.js:37:12)
at /app/node_modules/promise/lib/core.js:123:15
at flush (/app/node_modules/asap/raw.js:50:29)
at process._tickCallback (internal/process/next_tick.js:61:11)

Note that this is also missing the first two (more useful!) lines of the error message.

It looks like the original error report in comment 0 has been replaced with
https://console.cloud.google.com/errors/CKeRk8nN5_jzxgE?time=P7D&project=heroku-logging&organizationId=442341870013

and is again logging several different errors that happen to be handled through the same codepath.

Now filed as a GCP support ticket.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: