event label for some sumo_suggest1 events is "https:" in GA

RESOLVED FIXED

Status

RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: willkg, Assigned: willkg)

Tracking

Details

(Whiteboard: u=user c=feedback p=2 s=input.2015q2)

Attachments

(2 attachments)

Yesterday we pushed out the Thank You page suggestions. One of the thing it does is push events to GA for the links users are clicking. This lets us get success metrics without tracking individual sessions or collecting PII.

Today I looked at the Event Flow diagram for the sumo_suggest1 event. There's a group of sumo_suggest1 events for category "view" that have a label of "https:". That's not a complete url, so it feels really fishy.

This bug covers figuring out what's going on here. It feels like a bug somewhere. Maybe an incorrect assumption or a bit of code that's ordered wrong?
Grabbing this to work on now.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
I can't figure out what might be happening and I can't reproduce it.

I threw some error logging in to see if that helps shed some light on the issue.

PR: https://github.com/mozilla/fjord/pull/597
Error logging landed in https://github.com/mozilla/fjord/commit/e840796a297b3e4601bc23659b3a345c83bc42fb

Pushed it out just now. We'll see what happens.
Still getting the weird events, but there's nothing in the journal.

PR for tweaking that code: https://github.com/mozilla/fjord/pull/599

Landed in https://github.com/mozilla/fjord/commit/26f54cec6bb444b72836cfc8548816eb6b7962b2

After landing that, it occurred to me that we're talking 1000 events in the last week. Might as well just log everything for a bit rather than this trial-and-error nonsense.

In a PR: https://github.com/mozilla/fjord/pull/600
Possible theories:

1. SUMO isn't sending a full url sometimes

I wrote a script that went through the last week of responses and asked SUMO for suggestions and it returned urls every time. Maybe there's some timing circumstance where this isn't true, but after running my test and looking at the code, I'm skeptical.

2. GA is truncating the event value

GA says that the label field has 500 bytes and the value field has no limit. The urls are definitely not 500 bytes long. I'm skeptical this is the problem. Even if it was, why truncate after "https:"?

https://developers.google.com/analytics/devguides/collection/analyticsjs/field-reference#eventLabel

3. Input is transforming the url into "https:" possibly due to a bad urljoin

I haven't completely ruled this out, yet, but I'm skeptical.

4. sending data to GA during a test run

We don't use "https:" as a string in the tests, but it's possible.

5. nefarious people

It's possible. I'm not sure how, though. I don't know how to prove/disprove this theory, either.
I spent a couple of hours poking through the GA interface. Other than the Event Flow report, I see no evidence that GA has a bunch of "https:" labels. I see a lot of evidence it's got valid values. The logging I added to Input all looks good.

I think at this point, I'm ready to declare this is some weirdo bug with GA. I can't know for sure because I can't seem to get the right data out of GA to verify one way or the other.

I'll let the error logging go another day and if it discovers nothing, I'll nix it and push off figuring out what to do here until some later date.
Created attachment 8622618 [details]
1434378200.png

Event flow report. That "293" section in the first step is primarily made up of "view" events with "https:" labels.
Created attachment 8622619 [details]
1434029600.png

Screenshot of the "https:" labels.
We logged all the redirection urls for the last day and they all look fine. GA is still showing "https:" labels in the Event Flow report, but no where else. I conclude that this must be a bug in GA.

Given that, I'm removing the logging code.

In a PR: https://github.com/mozilla/fjord/pull/603
PR 603 landed in https://github.com/mozilla/fjord/commit/9a0e8ec4adde04c5a237f7e2b0107283bf8a9bd5

Pushed it to prod just now.

Going to close this out because I'm pretty sure it's a GA bug and I don't think there's anything we can do about it.
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Whiteboard: u=user c=feedback p= s=input.2015q2 → u=user c=feedback p=2 s=input.2015q2
Product: Input → Input Graveyard
You need to log in before you can comment on or make changes to this bug.