[AMO] Track link sharing statistics

RESOLVED FIXED in Unreviewed

Status

--
blocker
RESOLVED FIXED
10 years ago
9 years ago

People

(Reporter: clouserw, Assigned: dre)

Tracking

unspecified
Unreviewed
Dependency tree / graph

Details

(Whiteboard: Permissions fixed, backprocessing older data. Should close July 30th EOD.)

(Reporter)

Description

10 years ago
With the landing of bug 475584 we're offering "share this" links on add-on pages which will add links to several popular sites (digg, etc.).  We'd like to track how many links are added for each service so we can show the number next to the link on AMO.

The format of the URL in the logs is:

https://addons.mozilla.org/$lang/$product/addon/$addonid/share?service=$service

INSERT INTO stats_share_counts
(`addon_id`, `service`, `date`, `count`)
VALUES
({$addon_id}, {$service}, now(), 1)
ON DUPLICATE KEY UPDATE count=count+1;

To avoid having to file new stats bugs whenever we change services we'll handle removing bogus services on our end.  Let us know if there are any questions.  Ideally this is going into effect late on Mar 12th so if we need to backfill stats that will be when to start from.
(Reporter)

Comment 1

10 years ago
I left out the middle of my bug, but I meant to say we want to parse that URL and run the following query with correct parameters.
(Reporter)

Updated

10 years ago
Blocks: 482396
Mass update to QA Contact field.  Sorry for the bugspam
QA Contact: justin → data-reports
(Reporter)

Comment 3

10 years ago
Upping this to blocker since I would like to get bug 482396 into the 5.0.4 push.
Severity: normal → blocker
There is currently a problem with bots crawling these links.  This pollutes the numbers and makes it difficult to count legitimate clicks.

I can't easily throw my current User Agent parsing library at this problem because it is already being used for the extension download parsing and the current implementation is very thread heavy.  Having two instances of it would flood the parsing machine.  Eventually, I'll have a better library, but we don't have a roadmap for that yet.

I could potentially implement a rudimentary bot UA check, but that would take more custom development time, and honestly, I believe all it would take is one or two bots that are not caught by the filter to pretty much ruin the statistics with one big spidering like what we saw on the 15th.

The best solution I could think of when I spoke with Wil was for AMO to implement a secondary redirect page that can be excluded by robots.txt.  If that was done and rolled out, I could have the tracker for this bug done in less than one day.  Otherwise, I wouldn't feel comfortable trying to schedule it before the end of the month.

Assigning to Wil for his thoughts/decision on the robot problem before we move further on our end.
Assignee: nobody → clouserw
Status: NEW → ASSIGNED
(Reporter)

Updated

10 years ago
Depends on: 483690
(Reporter)

Comment 5

10 years ago
We've added the URL to robots.txt on our end.  Note that this changes the format of the URL from:

/$lang/$product/addon/$id/share/

to

/$lang/$product/addon/share/$id/

This will go live on April 9th.  Regarding the problem as it currently stands, I don't think it's a big enough deal to delay stat processing.  I'm assuming the duplicate address filter is still running so at worst we're tracking 1 additional hit per robot.  You said on IRC you could filter out the big robots, so I don't see a problem.  We'll always have a margin of error in the stats and this feels like a comfortable one to me.
Assignee: clouserw → deinspanjer
I spoke with Justin about working this into our schedule.  During that conversation, he suggested the very reasonable alternative of having a positive filter for only requests that have a common user agent (Gecko|MSIE|KHTML/Opera).  That should filter out almost all bots very well (unless any of the bots advertise one of those engines in their UA).  We'll turn that filter off after April 9th when we switch over to the new URL format.

We're going to take a miss on one of our quarterly goals and I will spend all of next week working on AMO bugs/feature requests.
(Reporter)

Comment 7

10 years ago
Thanks.  Let us know how we can help.
(Reporter)

Comment 8

10 years ago
Can we get a status update or ETA?  Thanks
This work is implemented, but I need to perform one more run of testing against the old format vs the new format.  This new processor will go into the current processing starting Saturday, 4/18 UTC.
This processor is in production and inserts will start being sent to AMO master tomorrow.
Status: ASSIGNED → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
(Reporter)

Comment 11

9 years ago
This got marked fixed but I don't see any data...?

mysql> select * from stats_share_counts;
Empty set (0.03 sec)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Gah!  This thing was all ready to go but the metrics@10.2.72.28 account doesn't have GRANTs for addons_remora.stats_share_counts so it wasn't turned on.  This bug should have been updated to request those permissions, not closed.  I'm sorry for the mix-up.

We had a purge of the staging table that stores this data on June 9th so I don't have the data before that.  We have the data from June 10th to the current day ready to import as soon as the privs are fixed.  Please give me a date range to go back to and I'll reprocess to pick up that older data right away.
Actually, Comment #0 says March 12th was the turn-on date so I'll set up a back-processing from then to June 9th.
(Reporter)

Comment 14

9 years ago
If you need privileges file a bug and I'll a+ it.  I don't know the specifics of what you need on what tables.
Whiteboard: Permissions fixed, backprocessing older data. Should close July 30th EOD.
Back-processed data to 2009-03-12 has been inserted and the current process has been exporting nightly.  Please let me know if there are any problems.
Status: REOPENED → RESOLVED
Last Resolved: 10 years ago9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.