Closed Bug 462814 Opened 11 years ago Closed 11 years ago

Stats collection for Collections

Categories

(addons.mozilla.org Graveyard :: Collections, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: fligtar, Assigned: clouserw)

Details

Attachments

(1 file)

We need a way to gather some basic information on the success of Collections.

Would like to add a downloads column to the collections table and the addons_collections table.

When a user is ready to install a bundle of add-ons from a collection, we should ping a URL like /collections/stats/download?ids=1865,983,1534,1234 with the IDs of the selected add-ons.

This URL will increment the counter in the "collections" table for the total number of downloads of that collection, and will increment the counter for each add-on in "addons_collections" for the number of downloads of that add-on from that collection.

The normal add-on download counter should still cover the download so that the add-on gets credit for the download too.

Users with JS disabled will not have their bundle installs counted, but it's a rare enough case that it shouldn't matter.
We could, worst-case, add an event log in the app when this happens as well with serialized data.
Assignee: nobody → drolnitzky
Here are the stats we'd like to have for collections:

* Count of total users of the web application. This should include visitors to the web app main page, and how many users actually initiated creation of a collection bundle
* Total number of collection bundles downloaded
* Average number of add-ons selected per bundle
* Most popular add-ons selected per bundle (nice to have – not mandatory for launch)
* Number of active users that originated from the collections bundle application (nice to have – not mandatory). Hoping we can use the bandwagon reporting of the URL structure from Fligtar’s Feature Spec for this:
 - The URL structure needs to allow for determining an “active daily subscriber count” based on the number of feed pings each day. This URL would likely need to include the ping interval to eliminate multiple pings per day from the same user.

=====Fligtar's response via grouphub:
The fastest and easiest way to get this data is to ping a URL with the add-on IDs passed as GET parameters when the user installs. So when the user clicks to install a bundle, we just do an AJAX request to addons.mozilla.org/collections/statistics/install?ids=1865,1432,343,1234

Doing that will give us all the data required above except the last one. That pinged URL can increment counters in tables that already exist. (the collections table row for total downloads of that collection, and the addons_collections table for that add-on in that collection)

As for the last one, we can’t use the same mechanism we’re going to be using for Bandwagon subscription counts. The only way we could do this is to set a cookie from AMO that is read by the AMO update check script, and that will be a lot more work and a first in regards to add-on privacy issues.
Hi Fligtar -- is this happening with 4.0.3 or is this a 4.0.4 functionality?
I think we can run stats on these logs:
https://preview.addons.mozilla.org/en-US/firefox/collections/success?i=138,4999,5203

Success pages would get hits from people who completed the install with passed IDs.

That plus urchin should get us the top four bullet points.  Cross referencing this data with updates is difficult because we don't have unique identifiers.  So not sure how we would technically implement the last bullet point.
Assignee: drolnitzky → clouserw
Target Milestone: 4.0.3 → 4.0.4
Checked out Urchin, and I'm not seeing any way to get visitor counts to any of the FYF related pages (e.g. success page, front page, etc.).  Do all these logs have to be processed or is there any data currently available?
Can someone summarize what needs to happen in this bug?  Are we doing the XMLHttpRequest or are we using the success page?
I don't have a preference on what method we use, as long as the stats are accurate and available.
(In reply to comment #6)
> Can someone summarize what needs to happen in this bug?  Are we doing the
> XMLHttpRequest or are we using the success page?

The success page has the charm that its data is already available from the point of launch on (provided, we don't throw the logs away in the meantime).
I'm looking for some of the stats for this program to-date (as listed above):

Specifically, looking for:
* Daily traffic to the main FYF page from 11/11-present:
https://addons.mozilla.org/en-US/firefox/fashionyourfirefox/
* Number of pageviews/visitors to the FYF confirmation and first-run pages
* Referral traffic to the main FYF homepage (to see whether people were referred via AMO, Google, particular press pubs, etc).
* Average number of add-ons selected per bundle
* Most popular add-ons selected per bundle

If any of this data happens to be in Urchin and I'm missing it, let me know where I can find it and I'm happy to pull myself.
Attached patch pile of rageSplinter Review
Wow, what a pain. This patch took forever.  The /parse_logs/ code really hates tests and I had to move some stuff around to add them.  Please make sure this patch works correctly since I'd rather not screw up our already shaky stats track record further.

You're going to need a couple new columns:

ALTER TABLE `collections` ADD COLUMN `downloads` int(11) DEFAULT 0;
ALTER TABLE `addons_collections` ADD COLUMN `downloads` int(11) DEFAULT 0;

And you'll want to re-import remora-test-data.sql for some default test data.

Some things to note:

1) I'm assuming the existing `collections`.`subscribers` is not the same as downloads.  Can someone verify that?

2) It seems like we should have a UNIQUE key across addon_id and date in `download_count`.  Otherwise you can get multiple rows for the same addon and the same date.

3) The collection id is currently hardcoded to 1.  We aren't passing a collection id back anywhere so I have no idea of knowing what collection people are using.  Fligtar says this is something that needs fixing before bandwagon launches.  If someone can confirm this I'd recommend someone file a bug and make it a blocker of a bandwagon bug.  Who's doing bandwagon stuff?

Comments on requirements:

This patch implements fligtar's suggestions for adding two download columns.

> * Count of total users of the web application. This should include visitors to
> the web app main page, and how many users actually initiated creation of a
> collection bundle
You should be able to get part 1 from urchin or whatever we're using these days and the second can be found by SUM()ing all the downloads in the collections table.

> * Total number of collection bundles downloaded
SUM() of collections.downloads

> * Average number of add-ons selected per bundle
This patch can tell you the total number of add-ons in a bundle and the total number of each add-on installed from that bundle.  If you're asking for numbers regarding how many add-ons each person installed when they installed the bundle this patch doesn't track that.  Is that what you're talking about?

> * Most popular add-ons selected per bundle (nice to have – not mandatory for
> launch)
This sounds like: SELECT downloads from addons_collections where addon_id=X and collection_id=Y.  Is that what you're looking for?

> * Number of active users that originated from the collections bundle
> application (nice to have – not mandatory). Hoping we can use the bandwagon
> reporting of the URL structure from Fligtar’s Feature Spec for this:
>  - The URL structure needs to allow for determining an “active daily subscriber
> count” based on the number of feed pings each day. This URL would likely need
> to include the ping interval to eliminate multiple pings per day from the same
> user.
It sounds like fligtar's response in comment #2 says this isn't going to happen soon so I'm skipping it.

Sorry for the giant patch Fred but I think adding tests is worth it.
Attachment #350279 - Flags: review?(fwenzel)
> You should be able to get part 1 from urchin or whatever we're using these days
> and the second can be found by SUM()ing all the downloads in the collections
> table.

Actually, Urchin 5 is has very spotty data (and missing all of November). Urchin 6 doesn't have any page data available (at least I couldn't find any for the FYF site).  Perhaps that's a report we can enable.  In any case, we need to recommit to using Urchin for AMO web data or find an alternative solution.

> > * Average number of add-ons selected per bundle
> This patch can tell you the total number of add-ons in a bundle and the total
> number of each add-on installed from that bundle.  If you're asking for numbers
> regarding how many add-ons each person installed when they installed the bundle
> this patch doesn't track that.  Is that what you're talking about?

I'm not totally clear what you're saying here -- isn't this the same thing?  Specifically I'm interested in how many add-ons a user selected and installed from a given collection.  Are users selecting and downloading just 1?  5?  all of them?  Fligtar may have other requirements for Bandwagon.

> 
> > * Most popular add-ons selected per bundle (nice to have – not mandatory for
> > launch)
> This sounds like: SELECT downloads from addons_collections where addon_id=X and
> collection_id=Y.  Is that what you're looking for?

I think that's right.
(In reply to comment #11)
> > You should be able to get part 1 from urchin or whatever we're using these days
> > and the second can be found by SUM()ing all the downloads in the collections
> > table.
> 
> Actually, Urchin 5 is has very spotty data (and missing all of November).
> Urchin 6 doesn't have any page data available (at least I couldn't find any for
> the FYF site).  Perhaps that's a report we can enable.  In any case, we need to
> recommit to using Urchin for AMO web data or find an alternative solution.
> 
If data is sporadic that sounds like an IT bug.  The __utm include is on the page.

> > > * Average number of add-ons selected per bundle
> > This patch can tell you the total number of add-ons in a bundle and the total
> > number of each add-on installed from that bundle.  If you're asking for numbers
> > regarding how many add-ons each person installed when they installed the bundle
> > this patch doesn't track that.  Is that what you're talking about?
> 
> I'm not totally clear what you're saying here -- isn't this the same thing? 
> Specifically I'm interested in how many add-ons a user selected and installed
> from a given collection.  Are users selecting and downloading just 1?  5?  all
> of them?  Fligtar may have other requirements for Bandwagon.
> 

This patch can tell you:

Collection X has 5 add-ons in it.
Add-on 1 from collection X was installed 10 times.
Add-on 2 from collection X was installed 8 times.
etc.

You can add them all up and divide by the total number of collection downloads to get an average and you can use their install numbers to get popularity of each add-on in the collections.  Is that good enough?
So to confirm, we can get "collection downloads" -- meaning any one user that downloaded an add-on from a particular collection.

If so, then this is ok with me.
What does that mean?
In other words, I know that we had X number of users that downloaded an add-on from a specific Y collection.  I'm trying to determine the relative value of a certain collection.
Seems like Wil explained that in comment #12 -- but he used X instead of Y.  Wil?
Heh, sorry.

> In other words, I know that we had X number of users that downloaded an add-on
> from a specific Y collection.  

I'm with you here.

> I'm trying to determine the relative value of a certain collection.

I don't know what that means.  The values a collection has are:

1) Total downloads of the collection
2) Number of add-ons in the collection.

I'm on IM if you wanna talk.
Attachment #350279 - Flags: review?(fwenzel) → review+
Comment on attachment 350279 [details] [diff] [review]
pile of rage

I tried it out, and as far as I can tell both the code and the tests are good.
Thanks Fred, this is r20420. 

David: We didn't talk again since comment #17.  If it's still not clear I'd suggest you run your computations against preview.amo to make sure you're getting whatever you want.

->FIXED
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Where on preview can I see it?
I think there is a phpmyadmin interface to the preview.amo database but I don't know where it is or who has access.  morgamic?

Also, we'll need to run the above SQL before it starts working.
Can anyone tell me where I can find these stats?  I'd like to start looking at how the collections program is performing.
(In reply to comment #22)
> Can anyone tell me where I can find these stats?  I'd like to start looking at
> how the collections program is performing.

Wil is troubleshooting some problems with the script in bug 469737
Any update here?  Do we have accessible stats yet for this?
As mentioned in comment 23, bug 469737 is tracking stats for this.  If the latest comment there ran successfully stats should be in the new database dumps.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
We should be collecting this data by day the way update and download counts are done.
Target Milestone: 4.0.4 → 5.0.1
(In reply to comment #26)
> We should be collecting this data by day the way update and download counts are
> done.

This data isn't tracked on a daily basis - it's just totals.  Comment 2 mentions this and has fligtar's response and I said I was ignoring it due to that in comment 10.

If that's a requirement now we should open a new bug and be specific about what is needed (I've heard everything from just put numbers on a page to use timeplot to draw graphs).
sine bug 473679 and bug 473685 are filed I'm closing this one.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.