Created attachment 705430 [details]
aceproject stats csv
On the back of an email conversation about AceProject.
The daily downloads listed in the statistics page don't match the entries in the users_install table. E.g. there is one record for an install in the DB table on 2013-01-19 but 32 shown on the statistics page.
See attached for csv (permission given to disclose these in this bug). Query output that Andy ran:
mysql> select created from users_install where addon_id = 371245
order by created;
| created |
| 2012-05-15 10:31:33 |
| 2012-06-26 10:32:01 |
| 2012-07-17 21:53:17 |
| 2012-07-23 16:48:41 |
| 2012-08-13 16:23:22 |
| 2012-10-01 15:05:02 |
| 2012-10-27 23:55:54 |
| 2012-12-25 07:11:24 |
| 2013-01-09 12:56:37 |
| 2013-01-19 13:27:50 |
| 2013-01-22 09:27:10 |
| 2013-01-22 17:59:08 |
12 rows in set (0.02 sec)
Idle speculation on my part is that the users_install table is only logged in users and on days without a logged in user install the total installs aren't being recorded.
Or possibly all downloads between those dates are being 'rolled up' into that day, i.e. there were 32 downloads between 2013-01-10 and 2013-01-19 and they're all being shown on a single day.
We are logging each install once per user. And since we allow anonymous installs, this means that for the second installation by any anonymous user, request.amo_user=None, and we record the install only once: https://github.com/mozilla/zamboni/blob/master/mkt/receipts/views.py#L84
We should probably key off user if the user is installed. Otherwise, insert a new record for anonymous users.
Aim for this week on this one since our stats stuff is changing and it would be good to have it accurate first.
If I understand the code correctly, the CSV file is being generated out of ES:
I have to go looking for where we are indexing this data to see if maybe that is the cause for the clumpiness.
Looking at the indexing task it appears everything is fine with that and I can't seem to find the flaw that is making it so only some days have data and the rest are 0.
I am pretty sure, unless someone has some other data, that the anonymous user download/install tracking is a separate issue. I am going to file it as a separate ticket.
made bug 836586 of the anonymous user issue.
This is going to require a dumping of the stats index and reindexing of all of days so far. Not sure where I should put that information. But the bug causing this is fixed.
You're talking about reindexing just marketplace stats?
Yes, it was the mkt.stats.search that was wrong, only marketplace uses that.
Did you mean to mark this fixed rather than unconfirmed?
(In reply to Wraithan from comment #9)
> Yes, it was the mkt.stats.search that was wrong, only marketplace uses that.
How do we clear them? Just give us the commands we need to run when we push
curl -XDELETE http://localhost:9200/amo_stats/users_install/
Where you replace localhost:9200 with where ES runs in prod and amo_stats with the value of settings.ES_INDEXES['users_install'] which defaults to amo_stats.
This will repopulate data from all time. It has chunking setup so it will do its best to not do queries/etc that are too large.
Going out today. Verify on friday.