Closed Bug 965006 Opened 11 years ago Closed 11 years ago

Generate report: number of banners/user

Categories

(Firefox Affiliates Graveyard :: affiliates.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: hoosteeno, Unassigned)

Details

We'd like to know how many banners people have - the average, median and max are probably enough to work with. Affiliates developers can put SQL in this bug and then we can loop in the DBA team.
This query should be sufficient to get us the average and max: SELECT AVG(t.banner_count) AS average_banners, MAX(t.banner_count) as max_banners FROM (SELECT COUNT(badges_badgeinstance.user_id) AS banner_count FROM badges_badgeinstance LEFT JOIN auth_user ON auth_user.id = badges_badgeinstance.user_id GROUP BY auth_user.id) AS t; However, for the median, we need a larger beast that comes mostly from Stack Overflow: SELECT AVG(t1.banner_count) as median_banners FROM ( SELECT @rownum:=@rownum+1 as `row_number`, data.banner_count FROM (SELECT COUNT(badges_badgeinstance.user_id) AS banner_count FROM badges_badgeinstance LEFT JOIN auth_user ON auth_user.id = badges_badgeinstance.user_id GROUP BY auth_user.id) AS data, (SELECT @rownum:=0) AS r ORDER BY data.banner_count ) AS t1, ( SELECT COUNT(*) AS total_rows FROM (SELECT COUNT(badges_badgeinstance.user_id) AS banner_count FROM badges_badgeinstance LEFT JOIN auth_user ON auth_user.id = badges_badgeinstance.user_id GROUP BY auth_user.id) AS data ) AS t2 WHERE t1.row_number IN (floor((total_rows+1)/2), floor((total_rows+2)/2)); Yeah. But, testing locally, it works! And it's just SELECTs so nothing should happen to the data.
:sheeri, is this something you can help us with?
Flags: needinfo?(scabral)
Those queries are right and get good data (I double-checked the median). The queries aren't 100% optimal - the first one uses a derived table, and the 2nd one uses 4 derived tables - historically MySQL hasn't been good at these, but we haven't done much testing about how much faster these are now that we're on MySQL 5.6. However, the affiliates database in production has about 53,000 banners in total, so this query shouldn't take too long. As a one-time or once-a-month or even once-a-week query, this should be fine. Remember that you have a read-only instance you can use, so that just in case this does end up running away from you (if data size grows a lot) it won't affect writes. I'm happy to work on optimizing this query (my initial guess would be to use temporary tables to avoid all the in-memory derived tables, but I'd have to test if that actually makes things better), but before I can prioritize and figure out how much time I should spend on it, I'd need more information about how frequently it's going to be used. Right now the queries each take a second or 2, so they're fast even though they may not be optimal.
Flags: needinfo?(scabral)
Sheeri, thanks! As we discussed in IRC, we don't really need this to be scheduled. We just need it run once, against prod. You can paste the output here. It's not a big rush, but in the next couple of days would be most helpful.
+-----------------+-------------+ | average_banners | max_banners | +-----------------+-------------+ | 1.4471 | 54 | +-----------------+-------------+ 1 row in set (0.17 sec) mysql> SELECT AVG(t1.banner_count) as median_banners FROM ( -> SELECT @rownum:=@rownum+1 as `row_number`, data.banner_count -> FROM (SELECT COUNT(badges_badgeinstance.user_id) AS banner_count FROM badges_badgeinstance LEFT JOIN auth_user ON auth_user.id = badges_badgeinstance.user_id GROUP BY auth_user.id) AS data, -> (SELECT @rownum:=0) AS r -> ORDER BY data.banner_count -> ) AS t1, -> ( -> SELECT COUNT(*) AS total_rows -> FROM (SELECT COUNT(badges_badgeinstance.user_id) AS banner_count FROM badges_badgeinstance LEFT JOIN auth_user ON auth_user.id = badges_badgeinstance.user_id GROUP BY auth_user.id) AS data -> ) AS t2 -> WHERE t1.row_number IN (floor((total_rows+1)/2), floor((total_rows+2)/2)); +----------------+ | median_banners | +----------------+ | 1.0000 | +----------------+ 1 row in set (0.28 sec) Also some checking I did, for the average: mysql> SELECT sum(t.banner_count) as sm, count(t.banner_count) as cnt FROM (SELECT COUNT(badges_badgeinstance.user_id) AS banner_count FROM badges_badgeinstance LEFT JOIN auth_user ON auth_user.id = badges_badgeinstance.user_id GROUP BY auth_user.id) AS t; +-------+-------+ | sm | cnt | +-------+-------+ | 52362 | 36183 | +-------+-------+ 1 row in set (0.15 sec) This next query gets the # of users who have 1 banner (checking that median banner = 1 thing): mysql> SELECT COUNT(*) FROM (SELECT COUNT(badges_badgeinstance.user_id) AS banner_count FROM badges_badgeinstance LEFT JOIN auth_user ON auth_user.id = badges_badgeinstance.user_id GROUP BY auth_user.id) as banner_counts where banner_count=1; +----------+ | COUNT(*) | +----------+ | 27133 | +----------+ 1 row in set (0.20 sec) OK, so out of 36,183 total (as per the cnt in the previous query), 27,133 have 1 banner. that's almost 75%, and I verified using the same query "where banner_count>1" that the result was 9050. So, asked, answered and verified! Makes sense to me!
Resolving; I'm pretty sure you have what you need. If not, reopen and let us know!
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Thanks Sheeri, I think so!
Status: RESOLVED → VERIFIED
Product: Firefox Affiliates → Firefox Affiliates Graveyard
You need to log in before you can comment on or make changes to this bug.