Closed Bug 611706 Opened 14 years ago Closed 13 years ago

Sometimes the line chart isn't drawn

Categories

(addons.mozilla.org Graveyard :: Statistics, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: clouserw, Assigned: jbalogh)

References

Details

This is more reproducible when I change to custom dates, but it's not 100%.

To reproduce:

1) Load something like https://addons.mozilla.org/z/en-US/firefox/addon/1865/statistics/?start=20100701&end=20101112

2) If you're (un)lucky the line chart won't appear.  Mousing over it will show the numbers as you'd expect, it's just not drawn.

A force refresh will make it show up.
Assignee: nobody → thepotch
Target Milestone: 5.12.4 → 5.12.7
Tried this today in minefield and chrome and couldn't reproduce.
Target Milestone: 5.12.7 → Q1 2011
Target Milestone: Q1 2011 → Q2 2011
(In reply to comment #1)
> Tried this today in minefield and chrome and couldn't reproduce.

I can no longer reproduce the "mousing over it will show the numbers" part, but on adblock-plus I have yet to see a graph today for any combination of dates (or just the defaults)
Target Milestone: Q2 2011 → Q3 2011
Alright, the mouse over part isn't relevant anymore, but it's true that graphs aren't getting drawn.  Looks like the HTTP requests timeout after 45s or so.  I suspect too slow of SQL queries
Assignee: thepotch → nobody
Assignee: nobody → ashort
Target Milestone: Q3 2011 → 6.1.8
Generating the JSON with downloads stats does two SQL queries. 

For the URL given in the reproduce instructions it does:

    SELECT `download_counts`.`id` FROM `download_counts` WHERE (`download_counts`.`date` BETWEEN '2010-07-01' and '2011-11-12' AND `download_counts`.`addon_id` = 1865 )


    SELECT `download_counts`.`id`, `download_counts`.`addon_id`, `download_counts`.`count`, `download_counts`.`date`, `download_counts`.`src` FROM `download_counts` WHERE (`download_counts`.`date` BETWEEN '2010-07-01' and '2010-11-12' AND `download_counts`.`addon_id` = 1865  AND `download_counts`.`id` IN (26384390, 26277676, 26375220, 25533443, 26013270, 25602550, 25731403, 26098479, 25996331, 26081796, 26393383, 25593805, 25290450, 25936430, 25340002, 25987851, 25273561, 25867377, 25576852, 26030431, 25611642, 26286886, 25374039, 25517084, 25222999, 26411093, 25823850, 25331878, 26140334, 25482718, 25231700, 25806371, 25256834, 26165711, 25491368, 26038932, 26348159, 25197989, 26330978, 26115108, 25567928, 26451040, 25919137, 25620249, 25962150, 25265211, 25391336, 26469694, 26251263, 26428818, 26322365, 25722902, 26056165, 25458065, 25499934, 25698241, 25953486, 26191228, 25441207, 25646057, 25789641, 26174238, 25910042, 26123530, 26233643, 26073261, 26242210, 26419779, 25239941, 26199783, 25797832, 25892235, 26460145, 25706410, 25525231, 26157160, 25206298, 26356844, 25508439, 25944962, 26304373, 25654233, 25875764, 25424290, 25214551, 26208239, 25689625, 25415923, 25382847, 26224910, 25979322, 25970703, 26268912, 26021745, 25281825, 25680843, 26148704, 25773021, 26339509, 26365719, 25764537, 25466370, 25306930, 25474507, 26402261, 26131993, 26295456, 25348484, 25663016, 25629078, 25671796, 26438003, 26216581, 25298637, 25365404, 25407708, 25815001, 25541725, 25585343, 25841520, 25550143, 26313313, 25637801, 25323768, 25740096, 25850273, 25884142, 25449538, 25714623, 25315280, 25399585, 25248429, 25432879, 26260037, 25558987));

The second query can take several seconds when the time range is large. I talked to jbalogh about this and he suggested the inclusion of the IDs in the query improves its cacheability.

These queries use the "addon_date_idx" index, so it's not clear how to immediately speed them up.
Target Milestone: 6.1.8 → Q3 2011
Combining the two queries is still slow for me, for what it's worth.  (30s on cm-webdev01-master01.  comparable to separate queries).
Are the file ids actually needed for rendering purposes? I tried running the following similar query in our datawarehouse:
select addon_id, sum(requests), date, src from amo_download_requests_by_day a join dates d on a.utc_date_id = d.date_id where date between '2010-07-01' and '2010-11-12' and addon_id = 1865 group by 1,3,4;

That query takes 3 seconds to fetch the first row.  Most of that time is spent doing the join to the dates table. If I plug in the date ids instead:
select addon_id, sum(requests), utc_date_id, src from amo_download_requests_by_day a where utc_date_id between 3835 and 3969 and addon_id = 1865 group by 1,3,4;

Then it takes only 500ms.  These queries aren't optimized.  If we wanted to investigate performing these statistics type queries directly on the datawarehouse then we could create some projections (Vertica's form of indicies) that would keep all of the queries at a consistently fast speed.
We don't use file ids in our graphs - See the link in comment 0 for an example of the stats we offer.

I'm very interested in outsourcing this to you as our data and offerings are just continuing to grow.  What are the next steps for investigating what that would take? :)
If we can't get interaction with the metrics cluster here we'll have to start looking at alternatives to mysql it sounds like.  Do you have any suggestions?
Assignee: ashort → jbalogh
Target Milestone: Q3 2011 → 6.2.2
13:43 <jbalogh> I want one of those war propaganda posters with kumar encouraging testing
13:43 <jbalogh> and deletion
13:43 <jbalogh> clouserw: I have stats serving from elastic
13:43 <jbalogh> and fast
13:44 <jbalogh> I need to make some scripts to index all the old data and handle backfilling and all that


-> next week for some additional scripts
Target Milestone: 6.2.2 → 6.2.3
This moves all the slow stuff to ES: https://github.com/jbalogh/zamboni/commit/e21f588 

Note that I have to kick off all the indexing before stats show up.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
(In reply to Jeff Balogh (:jbalogh) from comment #10)
> This moves all the slow stuff to ES:
> https://github.com/jbalogh/zamboni/commit/e21f588 
> 
> Note that I have to kick off all the indexing before stats show up.

Let me know when this is done.
I think it's done
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.