Closed Bug 323904 Opened 19 years ago Closed 19 years ago

"Downloads this Week" is greater than "Total Downloads"

Categories

(addons.mozilla.org Graveyard :: Public Pages, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: will_bugzilla, Assigned: morgamic)

References

()

Details

Attachments

(1 file, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8) Gecko/20051107 Firefox/1.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8) Gecko/20051107 Firefox/1.5

The download figures seem to be broken at https://addons.mozilla.org/extensions/showlist.php
Extension which are still in their first week should have "downloads per week" equal to "total downloads".

"Downloads per week" is bigger than, and growing faster than, "total downloads".

Reproducible: Always

Steps to Reproduce:
visit https://addons.mozilla.org/extensions/showlist.php?application=firefox&numpg=10&category=Newest

Actual Results:  
Extension which are still in their first week have "downloads per week" greater than "total downloads".


Expected Results:  
in the first week "downloads per week" should equal "total downloads".


All of the newest extensions (in their first week) show this problem. Examples:
Concrete examples:
https://addons.mozilla.org/extensions/moreinfo.php?id=1854
https://addons.mozilla.org/extensions/moreinfo.php?id=1865
https://addons.mozilla.org/extensions/moreinfo.php?id=1864

There is 4 days worth of downloads figures for one extension at
http://viewmycurrency.wordpress.com/2006/01/18/download-figures-from-mozilla-update/
There are ~20 other addons affected by this.  I will run some tests on maintenance.php to see what's going on.  Thanks for submitting the bug.

If anybody has ideas, please comment here.  Here is some background information on the script that handles the counts.

First, our users download something -- which causes a request to go out to:
http://lxr.mozilla.org/update1.0/source/core/install.php

The install script logs the requests in the `downloads` table, but doesn't update any counts right away.  That is done later because it makes more sense to do it on a scheduled basis instead of everytime someone makes a request -- in the past, this actually brought AMO to its knees, which was why maintenance.php was written.

Script:
http://lxr.mozilla.org/update1.0/source/core/maintenance.php

The script itself is called on a scheduled basis (maybe justdave can add info about the actual schedule).  There are three possible modes: weekly, total and gc.  Weekly is responsible for getting all counts over the last 7 days and updating the value accordingly in main.downloadcount  Total gathers all non-counted download entries and icrements main.totaldownloads accordingly.  Gc is used for cleaning out records that are older than 8 days.

So -- something must be happening with scheduling, deleting totaldownloads counts that should be counted before deletion, or... ?
Assignee: Bugzilla-alanjstrBugs → morgamic
Status: UNCONFIRMED → NEW
Ever confirmed: true
Ok, so my theory is that during the total update, the process of marking things as 'counted' is lacking the WHERE clause for dID.  It assumes that after the update everything has been counted, but there is a delay between the point were things are selected, updated and marked as counted.

The solution would be to use subqueries or to add a "WHERE dID IN()" on the end of the set counted=1 query.  This would prevent records in `downloads` from being erroneously marked at counted.

Thoughts?
Status: NEW → ASSIGNED
OK, I just took a look at the maintenance.php code, and that's definitely what's happening.  You're not locking the downloads table between running the query and marking everything counted, and at the rate we're getting downloads, there's almost guaranteed to be new stuff in there before you clear them all.  using IN() is going to be insanely huge, so that's probably not a good idea.  The safe way would be to completely lock the table from the time you select until you mark them counted.  But that's not a realistic thing to do unless the query can run in under 5 seconds.  It doesn't, it takes over 5 minutes.

aha, we have a primary key on that table.

Snag MAX(dID) when you start.  Make your counting query select only records with that dID or lower.  Make your "this is now counted" update only update records with that dID or lower.
Sounds good -- that is what I suspected.  I was going to do an IN() on dID but MAX() is a great idea, so I'll do that instead.

Patch coming.
$max_did will always be set if affected_rows > 0, and the query for marking records as counted now has a clause in it so it only processes where dID < $max_did/
Attachment #208946 - Flags: first-review?(justdave)
[morgamic@chameleon tip]$ php -q core/maintenance.php total
Retrieving uncounted downloads ...
Updating download totals ...
Affected rows: 437    Time: 17799.299265146
Exiting ...

Seemed to work for me.
Comment on attachment 208946 [details] [diff] [review]
Added MAX(dID) to ensure only counted records are marked as counted.

does that get the real MAX(dID) in each row, or is it the max of the dID column that matched the rest of the results for that row?
Comment on attachment 208946 [details] [diff] [review]
Added MAX(dID) to ensure only counted records are marked as counted.

This query is incorrect.  The MAX(dID) needs to be pulled separately.  It should then be added to the master select, and added to the counted=1 update SQL.
Attachment #208946 - Flags: first-review?(justdave) → first-review-
Attachment #208946 - Attachment is obsolete: true
Dave is on vacation for now, but he can take a look at it when he gets back.  Should be fixed soon.  :)
Attachment #209819 - Flags: first-review?(justdave)
Comment on attachment 209819 [details] [diff] [review]
Added correct method to get max_id to ensure accuracy for total counts.

>+                `counted`=0 AND
>+                dID <= '{$max_id}'

Not sure how all the php stuff works, but I'm assuming the quotes here actually go into the SQL...   this is an integer and we know it's an integer, so it doesn't really need quotes.  MySQL won't care, so it doesn't hurt anything either.  I'm just used to cross DB stuff, and other databases hate quotes on integer fields :)

>             // Mark the downloads we just counted as counted if:
>-            //      a) it is a day count that is more than 8 days old
>-            //      b) it is a download log that has not been counted
>+            //      c) it has a key lower than max_id, because 
>+            //         all keys lower than max_id have been counted above

Do we even need a letter here if it's the only thing?  (it looks funny with a c) all be itself.

Otherwise this looks good, and it's all nits. :)
Attachment #209819 - Flags: first-review?(justdave) → first-review+
PHP/MySQL won't care about the '' but I'll clean that up.  I will also make my 1-item list not a list.  :)

Thanks for the r+.

Will -- over the next 7-8 days you should see the counts even out.  Feel free to graph that, too if you want.  :)

If you see that the problem has not gone away, please let us know by filing it on this bug.  Thanks for taking the time to fill out the bug -- we appreciated the heads-up.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
... and here are the pretty graphs to show that the fix worked:
http://viewmycurrency.wordpress.com/2006/02/04/fixed-download-figures-from-mozilla-update/

Good work guys!
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: