Last Comment Bug 384084 - Create a script to parse AMO logs and update add-on counts
: Create a script to parse AMO logs and update add-on counts
Status: RESOLVED FIXED
:
Product: addons.mozilla.org Graveyard
Classification: Graveyard
Component: Maintenance Scripts (show other bugs)
: 3.0
: All All
: -- blocker
: 3.1
Assigned To: Andrei Hajdukewycz [:sancus]
:
:
Mentors:
http://blog.mozilla.com/webdev/2007/0...
: 386851 386858 392451 (view as bug list)
Depends on:
Blocks: 384085
  Show dependency treegraph
 
Reported: 2007-06-11 15:55 PDT by Michael Morgan [:morgamic]
Modified: 2016-02-04 14:55 PST (History)
18 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Parses Addons logs and inserts them into the downloads table. (3.63 KB, text/plain)
2007-07-12 17:47 PDT, Jeremy Orem [:oremj]
no flags Details
download count log parser (3.71 KB, patch)
2007-07-18 13:05 PDT, Andrei Hajdukewycz [:sancus]
no flags Details | Diff | Splinter Review
updated download counter (5.32 KB, patch)
2007-07-26 12:24 PDT, Andrei Hajdukewycz [:sancus]
morgamic: review-
Details | Diff | Splinter Review
download counter with arguments (5.51 KB, patch)
2007-07-31 14:10 PDT, Andrei Hajdukewycz [:sancus]
morgamic: review+
Details | Diff | Splinter Review
weekly counting patch (1.73 KB, patch)
2007-08-14 16:57 PDT, Andrei Hajdukewycz [:sancus]
no flags Details | Diff | Splinter Review
daily download counts table (501 bytes, text/plain)
2007-08-14 16:58 PDT, Andrei Hajdukewycz [:sancus]
no flags Details
maintenance.php weekly download patch (1.15 KB, patch)
2007-08-22 11:15 PDT, Andrei Hajdukewycz [:sancus]
wclouser: review+
Details | Diff | Splinter Review

Description Michael Morgan [:morgamic] 2007-06-11 15:55:10 PDT
Currently install counts are logged real-time which is a drain on resources.  Parsing logs nightly to update weekly or daily counts would be more useful.

This method will be deprecated in order to save resources and make the app scale better during peak load times.

We need a two things out of this script:
a) count unique logs hitting the downloads controller to generate weekly counts for each add-on, storing the information in addons.weeklydownloads
b) the same script should update addons.totaldownloads as appropriate
Comment 1 Michael Morgan [:morgamic] 2007-06-29 09:57:32 PDT
Andrei said he's working on it...
Comment 2 Michel Gutierrez 2007-07-04 08:09:50 PDT
*** Bug 386851 has been marked as a duplicate of this bug. ***
Comment 3 Justin Scott [:fligtar] 2007-07-04 11:44:31 PDT
*** Bug 386858 has been marked as a duplicate of this bug. ***
Comment 4 Jeremy Orem [:oremj] 2007-07-12 17:47:24 PDT
Created attachment 272096 [details]
Parses Addons logs and inserts them into the downloads table.

Parses Addons logs and inserts them into the downloads table.

You will also need this table:

CREATE TABLE `logs_parsed` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `logname` varchar(300) NOT NULL default '',
  `done` tinyint(3) unsigned NOT NULL default '1',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `logname` (`logname`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Comment 5 Wil Clouser [:clouserw] 2007-07-12 18:35:16 PDT
It seems like the downloads table is kind of an unnecessary middle-man to me.

We're filling it with data, only to delete it after 8 days (via the maintenance script).  As far as I know, the only thing we use the table for is to calculate total downloads and store in the addons table.  Why not just go directly from logs -> addons table, and save the logs for 8 days?

My motivation for the suggestion is that the database dumps are becoming too large to import on khan-vm.  I'll write a script to strip the downloads/logs table before I import it if I have to, but I'd rather have an exact mirror of the production db.
Comment 6 Justin Scott [:fligtar] 2007-07-12 18:50:45 PDT
See also: bug 335569, having to do with never getting rid of data
Comment 7 Michael Morgan [:morgamic] 2007-07-12 23:48:48 PDT
Parsing 8 days of logs is not efficient.  A day takes about 15 minutes to get parsed.

An alternative would be to store per-day counts in a separate table, which would get us trends over time for later if we chose to offer graphs of d/l over time to users.  Then the parser could just worry about updating that table 1 day at a time and a maint script could just sum 7 records per add-on.  How's that sound?
Comment 8 Wil Clouser [:clouserw] 2007-07-13 09:03:08 PDT
(In reply to comment #7)
> An alternative would be to store per-day counts in a separate table, which
> would get us trends over time for later if we chose to offer graphs of d/l over
> time to users.  Then the parser could just worry about updating that table 1
> day at a time and a maint script could just sum 7 records per add-on.  How's
> that sound?
> 

It sounds alright to me.  Is this going to be the never-purged table from bug 335569?
Comment 9 Michael Morgan [:morgamic] 2007-07-13 09:10:41 PDT
Would make sense to use the same method.  I think that sounds sane.  I'll post a table structure there.

One thing we didn't talk about was dupes.  Andrei, what do you have in mind about avoiding duplicates?  Could you post the PHP version so we can compare?  Do you think the Perl script would work?
Comment 10 Andrei Hajdukewycz [:sancus] 2007-07-14 03:00:55 PDT
The script that I've written just adjusts the download counts in the addons table directly, although I was considering doing it this way, it seems to me that it's better to just do it directly. Is there a reason not to do that?
Comment 11 Andrei Hajdukewycz [:sancus] 2007-07-18 13:05:14 PDT
Created attachment 272846 [details] [diff] [review]
download count log parser

This script parses the log, incrementes the download counts and "blacklists" ips for 30 seconds to eliminate duplication of counting. Note that it does not deal with picking the log files to parse.

I think it'd work fine to have a shell script grep a log file for "downloads/file" and then call this script on it -- that way the ~1GB log files become about 15-20MB before this script runs on them.

Also, this script doesn't keep a record of what log files have been parsed in the db, that's actually a good idea that I'll add, to help avoid reparsing the same log file, although there is still an issue if parsing dies partway through a log file.

The script also prints to stdout if updating a count *failed* because no addon could be found for that file, this should never really happen on prod, but in case it does...
Comment 12 Justin Scott [:fligtar] 2007-07-25 23:19:14 PDT
From what I understand from morgamic, Andrei is fixing the two issues in his comment above and this will be fixed tomorrow (Thursday) at the latest. Is that correct Andrei?
Comment 13 Andrei Hajdukewycz [:sancus] 2007-07-26 12:24:54 PDT
Created attachment 274023 [details] [diff] [review]
updated download counter

I've updated the php log parser to automatically parse any access_*.gz log files in a given directory, as the perl one does, as well as keep track of what log files have been parsed.

The script still modifies download counts directly, and really ends the need for the downloads table at all. It also still blocks IPs from being counted for 30s windows, etc.
Comment 14 Michael Morgan [:morgamic] 2007-07-26 12:28:43 PDT
Have you tried this with a full day of logs?  How long does it take to parse a single day?
Comment 15 Andrei Hajdukewycz [:sancus] 2007-07-26 13:58:19 PDT
I've tested it on a VM, and it takes about 3 hours for a day's worth of logs. Most of that time is the database queries, so I would think it'll be significantly faster on the cluster.
Comment 16 Frank Lion 2007-07-28 12:49:28 PDT
From what I can find by searching Bugzilla, this seems to be the right bug to report on.

The download counter on my themes has been stuck on my stuff for weeks and weeks now i.e. showing 120 downloads per week, instead of 200,000 - 400,000 pw.

Two questions

1. Why does the present figure never move? i.e. go from 120 to 121 or something.

2. When this all gets resolved, what happens to the total download figures? Will they include all the download figures since this side broke down? i.e. true total figures from the start of this year, for instance.
Comment 17 mcdavis941 (sporadically reading bugmail) 2007-07-28 17:33:31 PDT
(In reply to comment #16)
> The download counter on my themes has been stuck on my stuff for weeks and
> weeks now i.e. showing 120 downloads per week, instead of 200,000 - 400,000 pw.

See the AMO web development blog:

http://blog.mozilla.com/webdev/2007/06/30/download-counts-halted/
Comment 18 Michael Morgan [:morgamic] 2007-07-30 13:30:24 PDT
Jeremy - will exec() work on any PHP install?  Doesn't redhat turn that off in its RPM?
Comment 19 Jeremy Orem [:oremj] 2007-07-30 13:36:18 PDT
Looks like safe_mode is off by default so you should be able to use exec().
Comment 20 Michael Morgan [:morgamic] 2007-07-30 17:05:45 PDT
Comment on attachment 274023 [details] [diff] [review]
updated download counter

Hey Andrei, I think this looks great but we need to have some of those DEFINE's be arguments that can be passed from CRON:
* LOGDIR
* TEMPDIR
Comment 21 Andrei Hajdukewycz [:sancus] 2007-07-31 14:10:05 PDT
Created attachment 274675 [details] [diff] [review]
download counter with arguments

Okay, I changed the constants to required arguments, and I also fixed a small bug with a couple non-download related lines getting through the grep that shouldn't have been.
Comment 22 Michael Morgan [:morgamic] 2007-07-31 21:53:53 PDT
Comment on attachment 274675 [details] [diff] [review]
download counter with arguments

Sweet -- looks good.  Jeremy, what's your schedule like?  Can we work on setting up an initial run tomorrow?
Comment 23 Jeremy Orem [:oremj] 2007-08-01 09:51:10 PDT
I'm up for it.
Comment 24 Jeremy Orem [:oremj] 2007-08-03 11:16:31 PDT
I have created this table in the add-ons database:

CREATE TABLE `logs_parsed` (   `id` int(10) unsigned NOT NULL auto_increment, `name` varchar(255) NOT NULL default '',   `done` tinyint(3) unsigned NOT NULL default '1',   PRIMARY KEY  (`id`),   UNIQUE KEY `name` (`name`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Comment 25 Jeremy Orem [:oremj] 2007-08-03 11:49:49 PDT
I am adding the following MySQL grants:

GRANT SELECT ON `addons_remora`.`versions`
GRANT SELECT ON `addons_remora`.`files`
GRANT UPDATE ON `addons_remora`.`addons`
GRANT SELECT, INSERT, UPDATE ON `addons_remora`.`logs_parsed`

Comment 26 Michael Morgan [:morgamic] 2007-08-03 11:55:56 PDT
That should work, don't see any other tables hit by the statements.
Comment 27 Jon Granrose 2007-08-13 10:28:33 PDT
not to be a pest but... is there an ETA for when this might go live?  Thx.
Comment 28 Michael Morgan [:morgamic] 2007-08-13 15:17:45 PDT
Hey Jon, total downloads are currently being updated.  Work is being done on:
a) backfilling June 27th -> Aug 9th from backed up logs
b) restoring weekly counts

We have an IT bug filed for the backlogs, and ETA on that is by Friday.  Andrei -- how is the weekly count patch coming?  What is the ETA for a patch?
Comment 29 Andrei Hajdukewycz [:sancus] 2007-08-14 16:10:35 PDT
I'm going to have a patch for weekly counts tomorrow/thursday at the latest. I'll try for sooner, though.

This will include a new table that'll hold daily counts for every add-on, to replace the old downloads table completely, we won't store each individual download anymore.

The counting script will update the daily counts in the database, and then the maintenance script will produce weekly counts from those totals.
Comment 30 Andrei Hajdukewycz [:sancus] 2007-08-14 16:57:08 PDT
Created attachment 276716 [details] [diff] [review]
weekly counting patch
Comment 31 Andrei Hajdukewycz [:sancus] 2007-08-14 16:58:35 PDT
Created attachment 276717 [details]
daily download counts table

I've upload the patch to the counter script and the new count table, only thing remaining to do is change the maintenance script to update weekly counts from the new table.
Comment 32 Adam Kowalczyk 2007-08-16 10:43:25 PDT
*** Bug 392451 has been marked as a duplicate of this bug. ***
Comment 33 Andrei Hajdukewycz [:sancus] 2007-08-22 11:15:22 PDT
Created attachment 277749 [details] [diff] [review]
maintenance.php weekly download patch

This fixes the weekly count updating in the maintenance script, total downloads are handled by the parser.
Comment 34 Michael Morgan [:morgamic] 2007-08-22 12:07:02 PDT
Comment on attachment 277749 [details] [diff] [review]
maintenance.php weekly download patch

Wil can you take a look and test with a log from stats?
Comment 35 Chuck Baker 2007-08-30 19:21:25 PDT
Any word yet on when we'll have the weekly counts?  It's been two months ...

And what about this (from http://blog.mozilla.com/webdev/category/amo):

"An improved API for aggregating add-on statistics and integrating it into your blog or external sites"

Is that still in the works?
Comment 36 Michael Morgan [:morgamic] 2007-09-05 10:38:55 PDT
This was pushed in the latest update (last Saturday).  Weekly counts are working along with total counts.  Thanks to Sancus and Oremj for their hard work.  :)
Comment 37 Adam Kowalczyk 2007-09-05 17:03:18 PDT
Have the backlogs been processed as well (see comment 28)?
Comment 38 Jon Granrose 2007-09-06 09:44:49 PDT
w00t!  Thanks, guys!
Comment 39 Chuck Baker 2007-09-06 19:22:53 PDT
Weekly and total counts are back, but the rating fields are not updating.  Was that fix supposed to be included in this bug?
Comment 40 Oliver Saier 2007-09-10 17:18:11 PDT
It's still not working in sandbox is it?
Comment 41 Ingo Müller 2007-09-12 13:28:02 PDT
I think that it really doesn't work in sandbox. I didn't have a single download in over two weeks (which es unprobable, but not impossible). Since my extension is public, everythings works great! Thanks a lot for fixing!!
Comment 42 Frank Lion 2007-11-02 20:29:41 PDT
Despite this being marked as RESOLVED FIXED, I do not feel it is.

Ever since this bug was claimed to have been fixed, the Weekly Download figures for Themes has been showing at 10% of the true figure. This has been confirmed with other themers. The Weekly Download figures for Extensions, however, appear about right.

As this 10% figure has been applied across the board with Themes, it has no effect on the relative positions of them, in terms of 'most popular'. However, it has made a nonsense of the Total Download figures for all the Themes. 

None of this is supposition on my part, as I have screenshot after screenshot going back to Jan 2006 that will confirm all this.

I would ask, therefore, that either this bug is fixed or else that AMO confirm here that this 10% is the position, for whatever reason. Otherwise, at sometime in the future, I suspect that someone at Mozilla will be waving about a totally meaningless set of figures to support a position that third party Firefox Themes are not in great demand by Firefox users.

Thanks :) 
Comment 43 Matthew Gertner 2007-11-06 04:53:20 PST
The download count for our extension is also way too low, quite plausibly around 10% of the true figure. Both our own click count (which redirects to AMO) and the number of new registrations (which is a subset of the number of new downloads) are several times larger than the download count reported by AMO.
Comment 44 Chuck Baker 2007-11-07 11:29:43 PST
Same thing happening to me.  My extension presents a welcome page after installation is complete.  I have a counter on that page that indicates over 42,000 page loads for the last seven days (the page is only loaded once per install).  Dev cpanel recorded only about 5200 downloads during the same period.  While I don't expect an exact one-to-one correlation between the two, the numbers should be *much* closer than they are.
Comment 45 Justin Scott [:fligtar] 2007-11-07 11:41:46 PST
I'm trying to clean up the download counter bugs, several of which have comments on this issue. I'm considering bug 402796 the bug for figuring out if there's a problem with the counts, so please post further comments there.

Note You need to log in before you can comment on or make changes to this bug.