Closed
Bug 384084
Opened 17 years ago
Closed 17 years ago
Create a script to parse AMO logs and update add-on counts
Categories
(addons.mozilla.org Graveyard :: Maintenance Scripts, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
3.1
People
(Reporter: morgamic, Assigned: sancus)
References
()
Details
Attachments
(5 files, 2 obsolete files)
3.63 KB,
text/plain
|
Details | |
5.51 KB,
patch
|
morgamic
:
review+
|
Details | Diff | Splinter Review |
1.73 KB,
patch
|
Details | Diff | Splinter Review | |
501 bytes,
text/plain
|
Details | |
1.15 KB,
patch
|
clouserw
:
review+
|
Details | Diff | Splinter Review |
Currently install counts are logged real-time which is a drain on resources. Parsing logs nightly to update weekly or daily counts would be more useful. This method will be deprecated in order to save resources and make the app scale better during peak load times. We need a two things out of this script: a) count unique logs hitting the downloads controller to generate weekly counts for each add-on, storing the information in addons.weeklydownloads b) the same script should update addons.totaldownloads as appropriate
Reporter | ||
Updated•17 years ago
|
Target Milestone: --- → 3.1
Updated•17 years ago
|
Severity: normal → major
Updated•17 years ago
|
Severity: major → blocker
Comment 4•17 years ago
|
||
Parses Addons logs and inserts them into the downloads table. You will also need this table: CREATE TABLE `logs_parsed` ( `id` int(10) unsigned NOT NULL auto_increment, `logname` varchar(300) NOT NULL default '', `done` tinyint(3) unsigned NOT NULL default '1', PRIMARY KEY (`id`), UNIQUE KEY `logname` (`logname`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8
Attachment #272096 -
Flags: review?(morgamic)
Comment 5•17 years ago
|
||
It seems like the downloads table is kind of an unnecessary middle-man to me. We're filling it with data, only to delete it after 8 days (via the maintenance script). As far as I know, the only thing we use the table for is to calculate total downloads and store in the addons table. Why not just go directly from logs -> addons table, and save the logs for 8 days? My motivation for the suggestion is that the database dumps are becoming too large to import on khan-vm. I'll write a script to strip the downloads/logs table before I import it if I have to, but I'd rather have an exact mirror of the production db.
Comment 6•17 years ago
|
||
See also: bug 335569, having to do with never getting rid of data
Reporter | ||
Comment 7•17 years ago
|
||
Parsing 8 days of logs is not efficient. A day takes about 15 minutes to get parsed. An alternative would be to store per-day counts in a separate table, which would get us trends over time for later if we chose to offer graphs of d/l over time to users. Then the parser could just worry about updating that table 1 day at a time and a maint script could just sum 7 records per add-on. How's that sound?
Comment 8•17 years ago
|
||
(In reply to comment #7) > An alternative would be to store per-day counts in a separate table, which > would get us trends over time for later if we chose to offer graphs of d/l over > time to users. Then the parser could just worry about updating that table 1 > day at a time and a maint script could just sum 7 records per add-on. How's > that sound? > It sounds alright to me. Is this going to be the never-purged table from bug 335569?
Reporter | ||
Comment 9•17 years ago
|
||
Would make sense to use the same method. I think that sounds sane. I'll post a table structure there. One thing we didn't talk about was dupes. Andrei, what do you have in mind about avoiding duplicates? Could you post the PHP version so we can compare? Do you think the Perl script would work?
Assignee | ||
Comment 10•17 years ago
|
||
The script that I've written just adjusts the download counts in the addons table directly, although I was considering doing it this way, it seems to me that it's better to just do it directly. Is there a reason not to do that?
Assignee | ||
Comment 11•17 years ago
|
||
This script parses the log, incrementes the download counts and "blacklists" ips for 30 seconds to eliminate duplication of counting. Note that it does not deal with picking the log files to parse. I think it'd work fine to have a shell script grep a log file for "downloads/file" and then call this script on it -- that way the ~1GB log files become about 15-20MB before this script runs on them. Also, this script doesn't keep a record of what log files have been parsed in the db, that's actually a good idea that I'll add, to help avoid reparsing the same log file, although there is still an issue if parsing dies partway through a log file. The script also prints to stdout if updating a count *failed* because no addon could be found for that file, this should never really happen on prod, but in case it does...
Comment 12•17 years ago
|
||
From what I understand from morgamic, Andrei is fixing the two issues in his comment above and this will be fixed tomorrow (Thursday) at the latest. Is that correct Andrei?
Assignee | ||
Comment 13•17 years ago
|
||
I've updated the php log parser to automatically parse any access_*.gz log files in a given directory, as the perl one does, as well as keep track of what log files have been parsed. The script still modifies download counts directly, and really ends the need for the downloads table at all. It also still blocks IPs from being counted for 30s windows, etc.
Attachment #272846 -
Attachment is obsolete: true
Attachment #274023 -
Flags: review?(morgamic)
Reporter | ||
Comment 14•17 years ago
|
||
Have you tried this with a full day of logs? How long does it take to parse a single day?
Assignee | ||
Comment 15•17 years ago
|
||
I've tested it on a VM, and it takes about 3 hours for a day's worth of logs. Most of that time is the database queries, so I would think it'll be significantly faster on the cluster.
Comment 16•17 years ago
|
||
From what I can find by searching Bugzilla, this seems to be the right bug to report on. The download counter on my themes has been stuck on my stuff for weeks and weeks now i.e. showing 120 downloads per week, instead of 200,000 - 400,000 pw. Two questions 1. Why does the present figure never move? i.e. go from 120 to 121 or something. 2. When this all gets resolved, what happens to the total download figures? Will they include all the download figures since this side broke down? i.e. true total figures from the start of this year, for instance.
Comment 17•17 years ago
|
||
(In reply to comment #16) > The download counter on my themes has been stuck on my stuff for weeks and > weeks now i.e. showing 120 downloads per week, instead of 200,000 - 400,000 pw. See the AMO web development blog: http://blog.mozilla.com/webdev/2007/06/30/download-counts-halted/
Reporter | ||
Comment 18•17 years ago
|
||
Jeremy - will exec() work on any PHP install? Doesn't redhat turn that off in its RPM?
Comment 19•17 years ago
|
||
Looks like safe_mode is off by default so you should be able to use exec().
Reporter | ||
Comment 20•17 years ago
|
||
Comment on attachment 274023 [details] [diff] [review] updated download counter Hey Andrei, I think this looks great but we need to have some of those DEFINE's be arguments that can be passed from CRON: * LOGDIR * TEMPDIR
Attachment #274023 -
Flags: review?(morgamic) → review-
Assignee | ||
Comment 21•17 years ago
|
||
Okay, I changed the constants to required arguments, and I also fixed a small bug with a couple non-download related lines getting through the grep that shouldn't have been.
Attachment #274675 -
Flags: review?(morgamic)
Assignee | ||
Updated•17 years ago
|
Attachment #274023 -
Attachment is obsolete: true
Reporter | ||
Comment 22•17 years ago
|
||
Comment on attachment 274675 [details] [diff] [review] download counter with arguments Sweet -- looks good. Jeremy, what's your schedule like? Can we work on setting up an initial run tomorrow?
Attachment #274675 -
Flags: review?(morgamic) → review+
Comment 23•17 years ago
|
||
I'm up for it.
Comment 24•17 years ago
|
||
I have created this table in the add-ons database: CREATE TABLE `logs_parsed` ( `id` int(10) unsigned NOT NULL auto_increment, `name` varchar(255) NOT NULL default '', `done` tinyint(3) unsigned NOT NULL default '1', PRIMARY KEY (`id`), UNIQUE KEY `name` (`name`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Comment 25•17 years ago
|
||
I am adding the following MySQL grants: GRANT SELECT ON `addons_remora`.`versions` GRANT SELECT ON `addons_remora`.`files` GRANT UPDATE ON `addons_remora`.`addons` GRANT SELECT, INSERT, UPDATE ON `addons_remora`.`logs_parsed`
Reporter | ||
Comment 26•17 years ago
|
||
That should work, don't see any other tables hit by the statements.
Comment 27•17 years ago
|
||
not to be a pest but... is there an ETA for when this might go live? Thx.
Reporter | ||
Comment 28•17 years ago
|
||
Hey Jon, total downloads are currently being updated. Work is being done on: a) backfilling June 27th -> Aug 9th from backed up logs b) restoring weekly counts We have an IT bug filed for the backlogs, and ETA on that is by Friday. Andrei -- how is the weekly count patch coming? What is the ETA for a patch?
Assignee | ||
Comment 29•17 years ago
|
||
I'm going to have a patch for weekly counts tomorrow/thursday at the latest. I'll try for sooner, though. This will include a new table that'll hold daily counts for every add-on, to replace the old downloads table completely, we won't store each individual download anymore. The counting script will update the daily counts in the database, and then the maintenance script will produce weekly counts from those totals.
Assignee | ||
Comment 30•17 years ago
|
||
Assignee | ||
Comment 31•17 years ago
|
||
I've upload the patch to the counter script and the new count table, only thing remaining to do is change the maintenance script to update weekly counts from the new table.
Assignee | ||
Comment 33•17 years ago
|
||
This fixes the weekly count updating in the maintenance script, total downloads are handled by the parser.
Assignee | ||
Updated•17 years ago
|
Attachment #277749 -
Flags: review?(morgamic)
Reporter | ||
Comment 34•17 years ago
|
||
Comment on attachment 277749 [details] [diff] [review] maintenance.php weekly download patch Wil can you take a look and test with a log from stats?
Attachment #277749 -
Flags: review?(morgamic) → review?(clouserw)
Updated•17 years ago
|
Attachment #277749 -
Flags: review?(clouserw) → review+
Comment 35•17 years ago
|
||
Any word yet on when we'll have the weekly counts? It's been two months ... And what about this (from http://blog.mozilla.com/webdev/category/amo): "An improved API for aggregating add-on statistics and integrating it into your blog or external sites" Is that still in the works?
Reporter | ||
Comment 36•17 years ago
|
||
This was pushed in the latest update (last Saturday). Weekly counts are working along with total counts. Thanks to Sancus and Oremj for their hard work. :)
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment 37•17 years ago
|
||
Have the backlogs been processed as well (see comment 28)?
Comment 38•17 years ago
|
||
w00t! Thanks, guys!
Comment 39•17 years ago
|
||
Weekly and total counts are back, but the rating fields are not updating. Was that fix supposed to be included in this bug?
Comment 40•17 years ago
|
||
It's still not working in sandbox is it?
Comment 41•17 years ago
|
||
I think that it really doesn't work in sandbox. I didn't have a single download in over two weeks (which es unprobable, but not impossible). Since my extension is public, everythings works great! Thanks a lot for fixing!!
Reporter | ||
Updated•17 years ago
|
Attachment #272096 -
Flags: review?(morgamic)
Comment 42•17 years ago
|
||
Despite this being marked as RESOLVED FIXED, I do not feel it is. Ever since this bug was claimed to have been fixed, the Weekly Download figures for Themes has been showing at 10% of the true figure. This has been confirmed with other themers. The Weekly Download figures for Extensions, however, appear about right. As this 10% figure has been applied across the board with Themes, it has no effect on the relative positions of them, in terms of 'most popular'. However, it has made a nonsense of the Total Download figures for all the Themes. None of this is supposition on my part, as I have screenshot after screenshot going back to Jan 2006 that will confirm all this. I would ask, therefore, that either this bug is fixed or else that AMO confirm here that this 10% is the position, for whatever reason. Otherwise, at sometime in the future, I suspect that someone at Mozilla will be waving about a totally meaningless set of figures to support a position that third party Firefox Themes are not in great demand by Firefox users. Thanks :)
Comment 43•17 years ago
|
||
The download count for our extension is also way too low, quite plausibly around 10% of the true figure. Both our own click count (which redirects to AMO) and the number of new registrations (which is a subset of the number of new downloads) are several times larger than the download count reported by AMO.
Comment 44•17 years ago
|
||
Same thing happening to me. My extension presents a welcome page after installation is complete. I have a counter on that page that indicates over 42,000 page loads for the last seven days (the page is only loaded once per install). Dev cpanel recorded only about 5200 downloads during the same period. While I don't expect an exact one-to-one correlation between the two, the numbers should be *much* closer than they are.
Comment 45•17 years ago
|
||
I'm trying to clean up the download counter bugs, several of which have comments on this issue. I'm considering bug 402796 the bug for figuring out if there's a problem with the counts, so please post further comments there.
Updated•8 years ago
|
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•