Closed Bug 443937 Opened 17 years ago Closed 11 months ago

Take stats generation offline

Categories

(Webtools Graveyard :: Verbatim, defect, P1)

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: clouserw, Unassigned)

References

Details

Currently Pootle builds stats on startup and on the fly as things are changed. In order to be scalable we need to have the option of doing this offline (via cron is fine). On startup should never happen. If stats don't exist it should just say so and continue without them. We have way too many and too large of projects for it to generate all the stats at once. On the fly would be nice to have, but I don't think it can do it fast enough so at the minimum we need a switch so we can disable it. If that's too complex just remove it all together and we'll cron it.
Assignee: nobody → dschafer
Status: NEW → ASSIGNED
Priority: -- → P1
friedel suggested running pocount on the appropriate files beforehand, which should generate the database before the first page load. I'll do some tesing, and if this works, we should be able to create a cronjob to run pocount every so often and fix this bug.
Running pocount on just the pootle/jToolkit localizations takes a very, very long time: $ time pocount . &> /dev/null real 1m52.381s user 1m27.881s sys 0m2.611s
I think this bug in the our bugzilla might be relevant: http://bugs.locamotion.org/show_bug.cgi?id=429
Assignee: dschafer → nobody
Status: ASSIGNED → NEW
(In reply to comment #2) > Running pocount on just the pootle/jToolkit localizations takes a very, very > long time: > $ time pocount . &> /dev/null > > real 1m52.381s > user 1m27.881s > sys 0m2.611s > That's just because the pocount script is somewhat inefficient or because the multiple-po-files data structure slows down the proccess a lot?
Blocks: 504073
Friedel: Has this changed in the mean time, or is the stats generation still a major time sink? How often is it run? Thanks.
Generating the stats and search indexes will always take long, but the question is more about when it is happening and the consequence to the user(s). PootleServer --refreshstats can perhaps be called as part of any update script and ensure that the stats are immediately available after the script completes. We should consider doing stats updates from the GUI as well. I'm not sure if the current refreshing function will ensure that stats are in place. The stats generation should be run (1) on demand - when the stats is required for a certain page, or (2) when a translation string is submitted. So currently it is not scheduled, but this can be done externally with cron or similar. My ideal solution (after the next release) is to detect if the stats exist, and if not, generate a page that will retrieve it with json after the fact. This way the page is displayed quickly to the user, and the stats is shown as soon as it is available.
That sounds like a very good idea. I'll keep this bug open for now.
any ideas on how much of a perf/load win this might be?
Product: Webtools → Webtools Graveyard
Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.