Closed Bug 636520 Opened 14 years ago Closed 14 years ago

Set up Twitter Collage on stage

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: malexis, Assigned: oremj)

References

()

Details

Attachments

(1 file)

The agency Quodis is developing the Twitter collage app and will handoff code to Mozilla on 2/28. The stack they're using is PHP, MySQL and ImageMagick Code repo is here but app is not yet code complete: https://github.com/quodis/Twitter-Collage Quodis will deliver instructions for integrating onto our servers this week. URL will be a mozilla.com subdomain. The PMM team will provide that this week.
OS: Windows 7 → All
Hardware: x86 → All
Assignee: server-ops → jeremy.orem+bugs
FWIW, I've also forked this code into this repo: https://github.com/mozilla/Twitter-Collage For now, the quodis repo will be authoritative, but at some point (eg. post-handover) we may want to switch over to the mozilla fork for auto-updates on staging.
Comments from Quodis: We've just tested the attached deployment instructions in a single machine deployment. We used Ubuntu 10.04 32bits. - About the system: The system requires at least one instance of mysql, memcache and lighttpd. There are also 3 worker php scripts doing respectively: - twitter-search (feeds on twitter > db) - image-process (db > download > process tile > db) - mosaic-build (db > full mosaic pages in json) We usually have these php scripts supervised by daemontools http://cr.yp.to/daemontools.html Installation details are provided in the attached twitterparty.shell.txt The file contains shell commands for installation, code updates, service control and logs, assuming a single machine scenario (the svscan.conf file is supervise related) - Production infra-structure For a production deployment we believe the ideal is: - 2 load-balanced http instances (failover) - 1 memcache instance - 1 mysql (possibly replicated to have full backup and failover) - 1 worker instance (running the job scripts) - 1 network storage for caching data For example, in a 2 machine deployment: - cpu A: memcache + mysql master + storage + http (load-balanced) + supervised job scripts - cpu B: mysql slave (backup/failover) + http (load-balanced) With this infra-sctructure, if A fails, one can switch master to B within reasonable down-time by re-configuring and restarting some servers. We believe the system can be configured for a wide range of scenarios, but let us know what you're thinking so we can cook the right config.yaml file. - Performance and Scaling We're still optimizing caching and i/o but the bottleneck is clearly in the image-process worker. Anyway, we tested in single virtual machine using Twitter's most trending topics and the 3 workers keep things up-to-date, so we believe there's not much to worry about that. There are still some TODOs in the code also and some stuff is not yet fully memcached. Let us know what you think, we could also manage image processing with a queue job, and have more than one worker going (we'd use gearman job server for that) On the other hand, it would would be adding yet another component (and failure modes) for probably no real gain.
When will this be code complete? Should I stage it before it is?
(In reply to comment #4) > When will this be code complete? Should I stage it before it is? We need this staged earlier than later. It looks like it's got a number of moving parts (eg. cronjobs, etc), so it's probably best to get a start on working through the surprises now.
FWIW, the Quodis code repo is here: https://github.com/quodis/Twitter-Collage I've forked it into these repos: https://github.com/mozilla/Twitter-Collage https://github.com/lmorchard/Twitter-Collage I think it would be good to use the github.com/mozilla repo for staging and code auto-updates, since that would let us pull from Quodis and push to our repo to maintain code control on our servers.
(In reply to comment #6) > I think it would be good to use the github.com/mozilla repo for staging and > code auto-updates, since that would let us pull from Quodis and push to our > repo to maintain code control on our servers. Jeremy, we can go ahead and stage it as Les suggested. Having it staged sooner will help us start testing and work out the kinks. Thanks!
The site will live at twitterparty.mozilla.org so that should help in setting up the staging URL.
PHP >= 5.3.2 Is this a hard requirement? We run with 5.2.6, and changing that is going to be difficult.
Guys - the page load size for this is pretty extreme. It's coming in at nearly 700KB and sometimes as high as 1.5MB. That's really large for any site. I don't want to host a site that has that sort of page size requirements. What can be done to reduce this before launch?
I've also noticed that if I run my mouse across the image it will fire off about 450 48x48 image requests to twitter.
Corey, we confirm the system is ok with PHP 5.2.6
This site drives CPU load up quite a bit. Just loading this one site in a single tab takes Firefox to > 80% CPU. Duplicated on two different systems.
(In reply to comment #13) > This site drives CPU load up quite a bit. Just loading this one site in a > single tab takes Firefox to > 80% CPU. Duplicated on two different systems. We're tweaking the performance across all browsers. The first minute will always push the CPU the hardest and then drop to low usage. Will do new testing tomorrow (March 4)
Looks like this was staged already (from reading bug 631350), but not mentioned in this bug: http://twittercollage.allizom.org
Please set up the localization files by * checking out http://svn.mozilla.org/projects/l10n-misc/trunk/twitterparty/locale/ in the root directory, and by * setting up a cronjob that will `svn up` the locale dir and run ./locale/compile.sh (without any args) every 15 minutes. Leo: do we need to modify anything in the code to start supporting any given locale?
(In reply to comment #16) > Leo: do we need to modify anything in the code to start supporting any given > locale? No modification required on our side for now - the server searches for the requested language file on the disk. Currently the language select on the UI is hard-coded, but we'll change that, too.
Is the stage going to be * http://twittercollage.allizom.org, or * http://twitterparty.allizom.org ? None of these work ATM, too.
(In reply to comment #18) > Is the stage going to be > > * http://twittercollage.allizom.org, or > * http://twitterparty.allizom.org ? > > None of these work ATM, too. Currently it is twittercollage.allizom.org It is broken because we updated the code yesterday, and the new code does not work with our version of PHP on RHEL5
Hi We're moving on with optimizations and security issues, and at the same time, downgrading, yet again, to get around the PHP bug bugs.php.net/50772 We will update code as soon as we tested it in a 5.2.6 box. If meanwhile you could have a staging environment running PHP > 5.2.13 or > 5.3, these versions have no mysqli bug
Hi! We have tested everything in php 5.2.6 and it's now working. Let us know if you need further help on setting it up. André
OK, it's not crashing anymore, but it does give the following warning when accessing the top level URL path: 2011-03-09 14:18:21 | 1299709101 > index.php > ERROR Undefined index: locale (/data/www/twittercollage.allizom.org/lib/localization.php:49) ------ Array ( [0] => 8 [1] => Undefined index: locale [2] => /data/www/twittercollage.allizom.org/lib/localization.php [3] => 49 [4] => Array ( [available_locales] => a:25:{s:2:"ar";a:2:{s:7:"English";s:6:"Arabic";s:6:"native";s:8:"عربي";}s:2:"ca";a:2:{s:7:"Engli [output] => s:0:""; [value] => a:2:{s:7:"English";s:6:"Arabic";s:6:"native";s:8:"عربي";} [key] => s:2:"ar"; ) ) ERROR Undefined index: locale (/data/www/twittercollage.allizom.org/lib/localization.php:49) It is still (in my browser anyway) redirecting to /en-US, which seems to be the expected behavior. The page doesn't currently seem to render quite right for me, though. Can others check it?
(In reply to comment #22) > The page doesn't currently seem to render quite right for me, though. Can > others check it? Yes: Error: Z is undefined Source File: http://twittercollage.allizom.org/assets/js/party-min.js?v=1 Line: 1
The grid.js file appears to be outdated, can you please verify if you have the latest code and if not pull the latest one from our repo? Thanks
We are up to date: git status # On branch master # Untracked files: # (use "git add <file>..." to include in what will be committed) # # config/config.yaml.backup # locale/ # store/ nothing added to commit but untracked files present (use "git add" to track) git pull Already up-to-date. [root@mrapp-stage04 twittercollage.allizom.org]# git branch -v * master 3f275b3 fix overreplacing stuff
Okay sorry, the grid.js json structure has changed, since you already have the up to date code, you need to run: php util/mosaic-configure.php to regenerate the grid.js file.
OK, that looks better!
Blocks: 640397
(In reply to comment #16) > Please set up the localization files by > > * checking out > http://svn.mozilla.org/projects/l10n-misc/trunk/twitterparty/locale/ in the > root directory, and by > > * setting up a cronjob that will `svn up` the locale dir and run > ./locale/compile.sh (without any args) every 15 minutes. I filed bug 640397 to deal with this.
(In reply to comment #6) > FWIW, the Quodis code repo is here: > https://github.com/quodis/Twitter-Collage > > I've forked it into these repos: > https://github.com/mozilla/Twitter-Collage > https://github.com/lmorchard/Twitter-Collage > > I think it would be good to use the github.com/mozilla repo for staging and > code auto-updates, since that would let us pull from Quodis and push to our > repo to maintain code control on our servers. Do you know which repo has been used for the staging server in the end?
(In reply to comment #29) > (In reply to comment #6) > > FWIW, the Quodis code repo is here: > > https://github.com/quodis/Twitter-Collage > Do you know which repo has been used for the staging server in the end? As far as I understand, it's still the Quodis repo, since development has been ongoing on their end.
Summary: Set up Twitter collage on stage → Set up Twitter Collage on stage
Jeremy, can you confirm if/when the staging server is up to date and working with the latest code from Quodis? Security needs to review fixes for the last few bugs.
I've updated the staging host: Updating 5c13bd7..35026c0 Fast-forward CHANGELOG.txt | 6 + bootstrap.php | 21 ++- config/config.develop2.yaml | 2 +- config/locale.yaml | 9 + job/mosaic-build.php | 12 +- lib/Image.class.php | 37 ++++ lib/Mosaic.class.php | 28 +++- lib/l10n.lib.php | 358 +++++++++++++++++++++++++++++++++++++++ lib/localeDetails.class.php | 321 ----------------------------------- lib/localization.php | 80 --------- public/assets/css/ftp.css | 1 + public/assets/css/mosaic.css | 10 +- public/assets/css/party-min.css | 2 +- public/assets/js/dashboard.js | 22 ++- public/assets/js/general.js | 120 ++++++++------ public/assets/js/party-min.js | 2 +- public/dashboard/dashboard.php | 4 +- public/index.php | 47 +++--- util/configure.php | 2 +- 19 files changed, 577 insertions(+), 507 deletions(-) create mode 100644 config/locale.yaml create mode 100644 lib/l10n.lib.php delete mode 100644 lib/localeDetails.class.php delete mode 100644 lib/localization.php
please make sure you follow the notes on CHANGELOG remember to recompile configuration by running php util/configure.php memcached restart and job restart is also advisable I think you could also drop db and re-install and purge all the data/store files written by the system so you can thoroughly test all system components.
The latest update in stage gives us the following, and things look ok from my perspective: remote: Counting objects: 21, done. remote: Compressing objects: 100% (12/12), done. remote: Total 12 (delta 8), reused 0 (delta 0) Unpacking objects: 100% (12/12), done. From https://github.com/quodis/Twitter-Collage 2c44ed1..99cb79f master -> origin/master Updating 2c44ed1..99cb79f Fast-forward public/assets/js/general.js | 40 ++++++++++++++++++---------------------- public/assets/js/global.js | 6 +++--- public/assets/js/party-min.js | 24 +----------------------- util/minify.php | 3 ++- 4 files changed, 24 insertions(+), 49 deletions(-)
Yes, everything looks swell, except for data freshness. Are the jobs running either under cron or supervised?
The jobs had previously been running more-or-less by hand. They're not running under supervisord with the following configuration: [program:image-process] command=/usr/bin/php /data/www/twittercollage.allizom.org/job/image-process.php directory=/data/www/twittercollage.allizom.org user=root numprocs=1 stdout_logfile=/var/log/twitterparty/image-process.log stderr_logfile=/var/log/twitterparty/image-process.err autostart=true autorestart=true startsecs=10 ; Need to wait for currently executing tasks to finish at shutdown. ; Increase this if you have very long running tasks. stopwaitsecs = 30 [program:mosaic-build] command=/usr/bin/php /data/www/twittercollage.allizom.org/job/mosaic-build.php directory=/data/www/twittercollage.allizom.org user=root numprocs=1 stdout_logfile=/var/log/twitterparty/mosaic-build.log stderr_logfile=/var/log/twitterparty/mosaic-build.err autostart=true autorestart=true startsecs=10 ; Need to wait for currently executing tasks to finish at shutdown. ; Increase this if you have very long running tasks. stopwaitsecs = 30 [program:twitter-search] command=/usr/bin/php /data/www/twittercollage.allizom.org/job/twitter-search.php directory=/data/www/twittercollage.allizom.org user=root numprocs=1 stdout_logfile=/var/log/twitterparty/twitter-search.log stderr_logfile=/var/log/twitterparty/twitter-search.err autostart=true autorestart=true startsecs=10 ; Need to wait for currently executing tasks to finish at shutdown. ; Increase this if you have very long running tasks. stopwaitsecs = 30
Looks nice. I don't think the .log files are necessary since what the script spits out to stdout it also logs to /var/log/twitterparty/JOBNAME-msg.log Error log files /var/log/twitterparty/JOBNAME-error.log are also created by the scripts, but these are only needed for troubleshooting... for instance image-process will log everytime it defaults to the egg picture and if too many of those show up on the mosaic we can take a look at this log image-process.error.log to figure out the cause.
OK. Is any sort of garbage collection performed on the image directories? I want to make sure that we won't continue to fill up the disks if we leave this running long-term.
No garbage collection is done: it wasn't planned, really. In the short term we believe there's no downside other than taking up some disk space, and there are some advantages. After the hype is gone and tweet rate goes back to normal, we'll have a notion of how many data/* files were generated, how much disk space it is using and decide if we should: a) purge data/original/* after processing each image (for as long as the tweet rate remains high these files are very useful, because if a user tweets more than once there's no overhead of downloading his/her avatar again from twitter) b) purge old data/processed/* tiles (but these will no longer show in search) Let us know if you think differently, and rest assured we'll be here to help you tweak the system for the long run.
Does the staging server autopulls from github?
(In reply to comment #40) > Does the staging server autopulls from github? Not for twitterparty..
Ah, now that explains everything :) Can you update it please? I pushed a change to config/locale.yaml.
Please note that staging at http://twittercollage.allizom.org/en-US is now on a debug branch (see Bug 641657) For that reason changes to locale.yaml won't get there without explicitly being merged. @Stas: let me know if there's something I need to do for you in the quodis dev site http://dev.twitterparty.quodis.com ... has it been useful so far?
I've pulled Andre's changes to the debug branch through commit 8f73fc194 I've merged the changes to the master branch (locale.yaml) into the debug branch locally. (Commit 318c943a) I've run configure.php and am starting the jobs now. Will monitor to make sure image-process is well behaved.
Thanks Noah. I chatted with Andre and we've worked out a system in which such changes to locale.yaml will not be required anymore. This way I won't have to bother you about pulling from github :)
Pressed Enter too soon. I'd actually like to ask you one last time to git pull. The following changeset makes the code look for locale.yaml in /locale/, which is an SVN checkout that I have access too: > https://github.com/quodis/Twitter-Collage/commit/318c943a80db7d57ef409540e70c08416820b381 Andre landed this on both branches.
Mosaic is not loading on http://twittercollage.allizom.org/en-US at this time. Would someone please look at it?
Today and tomorrow morning we are going to work at moving the twitterparty stage (and eventual prod) to a couple of new servers in phx running RHEL6. I'm not happy with this at all, especially having to do it 4 days after we said we would stop accepting code, but the code we are getting is not fit to run in the environment we have setup for it. We will make this work. In the future I'll work with the engagement team on improving the lead time to projects like this and properly communicating the requirements to any contractors. In the meantime, sit tight... this is a moving target.
Got an update from Noah that stage will be running RHEL6 by end of today. When updating the stage, please pull in the latest code from the Quodis repo (see comment 0) so we'll have the fixes Quodis has been working on. Thanks!
Could we please get an update on the move to rhel6? I'm not sure if it was completed last night. Thanks .
(In reply to comment #50) > Could we please get an update on the move to rhel6? I'm not sure if it was > completed last night. Thanks . Yes, the staging site is currently running on RHEL6. I believe it is entirely up to date, but will pull from git right now to be sure.
Thanks Noah! Can you make sure config.yaml has these parameters set? We'll be using them on production as well, but I'm not sure if they're configured on stage so I wanted to check. terms: "#fx4 OR #ff4 OR #teamfirefox" blacklist: - firefoxsux Also, are there any remaining IT concerns with stage? If not, let's mark this as Resolved.
Yes, the blacklist is in place. We have two remaining issues when running under RHEL 6: 1. Currently memcached support is disabled in the twitterparty config. I was unable to get the code to run without crashing with a protocol error in the memcache communication. I'll post the details on this ASAP, since it's possible that code changes will be needed in order to fix it. 2. The app does not run with the default RHEL 6 php.ini settings. I've installed a RHEL 5 php.ini, which let the app run, but this is suboptimal. I will need to compare the two files to figure out what exactly has changed. I'll follow up with more details if needed.
Please post details on 1) and 2). 1) error(s) output + php version + memcached api version + memcached version 2) where does it break? under httpd or cli scripts? can you show us the output of the error(s)? feel free to send us both php.ini.
(In reply to comment #54) > 2) where does it break? under httpd or cli scripts? can you show us the output > of the error(s)? feel free to send us both php.ini. The issue here appears to be that short_open_tag is assumed by your code to be enabled, but it is disabled by default in RHEL 6.
The memcache issue occurred in the twittersearch job. The log message was: 2011-03-17 13:52:48 | 1300395168 > twitter-search.php > ERROR MemcachePool::delete(): Server localhost (tcp 11211, udp 0) failed with: CLIENT_ERROR bad command line format. Usage: delete <key> [noreply] (0) (/data/www/twittercollage.allizom.org/lib/tiny.lib.php:386) ------ Array ( [0] => 8 [1] => MemcachePool::delete(): Server localhost (tcp 11211, udp 0) failed with: CLIENT_ERROR bad command line format. Usage: delete <key> [noreply] (0) [2] => /data/www/twittercollage.allizom.org/lib/tiny.lib.php [3] => 386 [4] => Array ( [cacheKey] => s:32:"TWITTERPARTY::Twitter::resetFlag"; ) ) ERROR MemcachePool::delete(): Server localhost (tcp 11211, udp 0) failed with: CLIENT_ERROR bad command line format. Usage: delete <key> [noreply] (0) (/data/www/twittercollage.allizom.org/lib/tiny.lib.php:386) 2011-03-17 13:52:48 | 1300395168 > twitter-search.php > TERMINATE (ERROR) Sorry, something went terribly wrong. TERMINATE (ERROR) However, I have re-enabled caching now, and things are working ok. The only difference is that the database is now populated. It is possible that this bug is only triggered when the code is run against an empty database.
Code with memcache::delete() is triggered when "lastId" is not set. Indeed it happens when database is empty but it will also happen if you run "php util/reset-twitter-api.php". The reset script skips the use of "lastId" in the next call to twitter search and might be useful under some situations. With the memcache::delete() broken, running this utility is extremely dangerous. I believe this is the issue: http://framework.zend.com/issues/browse/ZF-9376 Can you please edit try the following: edit tiny.lib.php#386: $ok = self::$_handler->delete($cacheKey, 0); restart the twitter-search job to refresh code php util/reset-twitter-api.php to trigger the code tail /var/log/twitter-search.msg.log for something like: twitter-search.php > http://search.twitter.com/search.json?q=... There should be one call to twitter api without the "since_id" param. If you confirm success I will push this mod to master.
(In reply to comment #57) > Can you please edit try the following: > > edit tiny.lib.php#386: > $ok = self::$_handler->delete($cacheKey, 0); > > restart the twitter-search job to refresh code > php util/reset-twitter-api.php to trigger the code > tail /var/log/twitter-search.msg.log for something like: > twitter-search.php > http://search.twitter.com/search.json?q=... > > There should be one call to twitter api without the "since_id" param. > > If you confirm success I will push this mod to master. That seems to have worked.
pushed, then... ", 0" ... what a difference 3 characters make :-)
Pulled the latest code into stage: remote: Counting objects: 52, done. remote: Compressing objects: 100% (39/39), done. remote: Total 39 (delta 29), reused 0 (delta 0) Unpacking objects: 100% (39/39), done. From https://github.com/quodis/Twitter-Collage b4dbe6a..592493f master -> origin/master Merge made by recursive. lib/tiny.lib.php | 2 +- public/assets/css/ftp.css | 12 ++++++++++++ public/assets/css/mosaic.css | 31 ++++++++++++++++++------------- public/assets/css/party-min.css | 2 +- public/assets/js/general.js | 6 ++++-- public/assets/js/party-min.js | 2 +- public/index.php | 4 ++-- 7 files changed, 39 insertions(+), 20 deletions(-) I believe the deployment issues in stage on rhel 6 are resolved at this point and am closing this bug. Will re-open if problems come up.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
great! please confirm that staging address is still http://twittercollage.allizom.org/
Confirmed.
I think either the language packs are not yet installed, or you don't have the /locale dir updated. Following Stas instructions, all you need to do is: * checking out http://svn.mozilla.org/projects/l10n-misc/trunk/twitterparty/locale/ in the root directory, and by * setting up a cronjob that will `svn up` the locale dir and run ./locale/compile.sh (without any args) every 15 minutes.
Confirmed stage is updated, and locales display correctly on allizom.org
Status: RESOLVED → VERIFIED
Stage is down: http://twittercollage.allizom.org/ any known reason for that, or some help is needed troubleshooting?
(In reply to comment #65) > Stage is down: http://twittercollage.allizom.org/ any known reason for that, or > some help is needed troubleshooting? It's back up, thanks to Noah.
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: