Deploy new development env for the Input project

VERIFIED FIXED

Status

Infrastructure & Operations
WebOps: Other
--
major
VERIFIED FIXED
7 years ago
4 years ago

People

(Reporter: mbrandt, Assigned: jd)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

7 years ago
We'd like to deploy a new development environment for the Input project. I'm not sure if this would require new hardware. 

New env: input.dev.allizom.org.

Dave can you add additional notes as to what is needed in this environment?

Cheers, Matt
Yes, more details would be nice..

We are already working on moving sites to a dev-stage-prod environment but the 'dev' portion of this is more like what our current-day 'stage' is like.  It wouldn't be an environment for you to develop on directly, if that is what the request here is for.
(In reply to comment #1)
> cc: cshields@mozilla.comYes, more details would be nice..
> 
> We are already working on moving sites to a dev-stage-prod environment but the
> 'dev' portion of this is more like what our current-day 'stage' is like.  It
> wouldn't be an environment for you to develop on directly, if that is what the
> request here is for.

We are thinking d-s-p.

input.stage.mozilla.com could be renamed to input-dev.allizom.com
and a new VM input.allizom.com could be our "stage" environment that is pinned to the "prod" tag.

The domains aren't important if there's a better convention in place.

The new latter environment should just auto-update from our "prod" tag - no need to make that a manual process for IT.

On our end we're willing to package up the app in whatever way you think would help.
Ok, fyi when we do implement this for input the dev and stage systems will be on different gear, so this is more than just shuffling around dns changes.

Right now we are working on puppet groundwork to prototype this with the engagement and generic clusters.  Other sites will follow.

Updated

7 years ago
Assignee: server-ops → jdow

Updated

7 years ago
Blocks: 632571
(In reply to comment #3)
> Ok, fyi when we do implement this for input the dev and stage systems will be
> on different gear, so this is more than just shuffling around dns changes.
> 
> Right now we are working on puppet groundwork to prototype this with the
> engagement and generic clusters.  Other sites will follow.

Any ETA for this?  I'm trying to plan out how long I'll be doing code freezes.

Comment 5

7 years ago
I plan on working on this this week. This is still relatively new territory, so I don't want to give a definite ETA, as I'm not sure what kind of problems we'll run into, but I'm hoping to have something solid by the end of the week. At least to a point where I can give you a better ETA.
Depends on: 656189
fyi this is going to sit for a bit until we get db servers for it going (656189)

Updated

7 years ago
Component: Server Operations → Server Operations: Projects
jdow/cshields can you CC me and fwenzel on bug 656189?

Comment 8

7 years ago
done.

Comment 9

7 years ago
Handing this to jeremy since I'll be out the next week, should be a pretty easy puppet re-write with the webapp module. A rewrite will be needed anyway since we are moving from rhel5 -> rhel6. I suggest not turning down the old stage environment until the new one is up and running and prod is moved to rhel6 as well to keep a matching dev->stage->prod environment.
Assignee: jdow → jeremy.orem+bugs
Component: Server Operations: Projects → Server Operations
Looks like this is still waiting on the db servers, but in the meantime has the hardware been set up for dev/stage?

Comment 11

7 years ago
I don't think anything has been set up yet, other than perhaps allocating some seamicros in DNS for it.
Depends on: 666678
Need a dev and staging seamicro for this.
Assignee: jeremy.orem+bugs → cshields
input1.dev.seamicro.phx1.mozilla.com
input2.dev.seamicro.phx1.mozilla.com
input1.stage.seamicro.phx1.mozilla.com
input2.stage.seamicro.phx1.mozilla.com
input3.stage.seamicro.phx1.mozilla.com
input4.stage.seamicro.phx1.mozilla.com

All set.  You'll want to set up DBs on dev1.db.phx1.mozilla.com
Assignee: cshields → jeremy.orem+bugs
Blocks: 664318
Is this still slated for this week?  We can't really take the sporadically-failing tests much longer :-(  Thanks!
No, it never was slated for this week (sorry if I miscommunicated that).

Where are the sporadically failing tests happening?   I'd like to safe this adm/dev/stage buildout for a new hire starting at work week.
(In reply to Corey Shields [:cshields] from comment #15)
> No, it never was slated for this week (sorry if I miscommunicated that).
> 
> Where are the sporadically failing tests happening?   I'd like to safe this
> adm/dev/stage buildout for a new hire starting at work week.

Since it's such a data-driven app, I think we typically see "Query has timed out," which causes Search Unavailable errors -- at times, too, we see Zeus 500s, which probably means some timeout-threshold there has been hit, waiting for a response from the backend.  Again, it's random, so hard to pinpoint.  Can we try to grep through the logs and see why/when it's happening?
Created attachment 556960 [details]
Screenshot of a very recent search/query failure, just loading staging's homepage
Assignee: jeremy.orem+bugs → cshields
Depends on: 688322
new admin node is up and ready.. Docs are updated..  Waiting on netops now to open up new flows.  The following will need to be done after that:

- get input to work in -dev using the new puppet environment and pulling from the new admin node (meaning this will be the first time we try it in RHEL6, expect some bumps in the road)
- roll out stage (same as above)
- test and certify them both as good 
- take each prod node out one at a time and rebuild with RHEL6 and with proper hostnames and puppet environment (currently they are using a legacy name scheme, and legacy puppet class)
--- when this happens we will have to be careful of directory placement, the new method will change where some of the directories end up versus where they are today.  tsv_exports dir will need to also be handled
Whiteboard: [waiting on netops]
Assignee: cshields → server-ops
Component: Server Operations → Server Operations: Web Operations
QA Contact: mrz → cshields
Assignee: server-ops → cshields
When setting up the db for the new input-dev environment I ran into this:

# ./manage.py syncdb
Creating tables ...
Creating table feedback_opinion_terms
Creating table feedback_opinion
Creating table feedback_term
Creating table theme
Creating table theme_item
Creating table django_admin_log
Creating table auth_permission
Creating table auth_group_permissions
Creating table auth_group
Creating table auth_user_user_permissions
Creating table auth_user_groups
Creating table auth_user
Creating table auth_message
Creating table django_content_type
Creating table django_session
Creating table django_site
Traceback (most recent call last):
  File "./manage.py", line 56, in <module>
    execute_manager(settings)
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/core/management/__init__.py", line 438, in execute_manager
    utility.execute()
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/core/management/__init__.py", line 379, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/core/management/base.py", line 191, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/core/management/base.py", line 220, in execute
    output = self.handle(*args, **options)
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/core/management/base.py", line 351, in handle
    return self.handle_noargs(**options)
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/core/management/commands/syncdb.py", line 109, in handle_noargs
    emit_post_sync_signal(created_models, verbosity, interactive, db)
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/core/management/sql.py", line 190, in emit_post_sync_signal
    interactive=interactive, db=db)
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/dispatch/dispatcher.py", line 172, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/contrib/contenttypes/management.py", line 24, in update_contenttypes
    ct.save()
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/db/models/base.py", line 460, in save
    self.save_base(using=using, force_insert=force_insert, force_update=force_update)
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/db/models/base.py", line 553, in save_base
    result = manager._insert(values, return_id=update_pk, using=using)
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/db/models/manager.py", line 195, in _insert
    return insert_query(self.model, values, **kwargs)
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/db/models/query.py", line 1436, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/db/models/sql/compiler.py", line 791, in execute_sql
    cursor = super(SQLInsertCompiler, self).execute_sql(None)
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/db/models/sql/compiler.py", line 735, in execute_sql
    cursor.execute(sql, params)
  File "/data/input-dev/src/input-dev.allizom.org/reporter/vendor/packages/Django/django/db/backends/mysql/base.py", line 86, in execute
    return self.cursor.execute(query, args)
  File "/usr/lib64/python2.6/site-packages/MySQLdb/cursors.py", line 173, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
django.db.utils.IntegrityError: (1062, "Duplicate entry 'feedback-opinion' for key 'app_label'")
Severity: minor → major
Whiteboard: [waiting on netops]
This doesn't make sense, I tried this on a fresh db and didn't hit that error - but I might be missing something crucial.

In my trial, the django_site was the last table... did this get created?

Also we could re-run syncdb to see if it recurs.  Maybe syncdb has some options.
Whiteboard: [blocked:davedash]
Whiteboard: [blocked:davedash]
(In reply to Dave Dash [:davedash] from comment #20)
> This doesn't make sense, I tried this on a fresh db and didn't hit that
> error - but I might be missing something crucial.

I wiped the db, recreated it, and got the same error.  django_site is the last table listed but that is where it crashes out.

syncdb gives the same error on subsequent runs.
Oddly enough I can not find an "app_label" key in the db at all.  In fact the only tables that have anything in them when this error comes up are auth_permission and content_type
Blocks: 667068
(In reply to Corey Shields [:cshields] from comment #22)
> Oddly enough I can not find an "app_label" key in the db at all.  In fact
> the only tables that have anything in them when this error comes up are
> auth_permission and content_type

I've heard strange things about content_type getting in the way.

Okay I have a hunch - at the allhands I discovered a bug in Django (https://code.djangoproject.com/ticket/16353) which may be related.  It's since been fixed.
Whiteboard: [blocked:davedash]
if its been fixed do we just need to have you update the vendor lib?
yeah, I'm updating it now, and seeing if my tests pass, and then I'll commit it.

Not sure that this solves this problem, but I'm hoping it does.
Okay I pushed a new vendor library.

If there's failures can you run ./manage.py sqlall and paste it somewhere?  That might give us some clues.
Whiteboard: [blocked:davedash]
Depends on: 661979
picking this up again..  while trying to update code and run migrations I run into this (may need to start with a fresh db again, remembering that we had to run it a half dozen times with errors to get through the errors - and that doesn't seem right)

################################################## 

Running migration 2:
UPDATE feedback_opinion
    SET os='winxp' WHERE os='win' AND user_agent LIKE '%Windows NT 5.1%';
UPDATE feedback_opinion
    SET os='vista' WHERE os='win' AND user_agent LIKE '%Windows NT 6.0%';
UPDATE feedback_opinion
    SET os='win7' WHERE os='win' AND user_agent LIKE '%Windows NT 6.1%';


Error: Had trouble running this: BEGIN;
UPDATE feedback_opinion
    SET os='winxp' WHERE os='win' AND user_agent LIKE '%Windows NT 5.1%';
UPDATE feedback_opinion
    SET os='vista' WHERE os='win' AND user_agent LIKE '%Windows NT 6.0%';
UPDATE feedback_opinion
    SET os='win7' WHERE os='win' AND user_agent LIKE '%Windows NT 6.1%';


UPDATE schema_version SET version = 2;
COMMIT;
stdout: 
stderr: ERROR 1054 (42S22) at line 2: Unknown column 'os' in 'where clause'

returncode: 1
Also, I noticed that common-* css files are missing from the repo, whereas they seem to be on the old site.  Where are these pulled from?
Why are we using syncdb anyway? It's bound to collide with the migrations. Dave, is Input set up to get into a consistent state if you run schematic on an empty database?
Depends on: 699126
I don't know, I thought about this this morning, I'm going to switch this project to south.  schematic has constantly given us trouble in input especially since our instructions have a mish-mash of syncdb and migrations, and south actually works.

Filing a blocker.
For posterity's sake: the syncdb issue was thanks to having the slave db setup while running syncdb.

I commented the slave db config and syncdb worked fine.

The css problems were due to missing compress_assets in the update script.

making progress now..
./manage.py cron get_highcharts
./manage.py compress_assets


Should do the trick.

actually the update_stage script might have a good explanation of what needs to happen and when.
No longer depends on: 699126
Thanks for the headway today.  I think what's left is:

- getting sphinx hooked up
- getting production-like data online (?)
- automating pulls
- taking all our lessons and throwing them in a commander script?  I might need Oremj's help with that.
Depends on: 699490
sphinx is done

celery is setup

everything seems to be working, -except- for the very front page.  I can't get results from sphinx to show up in the front page.  (they show up if you select a category).  I imagine we might be missing a cron.

Speaking of crons, prod has one called "update_index" that is not found in manage.py anymore.. should we remove that?
(In reply to Corey Shields [:cshields] from comment #34)
> everything seems to be working, -except- for the very front page.  I can't
> get results from sphinx to show up in the front page.  (they show up if you
> select a category).  I imagine we might be missing a cron.

This was due to not having enough data (needed 1000 reports).

works great now.
Thanks this looks nice.
http://input-dev.allizom.org/en-US/sites is bombing out, but might be a missing cron or something.
crons are in place..

I'm about to set this up on an automatic update schedule, but it is going to immediately choke on the first migration (causing you guys a lot of cron spam) if we do.  Can we get this fixed?

Running migration 1:
ALTER TABLE feedback_opinion ENGINE=InnoDB;

DROP TABLE IF EXISTS feedback_cluster;
DROP TABLE IF EXISTS feedback_clusteritem ;
DROP TABLE IF EXISTS feedback_clustertype;

CREATE TABLE `feedback_cluster` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `type_id` int(11) NOT NULL,
  `pivot_id` int(11) NOT NULL,
  `num_opinions` int(11) NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `feedback_cluster_777d41c8` (`type_id`),
  KEY `feedback_cluster_c360d361` (`pivot_id`),
  KEY `feedback_cluster_af507caf` (`num_opinions`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `feedback_clusteritem` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `cluster_id` int(11) NOT NULL,
  `opinion_id` int(11) NOT NULL,
  `score` double NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `feedback_clusteritem_2777883f` (`cluster_id`),
  KEY `feedback_clusteritem_ac81e047` (`opinion_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `feedback_clustertype` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `feeling` varchar(20) NOT NULL,
  `platform` varchar(255) NOT NULL,
  `version` varchar(255) NOT NULL,
  `frequency` varchar(255) NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `feeling` (`feeling`,`platform`,`version`,`frequency`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

ALTER TABLE `feedback_cluster`
  ADD CONSTRAINT FOREIGN KEY (`pivot_id`) REFERENCES `feedback_opinion` (`id`),
  ADD CONSTRAINT FOREIGN KEY (`type_id`) REFERENCES `feedback_clustertype` (`id`);

ALTER TABLE `feedback_clusteritem`
  ADD CONSTRAINT FOREIGN KEY (`cluster_id`) REFERENCES `feedback_cluster` (`id`),
  ADD CONSTRAINT FOREIGN KEY (`opinion_id`) REFERENCES `feedback_opinion` (`id`);


That took 2.18 seconds
################################################## 

Running migration 2:
UPDATE feedback_opinion
    SET os='winxp' WHERE os='win' AND user_agent LIKE '%Windows NT 5.1%';
UPDATE feedback_opinion
    SET os='vista' WHERE os='win' AND user_agent LIKE '%Windows NT 6.0%';
UPDATE feedback_opinion
    SET os='win7' WHERE os='win' AND user_agent LIKE '%Windows NT 6.1%';


Error: Had trouble running this: BEGIN;
UPDATE feedback_opinion
    SET os='winxp' WHERE os='win' AND user_agent LIKE '%Windows NT 5.1%';
UPDATE feedback_opinion
    SET os='vista' WHERE os='win' AND user_agent LIKE '%Windows NT 6.0%';
UPDATE feedback_opinion
    SET os='win7' WHERE os='win' AND user_agent LIKE '%Windows NT 6.1%';


UPDATE schema_version SET version = 2;
COMMIT;
stdout: 
stderr: ERROR 1054 (42S22) at line 2: Unknown column 'os' in 'where clause'
(In reply to Stephen Donner [:stephend] from comment #37)
> http://input-dev.allizom.org/en-US/sites is bombing out, but might be a
> missing cron or something.

this one is causing:

[Tue Nov 08 18:14:42 2011] [error] [client 10.8.33.248] mod_wsgi (pid=18907): Exception occurred processing WSGI script '/data/www/input-dev.allizom.org/reporter/wsgi/reporter.wsgi'.
[Tue Nov 08 18:14:42 2011] [error] [client 10.8.33.248] Traceback (most recent call last):
[Tue Nov 08 18:14:42 2011] [error] [client 10.8.33.248]   File "/data/www/input-dev.allizom.org/reporter/wsgi/reporter.wsgi", line 33, in application
[Tue Nov 08 18:14:42 2011] [error] [client 10.8.33.248]     return django_app(env, start_response)
[Tue Nov 08 18:14:42 2011] [error] [client 10.8.33.248]   File "/data/www/input-dev.allizom.org/reporter/vendor/src/django/django/core/handlers/wsgi.py", line 250, in __call__
[Tue Nov 08 18:14:42 2011] [error] [client 10.8.33.248]     self.load_middleware()
[Tue Nov 08 18:14:42 2011] [error] [client 10.8.33.248]   File "/data/www/input-dev.allizom.org/reporter/vendor/src/django/django/core/handlers/base.py", line 51, in load_middleware
[Tue Nov 08 18:14:42 2011] [error] [client 10.8.33.248]     raise exceptions.ImproperlyConfigured('Middleware module "%s" does not define a "%s" class' % (mw_module, mw_classname))
[Tue Nov 08 18:14:42 2011] [error] [client 10.8.33.248] ImproperlyConfigured: Middleware module "commonware.response.middleware" does not define a "GraphiteMiddleware" class
(In reply to Corey Shields [:cshields] from comment #39)
> (In reply to Stephen Donner [:stephend] from comment #37)
> > http://input-dev.allizom.org/en-US/sites is bombing out, but might be a
> > missing cron or something.
> 
> this one is causing:
> 

oremj had this input to offer:

18:20 <@oremj> cshields: it appears that either 1) they didn't update vendor or 2) vendor isn't update date on our side
18:21 <@oremj> cshields: yeah, it's using this version https://github.com/jsocol/commonware/blob/27646ecaca40a89024cc581c3ecf5eb0fa87ee11/commonware/middleware.py
18:21 <@oremj> which doesn't have those classes imported
Is our vendor not up to date, I thought I updated it.  Or do we need to do another pull?  if it's the former let me know and I'll update.
Depends on: 703177
Depends on: 703099
Assignee: cshields → jcrowe

Comment 42

6 years ago
Hmm... I was thinking this may be an issue with 'vendor' not being checked out properly, but based on the update script I think it's doing it correctly:

....
    cd $CODE_DIR
    git fetch origin -q
    checkretval
    git checkout -f origin/master -q
    git submodule sync -q
    git submodule update --init -q
    checkretval

    git log -3

    echo -e "Updating vendor..."
    cd $VENDOR_DIR
    git pull -q
    git submodule sync -q
    git submodule update --init -q
    checkretval
....


Perhaps this is fixed by now, or maybe we're updating improperly? I think the 'git pull' in vendor may be problematic... ISTR other apps where that was the case.
Sounds like that's done -- for now.  Is this just a tracking bug now?
(Assignee)

Comment 44

6 years ago
(In reply to Dave Dash [:davedash] from comment #43)
> Sounds like that's done -- for now.  Is this just a tracking bug now?

Is anything broken or still in need of configuration.  FWIW I do not see those errors in the logs any longer.
I guess the only questions I have is:

1. how often is dev being updated?
2. do we have a mechanism for loading/clearing data?

Perhaps commander and chief can be used to help make 1 as quick as possible.

Perhaps we can make some commander tasks for 2, that we can leave off by default, but trigger them via Chief as needed.

But as far as everything else goes, I think we're done.  We can close this out, and we can open new bugs to discuss updating and the like.

-d
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Reporter)

Comment 46

6 years ago
Bumping to verified per comment 45. Many thanks
Status: RESOLVED → VERIFIED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.