Closed Bug 571228 Opened 14 years ago Closed 14 years ago

Push Socorro 1.7 to production

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

All
Other
task
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: laura, Assigned: aravind)

Details

The tag for this release is http://socorro.googlecode.com/svn/tags/releases/1.7_r2148_20100610/

Upgrade instructions are at 
http://code.google.com/p/socorro/wiki/SocorroUpgrade#Socorro_1.7

We have a pre-release meeting in #362 at 10 PT, and will push immediately after, probably around 11PT.
Assignee: server-ops → aravind
Working on collector configs, waiting for Daniel to turn off crash submissions into HBase.
Downloaded and extracted new hbase version
Symlinked to 0.20.5
copied CDH 2 hadoop jars to hadoop/lib
LZO jar and .so copied to /usr/lib/hbase-0.20.5/lib and .so symlinked.
Official production configs for hbase 0.20.5 copied from ~deinspanjer/hbase_conf to /usr/lib/hbase-0.20.5/conf
hbase-0.20.5-20100602 pushed to all production cluster
chowned to hadoop.hadoop
symlinked to hbase-0.20.5
Turned off crash submissions to HBase.  Still collecting into nfs.

Still working on other config files.
Copied the maintenance page over the index page, that should propagate out in like 10 minutes.
Thrift stopped.  Starting hbase master shutdown
Master stopped
updated /usr/lib/hbase to point at hbase-0.20.5
Starting master
Master started.  Waiting for regions to be assigned.
Almost done with expensive migrations.  Create table statements will be fast.
Wrong with almost done.  The second alter is taking much longer than the first.  Unfortunately, I don't have *any* way to tell progress at the moment..

Need one more migration to up the max region file size for the existing crash_reports table. Not in the migration file currently.

alter 'crash_reports', {METHOD => 'table_att', MAX_FILESIZE => '1073741824'}
Second one took 35 minutes.
Doing the table_att MAX_FILESIZE one now.
I think the third alter might be quicker like the first.  The second was probably very slow because the flags: column family already existed.
MAX_FILESIZE alter done. 792 seconds.
Doing last alter now.
Last alter took 162 seconds.
Enabling crash reports table now.
Done enabling big table.
Creating new tables now.
Done with all migrations.
Started thrift servers.
HBase upgrade complete
Done with the collectors, monitor, processors and the middleware layer.

Working on the php front end next.
I am now done with webapp and cron jobs as well.  All done..
Please restart web service layer. webapp-php can't talk to it.
Did a crash me now for d62d8c64-206c-4523-852d-ba5b12100610

it's in hadoop

http://crash-stats.mozilla.com/dumps/d62d8c64-206c-4523-852d-ba5b12100610.jsonz
is a 500 system error
"Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, root@localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.
Apache/2.2.3 (Red Hat) Server at dm-bp-mware01.mozilla.org Port 80"
The advanced search page at http://crash-stats.mozilla.com/query is encountering connection reset errors with every single query and has not completed a successful query yet.
Upon the first page load at http://crash-stats.mozilla.com/query, I am receiving this error:

"The maximum query date range you can perform is days. Admins may log in to increase query date range limits. Query results have been narrowed to the default range of ."

This leads me to believe that a config file is missing certain values.  Please ensure this array is found at the bottom of application/config/application.php:

/**
 * The query range limit for users who have the role of user and admin.
 *
 * @see My_SearchReportHelper->normalizeDateUnitAndValue()
 */
$config['query_range_defaults'] = array(
    'admin' => array(
        'range_default_value' => 14,
        'range_default_unit' => 'days',
        'range_limit_value_in_days' => 120
    ),
    'user' => array(
        'range_default_value' => 14,
        'range_default_unit' => 'days',
        'range_limit_value_in_days' => 30
    )
);
(In reply to comment #22)
(In reply to comment #23)
These are fixed.
Running the daily crash job now.  Its set to run in cron at 00:15
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Verification status:

Many features are working except:

1) ADU Report as noted in Comment #27
2) Most Search Queries are timing out

Performance has been improving and load on Postgres has dropped form 8 to 6 to 1 in the last hour. We're going to build out #1 and regroup at 6:40pm to see how #2 is looking.
(In reply to comment #28)
WTF... #2 advanced search is fixed.
http://crash-stats.mozilla.com/daily is working now.

Verification complete.
I can't verify all the bugs pushed in 1.7, but I've run through a series of post-push tests, and it's looking good to me (plus comment 29 and comment 30; thanks, Austin!)

Verified.
Status: RESOLVED → VERIFIED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.