Closed
Bug 520307
Opened 15 years ago
Closed 15 years ago
Push Sphinx changes, and cron new indexing scripts on preview
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: davedash, Assigned: chizu)
References
Details
Push the new sphinx changes according to this: https://intranet.mozilla.org/WebSites#Sphinx Ping me for any clarifications.
Updated•15 years ago
|
Assignee: server-ops → jeremy.orem+bugs
Reporter | ||
Comment 1•15 years ago
|
||
Note, if there's an easier way to automate this, let me know. I can write config files differently or write build scripts to build config files... just not sure what the easiest way is... *-dist sort of sucks when the file changes regularly.
Looking over prime_sphinx_index.py, this new setup forces us to choose just one AMO slave db for search? That seems like a bad single point of failure to introduce. Is there a good reason for it?
Reporter | ||
Comment 3•15 years ago
|
||
hmm you make a good point regarding single point of failure ... go ahead and point it to the master and have it replicate. It just didn't seem efficient to have a tbale that gets rebuild every 5 mins to be distributed across the network.
Index feed table initialized. Priming index with addons/addon_id/locale data done Adding author information. /data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'authors' at row 1 c2.execute(""" done Adding tag information. /data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'tags' at row 1 c2.execute(""" /data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'tags' at row 2 c2.execute(""" /data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'tags' at row 3 c2.execute(""" done Adding version info done Adding translated data done Updating max/min versions done Adding date modified done Sphinx 0.9.9-rc2 (r1785) Copyright (c) 2001-2009, Andrew Aksyonoff using config file '/data/sphinx/preview.addons/sphinx.conf'... indexing index 'addons'... collected 22482 docs, 11.9 MB collected 263919 attr values sorted 0.3 Mvalues, 100.0% done sorted 26.7 Mhits, 99.8% done total 22482 docs, 11915375 bytes total 14.387 sec, 828147 bytes/sec, 1562.55 docs/sec total 271 reads, 0.101 sec, 493.7 kb/call avg, 0.3 msec/call avg total 319 writes, 0.878 sec, 941.8 kb/call avg, 2.7 msec/call avg rotating indices: succesfully sent SIGHUP to searchd (pid=21851).
Note: /data/sphinx/preview.addons/preindex is a symlink to prime_sphinx_index.py I made two changes to prime_sphinx_index.py and put the configuration in a settings.py. First to execute it directly, second for Python 2.4: +#!/usr/bin/python -class SphinxIndexPrimer(): +class SphinxIndexPrimer:
Comment 6•15 years ago
|
||
(In reply to comment #5) > I made two changes to prime_sphinx_index.py and put the configuration in a > settings.py. First to execute it directly, second for Python 2.4: > +#!/usr/bin/python The amo boxes should have Python 2.6. I think the executable is called python2.6 or python26.
(In reply to comment #6) > The amo boxes should have Python 2.6. I think the executable is called > python2.6 or python26. This runs on the sphinx cluster, which is separate. Python 2.6 can be installed if it's required, rather use upstream Python when we can.
Comment 8•15 years ago
|
||
This cronjob is horked and spewing emails. Cron <daemon@pm-app-sphinx02> /data/bin/sphinx-reindex preview.addons > /dev/null File "/data/sphinx/preview.addons/preindex", line 29 class SphinxIndexPrimer(): ^ SyntaxError: invalid syntax and, Cron <daemon@pm-app-sphinx01> /data/bin/sphinx-reindex preview.addons > /dev/null /data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'authors' at row 1 c2.execute(""" /data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'tags' at row 1 c2.execute(""" /data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'tags' at row 2 c2.execute(""" /data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'tags' at row 3 c2.execute("""
Reporter | ||
Comment 9•15 years ago
|
||
Trevor, if possible lets use the same python26 on these boxes too - I can back down to 2.5 usually, but breaking 2.4 syntax is all too easy. These warnings can be ignored, I'll clean up these warnings in a commit: r52806
Assignee | ||
Comment 10•15 years ago
|
||
Updated and switched to Python 2.6, there's a new set of errors and warnings. Always this: /usr/lib64/python2.6/site-packages/MySQLdb/__init__.py:34: DeprecationWarning: the sets module is deprecated from sets import ImmutableSet Sometimes: Traceback (most recent call last): File "/data/sphinx/preview.addons/preindex", line 259, in <module> s.prime_index() File "/data/sphinx/preview.addons/preindex", line 36, in prime_index self.populate_feed() File "/data/sphinx/preview.addons/preindex", line 249, in populate_feed self.add_authors() File "/data/sphinx/preview.addons/preindex", line 151, in add_authors self.add_data(query=gq, msg=msg, field='authors') File "/data/sphinx/preview.addons/preindex", line 193, in add_data """ % field, (items, addon_id)) File "/usr/lib64/python2.6/site-packages/MySQLdb/cursors.py", line 166, in execute self.errorhandler(self, exc, value) File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 35, in defaulterrorhandler raise errorclass, errorvalue _mysql_exceptions.ProgrammingError: (1146, "Table 'addons_reskin.sphinx_index_feed' doesn't exist")
Comment 11•15 years ago
|
||
Maybe it has something to do with bug 520343. Every cron job that runs on preview has been failing since 6pm yesterday.
Reporter | ||
Comment 12•15 years ago
|
||
Wait... we have 2 indexer nodes? They aren't both dropping and creating the same table are they? We'll need to adjust this... a lot. I might need some time on monday to figure this out. Basically only one node should be recreating sphinx_index_Feed (or dropping it) and I can see if the sphinx queries can issue a read lock on the sphinx index table and then release it after the indexing is complete... Trevor, if that makes sense, let me know and I'll code it up monday.
Assignee | ||
Comment 13•15 years ago
|
||
Yes, both nodes are indexing since they're supposed to be redundant. Sorry for that being unclear. Yesterday we had some stacked indexing too, as puppet activated a copy of the cron job.
Reporter | ||
Comment 14•15 years ago
|
||
Okay, wow... yeah I can see why the view approach was well-loved. Well I think this is still workable. I'll just have sphinx lock the tables while it indexes. And then we can run the pre-indexer on a single node at 2,7,12,17,22,27,32,...,57 after the hour. Then it should all work smoothly most of the time. Also on the python script I'll try to have it do some pid checking. Lastly, I do know the set warnings we can't escape... it's some wierd issue between MySQLdb and python2.6. jbalogh - know away around it? We can dial back to python2.5. heh... glad this is all in the preview environment...
Comment 15•15 years ago
|
||
(In reply to comment #14) > Also on the python script I'll try to have it do some pid checking. I like http://pypi.python.org/pypi/lockfile/ for script locking. > Lastly, I do know the set warnings we can't escape... it's some wierd issue > between MySQLdb and python2.6. jbalogh - know away around it? import warnings with warnings.catch_warnings(): warnings.simplefilter('ignore') import MySQLdb as mysql You can also throw ``warnings.simplefilter('ignore')`` at the top of your script and be rid of all warnings, but then you might miss out on problems in your own code.
Reporter | ||
Comment 16•15 years ago
|
||
Okay I'll take care of these items... in the meantime bug 520516 implies that we somehow broke search. Can we get to the bottom of that, none of these issues should have broken search. If someone in IT wants to share a screen session to me, we can see what's going wrong.
Blocks: 520516
Assignee | ||
Comment 17•15 years ago
|
||
The error in comment ten zeros the index, breaking search until it runs successfully (with one try every half an hour, that's not very often).
Reporter | ||
Comment 18•15 years ago
|
||
Alright r52880 and r52883 is an attempt at doing this better. * Everything should be a bit faster as I cleaned up the queries a lot * sphinx.conf-dist now locks the tables (and unlocks them) in order to block the .py file. * The .py file now does everything in sphinx_index_feed_tmp and at the final renames the table to sphinx_index_feed So here's what needs to be done: * Get the new Py file and the new sphinx-dist.py file and update the sphinx clster. * Run: "alter table addons_users add key(listed);" to add a key to speed things up * pip install lockfile, it's a requirement for the prime_sphinx_index.py * run run index-sphinx.sh only on a single node - we need to watch the cron log to make sure it keeps running - if need be I can have it touch a file each time it runs * on each other node just run indexer --all --rotate as we did before The cron should run */5 mins for the indexer --all --rotate For the index-sphinx.sh let's do 2,7,12,17,22,27,32,37,42,47,52,57 - that'll give the other nodes enough time to fetch a new version of the sphinx_index_feed. Let me know if this will work, or how we can do this better.
Comment 19•15 years ago
|
||
(In reply to comment #16) > If someone in IT wants to share a screen session to me, we can see what's going > wrong. This and other search bugs are blocking the fennec release. Can we get together on IRC and figure this out asap?
Assignee | ||
Comment 20•15 years ago
|
||
(In reply to comment #18) > * run run index-sphinx.sh only on a single node - we need to watch the cron > log to make sure it keeps running - if need be I can have it touch a file each > time it runs I've set this up for the time being, but I think we should try to make it possible for any node (including the one running index-sphinx.sh) to go away. Changes are live.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•