Closed Bug 520307 Opened 15 years ago Closed 15 years ago

Push Sphinx changes, and cron new indexing scripts on preview

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

All
Other
task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: davedash, Assigned: chizu)

References

Details

Push the new sphinx changes according to this:

https://intranet.mozilla.org/WebSites#Sphinx

Ping me for any clarifications.
Assignee: server-ops → jeremy.orem+bugs
Assignee: jeremy.orem+bugs → thardcastle
Note, if there's an easier way to automate this, let me know.  I can write config files differently or write build scripts to build config files...

just not sure what the easiest way is... *-dist sort of sucks when the file changes regularly.
Looking over prime_sphinx_index.py, this new setup forces us to choose just one
AMO slave db for search? That seems like a bad single point of failure to
introduce. Is there a good reason for it?
hmm you make a good point regarding single point of failure ... go ahead and point it to the master and have it replicate.

It just didn't seem efficient to have a tbale that gets rebuild every 5 mins to be distributed across the network.
Index feed table initialized.
Priming index with addons/addon_id/locale data
done
Adding author information.
/data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'authors' at row 1
  c2.execute("""
done
Adding tag information.
/data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'tags' at row 1
  c2.execute("""
/data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'tags' at row 2
  c2.execute("""
/data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'tags' at row 3
  c2.execute("""
done
Adding version info
done
Adding translated data
done
Updating max/min versions
done
Adding date modified
done
Sphinx 0.9.9-rc2 (r1785)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file '/data/sphinx/preview.addons/sphinx.conf'...
indexing index 'addons'...
collected 22482 docs, 11.9 MB
collected 263919 attr values
sorted 0.3 Mvalues, 100.0% done
sorted 26.7 Mhits, 99.8% done
total 22482 docs, 11915375 bytes
total 14.387 sec, 828147 bytes/sec, 1562.55 docs/sec
total 271 reads, 0.101 sec, 493.7 kb/call avg, 0.3 msec/call avg
total 319 writes, 0.878 sec, 941.8 kb/call avg, 2.7 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=21851).
Note: /data/sphinx/preview.addons/preindex is a symlink to prime_sphinx_index.py

I made two changes to prime_sphinx_index.py and put the configuration in a settings.py. First to execute it directly, second for Python 2.4:
+#!/usr/bin/python

-class SphinxIndexPrimer():
+class SphinxIndexPrimer:
(In reply to comment #5)
> I made two changes to prime_sphinx_index.py and put the configuration in a
> settings.py. First to execute it directly, second for Python 2.4:
> +#!/usr/bin/python

The amo boxes should have Python 2.6.  I think the executable is called python2.6 or python26.
(In reply to comment #6)
> The amo boxes should have Python 2.6.  I think the executable is called
> python2.6 or python26.
This runs on the sphinx cluster, which is separate. Python 2.6 can be installed if it's required, rather use upstream Python when we can.
This cronjob is horked and spewing emails.

Cron <daemon@pm-app-sphinx02> /data/bin/sphinx-reindex preview.addons > /dev/null

File "/data/sphinx/preview.addons/preindex", line 29
   class SphinxIndexPrimer():
                           ^
SyntaxError: invalid syntax

and,

Cron <daemon@pm-app-sphinx01> /data/bin/sphinx-reindex preview.addons > /dev/null

/data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'authors' at row 1
 c2.execute("""
/data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'tags' at row 1
 c2.execute("""
/data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'tags' at row 2
 c2.execute("""
/data/sphinx/preview.addons/preindex:190: Warning: Data truncated for column 'tags' at row 3
 c2.execute("""
Trevor, if possible lets use the same python26 on these boxes too - I can back down to 2.5 usually, but breaking 2.4 syntax is all too easy.

These warnings can be ignored, I'll clean up these warnings in a commit:

r52806
Updated and switched to Python 2.6, there's a new set of errors and warnings. Always this:
/usr/lib64/python2.6/site-packages/MySQLdb/__init__.py:34: DeprecationWarning: the sets module is deprecated
  from sets import ImmutableSet

Sometimes:
Traceback (most recent call last):
  File "/data/sphinx/preview.addons/preindex", line 259, in <module>
    s.prime_index()
  File "/data/sphinx/preview.addons/preindex", line 36, in prime_index
    self.populate_feed()
  File "/data/sphinx/preview.addons/preindex", line 249, in populate_feed
    self.add_authors()
  File "/data/sphinx/preview.addons/preindex", line 151, in add_authors
    self.add_data(query=gq, msg=msg, field='authors')
  File "/data/sphinx/preview.addons/preindex", line 193, in add_data
    """ % field, (items, addon_id))
  File "/usr/lib64/python2.6/site-packages/MySQLdb/cursors.py", line 166, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 35, in defaulterrorhandler
    raise errorclass, errorvalue
_mysql_exceptions.ProgrammingError: (1146, "Table 'addons_reskin.sphinx_index_feed' doesn't exist")
Maybe it has something to do with bug 520343.  Every cron job that runs on preview has been failing since 6pm yesterday.
Wait... we have 2 indexer nodes?  They aren't both dropping and creating the same table are they?

We'll need to adjust this... a lot.  I might need some time on monday to figure this out.  

Basically only one node should be recreating sphinx_index_Feed (or dropping it) and I can see if the sphinx queries can issue a read lock on the sphinx index table and then release it after the indexing is complete...

Trevor, if that makes sense, let me know and I'll code it up monday.
Yes, both nodes are indexing since they're supposed to be redundant. Sorry for that being unclear. Yesterday we had some stacked indexing too, as puppet activated a copy of the cron job.
Okay, wow... yeah I can see why the view approach was well-loved.

Well I think this is still workable.  I'll just have sphinx lock the tables while it indexes.  And then we can run the pre-indexer on a single node at 2,7,12,17,22,27,32,...,57 after the hour.  Then it should all work smoothly most of the time.

Also on the python script I'll try to have it do some pid checking.

Lastly, I do know the set warnings we can't escape... it's some wierd issue between MySQLdb and python2.6.  jbalogh - know away around it?  We can dial back to python2.5.

heh... glad this is all in the preview environment...
(In reply to comment #14)
> Also on the python script I'll try to have it do some pid checking.

I like http://pypi.python.org/pypi/lockfile/ for script locking.

> Lastly, I do know the set warnings we can't escape... it's some wierd issue
> between MySQLdb and python2.6.  jbalogh - know away around it?

import warnings

with warnings.catch_warnings():
    warnings.simplefilter('ignore')
    import MySQLdb as mysql

You can also throw ``warnings.simplefilter('ignore')`` at the top of your script and be rid of all warnings, but then you might miss out on problems in your own code.
Okay I'll take care of these items... in the meantime bug 520516 implies that we somehow broke search.  Can we get to the bottom of that, none of these issues should have broken search.

If someone in IT wants to share a screen session to me, we can see what's going wrong.
Blocks: 520516
The error in comment ten zeros the index, breaking search until it runs successfully (with one try every half an hour, that's not very often).
Alright r52880 and r52883 is an attempt at doing this better.

* Everything should be a bit faster as I cleaned up the queries a lot
* sphinx.conf-dist now locks the tables (and unlocks them) in order to block the .py file.
* The .py file now does everything in sphinx_index_feed_tmp and at the final renames the table to sphinx_index_feed

So here's what needs to be done:
* Get the new Py file and the new sphinx-dist.py file and update the sphinx clster.
* Run: "alter table addons_users add key(listed);" to add a key to speed things up
* pip install lockfile, it's a requirement for the prime_sphinx_index.py
* run run index-sphinx.sh only on a single node - we need to watch the cron 
log to make sure it keeps running - if need be I can have it touch a file each time it runs
* on each other node just run indexer --all --rotate as we did before

The cron should run */5 mins for the indexer --all --rotate
For the index-sphinx.sh let's do 2,7,12,17,22,27,32,37,42,47,52,57 - that'll give the other nodes enough time to fetch a new version of the sphinx_index_feed.

Let me know if this will work, or how we can do this better.
(In reply to comment #16)

> If someone in IT wants to share a screen session to me, we can see what's going
> wrong.

This and other search bugs are blocking the fennec release.  Can we get together on IRC and figure this out asap?
(In reply to comment #18)
> * run run index-sphinx.sh only on a single node - we need to watch the cron 
> log to make sure it keeps running - if need be I can have it touch a file each
> time it runs

I've set this up for the time being, but I think we should try to make it possible for any node (including the one running index-sphinx.sh) to go away.

Changes are live.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.