Closed Bug 488374 Opened 11 years ago Closed 11 years ago

Please reindex Sphinx on support-stage.mozilla.org

Categories

(mozilla.org Graveyard :: Server Operations, task, P1, blocker)

x86
All

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: stephend, Assigned: justdave)

References

()

Details

Attachments

(2 files)

Hi Trevor;

Laura will fill in details later, but basically, per https://bugzilla.mozilla.org/show_bug.cgi?id=485386#c9, we will need Sphinx reindexing on support-stage.mozilla.org.

Thanks!
Blocks: 485386
Blocks: 485300
Severity: normal → critical
As per
https://bugzilla.mozilla.org/show_bug.cgi?id=464851#c6

1. Execute "php manual_index.php" in webroot/ from the shell command line,

2. Execute "php manual_index_forums.php" in webroot/ from the shell command
line,

3. Execute as the webserver user, this: /usr/local/sphinx/bin/indexer --all
--rotate
Reassigning to server-ops so bug is more visible.  Best done by chizu or justdave I would think.
Assignee: thardcastle → server-ops
Assignee: server-ops → thardcastle
The indexer --all --rotate gave this output:
indexing index 'documents'...
ERROR: index 'documents': source 'documents': XML parse error: no element found (line=2396, pos=19, docid=521).
total 0 docs, 0 bytes
total 0.010 sec, 0.00 bytes/sec, 0.00 docs/sec
indexing index 'forums'...
ERROR: index 'forums': source 'forums': XML parse error: undefined entity (line=408351, pos=16120, docid=164096).
total 57558 docs, 65925094 bytes
total 6.506 sec, 10133582.06 bytes/sec, 8847.45 docs/sec
Steps 1 and 2 executed without error?

I'm currently investigating/trying to replicate.
(In reply to comment #3)
> The indexer --all --rotate gave this output:
> indexing index 'documents'...
> ERROR: index 'documents': source 'documents': XML parse error: no element found
> (line=2396, pos=19, docid=521).
> total 0 docs, 0 bytes
> total 0.010 sec, 0.00 bytes/sec, 0.00 docs/sec
> indexing index 'forums'...
> ERROR: index 'forums': source 'forums': XML parse error: undefined entity
> (line=408351, pos=16120, docid=164096).
> total 57558 docs, 65925094 bytes
> total 6.506 sec, 10133582.06 bytes/sec, 8847.45 docs/sec

The error that occurred while indexing 'documents' could have been caused by something fixed in bug 485326. The  undefined entity error shouldn't happen at all if /trunk/scripts/ on staging is up to date with what's in the repo (I don't know if scripts is synced like webroot is on staging).

Could we try this again?

1. Make sure scripts is up to date with the svn repo
2. Delete webroot/lastindexingtime-f.txt and webroot/lastindexingtime-f.txt
3. Follow steps in comment 1

Sorry for the troubles.
Forums error only now:

using config file '/etc/sphinx.conf'...
indexing index 'documents'...
collected 1386 docs, 8.1 MB
sorted 1.0 Mhits, 100.0% done
total 1386 docs, 8103206 bytes
total 0.835 sec, 9698805.10 bytes/sec, 1658.92 docs/sec
indexing index 'forums'...
ERROR: index 'forums': source 'forums': XML parse error: undefined entity (line=408351, pos=16120, docid=164096).
total 57557 docs, 65923305 bytes
total 6.719 sec, 9811585.19 bytes/sec, 8566.40 docs/sec
rotating indices: succesfully sent SIGHUP to searchd (pid=3423).
Can the output of "sed -n '408351 p' /usr/local/sphinx/index/data-forums.xml" (or head piped into tail -n 1 due to the size of the file) be attached or posted to the bug?
Ping, chizu?
Assignee: thardcastle → server-ops
Severity: critical → blocker
I've paged chizu three or four times now, and no response yet (it's daytime on a weekday - he may be in class or something), so I'm attempting to poke...

Based on the referenced bugs, I'm assuming this is on mrapp-stage02.
Assignee: server-ops → justdave
Great! Thanks.

Knocking this back down. We'll need this run again shortly, but for the immediate future it doesn't need to be marked as a blocker.
Severity: blocker → major
Blocks: 488429
(In reply to comment #11)
> Great! Thanks.
> 
> Knocking this back down. We'll need this run again shortly, but for the
> immediate future it doesn't need to be marked as a blocker.

The way the indexers work have been updated and need to be tested for a freeze tomorrow. 

Can we get this run again (see comment 5). After this, we should be set.
Severity: major → critical
Severity: critical → blocker
Sorry for the delay, FF release kept me busy, and I also slept a lot this afternoon.  Working on this now...
Attached file transcript
Comment 1 step 1 crashed (hard).  Transcript attached.
Looks like we poked a bug in libxml2.  Dave, what version of PHP and libxml2 do we have on there?  Eric, what are you running locally?
Yeah, I spent the last half-hour digging the internets. It does look like a libxml2 problem. <http://bugs.php.net/bug.php?id=42858>

On the dev instance I tested this on, I'm at PHP 5.2.8 and libxml2 2.7.2.

According to tiki, stage is at PHP 5.1.6 and libxml2 2.6.26    

I'm willing to bet updating libxml2 will solve this. If not, I can use an alternative method in the indexer that doesn't use DOM or XPath.
> According to tiki, stage is at PHP 5.1.6 and libxml2 2.6.26    
> 
> I'm willing to bet updating libxml2 will solve this. If not, I can use an
> alternative method in the indexer that doesn't use DOM or XPath.

I just updated php (to 23.2.el5_3) but it's still at 5.1.6 and there's no RHEL update to libxml2.

If that's the only fix it looks like it'll be a custom build to get there since RHEL isn't up to date enough.
If that's the case, I'll work up an alternative solution for the indexer so it avoids using libxml2.
Whiteboard: looking for libxml2 alternative
Eric, I think that's the wrong approach.  We'll need libxml2 for more and more, and the indexer is working with the up to date version.  That's a pretty old version in RHEL - I can't even d/l one that old.  (2.6.30 is from August 2007).  I also don't think rewriting the (working) indexer at this stage is a good idea.  It will set us back in QA terms.

Matthew, how long will it take to make a custom build?
I don't know, I'll let justdave comment.
Whiteboard: looking for libxml2 alternative
It's been about 4 hours, can we get an update?
Reassigning to server-ops so someone will get paged.
Assignee: justdave → server-ops
Priority: -- → P1
I'll see if I can find something now...  is this going to go in a production environment at some point or will this only be on the staging box?

Even in production I'll feel much better about a custom build if it's only one one or two boxes and not the whole web cluster, for example.
Assignee: server-ops → justdave
Updated: libxml2.i386 0:2.7.2-1.rhel5 libxml2-python.i386 0:2.7.2-1.rhel5
Complete!

re-attempting the re-index now.
Woot!  Thanks, Dave!
New version of libxml2 did not fix the crash.

http://pastebin.mozilla.org/644116 (will only be there for 24 hours)
Ugh. I'll continue to investigate a work around.

At the risk of sounding dumb (since I don't know the infrastructure behind staging), phpinfo() through tiki on staging says that php is still using 2.6.26.

System 	Linux mrapp-stage02 2.6.18-128.1.1.el5PAE #1 SMP Mon Jan 26 14:18:23 EST 2009 i686 

libxml
libXML support 	active
libXML Version 	2.6.26
libXML streams 	enabled
It's probably loaded in memory in the apache process since you're using mod_php.  Running it from the command line I would assume would load it from disk when I run it...
Dave, did you reconfigure php --with-libxml-dir (to wherever you installed it)?  If it's not the distro default, this may be needed.
PS you can check which version is being invoked on the command line by reading the output you get from 
php -i
This will do phpinfo() for the version that's being invoked.
Reassigning to server-ops so this gets noticed again today.
Assignee: justdave → server-ops
Assignee: server-ops → justdave
nice, so libxml2 seems to be statically linked into php... :(

Recompiled php with libxml2 2.7.2 present on the build machine.

Updated: php.i386 0:5.1.6-23.2.rhel5+libxml2 php-cli.i386 0:5.1.6-23.2.rhel5+libxml2 php-common.i386 0:5.1.6-23.2.rhel5+libxml2 php-devel.i386 0:5.1.6-23.2.rhel5+libxml2 php-gd.i386 0:5.1.6-23.2.rhel5+libxml2 php-ldap.i386 0:5.1.6-23.2.rhel5+libxml2 php-mbstring.i386 0:5.1.6-23.2.rhel5+libxml2 php-mysql.i386 0:5.1.6-23.2.rhel5+libxml2 php-pdo.i386 0:5.1.6-23.2.rhel5+libxml2 php-pgsql.i386 0:5.1.6-23.2.rhel5+libxml2 php-xml.i386 0:5.1.6-23.2.rhel5+libxml2
Complete!

[root@mrapp-stage02 ~]# php -i | grep -i libxml
...
libXML Version => 2.7.2

the re-index script is running again now.
P.S. I'm doing this now on the staging box because it's blocking QA work.  Putting a custom build of PHP on the production servers does NOT excite me.
OK, so next step is to try and reproduce this on khan, so we can try and code around the bug.  I think that's probably less work that rewriting the whole indexer.  Eric, what do you think?  (And are you set up to work on khan?)
Also, reassigning to Eric until we work out what to do next.  Will reassign to IT when we have a proposed solution.
Assignee: justdave → smirkingsisyphus
(In reply to comment #35)
> OK, so next step is to try and reproduce this on khan, so we can try and code
> around the bug.  I think that's probably less work that rewriting the whole
> indexer.  Eric, what do you think?  (And are you set up to work on khan?)

I'm currently testing a limited alternative to clear up the bugs blocking search. If it's able close those to an extent, I'd like move on and make reworking the indexer something done in tandem with my other Q2 goals. Undoubtedly, there will be things we want to improve with the search anyway.

AFIK, I'm not set up for khan. If I am, no one has told me yet. ;)
Yeah, we shouldn't do custom builds of php... that's just not cool.
OK the libxml2 stuff has all been ripped out of the indexer (as per https://bugzilla.mozilla.org/show_bug.cgi?id=488429#c10)

Time to try again.
Assignee: smirkingsisyphus → server-ops
Assignee: server-ops → justdave
running now...
[root@mrapp-stage02 webroot]# php manual_index.php 
Data retrieved from the db
1386 wiki pages successfully parsed in 703.683 s
[root@mrapp-stage02 webroot]# php manual_index_forums.php 
Data retrieved from the db
111656 topics successfully parsed in 1038.722 s
[root@mrapp-stage02 webroot]# su - apache -s /bin/bash -c "/usr/bin/indexer --all --rotate"
Sphinx 0.9.8.1-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file '/etc/sphinx.conf'...
indexing index 'documents'...
collected 1386 docs, 8.1 MB
sorted 1.0 Mhits, 100.0% done
total 1386 docs, 8122729 bytes
total 0.862 sec, 9425065.63 bytes/sec, 1608.22 docs/sec
indexing index 'forums'...
collected 111656 docs, 129.8 MB
collected 0 attr values
sorted 0.1 Mvalues, 100.0% done
sorted 21.2 Mhits, 100.0% done
total 111656 docs, 129765432 bytes
total 19.687 sec, 6591301.55 bytes/sec, 5671.45 docs/sec
WARNING: no process found by PID 3423.
WARNING: indices NOT rotated.
Hmm...It seems like searchd has gone away, which explains why we can't get search results on staging right now. The paths might need adjusted since I don't know the setup on staging. This should get us going, though. 

/usr/bin/searchd --stop
/usr/bin/searchd
/usr/bin/indexer --all --rotate
The sphinx RPM doesn't appear to install an initscript, so the searchd process probably never came back after the last reboot.

[root@mrapp-stage02 ~]# searchd --stop
Sphinx 0.9.8.1-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file '/etc/sphinx.conf'...
stop: succesfully sent SIGTERM to pid 3193
[root@mrapp-stage02 ~]# su - apache -s /bin/bash -c "/usr/bin/searchd"
Sphinx 0.9.8.1-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file '/etc/sphinx.conf'...
creating server socket on 0.0.0.0:3312
[root@mrapp-stage02 ~]# su - apache -s /bin/bash -c "/usr/bin/indexer --all --rotate"
Sphinx 0.9.8.1-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file '/etc/sphinx.conf'...
indexing index 'documents'...
collected 1386 docs, 8.1 MB
sorted 1.0 Mhits, 100.0% done
total 1386 docs, 8122729 bytes
total 0.834 sec, 9738806.09 bytes/sec, 1661.75 docs/sec
indexing index 'forums'...
collected 111656 docs, 129.8 MB
collected 0 attr values
sorted 0.1 Mvalues, 100.0% done
sorted 21.2 Mhits, 100.0% done
total 111656 docs, 129765432 bytes
total 19.725 sec, 6578783.07 bytes/sec, 5660.68 docs/sec
rotating indices: succesfully sent SIGHUP to searchd (pid=3902).
Everything looks good.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.