Closed
Bug 485329
Opened 15 years ago
Closed 15 years ago
Sphinx indexing fails
Categories
(support.mozilla.org :: General, defect)
support.mozilla.org
General
Tracking
(Not tracked)
VERIFIED
FIXED
1.2.0.1
People
(Reporter: djst, Assigned: chizu)
References
()
Details
(Whiteboard: See comment 51 for description of current bug and steps sumo_only)
Attachments
(3 files, 1 obsolete file)
775 bytes,
patch
|
laura
:
review+
|
Details | Diff | Splinter Review |
719 bytes,
patch
|
laura
:
review+
|
Details | Diff | Splinter Review |
4.80 KB,
patch
|
laura
:
review-
|
Details | Diff | Splinter Review |
In the specified URL http://support-stage.mozilla.org/tiki-newsearch.php?locale=en-US&q=cookies&sa=, the item "Websites say cookies are blocked" appears twice.
Comment 1•15 years ago
|
||
Indeed this happens on my local machine too. Will investigate...
Comment 2•15 years ago
|
||
Those are two separate articles. When we changed the style guide to tell contributors that "website" is one word instead of two ("web site"), there were a few articles that need to be renamed, and the old URLs made to redirect. See bug 469029.
Comment 3•15 years ago
|
||
Sounds like the search is right then. Should this be marked invalid? Or should I just include a special case for "web site"?
Reporter | ||
Comment 4•15 years ago
|
||
Hm, good question. Do we really want dupes like that appear in the search results? I can think of cases where we'd want that (such as when two distinct symptoms are having the same solution), but in this case it's just a corrected title, and those should probably not show up.
Summary: Search results appearing more than once → Redirect article results appearing in search results
Target Milestone: 1.0 → Future
Comment 5•15 years ago
|
||
I'll need to look into the bug Chris mentioned for more info. Chris, got any ideas on how to determine if two articles fall under the scope of bug 469029? Then newsearch would just be able to show only one of them.
Comment 6•15 years ago
|
||
If we're going to exempt articles from appearing in search results, I'd rather it be done using an admin panel, instead of hard-coding it.
Reporter | ||
Updated•15 years ago
|
Severity: normal → enhancement
Summary: Redirect article results appearing in search results → Ability to remove certain articles from appearing in search results
Comment 7•15 years ago
|
||
Alrighty then. I'll look into ways of doing it. I know there's a file called admin_newsearch... should it just go in there? Should we spec out an admin panel for newsearch?
Comment 8•15 years ago
|
||
I think Nelson would have a better idea of where to put it. CCing him.
Updated•15 years ago
|
Assignee: paul.craciunoiu → bmo2008
Comment 9•15 years ago
|
||
Renaming bug for 1.0.2. I'll file a separate one for excluding specific articles.
Summary: Ability to remove certain articles from appearing in search results → Remove "redirect" articles that appear as dupes in search results
Comment 10•15 years ago
|
||
In that case this sounds like a dupe of bug 489046. Maybe you should keep it?
Comment 11•15 years ago
|
||
Here's a list of redirects that we should be able to remove: Passwords Importing from Netscape Importing from Safari Importing from Internet Explorer Importing from Opera Importing from SeaMonkey Importing from Mozilla Application Suite Importing Autoconnect Master Password Windows Media Player - Mac-Linux Windows Media Player - Windows Vista-XP Web sites look wrong Web site colors are wrong Customizing your Firefox with add-ons Flash Shockwave Java Adobe Reader QuickTime RealPlayer Windows Media Player CPU usage Clear Location bar history Clear Search bar history Exporting data to Opera Bookmarks, Toolbar buttons, and History not working How to disable the Smart Location Bar Firefox prints pages incorrectly Web sites say cookies are blocked Firefox cannot load web sites but other programs can Web sites or add-ons incorrectly report incompatible browser No programs can load web sites Firefox never finishes loading certain web sites Using Firefox Software Update Failed - One or more files could not be updated
Comment 12•15 years ago
|
||
Here's an updated list, with the articles they redirect to: Windows Media Player - Information --> Using the Windows Media Player plugin with Firefox Passwords --> Remembering passwords Importing from Netscape --> Importing bookmarks and other data from Netscape Importing from Safari --> Importing bookmarks and other data from Safari Importing from Internet Explorer --> Importing bookmarks and other data from Internet Explorer Importing from Opera --> Importing bookmarks and other data from Opera Importing from SeaMonkey --> Importing bookmarks and other data from SeaMonkey Importing from Mozilla Application Suite --> Importing bookmarks and other data from Mozilla Application Suite Importing --> Importing bookmarks and other data from other browsers Autoconnect --> How to make Firefox automatically dial up Master Password --> Protecting stored passwords using a master password Windows Media Player - Mac-Linux --> Using the Windows Media Player plugin with Firefox Windows Media Player - Windows Vista-XP --> Using the Windows Media Player plugin with Firefox Web sites look wrong --> Websites look wrong Web site colors are wrong --> Website colors are wrong Customizing your Firefox with add-ons --> Customizing Firefox with add-ons Flash --> Using the Flash plugin with Firefox Shockwave --> Using the Shockwave plugin with Firefox Java --> Using the Java plugin with Firefox Adobe Reader --> Using the Adobe Reader plugin with Firefox QuickTime --> Using the QuickTime plugin with Firefox RealPlayer --> Using the RealPlayer plugin with Firefox Windows Media Player --> Using the Windows Media Player plugin with Firefox CPU usage --> Firefox consumes a lot of CPU resources Change the e-mail program used by Firefox --> Changing the e-mail program used by Firefox Clear Location bar history --> Clearing Location bar history Clear Search bar history --> Clearing Search bar history Exporting data to Opera --> Exporting bookmarks to Opera Bookmarks, Toolbar buttons, and History not working --> Bookmarks and toolbar buttons not working after upgrading How to disable the Smart Location Bar --> Hiding bookmarks in the Smart Location Bar Firefox prints pages incorrectly --> Firefox prints pages in a different layout Web sites say cookies are blocked --> Websites say cookies are blocked Firefox cannot load web sites but other programs can --> Firefox cannot load websites but other programs can Web sites or add-ons incorrectly report incompatible browser --> Websites or add-ons incorrectly report incompatible browser No programs can load web sites --> No programs can load websites Firefox never finishes loading certain web sites --> Firefox never finishes loading certain websites Using Firefox --> Browsing basics Software Update Failed - One or more files could not be updated --> Software Update Failed Redirects to keep: Removing Internet Explorer --> Uninstalling Internet Explorer Upgrading Firefox --> Updating Firefox Hiding bookmarks in the Smart Location Bar --> Cannot clear Location bar history
Comment 13•15 years ago
|
||
If anyone thinks there should be an article removed or added to the list, let him/her speak now or forever hold their peace. Or speak before this gets pushed to production. :-)
Comment 14•15 years ago
|
||
How many of total aren't listed here? I'm thinking maybe it's easier to do it the other way, exclude some, if there are fewer.
Comment 15•15 years ago
|
||
Nvm, I read second part of comment 12 just now.
Comment 16•15 years ago
|
||
Do you mean exclude all redirecting articles by default, then create a whitelist of articles to include?
Comment 17•15 years ago
|
||
Well, I was confusing this with my other bug, I see this is assigned to you. How do you plan to do it? I was going to be change the indexer to completely ignore articles where {REDIRECT} shows up in $document['data']. It could check against a white/blacklist (we could put it in an admin panel even) and decide that way.
Comment 18•15 years ago
|
||
Right now, the plan is to remove those articles.
Comment 19•15 years ago
|
||
Does this need to be done on staging by the end of the day?
Comment 20•15 years ago
|
||
Okay, I've remove the "Web sites say cookies are blocked" article on staging; so after the next re-indexing, you should only see "Websites say cookies are blocked" and not "Web sites say cookies are blocked" from comment 0.
Comment 21•15 years ago
|
||
(In reply to comment #13) > If anyone thinks there should be an article removed or added to the list, let > him/her speak now or forever hold their peace. Or speak before this gets pushed > to production. :-) I think removing all of those redirecting articles is a terrible idea. Many of those redirects are for articles that were renamed long after they were originally created. The "Software Update Failed - One or more files could not be updated" --> "Software Update Failed" redirect, for example, was created just last month, for an article in existence since Oct, 2007. See http://support.mozilla.com/tiki-view_forum_thread.php?comments_parentId=310169&forumId=3 Removing the redirects will break the links that currently exist on other forums, help pages, newsgroup posts, etc. There's a discussion on that going on here: http://support.mozilla.com/tiki-view_forum_thread.php?comments_parentId=333166&forumId=3
Comment 22•15 years ago
|
||
I agree with not removing the redirects. Paul proposed a great solution in bug 489046 comment 5, in which we change the indexer to not index articles that start with "{REDIRECT". This would remove those articles from search results and keep old URLs working.
Reporter | ||
Comment 23•15 years ago
|
||
Paul, how hard would that solution be to implement, and how long would it take before this was actually live on prod? 1.1? We would have to deal with duplicate search results listings in the meantime if we went for the proposed solution in 489046 comment 5.
Comment 24•15 years ago
|
||
Chris: It would be really easy to implement. I think can land a patch tonight if need be. I would simply not index articles that start with "{REDIRECT" and then you could edit those that you want to have shown up, and they would automatically show up after the next reindexing. In terms of how "sensitive" the check should be, I can ignore spaces before the "{REDIRECT", or not. I was thinking to ignore leading spaces, but if I do not, then an article starting with " {REDIRECT", for example, would still be indexed. Let me know what you think.
Comment 25•15 years ago
|
||
This is a 3-line patch that removes articles starting with "{REDIRECT" from search indexing.
Assignee: bmo2008 → paul.craciunoiu
Attachment #377249 -
Flags: review?(laura)
Comment 26•15 years ago
|
||
Oops, forgot to exclude those articles that have text before {REDIRECT.
Attachment #377249 -
Attachment is obsolete: true
Attachment #377253 -
Flags: review?(laura)
Attachment #377249 -
Flags: review?(laura)
Updated•15 years ago
|
Attachment #377253 -
Flags: review?(laura) → review+
Comment 27•15 years ago
|
||
r25729 / r25730
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Target Milestone: Future → 1.1
https://support-stage.mozilla.org/tiki-newsearch.php?locale=en&q=firefox+consumes+a+lot+of&where=all&sa=&filter_lang=1&l=en&en_too=1&lastmodif=0&type=0&author= lists "CPU usage" as its first article, which redirects to "Firefox consumes a lot of CPU usage (https://support-stage.mozilla.org/en-US/kb/Firefox+consumes+a+lot+of+CPU+resources). Does this just need a new run of the indexer? iirc, it runs at 4am or something, daily.
Looks like reindexing, which happened this morning, fixed this.
Status: RESOLVED → VERIFIED
Comment 30•15 years ago
|
||
https://support.mozilla.com/tiki-newsearch.php?locale=en&q=firefox+consumes+a+lot+of&where=all&sa=&filter_lang=1&l=en&en_too=1&lastmodif=0&type=0&author still lists "CPU usage".
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Comment 31•15 years ago
|
||
This doesn't occur locally, i.e. the article exists but does not get listed in search results for the query you gave. Might be some issue with production. Any ideas, Laura, Eric?
Comment 32•15 years ago
|
||
Is bug 498119 the reason why this doesn't work?
Comment 33•15 years ago
|
||
(In reply to comment #32) > Is bug 498119 the reason why this doesn't work? Doesn't seem to be, but there is something going on here... Eric, any ideas?
Comment 34•15 years ago
|
||
FTR, this doesn't happen on staging: https://support-stage.mozilla.org/tiki-newsearch.php?locale=en&q=firefox+consumes+a+lot+of&where=all&sa=&filter_lang=1&l=en&en_too=1&lastmodif=0&type=0&author
Comment 35•15 years ago
|
||
(In reply to comment #32) > Is bug 498119 the reason why this doesn't work? Depending how long the prod-only problem at <https://bugzilla.mozilla.org/show_bug.cgi?id=498119#c5> has been around, this is potentially the problem.
Comment 36•15 years ago
|
||
This doesn't occur on stage anymore, is it possible that sphinx/indexer.php wasn't updated on prod, only on stage? E.g. if indexer.php is being run from somewhere else. (see r25730) Otherwise the code is the same... Reed, can you check please please? :)
Updated•15 years ago
|
Target Milestone: 1.1 → 1.3
Comment 37•15 years ago
|
||
Can we get a little help from IT here? Just trying to understand why staging works properly and production does not. I suspect an indexer issue.
Assignee: paul.craciunoiu → server-ops
Severity: enhancement → normal
Comment 39•15 years ago
|
||
Thanks mrz. fox2mike, check out comment 36. Start by looking at the differences between the two indexer.php's and we can go from there. I'm on IRC as paulc if you want to speed this up there.
Comment 40•15 years ago
|
||
(In reply to comment #39) > Thanks mrz. fox2mike, check out comment 36. Start by looking at the differences > between the two indexer.php's and we can go from there. I'm on IRC as paulc if > you want to speed this up there. Well, doesn't seem to be an issue with indexer.php, as it's the exact same file on stage and prod : Stage : [root@mrapp-stage02 sphinx]# md5sum indexer.php 7b83dd8fff32322abbb31358bbb3b3f8 indexer.php Prod : [root@mradm02 sphinx]# md5sum indexer.php 7b83dd8fff32322abbb31358bbb3b3f8 indexer.php
Comment 41•15 years ago
|
||
Okay, Paul and I took a look at this...and right now it looks like there might not be cron job on prod for sphinx. I'll get in touch with Trevor and Paul again on Monday and we'll sort this out.
Comment 42•15 years ago
|
||
There is a cron... it's on pm-app-memcache01. However, netapp isn't mounted there, so scripts dir is missing, and therefore, causing issues with the script. Quick fix is to re-mount netapp... long term fix is to move it to its own VM (actually, pm-app01 might be a good place for it).
Comment 43•15 years ago
|
||
pm-app-memcache doesn't even sound like the right place to run it. Is it?
Comment 44•15 years ago
|
||
(In reply to comment #43) > pm-app-memcache doesn't even sound like the right place to run it. Is it? pm-app-memcache01 used to be mrapp01, which was used as a general "random things go here" webhead.
Comment 46•15 years ago
|
||
The indexing needs to happen ASAP since people can't find the 3.5 support articles.
Severity: normal → blocker
Comment 47•15 years ago
|
||
FWIW: manual indexing should happen /now/ (especially since the slow loading issue has hit press) but resolving this longer term can be done next week.
Assignee: shyam → server-ops
Whiteboard: Please reindex per bug 503526 ASAP
Assignee | ||
Comment 48•15 years ago
|
||
Reindexing is fixed for now. Will be moving it to its own VM soon.
Severity: blocker → major
Comment 49•15 years ago
|
||
This is an issue I isolated locally that will cause indexer.php to fail on certain articles, causing a malformed data.xml. Not confirmed if this is the same issue happening on prod or stage. Full error: http://ecooper.pastebin.mozilla.org/662026
Attachment #387979 -
Flags: review?(laura)
Updated•15 years ago
|
Attachment #387979 -
Flags: review?(laura) → review+
Comment 50•15 years ago
|
||
(In reply to comment #49) > Created an attachment (id=387979) [details] > Fix conflict between rss and screencasts > > This is an issue I isolated locally that will cause indexer.php to fail on > certain articles, causing a malformed data.xml. Not confirmed if this is the > same issue happening on prod or stage. > > Full error: http://ecooper.pastebin.mozilla.org/662026 r29609 in trunk.
Comment 51•15 years ago
|
||
So a basic summary of what happened today: As per comment #48, chizu reindexed. This actually failed. From IRC: chizu: laura: I ran indexing a couple times, both failed with a "no element found" error at the end of the file. laura: ever seen this before on stage etc? chizu: Yes, several times on production and stage. But in the past it's either gone away with subsequent reindexing. chizu: or code fixes that are already done. chizu: Looks like the final </sphinx:docset> just gets cut off sometimes. We tried full manual index, also failed, same problem. Eric reproed locally and got: ( ! ) Fatal error: Cannot redeclare class HTTP_Request in /home/ecooper/projects/sumo/local/trunk/webroot/lib/pear/HTTP/Request.php on line 108 This appears to be a regression we introduced when we added screencasts. It appears fixed by r29609. Actions for today: - Chizu to restore backups of out of date indexes Actions to be done ASAP: - IT to run indexing on stage to confirm that the fix in comment #50 fixes the problem - If yes, we will patch the prod tag and reindex on prod - IT to then re-set up nightly cron job for indexing.
Severity: major → critical
Comment 52•15 years ago
|
||
Scope of bug has completely shifted. Re-summarizing.
Summary: Remove "redirect" articles that appear as dupes in search results → Sphinx indexing fails
Whiteboard: Please reindex per bug 503526 ASAP → See comment 51 for description of current bug and steps
Comment 53•15 years ago
|
||
re comment #51, chizu is afk and search is broken on prod - we need to get those backed up indexes in place ASAP. Upping to blocker and reassigning to server-ops so this can happen faster.
Assignee: thardcastle → server-ops
Severity: critical → blocker
Assignee | ||
Comment 55•15 years ago
|
||
searchd died on the most recent rerun because of a permission issue. Fixed this and search is working on the old indexes again. New indexes continue to fail as in comment #51.
Severity: blocker → major
Comment 56•15 years ago
|
||
Um... did anyone verify this because it's not returning any results at all.
Comment 57•15 years ago
|
||
Yeah, it's completely broken again. (If it was working Friday, we should make sure that a cron job didn't run and break it.)
Assignee: thardcastle → nobody
Severity: major → blocker
Assignee | ||
Comment 58•15 years ago
|
||
Fixed again, reverted the cron job all the way to the version we had before Friday.
Assignee: server-ops → thardcastle
Severity: blocker → major
Comment 59•15 years ago
|
||
(In reply to comment #51) > - IT to run indexing on stage to confirm that the fix in comment #50 fixes the > problem This did not solve the problem on stage - think it may have been a local path issue, as we did not see the same error on stage. In summary, we cannot reproduce locally the problem we are seeing on stage and prod. The code has not changed and used to work (since we have indexes). Is it possible the indexing job is bumping up against Apache or OS resource limits? We set it to not time out in the script but this may be getting overridden elsewhere.
Comment 60•15 years ago
|
||
I should add that we need to try and diagnose this today.
Assignee: thardcastle → server-ops
Severity: major → critical
Comment 61•15 years ago
|
||
(In reply to comment #59) > (In reply to comment #51) > > - IT to run indexing on stage to confirm that the fix in comment #50 fixes the > > problem > > This did not solve the problem on stage - think it may have been a local path > issue, as we did not see the same error on stage. > > In summary, we cannot reproduce locally the problem we are seeing on stage and > prod. The code has not changed and used to work (since we have indexes). Is > it possible the indexing job is bumping up against Apache or OS resource > limits? We set it to not time out in the script but this may be getting > overridden elsewhere. I'm currently downing yesterday's db dump from sumotools to a local dev to see if maybe some new wiki data could be responsible for this.
Comment 62•15 years ago
|
||
For some reason, empty articles have begun appearing on sumo (which shouldn't happen). This caused the indexer to die during the mysql queries that flesh out the 'Did you mean' table. Yesterday's sumo dump reveals 8 articles with null data. All of them are non-en locales, or so it seems.
Attachment #388290 -
Flags: review?(laura)
Comment 63•15 years ago
|
||
bug 503935 has been filed to deal with the NULLs.
Comment 64•15 years ago
|
||
Comment on attachment 388290 [details] [diff] [review] Skip empty articles and words Need to use mb safe string functions.
Attachment #388290 -
Flags: review?(laura) → review-
Comment 65•15 years ago
|
||
(In reply to comment #64) > (From update of attachment 388290 [details] [diff] [review]) > Need to use mb safe string functions. An updated patch is in trunk at r29686 after a brief chat with Laura in #sumodev.
Comment 66•15 years ago
|
||
Both patches are in prod at r29729.
Comment 67•15 years ago
|
||
Committed patches to new tag https://svn.mozilla.org/projects/sumo/tags/1.2/1.2.0.2_20090714/ in r29741. Filed bug 504148 to push it out.
Status: REOPENED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.2.0.1
Comment 68•15 years ago
|
||
Verifying as fixed. <https://support.mozilla.com/tiki-newsearch.php?locale=en&q=CPU+usage&where=all&sa=&filter_lang=1&l=en&en_too=1&lastmodif=0&type=0&author=> does not list "CPU usage" (which is good) <https://support.mozilla.com/tiki-newsearch.php?locale=en&q=cookies&where=all&sa=&filter_lang=1&l=en&en_too=1&lastmodif=0&type=0&author=> does lists "Websites say cookies are blocked" and not "Web sites say cookies are blocked". (ala comment 0) <https://support.mozilla.com/tiki-newsearch.php?locale=en&q=Smart+Location+Bar&where=all&sa=&filter_lang=1&l=en&en_too=1&lastmodif=0&type=0&author=> does list "Hiding bookmarks in the Smart Location Bar" which is a redirect article, with a summary. <https://support.mozilla.com/tiki-newsearch.php?locale=en&q=tab&where=all&sa=&filter_lang=1&l=en&en_too=1&lastmodif=0&type=0&author=> lists "Closing the only tab closes the window" which is an article created very recently. <https://support.mozilla.com/tiki-newsearch.php?locale=en&q=3.5&where=all&l=en&filter_lang=1&author=&filter_author=0&en_too=1&type=0&answered=0&lastmodif=0&offset=10> lists forum threads from July. Thanks to everyone for helping get all this sorted out and fixed.
Status: RESOLVED → VERIFIED
Reporter | ||
Updated•15 years ago
|
Whiteboard: See comment 51 for description of current bug and steps → See comment 51 for description of current bug and steps sumo_only
You need to log in
before you can comment on or make changes to this bug.
Description
•