Closed Bug 369976 Opened 17 years ago Closed 15 years ago

Nutch: pages that are not reachable from the front page are not indexed

Categories

(developer.mozilla.org Graveyard :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: Gijs, Unassigned)

References

()

Details

STR:
1. Open http://developer.mozilla.org/en/docs/
2. Search for "Linux Compatibility Reference" (without quotes)

Actual results:
3. Wonder why Nutch doesn't find what you're asking it for.

Expected results:
3. Redirect to, or at least a search result for, http://developer.mozilla.org/en/docs/Linux_Compatibility_Reference


Workaround:

Use http://www.google.com/ and search using site:developer.mozilla.org Linux Compatibility Reference. At least that works.

Marking as major because not finding what is clearly *there* is rather ridiculous and can wastes a lot of time for devs trying to find documentation.
I suppose this bug has to do with the fact that this page was only linked from [[Reference Build Configurations]], which in turn was not linked from any other page. I have put both pages into [[Category:Build Documentation]], but it would be interesting to know how nutch crawls the site.
OS: Linux → All
Hardware: PC → All
this search now wfm (duplicated, bug 353210) but I guess fixing the general 
case (though somewhat degenerate) would be nice
Resummarizing.  An easy fix might be to point Nutch at Special:AllPages as its root.
Severity: major → normal
Summary: Nutch: Searching mdc for "Linux Compatibility Reference" doesn't find that exact page → Nutch: pages that are not reachable from the front page are not indexed
Pretty sure this is WFM now that we switched wiki software (a couple of times?) and got rid of Nutch as the search engine (I think?). If I'm wrong, please reopen.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WORKSFORME
Component: Deki Infrastructure → Other
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.