Nutch: pages that are not reachable from the front page are not indexed



12 years ago
7 years ago


(Reporter: Gijs, Unassigned)






12 years ago
1. Open
2. Search for "Linux Compatibility Reference" (without quotes)

Actual results:
3. Wonder why Nutch doesn't find what you're asking it for.

Expected results:
3. Redirect to, or at least a search result for,


Use and search using Linux Compatibility Reference. At least that works.

Marking as major because not finding what is clearly *there* is rather ridiculous and can wastes a lot of time for devs trying to find documentation.

Comment 1

12 years ago
I suppose this bug has to do with the fact that this page was only linked from [[Reference Build Configurations]], which in turn was not linked from any other page. I have put both pages into [[Category:Build Documentation]], but it would be interesting to know how nutch crawls the site.
OS: Linux → All
Hardware: PC → All
this search now wfm (duplicated, bug 353210) but I guess fixing the general 
case (though somewhat degenerate) would be nice
Resummarizing.  An easy fix might be to point Nutch at Special:AllPages as its root.
Severity: major → normal
Summary: Nutch: Searching mdc for "Linux Compatibility Reference" doesn't find that exact page → Nutch: pages that are not reachable from the front page are not indexed

Comment 4

9 years ago
Pretty sure this is WFM now that we switched wiki software (a couple of times?) and got rid of Nutch as the search engine (I think?). If I'm wrong, please reopen.
Last Resolved: 9 years ago
Resolution: --- → WORKSFORME
Component: Deki Infrastructure → Other
Product: Mozilla Developer Network → Mozilla Developer Network
You need to log in before you can comment on or make changes to this bug.