Nutch: pages that are not reachable from the front page are not indexed

RESOLVED WORKSFORME

Status

Mozilla Developer Network
General
RESOLVED WORKSFORME
11 years ago
5 years ago

People

(Reporter: Gijs, Unassigned)

Tracking

Details

(URL)

(Reporter)

Description

11 years ago
STR:
1. Open http://developer.mozilla.org/en/docs/
2. Search for "Linux Compatibility Reference" (without quotes)

Actual results:
3. Wonder why Nutch doesn't find what you're asking it for.

Expected results:
3. Redirect to, or at least a search result for, http://developer.mozilla.org/en/docs/Linux_Compatibility_Reference


Workaround:

Use http://www.google.com/ and search using site:developer.mozilla.org Linux Compatibility Reference. At least that works.

Marking as major because not finding what is clearly *there* is rather ridiculous and can wastes a lot of time for devs trying to find documentation.

Comment 1

11 years ago
I suppose this bug has to do with the fact that this page was only linked from [[Reference Build Configurations]], which in turn was not linked from any other page. I have put both pages into [[Category:Build Documentation]], but it would be interesting to know how nutch crawls the site.
OS: Linux → All
Hardware: PC → All
this search now wfm (duplicated, bug 353210) but I guess fixing the general 
case (though somewhat degenerate) would be nice
Resummarizing.  An easy fix might be to point Nutch at Special:AllPages as its root.
Severity: major → normal
Summary: Nutch: Searching mdc for "Linux Compatibility Reference" doesn't find that exact page → Nutch: pages that are not reachable from the front page are not indexed
(Reporter)

Comment 4

8 years ago
Pretty sure this is WFM now that we switched wiki software (a couple of times?) and got rid of Nutch as the search engine (I think?). If I'm wrong, please reopen.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → WORKSFORME
(Assignee)

Updated

5 years ago
Component: Deki Infrastructure → Other
Product: Mozilla Developer Network → Mozilla Developer Network
You need to log in before you can comment on or make changes to this bug.