Looks like it maybe didn't come back up after the maint window? I thought it was under nagios, though, so maybe there's something else at play. http://developer.mozilla.org/en/docs/Special:Nutch?language=en&start=0&hitsPerPage=10&query=prototype&fulltext=Search should return a bunch of hits, as an example.
Related to bug 444502?
Any thoughts? MDC search is still busted, it's been a couple of days.
Nutch is back up and running.
Thanks. Is it under nagios' supervision, or should we open a ticket on that?
reed can add that off this bug too...good call.
Seems to have been caused by a hung tomcat process and a busted crawl. I moved an older crawl into place and restarted the tomcat process. Please re-open if this continues to be busted.
This was reopened for nagios, I think. Aravind: can we get the log somewhere, so we can report it to the nutch guys and see if they're interested?
Nagios checks were already added.
I am not sure this is a nutch related bug. My guess is that this is more related to the crawl.. or related to tomcat serving pages out of that crawl. We have this happen to us once before and moving an older crawl into place fixed it. If this happens again, we can dig into it somemore and follow up depending on what seems to be busted.
URL still returns no hits.
Over to aravind, the resident nutch expert.
reed - can you make sure there are monitors to catch this next time automatically?
Here is the error from the nightly cron job that seems to be the culprit. CrawlDb update: done Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: /home/nutch/crawl_new/segments/20080715181708 Generator: filtering: false Generator: topN: 2147483647 Generator: jobtracker is 'local', generating exactly one partition. Generator: Partitioning selected urls by host, for politeness. Generator: done. Fetcher: starting Fetcher: segment: /home/nutch/crawl_new/segments/20080715181708 Exception in thread "main" java.io.IOException: Target /tmp/hadoop-nutch/mapred/local/localRunner/job_local_25.xml already exists at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:269) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:133) at org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:55) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1064) at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:86) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:281) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:590) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:805) at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:526) at org.apache.nutch.crawl.Crawl.main(Crawl.java:125) CCing fredric wenzel and wil, who are the developers for this thing.
I am clearing out that tmp directory and starting another scan. Lets see if this one finishes correctly.
(In reply to comment #15) > I am clearing out that tmp directory and starting another scan. Lets see if > this one finishes correctly. Ah, this is what I wanted to suggest. Maybe it crashed one time before it had a chance to clean up after itself and now it's confused how its "workspace" is not cleaned up (very robust, indeed). Let us know if it works. Thanks.
Looks like clearing that tmp directory did it. Seems to be working now.
Yup, works. Great. Reed, did you write your Nagios check?
(In reply to comment #18) > Reed, did you write your Nagios check? I did, indeed. :) https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?host=dyna-nutch.nslb.sj