Closed
Bug 444905
Opened 17 years ago
Closed 17 years ago
nutch not working/running on MDC
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: shaver, Assigned: aravind)
References
()
Details
Looks like it maybe didn't come back up after the maint window? I thought it was under nagios, though, so maybe there's something else at play.
http://developer.mozilla.org/en/docs/Special:Nutch?language=en&start=0&hitsPerPage=10&query=prototype&fulltext=Search should return a bunch of hits, as an example.
Comment 1•17 years ago
|
||
Related to bug 444502?
Comment 2•17 years ago
|
||
(In reply to comment #1)
> Related to bug 444502?
Nope, separate issues.
| Reporter | ||
Comment 3•17 years ago
|
||
Any thoughts? MDC search is still busted, it's been a couple of days.
Updated•17 years ago
|
Assignee: aravind → reed
Comment 4•17 years ago
|
||
Nutch is back up and running.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 5•17 years ago
|
||
Thanks. Is it under nagios' supervision, or should we open a ticket on that?
Comment 6•17 years ago
|
||
reed can add that off this bug too...good call.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
Comment 7•17 years ago
|
||
Seems to have been caused by a hung tomcat process and a busted crawl. I moved an older crawl into place and restarted the tomcat process. Please re-open if this continues to be busted.
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 8•17 years ago
|
||
This was reopened for nagios, I think.
Aravind: can we get the log somewhere, so we can report it to the nutch guys and see if they're interested?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 9•17 years ago
|
||
Nagios checks were already added.
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 10•17 years ago
|
||
I am not sure this is a nutch related bug. My guess is that this is more related to the crawl.. or related to tomcat serving pages out of that crawl. We have this happen to us once before and moving an older crawl into place fixed it.
If this happens again, we can dig into it somemore and follow up depending on what seems to be busted.
Comment 11•17 years ago
|
||
URL still returns no hits.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 12•17 years ago
|
||
Over to aravind, the resident nutch expert.
Assignee: reed → aravind
Status: REOPENED → NEW
Comment 13•17 years ago
|
||
reed - can you make sure there are monitors to catch this next time automatically?
| Assignee | ||
Comment 14•17 years ago
|
||
Here is the error from the nightly cron job that seems to be the culprit.
CrawlDb update: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: /home/nutch/crawl_new/segments/20080715181708
Generator: filtering: false
Generator: topN: 2147483647
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: /home/nutch/crawl_new/segments/20080715181708
Exception in thread "main" java.io.IOException: Target /tmp/hadoop-nutch/mapred/local/localRunner/job_local_25.xml already exists
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:269)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:133)
at org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:55)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1064)
at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:86)
at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:281)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:590)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:805)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:526)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:125)
CCing fredric wenzel and wil, who are the developers for this thing.
| Assignee | ||
Comment 15•17 years ago
|
||
I am clearing out that tmp directory and starting another scan. Lets see if this one finishes correctly.
Comment 16•17 years ago
|
||
(In reply to comment #15)
> I am clearing out that tmp directory and starting another scan. Lets see if
> this one finishes correctly.
Ah, this is what I wanted to suggest. Maybe it crashed one time before it had a chance to clean up after itself and now it's confused how its "workspace" is not cleaned up (very robust, indeed).
Let us know if it works. Thanks.
| Assignee | ||
Comment 17•17 years ago
|
||
Looks like clearing that tmp directory did it. Seems to be working now.
Status: NEW → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
Comment 18•17 years ago
|
||
Yup, works. Great.
Reed, did you write your Nagios check?
Status: RESOLVED → VERIFIED
Comment 19•17 years ago
|
||
(In reply to comment #18)
> Reed, did you write your Nagios check?
I did, indeed. :)
https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?host=dyna-nutch.nslb.sj
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•