Closed
Bug 878052
Opened 12 years ago
Closed 12 years ago
sp-admin01.phx1.mozilla.com:Socorro Admin - crash analysis is CRITICAL: CRITICAL: File 20130529-modulelist.txt
Categories
(Socorro :: Backend, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Usul, Assigned: rhelmer)
Details
Nagios is alerting since this morning about the file not being present.
ashish: long running cron job, killed it and started a new one by hand
[10:43am] ashish: let's see how this goes
[10:43am] ashish: it was running for 8h, a little unusual
ashish: no, running one by hand
[11:21am] ashish: can downtime it
ashish: Usul: fyi - 20130530-modulelist.txt is now generated by the hand run
[12:29pm] ashish: i'll work on generating 20130529 and step away for a coffee
nagios-pek1 joined the chat room.
[12:57pm] ashish: eh 0 byte file
[12:57pm] • ashish reruns
ashish: nope, i'm unable to get a good 20130529-modulelist.txt
[1:28pm] ashish: Usul: can you check with laura on #breakpad in about 3h if that is ok or whether it needs a bug filed?
lars is looking through the crontabber logs
[3:57pm] lars: Usul: file a bug about this issue - assign to rhelmer, he'll likely either solve it or reassign to a more appropriate person...
Assignee | ||
Comment 1•12 years ago
|
||
This is the "cron_modulelist.sh" cron job, I see this in my cron email:
Fatal exit code: 143
pig run failed
sp-admin01:/var/log/socorro/cron_modulelist.log says:
2013-05-31 04:00:57,932 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
getmerge: Target /mnt/crashanalysis/crash_analysis/modulelist/20130529-modulelist.txt already exists
Moved to trash: hdfs://hp-admin01.phx1.mozilla.com/user/socorro/modulelist-20130529-20130529
Maybe it got re-run for the wrong day?
Assignee | ||
Comment 2•12 years ago
|
||
Can you try re-running this job now? Looks like it tried to regenerate the 29th not the 30th before for some reason, see comment 1
This should do it I think:
export JAVA_HOME=/usr/lib/jvm/java-1.6.0; /data/socorro/application/scripts/crons/cron_modulelist.sh
Flags: needinfo?(ashish)
Assignee | ||
Comment 3•12 years ago
|
||
This is a pig (map reduce) job that runs against hadoop, may take a while, use screen :)
Reporter | ||
Comment 4•12 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #3)
> This is a pig (map reduce) job that runs against hadoop, may take a while,
> use screen :)
Ashish is on week-end right now.
Assignee | ||
Comment 5•12 years ago
|
||
(In reply to Ludovic Hirlimann [:Usul] from comment #4)
> (In reply to Robert Helmer [:rhelmer] from comment #3)
> > This is a pig (map reduce) job that runs against hadoop, may take a while,
> > use screen :)
>
> Ashish is on week-end right now.
Oh ok anyone should be able to do it, can you or should I start casting about ? :)
Flags: needinfo?(ashish)
Reporter | ||
Comment 6•12 years ago
|
||
I'm on the verge of week-end too :-)
Comment 7•12 years ago
|
||
I'm running this now.
Comment 8•12 years ago
|
||
[socorro@sp-admin01.phx1 ~]$ export JAVA_HOME=/usr/lib/jvm/java-1.6.0; /data/socorro/application/scripts/crons/cron_modulelist.sh
warn (lock): removing lock for cron_modulelist, 17941 not running
Fatal exit code: 255
hadoop getmerge failed
Assignee | ||
Comment 9•12 years ago
|
||
(In reply to Shyam Mani [:fox2mike] from comment #8)
> [socorro@sp-admin01.phx1 ~]$ export JAVA_HOME=/usr/lib/jvm/java-1.6.0;
> /data/socorro/application/scripts/crons/cron_modulelist.sh
> warn (lock): removing lock for cron_modulelist, 17941 not running
> Fatal exit code: 255
> hadoop getmerge failed
Looks like it already completed:
getmerge: Target /mnt/crashanalysis/crash_analysis/modulelist/20130530-modulelist.txt already exists
$ ls -latr /mnt/crashanalysis/crash_analysis/modulelist/20130530-modulelist.txt
-rwxrwxrwx 1 socorro socorro 4376709 May 31 03:20 /mnt/crashanalysis/crash_analysis/modulelist/20130530-modulelist.txt
So everything seems fine? Did nagios stop alerting?
Reporter | ||
Comment 10•12 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #9)
> So everything seems fine? Did nagios stop alerting?
Yes
Assignee | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•