Closed
Bug 819881
Opened 11 years ago
Closed 11 years ago
Socorro - No crash correlations since Dec 3 except on Dec 6 - needs manual cron re-run
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task, P2)
Infrastructure & Operations Graveyard
WebOps: Other
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: scoobidiver, Assigned: dmaher)
References
Details
(Whiteboard: [triaged 20121210][push interrupt])
Crash correlations have been no longer computed since December 3 except for December 6. See https://crash-analysis.mozilla.com/crash_analysis/
![]() |
||
Updated•11 years ago
|
Severity: normal → critical
Comment 1•11 years ago
|
||
This doesn't depend exactly on bug 817718, but that is the root cause of this problem.
Depends on: 817718
Comment 2•11 years ago
|
||
Need to manually re-run these. For the dates 3, 4, 5, 7, 8, 9 of December, we need to run the script in scripts/crons/cron_libraries.sh The catch is that it does not accept parameters, but begins by calculating the date: WEEK=`date -d 'last monday' '+%Y%m%d'` DATE=`date '+%Y%m%d'` You will need to manually change these to the appropriate Monday (3/12) and the appropriate date (from the range above) for each run. Sorry for the inconvenience.
Assignee: nobody → server-ops-webops
Component: General → Server Operations: Web Operations
Product: Socorro → mozilla.org
QA Contact: nmaul
Version: unspecified → other
Updated•11 years ago
|
Summary: No crash correlations since Dec 3 except on Dec 6 → Socorro - No crash correlations since Dec 3 except on Dec 6 - needs manual cron re-run
Updated•11 years ago
|
Assignee: server-ops-webops → eziegenhorn
Assignee | ||
Updated•11 years ago
|
Assignee: eziegenhorn → dmaher
Severity: critical → normal
Status: NEW → ASSIGNED
Priority: -- → P2
Whiteboard: [triaged 20121210][push interrupt]
Assignee | ||
Comment 3•11 years ago
|
||
I have so far been unsuccessful in actually getting this script to run. It enters the first loop (Firefox 17.0.1) and executes the psql command. This creates "/tmp/Firefox_17.0.1.log", which reveals this : DEBUG Ooid: "0700f620-d0c5-4fee-bd95-c8df62121209" DEBUG Ooid: "211d0aff-abe2-4da5-bb7a-ccc9e2121209" DEBUG MainThread - retry_wrapper: handled exception, timed out DEBUG MainThread - retry_wrapper: about to retry connection DEBUG make_connection, timeout = 5000 DEBUG connection successful DEBUG MainThread - retry_wrapper: handled exception, timed out That is where it stops, hung, forever.
Comment 4•11 years ago
|
||
The psql command you're referring to is piping its output to thrift (hbase) on port 10.8.81.209:9090. The "retry_wrapper" error is really coming from the hbaseclient library, not postgres.
Assignee | ||
Comment 5•11 years ago
|
||
Today, the first phase of the cron_libraries.sh script will run, but it hangs indefinitely on the second run. The key difference between yesterday and today is that the script does NOT hang on the psql command (which is good) - it hangs on this : $PYTHON /data/crash-data-tools/per-crash-core-count.py -p ${I} -r ${J} -f /tmp/${I}_${J}.tar > /tmp/${DATE}_${I}_${J}-core-counts.txt For Firefox version 18.0 . It's worth noting that for this version there are no OOIDs in the psql-generated log, so I suspect that per-crash-core-count.py can't handle having "nothing" to do. Still working on it w/ :laura.
Assignee | ||
Comment 6•11 years ago
|
||
Nevermind, the psql step is hanging forever after all. DEBUG Ooid: "c387c476-52a7-42f6-9daa-8b3752121210" DEBUG Ooid: "ec77f931-0b1d-445e-bcc7-ed4e22121210" DEBUG MainThread - retry_wrapper: handled exception, timed out DEBUG MainThread - retry_wrapper: about to retry connection DEBUG make_connection, timeout = 5000 DEBUG connection successful DEBUG MainThread - retry_wrapper: handled exception, timed out
Comment 7•11 years ago
|
||
Troubleshooting notes regarding the thrift side of things.. Started out by taking a look at logs on the thrift cluster. Discovered there were several OOMEs recorded on all the nodes. It looks like they might have gotten backlogged at some point and then never fully recovered (possibly due to a logjam). We bounced the thrift process on all the nodes and things quickly recovered and looked clean for normal crashmover/monitor/processors. We then kicked off the script in question here and it seems to be running fine now.
Comment 8•11 years ago
|
||
Daniel restarted thrift and the cron is now running apparently successfully. Let's see if that fully solves the problem. Check back in 24 hours.
Comment 9•11 years ago
|
||
cron_libraries.sh just finished and looks to have completed without error. the following is the stdout from the run: [root@sp-admin01.phx1 socorro]# bash cron_libraries_819881.sh | tee /tmp/cron_libraries-`date`.log [2012-12-11 19:08:39] Phase 1 start: Firefox [2012-12-11 19:08:40] Phase 1.1 start: 17.0.1 [2012-12-11 19:08:40] ++ psql; generating /tmp/Firefox_17.0.1.log [2012-12-11 19:43:03] per-crash-core-count.py; generating /tmp/20121211_Firefox_17.0.1-core-counts.txt [2012-12-11 19:49:32] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_17.0.1-interesting-modules.txt [2012-12-11 19:58:09] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_17.0.1-interesting-modules-with-versions.txt [2012-12-11 20:07:56] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_17.0.1-interesting-addons.txt [2012-12-11 20:13:12] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_17.0.1-interesting-addons-with-versions.txt [2012-12-11 20:18:08] Phase 1.1 end: 17.0.1 [2012-12-11 20:18:08] Phase 1.1 start: 18.0 [2012-12-11 20:18:08] ++ psql; generating /tmp/Firefox_18.0.log [2012-12-11 20:30:17] per-crash-core-count.py; generating /tmp/20121211_Firefox_18.0-core-counts.txt [2012-12-11 20:32:08] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_18.0-interesting-modules.txt [2012-12-11 20:34:41] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_18.0-interesting-modules-with-versions.txt [2012-12-11 20:37:32] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_18.0-interesting-addons.txt [2012-12-11 20:39:13] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_18.0-interesting-addons-with-versions.txt [2012-12-11 20:40:53] Phase 1.1 end: 18.0 [2012-12-11 20:40:53] Phase 1.1 start: 16.0.2 [2012-12-11 20:40:53] ++ psql; generating /tmp/Firefox_16.0.2.log [2012-12-11 20:49:05] per-crash-core-count.py; generating /tmp/20121211_Firefox_16.0.2-core-counts.txt [2012-12-11 20:50:18] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_16.0.2-interesting-modules.txt [2012-12-11 20:51:59] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_16.0.2-interesting-modules-with-versions.txt [2012-12-11 20:53:49] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_16.0.2-interesting-addons.txt [2012-12-11 20:54:53] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_16.0.2-interesting-addons-with-versions.txt [2012-12-11 20:55:56] Phase 1.1 end: 16.0.2 [2012-12-11 20:55:56] Phase 1 end: Firefox [2012-12-11 20:55:56] Phase 1 start: Thunderbird [2012-12-11 20:55:56] Phase 1.1 start: 17.0 [2012-12-11 20:55:56] ++ psql; generating /tmp/Thunderbird_17.0.log [2012-12-11 20:59:45] per-crash-core-count.py; generating /tmp/20121211_Thunderbird_17.0-core-counts.txt [2012-12-11 21:00:25] per-crash-interesting-modules.py; generating /tmp/20121211_Thunderbird_17.0-interesting-modules.txt [2012-12-11 21:01:21] per-crash-interesting-modules.py; generating /tmp/20121211_Thunderbird_17.0-interesting-modules-with-versions.txt [2012-12-11 21:02:20] per-crash-interesting-modules.py; generating /tmp/20121211_Thunderbird_17.0-interesting-addons.txt [2012-12-11 21:02:55] per-crash-interesting-modules.py; generating /tmp/20121211_Thunderbird_17.0-interesting-addons-with-versions.txt [2012-12-11 21:03:30] Phase 1.1 end: 17.0 [2012-12-11 21:03:30] Phase 1.1 start: 16.0.2 [2012-12-11 21:03:30] ++ psql; generating /tmp/Thunderbird_16.0.2.log [2012-12-11 21:06:29] per-crash-core-count.py; generating /tmp/20121211_Thunderbird_16.0.2-core-counts.txt [2012-12-11 21:07:03] per-crash-interesting-modules.py; generating /tmp/20121211_Thunderbird_16.0.2-interesting-modules.txt [2012-12-11 21:07:46] per-crash-interesting-modules.py; generating /tmp/20121211_Thunderbird_16.0.2-interesting-modules-with-versions.txt [2012-12-11 21:08:29] per-crash-interesting-modules.py; generating /tmp/20121211_Thunderbird_16.0.2-interesting-addons.txt [2012-12-11 21:08:59] per-crash-interesting-modules.py; generating /tmp/20121211_Thunderbird_16.0.2-interesting-addons-with-versions.txt [2012-12-11 21:09:29] Phase 1.1 end: 16.0.2 [2012-12-11 21:09:29] Phase 1.1 start: 15.0.1 [2012-12-11 21:09:29] ++ psql; generating /tmp/Thunderbird_15.0.1.log [2012-12-11 21:09:45] per-crash-core-count.py; generating /tmp/20121211_Thunderbird_15.0.1-core-counts.txt [2012-12-11 21:09:47] per-crash-interesting-modules.py; generating /tmp/20121211_Thunderbird_15.0.1-interesting-modules.txt [2012-12-11 21:09:50] per-crash-interesting-modules.py; generating /tmp/20121211_Thunderbird_15.0.1-interesting-modules-with-versions.txt [2012-12-11 21:09:53] per-crash-interesting-modules.py; generating /tmp/20121211_Thunderbird_15.0.1-interesting-addons.txt [2012-12-11 21:09:55] per-crash-interesting-modules.py; generating /tmp/20121211_Thunderbird_15.0.1-interesting-addons-with-versions.txt [2012-12-11 21:09:56] Phase 1.1 end: 15.0.1 [2012-12-11 21:09:56] Phase 1 end: Thunderbird [2012-12-11 21:09:56] Phase 1 start: SeaMonkey [2012-12-11 21:09:58] Phase 1.1 start: 2.14.1 [2012-12-11 21:09:58] ++ psql; generating /tmp/SeaMonkey_2.14.1.log [2012-12-11 21:10:18] per-crash-core-count.py; generating /tmp/20121211_SeaMonkey_2.14.1-core-counts.txt [2012-12-11 21:10:20] per-crash-interesting-modules.py; generating /tmp/20121211_SeaMonkey_2.14.1-interesting-modules.txt [2012-12-11 21:10:24] per-crash-interesting-modules.py; generating /tmp/20121211_SeaMonkey_2.14.1-interesting-modules-with-versions.txt [2012-12-11 21:10:28] per-crash-interesting-modules.py; generating /tmp/20121211_SeaMonkey_2.14.1-interesting-addons.txt [2012-12-11 21:10:31] per-crash-interesting-modules.py; generating /tmp/20121211_SeaMonkey_2.14.1-interesting-addons-with-versions.txt [2012-12-11 21:10:33] Phase 1.1 end: 2.14.1 [2012-12-11 21:10:33] Phase 1.1 start: 2.13.2 [2012-12-11 21:10:33] ++ psql; generating /tmp/SeaMonkey_2.13.2.log [2012-12-11 21:10:37] per-crash-core-count.py; generating /tmp/20121211_SeaMonkey_2.13.2-core-counts.txt [2012-12-11 21:10:38] per-crash-interesting-modules.py; generating /tmp/20121211_SeaMonkey_2.13.2-interesting-modules.txt [2012-12-11 21:10:39] per-crash-interesting-modules.py; generating /tmp/20121211_SeaMonkey_2.13.2-interesting-modules-with-versions.txt [2012-12-11 21:10:40] per-crash-interesting-modules.py; generating /tmp/20121211_SeaMonkey_2.13.2-interesting-addons.txt [2012-12-11 21:10:41] per-crash-interesting-modules.py; generating /tmp/20121211_SeaMonkey_2.13.2-interesting-addons-with-versions.txt [2012-12-11 21:10:42] Phase 1.1 end: 2.13.2 [2012-12-11 21:10:42] Phase 1.1 start: 2.0.14 [2012-12-11 21:10:42] ++ psql; generating /tmp/SeaMonkey_2.0.14.log [2012-12-11 21:10:43] per-crash-core-count.py; generating /tmp/20121211_SeaMonkey_2.0.14-core-counts.txt [2012-12-11 21:10:43] per-crash-interesting-modules.py; generating /tmp/20121211_SeaMonkey_2.0.14-interesting-modules.txt [2012-12-11 21:10:44] per-crash-interesting-modules.py; generating /tmp/20121211_SeaMonkey_2.0.14-interesting-modules-with-versions.txt [2012-12-11 21:10:45] per-crash-interesting-modules.py; generating /tmp/20121211_SeaMonkey_2.0.14-interesting-addons.txt [2012-12-11 21:10:46] per-crash-interesting-modules.py; generating /tmp/20121211_SeaMonkey_2.0.14-interesting-addons-with-versions.txt [2012-12-11 21:10:47] Phase 1.1 end: 2.0.14 [2012-12-11 21:10:47] Phase 1 end: SeaMonkey [2012-12-11 21:10:47] Phase 1 start: Camino [2012-12-11 21:10:47] Phase 1.1 start: 2.1.2 [2012-12-11 21:10:47] ++ psql; generating /tmp/Camino_2.1.2.log [2012-12-11 21:10:50] per-crash-core-count.py; generating /tmp/20121211_Camino_2.1.2-core-counts.txt [2012-12-11 21:10:50] per-crash-interesting-modules.py; generating /tmp/20121211_Camino_2.1.2-interesting-modules.txt [2012-12-11 21:10:51] per-crash-interesting-modules.py; generating /tmp/20121211_Camino_2.1.2-interesting-modules-with-versions.txt [2012-12-11 21:10:52] per-crash-interesting-modules.py; generating /tmp/20121211_Camino_2.1.2-interesting-addons.txt [2012-12-11 21:10:53] per-crash-interesting-modules.py; generating /tmp/20121211_Camino_2.1.2-interesting-addons-with-versions.txt [2012-12-11 21:10:54] Phase 1.1 end: 2.1.2 [2012-12-11 21:10:54] Phase 1.1 start: 2.0.4 [2012-12-11 21:10:54] ++ psql; generating /tmp/Camino_2.0.4.log [2012-12-11 21:10:54] per-crash-core-count.py; generating /tmp/20121211_Camino_2.0.4-core-counts.txt [2012-12-11 21:10:54] per-crash-interesting-modules.py; generating /tmp/20121211_Camino_2.0.4-interesting-modules.txt [2012-12-11 21:10:55] per-crash-interesting-modules.py; generating /tmp/20121211_Camino_2.0.4-interesting-modules-with-versions.txt [2012-12-11 21:10:55] per-crash-interesting-modules.py; generating /tmp/20121211_Camino_2.0.4-interesting-addons.txt [2012-12-11 21:10:56] per-crash-interesting-modules.py; generating /tmp/20121211_Camino_2.0.4-interesting-addons-with-versions.txt [2012-12-11 21:10:56] Phase 1.1 end: 2.0.4 [2012-12-11 21:10:56] Phase 1.1 start: 2.0.7 [2012-12-11 21:10:56] ++ psql; generating /tmp/Camino_2.0.7.log [2012-12-11 21:10:57] per-crash-core-count.py; generating /tmp/20121211_Camino_2.0.7-core-counts.txt [2012-12-11 21:10:57] per-crash-interesting-modules.py; generating /tmp/20121211_Camino_2.0.7-interesting-modules.txt [2012-12-11 21:10:57] per-crash-interesting-modules.py; generating /tmp/20121211_Camino_2.0.7-interesting-modules-with-versions.txt [2012-12-11 21:10:58] per-crash-interesting-modules.py; generating /tmp/20121211_Camino_2.0.7-interesting-addons.txt [2012-12-11 21:10:58] per-crash-interesting-modules.py; generating /tmp/20121211_Camino_2.0.7-interesting-addons-with-versions.txt [2012-12-11 21:10:59] Phase 1.1 end: 2.0.7 [2012-12-11 21:10:59] Phase 1 end: Camino [2012-12-11 21:10:59] Phase 2 start: Firefox [2012-12-11 21:10:59] Phase 2.1 start: 18.0 [2012-12-11 21:10:59] ++ psql; generating /tmp/Firefox_18.0.log [2012-12-11 21:19:25] per-crash-core-count.py; generating /tmp/20121211_Firefox_18.0-core-counts.txt [2012-12-11 21:21:19] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_18.0-interesting-modules.txt [2012-12-11 21:23:53] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_18.0-interesting-modules-with-versions.txt [2012-12-11 21:26:46] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_18.0-interesting-addons.txt [2012-12-11 21:28:26] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_18.0-interesting-addons-with-versions.txt [2012-12-11 21:30:05] Phase 2.1 end: 18.0 [2012-12-11 21:30:05] Phase 2.1 start: 19.0a2 [2012-12-11 21:30:05] ++ psql; generating /tmp/Firefox_19.0a2.log [2012-12-11 21:30:50] per-crash-core-count.py; generating /tmp/20121211_Firefox_19.0a2-core-counts.txt [2012-12-11 21:30:57] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_19.0a2-interesting-modules.txt [2012-12-11 21:31:06] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_19.0a2-interesting-modules-with-versions.txt [2012-12-11 21:31:16] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_19.0a2-interesting-addons.txt [2012-12-11 21:31:22] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_19.0a2-interesting-addons-with-versions.txt [2012-12-11 21:31:28] Phase 2.1 end: 19.0a2 [2012-12-11 21:31:28] Phase 2.1 start: 20.0a1 [2012-12-11 21:31:28] ++ psql; generating /tmp/Firefox_20.0a1.log [2012-12-11 21:32:28] per-crash-core-count.py; generating /tmp/20121211_Firefox_20.0a1-core-counts.txt [2012-12-11 21:32:36] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_20.0a1-interesting-modules.txt [2012-12-11 21:32:47] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_20.0a1-interesting-modules-with-versions.txt [2012-12-11 21:32:58] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_20.0a1-interesting-addons.txt [2012-12-11 21:33:05] per-crash-interesting-modules.py; generating /tmp/20121211_Firefox_20.0a1-interesting-addons-with-versions.txt [2012-12-11 21:33:12] Phase 2.1 end: 20.0a1 [2012-12-11 21:33:12] Phase 2 end: Firefox [2012-12-11 21:33:12] find /tmp -name 20121211\* -type f -size +500k | xargs gzip -9 [2012-12-11 21:33:58] mkdir /mnt/crashanalysis/crash_analysis/20121211 [2012-12-11 21:33:58] cp /tmp/20121211* /mnt/crashanalysis/crash_analysis/20121211/ [2012-12-11 21:33:59] rm -f /tmp/20121211*
Reporter | ||
Comment 10•11 years ago
|
||
Each 20121211_Firefox_<n>* file (n=16.0.2, 17.0.1, 18) in https://crash-analysis.mozilla.com/crash_analysis/20121211/ send a 403 Forbidden error.
Comment 11•11 years ago
|
||
Failed again today, so we'll need to re-run cron_libraries.sh (and fix the permissions if needed). cron_submitter is also failing. I also have cronmail from cron_daily_matviews and cron_bugzilla and cron_daily_adus, so we should see if those have recovered and if not re-run them by hand. Another thrift restart might be in order. (deinspanjer?)
![]() |
||
Comment 12•11 years ago
|
||
https://crash-analysis.mozilla.com/crash_analysis/ is missing the newest rounds of CSV (for yesterday) and correlations (for today) - there are multiple directories missing completely there, btw.
Comment 13•11 years ago
|
||
Laura, was the failure still Thrift timeouts? tmary, could you check whether we can determine any possible underlying cause in the Thrift layer? We might need to step up the plans to try to get some better debugging into the Python Thrift client layer to sort this out.
Comment 14•11 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #12) > https://crash-analysis.mozilla.com/crash_analysis/ is missing the newest > rounds of CSV (for yesterday) and correlations (for today) - there are > multiple directories missing completely there, btw. Is there a separate bug for the daily CSV dump? The status on both correlations and CSV is: * working on backfilling now * watching for any network or thrift problems (still trying to isolate the problem) * adding debugging and backfill support to these jobs Right now these need to be done by hand and it's somewhat laborious.
Comment 15•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #14) > (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #12) > > https://crash-analysis.mozilla.com/crash_analysis/ is missing the newest > > rounds of CSV (for yesterday) and correlations (for today) - there are > > multiple directories missing completely there, btw. > > Is there a separate bug for the daily CSV dump? > > The status on both correlations and CSV is: CSVs have been backfilled and pushed out for missing days. Correlations should be done tomorrow. Hopefully everything will run as expected overnight for today's report.
![]() |
||
Comment 16•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #14) > Is there a separate bug for the daily CSV dump? I was was told to just mention the CSVs here. We can have a separate bug if you like, though.
Comment 17•11 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #16) > (In reply to Robert Helmer [:rhelmer] from comment #14) > > Is there a separate bug for the daily CSV dump? > > I was was told to just mention the CSVs here. We can have a separate bug if > you like, though. Nope fine w/ me, I will use this bug to report status :)
Reporter | ||
Comment 18•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #15) > CSVs have been backfilled and pushed out for missing days. There are still missing on December 3 and 4.
Comment 19•11 years ago
|
||
This morning's sitrep: * https://crash-analysis.mozilla.com/crash_analysis/ is missing reports for 1213 Errors from cronmail overnight: * error (lock): lock already exists for cron_update_adus pid 23074 * Cron job startReportsClean exited non-zero: 1 * /bin/sh: /data/bin/cron_daily_reports.sh: No such file or directory * Cron job startDailyMatviews exited non-zero: 9 phrawtzy: can we see which of these jobs later ran successfully, and re-run anything that didn't by hand? I already know we need to do the following: * run cron_libraries for 12/3, 12/4 and 12/13. (Wonder why this one didn't send cron mail)
Comment 20•11 years ago
|
||
(In reply to Laura Thomson :laura from comment #19) > This morning's sitrep: > * https://crash-analysis.mozilla.com/crash_analysis/ is missing reports for > 1213 > > Errors from cronmail overnight: > * error (lock): lock already exists for cron_update_adus pid 23074 > * Cron job startReportsClean exited non-zero: 1 > * /bin/sh: /data/bin/cron_daily_reports.sh: No such file or directory I don't see this for production (but I do for stage). Running this by hand I don't get any errors, but no output either. Debugging it now.
Comment 21•11 years ago
|
||
I figured out the problem(s) with cron_reports.sh (which generates the daily CSV report), and why they have anything to do with the totally separate cron_libraries.sh (correlation reports): 1) cron_reports.sh does not create it's own output directory on crash_analysis, if cron_libraries fails then it can't write 2) cron_reports.sh logs in such a way that we don't see problems in the main log, cron mail, or even running w/ bash -x Going to fix both of these and get it checked into the IT repo. We'll pull this into the main Socorro repo as a separate bug.
Comment 22•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #21) > I figured out the problem(s) with cron_reports.sh (which generates the daily > CSV report), and why they have anything to do with the totally separate > cron_libraries.sh (correlation reports): > > 1) cron_reports.sh does not create it's own output directory on > crash_analysis, if cron_libraries fails then it can't write > 2) cron_reports.sh logs in such a way that we don't see problems in the main > log, cron mail, or even running w/ bash -x > > Going to fix both of these and get it checked into the IT repo. We'll pull > this into the main Socorro repo as a separate bug. I've also added a trivial backfill feature to this script, testing it now for rerunning 12/03 and 12/04.
Comment 23•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #22) > (In reply to Robert Helmer [:rhelmer] from comment #21) > > I figured out the problem(s) with cron_reports.sh (which generates the daily > > CSV report), and why they have anything to do with the totally separate > > cron_libraries.sh (correlation reports): > > > > 1) cron_reports.sh does not create it's own output directory on > > crash_analysis, if cron_libraries fails then it can't write > > 2) cron_reports.sh logs in such a way that we don't see problems in the main > > log, cron mail, or even running w/ bash -x > > > > Going to fix both of these and get it checked into the IT repo. We'll pull > > this into the main Socorro repo as a separate bug. > > I've also added a trivial backfill feature to this script, testing it now > for rerunning 12/03 and 12/04. OK daily CSV done for 12/03, 12/04 and 12/13.
Comment 24•11 years ago
|
||
I don't know how many of these are related to bug 822106 but here's what I see this morning: * cron_status is giving errors but appears to be running (looking at https://crash-stats.mozilla.com/status). Might be taking longer than 5 minutes to run, but that would seem odd. Error is: error (lock): lock already exists for cron_status pid 31847 * cron_bugzilla is reporting lock errors, last time at 11.11am ET, message: error (lock): lock already exists for cron_bugzilla pid 28682 * cron_ftp_scraper is reporting lock errors, last time at 11.05am ET, message: error (lock): lock already exists for cron_ftpscraper pid 25538 * cron_duplicates at 11.02 ET: error (lock): lock already exists for cron_duplicates pid 23685 * cron_reportsclean at 10.45 ET: error (lock): lock already exists for cron_reportsclean pid 13258 * cron_daily reports at around 6.55 ET: /bin/sh: /data/bin/cron_daily_reports.sh: No such file or directory and on /data/bin/cron_daily_reports.sh >> /var/log/socorro/cron_daily_reports.log /bin/sh: /data/bin/cron_daily_reports.sh: Permission denied * cron_daily_matviews at 5.00 ET error (lock): lock already exists for cron_daily_matviews pid 13259 On the bright side, cron_libraries.sh appears to have run, because the output is up on crash-analysis. It did report this error in logging though, at 3.50am: Cron <socorro@sp-admin01> /data/socorro/application/scripts/crons/cron_libraries.sh >> /var/log/socorro/cron_libraries.log find: `/tmp/ssh-IvjMpz9481': Permission denied find: `/tmp/socorro-install-29324-pdH': Permission denied find: `/tmp/ssh-MBSAu10078': Permission denied find: `/tmp/ssh-eQOLEe6397': Permission denied find: `/tmp/ssh-IbkxaQ4548': Permission denied find: `/tmp/socorro-install-30477-r3S': Permission denied find: `/tmp/atop.d': Permission denied find: `/tmp/hsperfdata_infrasec': Permission denied find: `/tmp/socorro-install-30431-38m': Permission denied find: `/tmp/socorro-install-12739-bRN': Permission denied find: `/tmp/ssh-VPCZt11503': Permission denied find: `/tmp/socorro-install-13837-vuJ': Permission denied find: `/tmp/ssh-VEGSEm9925': Permission denied
![]() |
||
Comment 25•11 years ago
|
||
(In reply to Laura Thomson :laura from comment #24) > On the bright side, cron_libraries.sh appears to have run, because the > output is up on crash-analysis. Of course, that output is also missing entries because of bug 822102 and probably should be recreated once (re)processing has caught up.
Comment 26•11 years ago
|
||
(In reply to Laura Thomson :laura from comment #24) > I don't know how many of these are related to bug 822106 but here's what I > see this morning: I'll take a look at the various locking complaints. Things could be running more slowly than usual for some reason. > /bin/sh: /data/bin/cron_daily_reports.sh: No such file or directory > and on /data/bin/cron_daily_reports.sh >> > /var/log/socorro/cron_daily_reports.log > /bin/sh: /data/bin/cron_daily_reports.sh: Permission denied /data/bin/cron_daily_reports.sh is not executable and should be. > On the bright side, cron_libraries.sh appears to have run, because the > output is up on crash-analysis. It did report this error in logging though, > at 3.50am: > Cron <socorro@sp-admin01> > /data/socorro/application/scripts/crons/cron_libraries.sh >> > /var/log/socorro/cron_libraries.log > find: `/tmp/ssh-IvjMpz9481': Permission denied > find: `/tmp/socorro-install-29324-pdH': Permission denied > find: `/tmp/ssh-MBSAu10078': Permission denied > find: `/tmp/ssh-eQOLEe6397': Permission denied > find: `/tmp/ssh-IbkxaQ4548': Permission denied > find: `/tmp/socorro-install-30477-r3S': Permission denied > find: `/tmp/atop.d': Permission denied > find: `/tmp/hsperfdata_infrasec': Permission denied > find: `/tmp/socorro-install-30431-38m': Permission denied > find: `/tmp/socorro-install-12739-bRN': Permission denied > find: `/tmp/ssh-VPCZt11503': Permission denied > find: `/tmp/socorro-install-13837-vuJ': Permission denied > find: `/tmp/ssh-VEGSEm9925': Permission denied This is nothing new unfortunately, it does not do a very good job of cleaning up after itself.
Comment 27•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #26) > > /bin/sh: /data/bin/cron_daily_reports.sh: No such file or directory > > and on /data/bin/cron_daily_reports.sh >> > > /var/log/socorro/cron_daily_reports.log > > /bin/sh: /data/bin/cron_daily_reports.sh: Permission denied > > > /data/bin/cron_daily_reports.sh is not executable and should be. Completed (on sp-admin01.phx1).
Comment 28•11 years ago
|
||
Hi, I haven't wanted to pollute this bug but I haven't gotten any daily crash dumps (%Y%m%d-crashdata.csv.gz) since 2012-12-13 on either fs1.corpdmz.scl3.mozilla.com in /data/security_group/crash_urls/ or on sisyphus.bughunter.ateam.phx1.mozilla.com in /work/mozilla/crash-reports/. I assume this is due to bug 817718. rhelmer: These appear to be generated in scripts/crons/cron_daily_reports.sh using scripts/startDailyUrl.py. Would it be possible to backfill the daily dumps before everyone bails for Christmas?
Comment 29•11 years ago
|
||
(In reply to Jake Maul [:jakem] from comment #27) > (In reply to Robert Helmer [:rhelmer] from comment #26) > > > /bin/sh: /data/bin/cron_daily_reports.sh: No such file or directory > > > and on /data/bin/cron_daily_reports.sh >> > > > /var/log/socorro/cron_daily_reports.log > > > /bin/sh: /data/bin/cron_daily_reports.sh: Permission denied > > > > > > /data/bin/cron_daily_reports.sh is not executable and should be. > > Completed (on sp-admin01.phx1). Looks like this is still not executable :(
Comment 30•11 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #28) > Hi, I haven't wanted to pollute this bug but I haven't gotten any daily > crash dumps (%Y%m%d-crashdata.csv.gz) since 2012-12-13 on either > fs1.corpdmz.scl3.mozilla.com in /data/security_group/crash_urls/ or on > sisyphus.bughunter.ateam.phx1.mozilla.com in /work/mozilla/crash-reports/. I > assume this is due to bug 817718. > > rhelmer: These appear to be generated in scripts/crons/cron_daily_reports.sh > using scripts/startDailyUrl.py. Would it be possible to backfill the daily > dumps before everyone bails for Christmas? Sure thing, will start generating these now.
Comment 31•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #30) > (In reply to Bob Clary [:bc:] from comment #28) > > Hi, I haven't wanted to pollute this bug but I haven't gotten any daily > > crash dumps (%Y%m%d-crashdata.csv.gz) since 2012-12-13 on either > > fs1.corpdmz.scl3.mozilla.com in /data/security_group/crash_urls/ or on > > sisyphus.bughunter.ateam.phx1.mozilla.com in /work/mozilla/crash-reports/. I > > assume this is due to bug 817718. > > > > rhelmer: These appear to be generated in scripts/crons/cron_daily_reports.sh > > using scripts/startDailyUrl.py. Would it be possible to backfill the daily > > dumps before everyone bails for Christmas? > > Sure thing, will start generating these now. Done, how does it look now?
Comment 32•11 years ago
|
||
All the missing CSV files seem to be back. Thanks!
Comment 33•11 years ago
|
||
yes, thanks!
Comment 34•11 years ago
|
||
dcd(In reply to Robert Helmer [:rhelmer] from comment #29) > (In reply to Jake Maul [:jakem] from comment #27) > > (In reply to Robert Helmer [:rhelmer] from comment #26) > > > > /bin/sh: /data/bin/cron_daily_reports.sh: No such file or directory > > > > and on /data/bin/cron_daily_reports.sh >> > > > > /var/log/socorro/cron_daily_reports.log > > > > /bin/sh: /data/bin/cron_daily_reports.sh: Permission denied > > > > > > > > > /data/bin/cron_daily_reports.sh is not executable and should be. > > > > Completed (on sp-admin01.phx1). > > Looks like this is still not executable :( Fixed in Puppet notice: /File[/data/bin/cron_daily_reports.sh]/mode: mode changed '0644' to '0755' [root@sp-admin01.phx1 bin]# ls -lah /data/bin/cron_daily_reports.sh -rwxr-xr-x 1 root root 1.2K Dec 14 11:16 /data/bin/cron_daily_reports.sh The next run should work properly, ping me on IRC if it does not
Comment 35•11 years ago
|
||
OK thanks everybody! This *should* work without hand-holding again, please reopen if not.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 36•11 years ago
|
||
The csv and module files are missing for December 19. In addition, there are three unexpected files in the parent directory: https://crash-analysis.mozilla.com/crash_analysis/
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 37•11 years ago
|
||
(In reply to Scoobidiver from comment #36) > The csv and module files are missing for December 19. Hmm I see them here, am I missing something? :) https://crash-analysis.mozilla.com/crash_analysis/20121219/ > In addition, there are three unexpected files in the parent directory: > https://crash-analysis.mozilla.com/crash_analysis/ Judging by the dates on these files this was probably due to debugging, I just cleaned this up.
Reporter | ||
Comment 38•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #37) > (In reply to Scoobidiver from comment #36) > > The csv and module files are missing for December 19. > Hmm I see them here, am I missing something? :) > https://crash-analysis.mozilla.com/crash_analysis/20121219/ It's built 26 hours after other files but it seems usual. Anyway, two module files are missing: https://crash-analysis.mozilla.com/crash_analysis/modulelist/
Comment 39•11 years ago
|
||
(In reply to Scoobidiver from comment #38) > (In reply to Robert Helmer [:rhelmer] from comment #37) > > (In reply to Scoobidiver from comment #36) > > > The csv and module files are missing for December 19. > > Hmm I see them here, am I missing something? :) > > https://crash-analysis.mozilla.com/crash_analysis/20121219/ > It's built 26 hours after other files but it seems usual. > > Anyway, two module files are missing: > https://crash-analysis.mozilla.com/crash_analysis/modulelist/ Hmm I see this as the latest which I'd expect: 20121218-modulelist.txt 19-Dec-2012 17:35 This job doesn't run until 17:00 (5 PM) Pacific, so I'd expect to see yesterday's data get uploaded in about 7 hours. Not sure why it runs so late, we could look into it and possibly change that.
Comment 40•11 years ago
|
||
It worked for me. ssh fs1.corpdmz.scl3.mozilla.com 'ls /data/security_group/crash_urls/' shows 20121219-crashdata.csv.gz and sisyphus.bughunter.ateam.phx1.mozilla.com also got 20121219-crashdata.csv.gz.
Reporter | ||
Comment 41•11 years ago
|
||
The issues reported in comment 36 are unrelated to the original issue.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•