Closed Bug 911585 Opened 9 years ago Closed 9 years ago

Missing crash dumps since 2013-08-28

Categories

(Socorro :: Backend, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: bc, Assigned: rhelmer)

References

Details

The crash dumps stopped being produced on fs1 on 2013-08-27.
See bug 794176 for some background information when this has occurred in the past.

ssh fs1.corpdmz.scl3.mozilla.com 'ls /data/security_group/crash_urls/'

20130816-crashdata.csv.gz
20130817-crashdata.csv.gz
20130818-crashdata.csv.gz
20130819-crashdata.csv.gz
20130820-crashdata.csv.gz
20130821-crashdata.csv.gz
20130822-crashdata.csv.gz
20130823-crashdata.csv.gz
20130824-crashdata.csv.gz
20130825-crashdata.csv.gz
20130826-crashdata.csv.gz
20130827-crashdata.csv.gz
Still not getting new crash dumps. Was the related to the outage of crash-stats on 2013-08-28 due to a configuration change?
Assignee: server-ops-webops → nobody
Group: mozilla-corporation-confidential
Component: Server Operations: Web Operations → Backend
Product: mozilla.org → Socorro
QA Contact: nmaul
Assignee: nobody → rhelmer
Ah ok so we took care of this in the socorro repo, but this job is actually running out of this location (not from the Socorro install):

/data/bin/cron_daily_reports.sh

Solarce can you help us change this please? All you should need to do is change "python" to "$PYTHON" inside that script, the include file points to the right place (which is now a virtualenv)
Sorry mid-air ate my needinfo? request, solarce please see below:

(In reply to Robert Helmer [:rhelmer] from comment #2)
> Ah ok so we took care of this in the socorro repo, but this job is actually
> running out of this location (not from the Socorro install):
> 
> /data/bin/cron_daily_reports.sh
> 
> Solarce can you help us change this please? All you should need to do is
> change "python" to "$PYTHON" inside that script, the include file points to
> the right place (which is now a virtualenv)
Status: NEW → ASSIGNED
Flags: needinfo?(bburton)
(In reply to Robert Helmer [:rhelmer] from comment #3)
> Sorry mid-air ate my needinfo? request, solarce please see below:
> 
> (In reply to Robert Helmer [:rhelmer] from comment #2)
> > Ah ok so we took care of this in the socorro repo, but this job is actually
> > running out of this location (not from the Socorro install):
> > 
> > /data/bin/cron_daily_reports.sh
> > 
> > Solarce can you help us change this please? All you should need to do is
> > change "python" to "$PYTHON" inside that script, the include file points to
> > the right place (which is now a virtualenv)

Do you think we should switch the cron job to run the script from the open source repo? https://github.com/mozilla/socorro/blob/master/scripts/crons/cron_daily_reports.sh
Flags: needinfo?(bburton)
(In reply to Brandon Burton [:solarce] from comment #4)
> (In reply to Robert Helmer [:rhelmer] from comment #3)
> > Sorry mid-air ate my needinfo? request, solarce please see below:
> > 
> > (In reply to Robert Helmer [:rhelmer] from comment #2)
> > > Ah ok so we took care of this in the socorro repo, but this job is actually
> > > running out of this location (not from the Socorro install):
> > > 
> > > /data/bin/cron_daily_reports.sh
> > > 
> > > Solarce can you help us change this please? All you should need to do is
> > > change "python" to "$PYTHON" inside that script, the include file points to
> > > the right place (which is now a virtualenv)
> 
> Do you think we should switch the cron job to run the script from the open
> source repo?
> https://github.com/mozilla/socorro/blob/master/scripts/crons/
> cron_daily_reports.sh

So this has actually been replaced *twice* and we apparently still have not switched over ;) I am not sure that the one above has been tested, not anytime lately at least. We'd need to configure all the locations to push to etc. too which I am sure is not done.

The latest is a crontabber job, I'd rather switch to that one really, and not waste the config/testing effort on the above one. I'll file a bug for this.

For now though I think we should fix the old one in-place (if you don't mind and agree), and I can handle backfilling the missing days.
(In reply to Robert Helmer [:rhelmer] from comment #5)
> (In reply to Brandon Burton [:solarce] from comment #4)
> > (In reply to Robert Helmer [:rhelmer] from comment #3)
> > > Sorry mid-air ate my needinfo? request, solarce please see below:
> > > 
> > > (In reply to Robert Helmer [:rhelmer] from comment #2)
> > > > Ah ok so we took care of this in the socorro repo, but this job is actually
> > > > running out of this location (not from the Socorro install):
> > > > 
> > > > /data/bin/cron_daily_reports.sh
> > > > 
> > > > Solarce can you help us change this please? All you should need to do is
> > > > change "python" to "$PYTHON" inside that script, the include file points to
> > > > the right place (which is now a virtualenv)
> > 
> > Do you think we should switch the cron job to run the script from the open
> > source repo?
> > https://github.com/mozilla/socorro/blob/master/scripts/crons/
> > cron_daily_reports.sh
> 
> So this has actually been replaced *twice* and we apparently still have not
> switched over ;) I am not sure that the one above has been tested, not
> anytime lately at least. We'd need to configure all the locations to push to
> etc. too which I am sure is not done.
> 
> The latest is a crontabber job, I'd rather switch to that one really, and
> not waste the config/testing effort on the above one. I'll file a bug for
> this.
> 
> For now though I think we should fix the old one in-place (if you don't mind
> and agree), and I can handle backfilling the missing days.

Sounds good, I'll fix in Puppet's svn tree and have pushed live shortly
Change committed and pushed

-> % svn diff
svn ci -Index: modules/socorro/files/prod/data-bin/cron_daily_reports.sh
===================================================================
--- modules/socorro/files/prod/data-bin/cron_daily_reports.sh	(revision 74408)
+++ modules/socorro/files/prod/data-bin/cron_daily_reports.sh	(working copy)
@@ -14,7 +14,7 @@
 fi

 SCRIPT_RUN_DATE=`date -d "$REPORT_DATE" '+%Y-%m-%d'`
-python /data/socorro/application/scripts/startDailyUrl.py --day=$SCRIPT_RUN_DATE
+$PYTHON /data/socorro/application/scripts/startDailyUrl.py --day=$SCRIPT_RUN_DATE

 DATA_FILE=`date -d "$REPORT_DATE" '+%Y%m%d-crashdata.csv.gz'`
 scp ${HOME}/${DATA_FILE} bacula@10.22.72.131:/data/security_group/crash_urls/
bburton@althalus [01:57:45] [~/code/mozilla/sysadmins/puppet/trunk]
-> % svn ci -m "updating python binary path to use virtualenv, bug 911585"
Sending        trunk/modules/socorro/files/prod/data-bin/cron_daily_reports.sh
Transmitting file data .
Committed revision 74409.

Info: Applying configuration version '74410'
Info: FileBucket adding {md5}97c878b7df5c3e29beab0a82ddfd9ab7
Info: /File[/data/bin/cron_daily_reports.sh]: Filebucketed /data/bin/cron_daily_reports.sh to main with sum 97c878b7df5c3e29beab0a82ddfd9ab7
Notice: /File[/data/bin/cron_daily_reports.sh]/content: content changed '{md5}97c878b7df5c3e29beab0a82ddfd9ab7' to '{md5}66c2fe8f6889ba13d9414cc83c6ab4b1'
Notice: Finished catalog run in 198.27 seconds

Script ran successfully, sensitive info REDACTED

[socorro@sp-admin01.phx1 ~]$ bash -x /data/bin/cron_daily_reports.sh
+ [[ -f /tmp/daily_urls.lock ]]
+ touch /tmp/daily_urls.lock
+ . /etc/socorro/socorrorc
++ SOCORRO_DIR=/data/socorro
++ APPDIR=/data/socorro/application
++ PATH=/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/socorro/bin:/home/socorro/bin:/home/socorro/python_extras/bin
++ PYTHONPATH=/home/socorro/python_extras/lib:/data/socorro/application
++ PYTHON=/data/socorro/socorro-virtualenv/bin/python
++ SOCORRO_CONFIG=/etc/socorro/common.conf
++ export APPDIR PATH PYTHONPATH SOCORRO_CONFIG
++ '[' -f /etc/socorro/common.conf ']'
++ . /etc/socorro/common.conf
+++ export databaseHost=REDACTED
+++ databaseHost=REDACTED
+++ export databaseName=REDACTED
+++ databaseName=REDACTED
+++ export databaseUserName=REDACTED
+++ databaseUserName=REDACTED
+++ export databasePassword=REDACTED
+++ databasePassword=REDACTED
+++ export hbaseHost=REDACTED
+++ hbaseHost=REDACTED
+++ export hbaseTimeout=5000
+++ hbaseTimeout=5000
+++ export stderrErrorLoggingLevel=10
+++ stderrErrorLoggingLevel=10
+++ export syslogErrorLoggingLevel=10
+++ syslogErrorLoggingLevel=10
+++ export persistentDataPathname=/home/socorro/persistent/bugzilla.pickle
+++ persistentDataPathname=/home/socorro/persistent/bugzilla.pickle
+++ export localFS=/home/socorro/primaryCrashStore
+++ localFS=/home/socorro/primaryCrashStore
+++ export fallbackFS=/home/socorro/primaryCrashStore
+++ fallbackFS=/home/socorro/primaryCrashStore
+++ export hbaseFallbackFS=/home/socorro/primaryCrashStore
+++ hbaseFallbackFS=/home/socorro/primaryCrashStore
+++ export temporaryFileSystemStoragePath=/home/socorro/temp
+++ temporaryFileSystemStoragePath=/home/socorro/temp
+++ export minidump_stackwalkPathname=/data/socorro/stackwalk/bin/stackwalk.sh
+++ minidump_stackwalkPathname=/data/socorro/stackwalk/bin/stackwalk.sh
+++ export syslogFacilityString=local2
+++ syslogFacilityString=local2
+++ export processorSymbolsPathnameList=/mnt/socorro/symbols/symbols_ffx,/mnt/socorro/symbols/symbols_sea,/mnt/socorro/symbols/symbols_tbrd,/mnt/socorro/symbols/symbols_mob,/mnt/socorro/symbols/symbols_penelope,/mnt/socorro/symbols/symbols_sbrd,/mnt/socorro/symbols/symbols_camino,/mnt/socorro/symbols/symbols_os,/mnt/socorro/symbols/symbols_solaris,/mnt/socorro/symbols/symbols_opensuse,/mnt/socorro/symbols/symbols_ubuntu,/mnt/socorro/symbols/symbols_fedora,/mnt/socorro/symbols/symbols_adobe,/mnt/socorro/symbols/symbols_b2g
+++ processorSymbolsPathnameList=/mnt/socorro/symbols/symbols_ffx,/mnt/socorro/symbols/symbols_sea,/mnt/socorro/symbols/symbols_tbrd,/mnt/socorro/symbols/symbols_mob,/mnt/socorro/symbols/symbols_penelope,/mnt/socorro/symbols/symbols_sbrd,/mnt/socorro/symbols/symbols_camino,/mnt/socorro/symbols/symbols_os,/mnt/socorro/symbols/symbols_solaris,/mnt/socorro/symbols/symbols_opensuse,/mnt/socorro/symbols/symbols_ubuntu,/mnt/socorro/symbols/symbols_fedora,/mnt/socorro/symbols/symbols_adobe,/mnt/socorro/symbols/symbols_b2g
+++ export primaryStorageClass=socorro.storage.crashstorage.CrashStorageSystemForLocalFS
+++ primaryStorageClass=socorro.storage.crashstorage.CrashStorageSystemForLocalFS
+++ export wsgiInstallation=True
+++ wsgiInstallation=True
+++ export modwsgiInstallation=True
+++ modwsgiInstallation=True
+++ export WSGIPythonPath=/data/socorro/application:/data/socorro/thirdparty:/data/socorro/application/scripts:/usr/lib64/python2.6/site-packages
+++ WSGIPythonPath=/data/socorro/application:/data/socorro/thirdparty:/data/socorro/application/scripts:/usr/lib64/python2.6/site-packages
+++ export smtpHostname=smtp.socketlabs.com
+++ smtpHostname=smtp.socketlabs.com
+++ export smtpPort=25
+++ smtpPort=25
+++ export smtpUsername=REDACTED
+++ smtpUsername=REDACTED
+++ export smtpPassword=REDACTED
+++ smtpPassword=REDACTED
+++ export fromEmailAddress=no-reply@crash-stats.mozilla.com
+++ fromEmailAddress=no-reply@crash-stats.mozilla.com
+++ export unsubscribeBaseUrl=http://crash-stats.mozilla.com/email/subscription/%s
+++ unsubscribeBaseUrl=http://crash-stats.mozilla.com/email/subscription/%s
+++ export product_uris=firefox/nightly/latest-mozilla-1.9.1/,firefox/nightly/latest-mozilla-1.9.2/,firefox/nightly/latest-mozilla-central/,seamonkey/nightly/latest-comm-1.9.1/,seamonkey/nightly/latest-comm-central-trunk/,thunderbird/nightly/latest-comm-1.9.2/,thunderbird/nightly/latest-comm-central/,mobile/nightly/latest-mobile-1.9.2/,mobile/nightly/latest-mobile-trunk/,camino/nightly/latest-2.1-M1.9.2/
+++ product_uris=firefox/nightly/latest-mozilla-1.9.1/,firefox/nightly/latest-mozilla-1.9.2/,firefox/nightly/latest-mozilla-central/,seamonkey/nightly/latest-comm-1.9.1/,seamonkey/nightly/latest-comm-central-trunk/,thunderbird/nightly/latest-comm-1.9.2/,thunderbird/nightly/latest-comm-central/,mobile/nightly/latest-mobile-1.9.2/,mobile/nightly/latest-mobile-trunk/,camino/nightly/latest-2.1-M1.9.2/
+++ export persistentBrokenDumpPathname=/home/socorro/persistent/fixbrokendumps.pickle
+++ persistentBrokenDumpPathname=/home/socorro/persistent/fixbrokendumps.pickle
+++ export brokenFennecFixer=/data/bin/minidump_hack-fennec
+++ brokenFennecFixer=/data/bin/minidump_hack-fennec
+++ export brokenFirefoxLinuxFixer=/data/bin/minidump_hack-firefox_linux
+++ brokenFirefoxLinuxFixer=/data/bin/minidump_hack-firefox_linux
+++ export elasticSearchHostname=
+++ elasticSearchHostname=
+++ export searchImplClass=socorro.search.postgresql.PostgresAPI
+++ searchImplClass=socorro.search.postgresql.PostgresAPI
+++ export product=Firefox,Fennec,FennecAndroid
+++ product=Firefox,Fennec,FennecAndroid
+++ export statsdHost=graphite1.dmz.phx1.mozilla.com
+++ statsdHost=graphite1.dmz.phx1.mozilla.com
+++ export statsdPrefix=socorro-prod
+++ statsdPrefix=socorro-prod
+++ export brokenBoot2GeckoFixer=/data/bin/minidump_hack-b2g
+++ brokenBoot2GeckoFixer=/data/bin/minidump_hack-b2g
+ REPORT_DATE='1 days ago'
+ '[' -n '' ']'
++ date -d '1 days ago' +%Y-%m-%d
+ SCRIPT_RUN_DATE=2013-09-03
+ /data/socorro/socorro-virtualenv/bin/python /data/socorro/application/scripts/startDailyUrl.py --day=2013-09-03
2013-09-04 14:23:16,186 INFO - current configuration:
2013-09-04 14:23:16,187 INFO - 	databaseHost=REDACTED
2013-09-04 14:23:16,187 INFO - 	databaseName=REDACTED
2013-09-04 14:23:16,187 INFO - 	databasePassword=REDACTED
2013-09-04 14:23:16,188 INFO - 	databaseUserName=REDACTED
2013-09-04 14:23:16,188 INFO - 	day=2013-09-03 00:00:00+00:00
2013-09-04 14:23:16,188 INFO - 	outputPath=.
2013-09-04 14:23:16,188 INFO - 	product=Firefox,Fennec,FennecAndroid
2013-09-04 14:23:16,189 INFO - 	publicOutputPath=.
2013-09-04 14:23:16,189 INFO - 	stderrErrorLoggingLevel=10
2013-09-04 14:23:16,189 INFO - 	stderrLineFormatString=%(asctime)s %(levelname)s - %(message)s
2013-09-04 14:23:16,189 INFO - 	syslogErrorLoggingLevel=10
2013-09-04 14:23:16,190 INFO - 	syslogFacilityString=local2
2013-09-04 14:23:16,190 INFO - 	syslogHost=localhost
2013-09-04 14:23:16,190 INFO - 	syslogLineFormatString=Socorro Daily URL (pid %(process)d): %(asctime)s %(levelname)s - %(threadName)s - %(message)s
2013-09-04 14:23:16,190 INFO - 	syslogPort=514
2013-09-04 14:23:16,191 INFO - 	utc_now=<function utc_now at 0x7f952461c398>
2013-09-04 14:23:16,191 INFO - 	version=
2013-09-04 14:23:16,191 INFO -
2013-09-04 14:23:16,198 DEBUG - config.day = 2013-09-03 00:00:00+00:00; now = 2013-09-04 00:00:00+00:00; yesterday = 2013-09-03 00:00:00+00:00
2013-09-04 14:23:16,198 DEBUG - config.day = 2013-09-03 00:00:00+00:00; now = 2013-09-04; yesterday = 2013-09-03
2013-09-04 14:23:16,198 DEBUG - SQL is:
      select
        r.signature,  -- 0
        r.url,        -- 1
        'http://crash-stats.mozilla.com/report/index/' || r.uuid as uuid_url, -- 2
        to_char(r.client_crash_date,'YYYYMMDDHH24MI') as client_crash_date,   -- 3
        to_char(r.date_processed,'YYYYMMDDHH24MI') as date_processed,         -- 4
        r.last_crash, -- 5
        r.product,    -- 6
        r.version,    -- 7
        r.build,      -- 8
        '' as branch, -- 9
        r.os_name,    --10
        r.os_version, --11
        r.cpu_name || ' | ' || r.cpu_info as cpu_info,   --12
        r.address,    --13
        array(select ba.bug_id from bug_associations ba where ba.signature = r.signature) as bug_list, --14
        r.user_comments, --15
        r.uptime as uptime_seconds, --16
        case when (r.email is NULL OR r.email='') then '' else r.email end as email, --17
        (select sum(adu_count) from raw_adu adu
           where adu.date = '2013-09-04'
             and r.product = adu.product_name and r.version = adu.product_version
             and substring(r.os_name from 1 for 3) = substring(adu.product_os_platform from 1 for 3)
             and r.os_version LIKE '%'||adu.product_os_version||'%') as adu_count, --18
        r.topmost_filenames, --19
        case when (r.addons_checked is NULL) then '[unknown]'when (r.addons_checked) then 'checked' else 'not' end as addons_checked, --20
        r.flash_version, --21
        r.hangid, --22
        r.reason, --23
        r.process_type, --24
        r.app_notes, --25
        r.install_age, --26
        rd.duplicate_of, --27
        r.release_channel, --28
        r.productid --29
      from
        reports r left join reports_duplicates rd on r.uuid = rd.uuid
      where
        '2013-09-03' <= r.date_processed and r.date_processed < '2013-09-04'
        and r.product in ('Firefox','Fennec','FennecAndroid')
      order by 5 -- r.date_processed, munged

2013-09-04 14:24:58,301 DEBUG - MainThread - killing database connections
2013-09-04 14:24:58,301 DEBUG - MainThread - connection MainThread closed
2013-09-04 14:24:58,352 INFO - done.
++ date -d '1 days ago' +%Y%m%d-crashdata.csv.gz
+ DATA_FILE=20130903-crashdata.csv.gz
+ scp /home/socorro/20130903-crashdata.csv.gz bacula@REDACTED:/data/security_group/crash_urls/
20130903-crashdata.csv.gz                                                                                                                                                                                                                                                                   100%   51MB  25.3MB/s   00:02
+ scp /home/socorro/20130903-crashdata.csv.gz mozauto@sisyphus.bughunter.ateam.phx1.mozilla.com:/work/mozilla/crash-reports/
20130903-crashdata.csv.gz                                                                                                                                                                                                                                                                   100%   51MB  50.6MB/s   00:01
+ ssh bacula@10.22.72.131 'chmod 640 /data/security_group/crash_urls/*'
+ ssh mozauto@sisyphus.bughunter.ateam.phx1.mozilla.com 'chmod 640 /work/mozilla/crash-reports/*'
+ mv /home/socorro/20130903-crashdata.csv.gz /tmp
++ date -d '1 days ago' +%Y%m%d-pub-crashdata.csv.gz
+ DATA_FILE=20130903-pub-crashdata.csv.gz
++ date -d '1 days ago' +%Y%m%d
+ SCRIPT_RUN_DATE=20130903
+ mkdir /mnt/crashanalysis/crash_analysis/20130903/
mkdir: cannot create directory `/mnt/crashanalysis/crash_analysis/20130903/': File exists
+ cp 20130903-pub-crashdata.csv.gz /mnt/crashanalysis/crash_analysis/20130903/
+ mv /home/socorro/20130903-pub-crashdata.csv.gz /tmp
+ rm -f /tmp/daily_urls.lock
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
I am going to kick off some backfill for this
All the backfill jobs have finished, I did 2013-08-24 - 2013-09-03, as I did 2013-09-04 earlier when testing

Please let us know if anything is missing
looks good. I see the files on fs1 and sisyphus. thanks!
Status: RESOLVED → VERIFIED
Blocks: 923595
No longer blocks: 923595
See Also: → 923595
You need to log in before you can comment on or make changes to this bug.