Closed Bug 715343 Opened 13 years ago Closed 13 years ago

Audit other cronjobs to use correct timestamps

Categories

(Socorro :: Database, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jberkus, Assigned: rhelmer)

References

Details

(Whiteboard: [qa-])

We need to check cronjobs in addition to matview generators to see if they are using correct timestamps (timestamptz or UTC times).

This needs to start with a list of cronjobs which touch Postgres, so handing over to Rob to provide that list.
Blocks: 714938
Component: Socorro → Database
Product: Webtools → Socorro
QA Contact: socorro → database
Here is a first-pass at auditing the cronjob code in socorro/cron:

* bugzilla.py
** uses datetime.now for it's own internal tracking (pickled last time it hit bugzilla)
** probably does not need modification
*** would not hurt to make this tz-aware

* builds.py
** replaced by ftpscraper.py
** deprecated, should be removed

* dailyMatviews.py
** only calls SPs and already uses datetime.utcnow()
** probably does not need modification

* dailyUrl.py
** generates nightly CSV for upload to crash-analysis
** does some date math (using tz-naive datetime.now())
** probably needs modification

* duplicates.py
** calls SPs to do duplicate detection
** does some date math (using tz-naive datetime.now())
** probably needs modification

* fixBrokenDumps.py
** post-process broken fennec crashes from bug 643201
** pickles last_run_time as tz-naive datetime.now() and uses it for reports query
** probably needs modification

* ftpscraper.py
** picks up info about releases from ftp.m.o
** uses tz-naive datetime.now() to pick correct dated dirs on FTP
** probably does not need modification
*** but, assumes that server is in PST (as FTP is), so making it tz-aware would be good

* hangreport.py
** calls SP to populate hangreport matview
** does some date math with tz-naive datetime.now to determine yesterday's date
** probably needs modification

* hbaseResbumit.py
** I think this is replaced by newCrashMover
** probably deprecated and should be removed

* mtbf.py
** looks like an obsolete report? not sure what this is
** probably deprecated and should be removed

* reportsClean.py
** runs hourly SP to populate reports_clean
** does some date math with tz-naive datetime.now()
** probably needs modification

* serverstatus.py
** populates server_status table used by /status page
** uses tz-naive datetime.now() to track when it last ran
** probably does not need modification
*** would not hurt to make this tz-aware

* signatures.py
** runs SP to populate signature_build matview
** uses tz-naive datetime.now()
** probably needs modification

* topCrashesBy{Signature,Url}
** deprecated and ignored

* util.py
** looks problematic, but date-related functions only used by TCBS AFAICT
** ignored
Some comments:

> * bugzilla.py
> ** uses datetime.now for it's own internal tracking (pickled last time it
> hit bugzilla)
> ** probably does not need modification
> *** would not hurt to make this tz-aware

The current bugs table in Postgres does not have a timestamp column.  Possibly it should, but currently it doesn't.

> * duplicates.py
> ** calls SPs to do duplicate detection
> ** does some date math (using tz-naive datetime.now())
> ** probably needs modification

Date math here is just for what interval to call it over, FYI.  May still need fixing; this will be consistent for several hourly matview generators.

> * hangreport.py
> ** calls SP to populate hangreport matview
> ** does some date math with tz-naive datetime.now to determine yesterday's
> date
> ** probably needs modification

Why is this separate from daily_matviews?  It should be part of that.

> * mtbf.py
> ** looks like an obsolete report? not sure what this is
> ** probably deprecated and should be removed

Good lord, this was deprecated over a year ago.  Please disable and delete the code.

> * reportsClean.py
> ** runs hourly SP to populate reports_clean
> ** does some date math with tz-naive datetime.now()
> ** probably needs modification
> 
> * serverstatus.py
> ** populates server_status table used by /status page
> ** uses tz-naive datetime.now() to track when it last ran
> ** probably does not need modification
> *** would not hurt to make this tz-aware

Current plan is to give server_status timestamptz columns.

> * signatures.py
> ** runs SP to populate signature_build matview
> ** uses tz-naive datetime.now()
> ** probably needs modification

Which matview function does this call?
(In reply to [:jberkus] Josh Berkus from comment #2)
> Some comments:
> 
> > * bugzilla.py
> > ** uses datetime.now for it's own internal tracking (pickled last time it
> > hit bugzilla)
> > ** probably does not need modification
> > *** would not hurt to make this tz-aware
> 
> The current bugs table in Postgres does not have a timestamp column. 
> Possibly it should, but currently it doesn't.

I would be surprised if the date column is actually used, since we care about bug<->ooid mappings and not really when the bug was added or found.

The only query I see that uses this (in the PHP code) is:
SELECT ba.signature, bugs.id FROM bugs

So this is probably safe to ignore for now.

 
> > * duplicates.py
> > ** calls SPs to do duplicate detection
> > ** does some date math (using tz-naive datetime.now())
> > ** probably needs modification
> 
> Date math here is just for what interval to call it over, FYI.  May still
> need fixing; this will be consistent for several hourly matview generators.
> 
> > * hangreport.py
> > ** calls SP to populate hangreport matview
> > ** does some date math with tz-naive datetime.now to determine yesterday's
> > date
> > ** probably needs modification
> 
> Why is this separate from daily_matviews?  It should be part of that.


Good point, I think this is from when we called daily_matviews "newtcbs", maybe before we decided to just make it the omnibus SP caller.

It doesn't do anything fancy so we could just move it in there as part of this bug.
 
> > * mtbf.py
> > ** looks like an obsolete report? not sure what this is
> > ** probably deprecated and should be removed
> 
> Good lord, this was deprecated over a year ago.  Please disable and delete
> the code.
> 
> > * reportsClean.py
> > ** runs hourly SP to populate reports_clean
> > ** does some date math with tz-naive datetime.now()
> > ** probably needs modification
> > 
> > * serverstatus.py
> > ** populates server_status table used by /status page
> > ** uses tz-naive datetime.now() to track when it last ran
> > ** probably does not need modification
> > *** would not hurt to make this tz-aware
> 
> Current plan is to give server_status timestamptz columns.
> 
> > * signatures.py
> > ** runs SP to populate signature_build matview
> > ** uses tz-naive datetime.now()
> > ** probably needs modification
> 
> Which matview function does this call?

update_signature_matviews
Status: NEW → ASSIGNED
> > > * signatures.py
> > > ** runs SP to populate signature_build matview
> > > ** uses tz-naive datetime.now()
> > > ** probably needs modification
> > 
> > Which matview function does this call?
> 
> update_signature_matviews

That SP is obsolete and should be dropped.  Is there a call to update_signatures() as part of daily_matviews?
Rob,

SPs which take a DATE and not a TIMESTAMP should require no modification from your end to get the right periods.
(In reply to [:jberkus] Josh Berkus from comment #4)
> > > > * signatures.py
> > > > ** runs SP to populate signature_build matview
> > > > ** uses tz-naive datetime.now()
> > > > ** probably needs modification
> > > 
> > > Which matview function does this call?
> > 
> > update_signature_matviews
> 
> That SP is obsolete and should be dropped.  Is there a call to
> update_signatures() as part of daily_matviews?

Yes there is.
(In reply to [:jberkus] Josh Berkus from comment #5)
> Rob,
> 
> SPs which take a DATE and not a TIMESTAMP should require no modification
> from your end to get the right periods.

Good to know (and makes sense), thanks.
Landed on integration branch https://github.com/mozilla/socorro/commit/940d8cae30364588427114813517533f5412c380

Made notes of which cron jobs to disable in crontab:
https://wiki.mozilla.org/Socorro:Releases/2.4
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Looks like we missed some, because the following cron jobs are now failing with "socorro.lib.ConfigurationManager.CannotConvert: date not a parsable string":

aggregates (aka old tcbs)
bugzilla
duplicates
reportsclean
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Hmm ok looking a little closer now, I see why this wasn't caught by any unit tests - it's not in the library code, it's the startup scripts. Here's the stack:

"""
Traceback (most recent call last):
  File "/data/socorro/application/scripts/startBugzilla.py", line 17, in <module>
    config = configurationManager.newConfiguration(configurationModule=configModule, applicationName="Bugzilla Associations 0.1")
  File "/data/socorro/application/socorro/lib/ConfigurationManager.py", line 95, in newConfiguration
    return Config(**kw)
  File "/data/socorro/application/socorro/lib/ConfigurationManager.py", line 278, in __init__
    raise CannotConvert(str(x))
socorro.lib.ConfigurationManager.CannotConvert: date not a parsable string
"""

It's really the call to "configurationManager.newConfiguration(...)" that is failing here, so this probably isn't the right bug to reopen.

I'll leave it reopened until we figure out which one is.
OK the bug here is from https://github.com/mozilla/socorro/commit/b725d79be1aee1bd90c6846f5ccd939185a0b711#diff-4

I traced the behavior here and what's happening is that dateTimeConverter() used to return inputString if it didn't match anything, now it throws an exception (for some reason startDate is being set as None in the default commonconfig.py.dist ?)

Anyway a quick fix would be:

"""
diff --git a/socorro/lib/ConfigurationManager.py b/socorro/lib/ConfigurationManager.py
index b9af44e..ead3690 100644
--- a/socorro/lib/ConfigurationManager.py
+++ b/socorro/lib/ConfigurationManager.py
@@ -427,6 +427,8 @@ def ioConverter(inputString):
 def dateTimeConverter(inputString):
   """ a conversion function for datetimes
   """
+  if inputString is None:
+    return inputString
   return string_to_datetime(inputString)
 
 #------------------------------------------------------------------------------------------
"""

But someone who knows this code better might know of a more correct way to fix this.

For reference, it looks like startDate in commonconfig.py.dist is simply:

startDate.fromStringConverter = cm.dateTimeConverter
(In reply to Robert Helmer [:rhelmer] from comment #12)
> startDate.fromStringConverter = cm.dateTimeConverter

Oops I mean:

startDate = cm.Option()
startDate.doc = 'The start of the overall/outer aggregation window (YYYY-MM-DD [hh:mm])'
startDate.fromStringConverter = cm.dateTimeConverter

The only place .default is ever set is in createpartitionsconfig.py.dist, so I guess it'll be "None" for anything else that tries to start configuration this way (I wonder why mware, collector, processor et. al. do not hit this?)
here's the problem and the solution:

you're right about the defaults not being set and therefore having a None value.  These crons jobs do a lazy thing in their config.dist files: they use "from commonconfig import *" rather than cherry picking just the values they need.  So the None parameter values are brought into the config namespace of each failing cron.  The config manager tries to apply the conversion routine, which is now raising an exception.  More about that in a moment.

The other crons don't experience this problem because they don't import the offending None config parameters.

With release 2.4, we've replaced the default string-to-datetime function with one that understands offsets.  What it doesn't understand, which the old one did, is None as an input value.  The new one raises a ValueError when it gets the unanticipated None.

The quick solution is to add a None case to string_to_datetime in the .../socorro/lib/datetimeutil.py modules.  It should just echo the None back. 

The "correct" solution is to replace the config system with something not vulnerable to this silly problem.
(In reply to K Lars Lohn [:lars] [:klohn] from comment #14)
> The quick solution is to add a None case to string_to_datetime in the
> .../socorro/lib/datetimeutil.py modules.  It should just echo the None back. 

r? https://github.com/mozilla/socorro/pull/269
 
> The "correct" solution is to replace the config system with something not
> vulnerable to this silly problem.

Assuming we're covered here with configman :)
Commits pushed to https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/81d85866d117f86b93d0b770082c67e0bd722ef7
bug 715343 - most of the existing cron jobs start scripts do not override the default dates from commonconfig, so return None in this case to match the old behavior

https://github.com/mozilla/socorro/commit/43fdc8e6ad8b78a16dcc32ad67d157bc7909ef9f
Merge pull request #269 from rhelmer/bug715343-cron-broken-date

fixes bug 715343 - most of the existing cron jobs start scripts do not overrid...
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Commit pushed to https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/83243d443b96e915d00d8b90b27354aa1d6abd44
bug 715343 - most of the existing cron jobs start scripts do not override the default dates from commonconfig, so return None in this case to match the old behavior
Since the cron jobs already missed the matviews from yesterday, I'm going to run through all the matview generators manually so that I can make sure they don't fail or produce wrong datetimes.
(In reply to [:jberkus] Josh Berkus from comment #18)
> Since the cron jobs already missed the matviews from yesterday, I'm going to
> run through all the matview generators manually so that I can make sure they
> don't fail or produce wrong datetimes.

I just sent you an email, but just in case you see it here first - the hourly jobs seem ok, I tried kicking off daily_matviews but get:

psycopg2.ProgrammingError: function update_tcbs(unknown) is not unique
LINE 1: SELECT * FROM update_tcbs('2012-01-07')
                      ^
HINT:  Could not choose a best candidate function. You might need to add explicit type casts.
See update to bug: 715335.  The above issue has been fixed.
Depends on: 715335
Marking [qa-] however automation is passing and manual bfts pass.
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.