Closed Bug 749462 Opened 12 years ago Closed 12 years ago

add new crontabber config to puppet

Categories

(Socorro :: Infra, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rhelmer, Assigned: rhelmer)

References

Details

(Whiteboard: [qa-])

For an upcoming Socorro release, we're starting to switch to a new configuration system (configman). The first configman app will be a crontab manager called crontabber, and we'll need (at least) the following changes on stage/prod puppet manifests:

---
1) /etc/socorro/crontabber.ini needs to be managed like the existing common.conf
2) new entry in /etc/cron.d/socorro, something like:

*/5 * * * * socorro PYTHONPATH=".:thirdparty" python socorro/cron/crontabber.py --admin.conf=/etc/socorro/crontabber.ini > /var/log/socorro/crontabber.log 2>&1

---

We're going to be migrating over time away from the current /etc/socorro/{common,processor,monitor}.conf files (which are shell include files) and to something more structured like INI (configman supports several formats).

It might be simplest to duplicate the info from the existing config system into crontabber.ini, and worry about refactoring it into common/app-specific configs later.

We could try to make configman read the old configs (or perhaps have configman be the authoritative config source and generate the old-style configs?) but this might be more complex than it's worth. The downside to not doing something like this is that we will have duplicate information in configman/non-configman apps until the transition period is over.
BTW I have access to the puppet SVN repo for socorro, so I can make a patch for this.
Actually let's move this to the Socorro product so we can track it better.
Assignee: server-ops → nobody
Component: Server Operations → Infra
Product: mozilla.org → Socorro
QA Contact: phong → infra
Target Milestone: --- → 9
Assignee: nobody → rhelmer
Status: NEW → ASSIGNED
Summary: [socorro] add new crontabber config to puppet → add new crontabber config to puppet
r? https://github.com/mozilla/socorro/pull/540

peterbe, can you try out the above? It seems to work but it's logging this, what am I missing?:

Traceback (most recent call last):
  File "/data/socorro/application/socorro/cron/crontabber.py", line 614, in <module>
    sys.exit(main(CronTabber))
  File "/data/socorro/application/socorro/app/generic_app.py", line 132, in main
    values_source_list=values_source_list,
  File "/data/socorro/application/configman/config_manager.py", line 193, in __init__
    self._walk_expanding_class_options()
  File "/data/socorro/application/configman/config_manager.py", line 469, in _walk_expanding_class_options
    parent_namespace=source_namespace)
  File "/data/socorro/application/configman/config_manager.py", line 488, in _walk_expanding_class_options
    self._overlay_value_sources(ignore_mismatches=True)
  File "/data/socorro/application/configman/config_manager.py", line 549, in _overlay_value_sources
    ignore_mismatches=this_source_ignore_mismatches)
  File "/data/socorro/application/configman/config_manager.py", line 574, in _overlay_value_sources_recurse
    sub_destination.set_value(val)
  File "/data/socorro/application/configman/option.py", line 107, in set_value
    self.value = self.from_string_converter(val)
  File "/data/socorro/application/socorro/cron/crontabber.py", line 218, in class_list_converter
    class InnerClassList(RequiredConfig):
  File "/data/socorro/application/socorro/cron/crontabber.py", line 246, in InnerClassList
    raise JobNotFoundError(class_list_element)
__main__.JobNotFoundError: InnerClassList
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/a30dab9496a4ecd05e3df30d9da694aabcd2eba2
bug 749462 - crontabber support for puppet

https://github.com/mozilla/socorro/commit/9f3d35b4d596a7e0d7ae9664e4868b4cfbc5c9df
Merge pull request #540 from rhelmer/bug749462-add-crontabber-config-puppet

Bug749462 add crontabber config puppet
Target Milestone: 9 → 10
(In reply to Robert Helmer [:rhelmer] from comment #3)
> r? https://github.com/mozilla/socorro/pull/540
> 
> peterbe, can you try out the above? It seems to work but it's logging this,
> what am I missing?:
> 
> Traceback (most recent call last):
>   File "/data/socorro/application/socorro/cron/crontabber.py", line 614, in
> <module>
>     sys.exit(main(CronTabber))
>   File "/data/socorro/application/socorro/app/generic_app.py", line 132, in
> main
>     values_source_list=values_source_list,
>   File "/data/socorro/application/configman/config_manager.py", line 193, in
> __init__
>     self._walk_expanding_class_options()
>   File "/data/socorro/application/configman/config_manager.py", line 469, in
> _walk_expanding_class_options
>     parent_namespace=source_namespace)
>   File "/data/socorro/application/configman/config_manager.py", line 488, in
> _walk_expanding_class_options
>     self._overlay_value_sources(ignore_mismatches=True)
>   File "/data/socorro/application/configman/config_manager.py", line 549, in
> _overlay_value_sources
>     ignore_mismatches=this_source_ignore_mismatches)
>   File "/data/socorro/application/configman/config_manager.py", line 574, in
> _overlay_value_sources_recurse
>     sub_destination.set_value(val)
>   File "/data/socorro/application/configman/option.py", line 107, in
> set_value
>     self.value = self.from_string_converter(val)
>   File "/data/socorro/application/socorro/cron/crontabber.py", line 218, in
> class_list_converter
>     class InnerClassList(RequiredConfig):
>   File "/data/socorro/application/socorro/cron/crontabber.py", line 246, in
> InnerClassList
>     raise JobNotFoundError(class_list_element)
> __main__.JobNotFoundError: InnerClassList

THis happens because the `jobs` option isn't configured in your crontabber.ini file.
The default is a weird thing called "InnerClassList"
(In reply to Peter Bengtsson [:peterbe] from comment #5)
> THis happens because the `jobs` option isn't configured in your
> crontabber.ini file.
> The default is a weird thing called "InnerClassList"

Ah ok got it.. so I now have a crontabber.ini that looks like this:

jobs = socorro.cron.jobs.weekly_reports_partitions.WeeklyReportsPartitionsCronApp|1d

What's this going to look like with multiple jobs?

Also, the more I think about it, the job definitions should probably be shipped in the tarball with Socorro releases, and the configs in /etc/socorro should just have site-specific info like DB host/user/password ... that way we can enable/disable jobs without having to ask IT to do it.

Is that possible to do with the current override system ^? Is there an assumption that the override file lives in the same dir as the regular file, and if so are absolute paths allowed?
(In reply to Robert Helmer [:rhelmer] from comment #6)
> (In reply to Peter Bengtsson [:peterbe] from comment #5)
> > THis happens because the `jobs` option isn't configured in your
> > crontabber.ini file.
> > The default is a weird thing called "InnerClassList"
> 
> Ah ok got it.. so I now have a crontabber.ini that looks like this:
> 
> jobs =
> socorro.cron.jobs.weekly_reports_partitions.WeeklyReportsPartitionsCronApp|1d
> 
> What's this going to look like with multiple jobs?
Like this:

jobs='''socorro.cron.jobs.weekly_reports_partitions.WeeklyReportsPartitionsCronApp|1d
        socorro.cron.jobs.other_job.OtherJobCronApp|2d'''

> 
> Also, the more I think about it, the job definitions should probably be
> shipped in the tarball with Socorro releases, and the configs in
> /etc/socorro should just have site-specific info like DB host/user/password
> ... that way we can enable/disable jobs without having to ask IT to do it.
> 

We have two options:

1. Instead of relying on it being in the /etc/socorro/crontabber.ini we can rely on the default value and maintain that as code. 

2. Make some improvements so you just have to specify the full class path, we can make it relevant plus it can perhaps auto-discover the class from the module name. 



> Is that possible to do with the current override system ^? Is there an
> assumption that the override file lives in the same dir as the regular file,
> and if so are absolute paths allowed?

No, it assumes sys.path relevant "paths".
BTW just to make sure we're all on the same page, the goals I am trying to get to are:

tl;dr - /etc/socorro is for IT's benefit (and non-mozilla sites too), don't put anything there they don't need/want - conversely, don't put anything there that we want to retain control over. We should consider non-mozilla site's override needs too though, they may not want to run the set of jobs we do (bugzilla, ftpscraper, etc)

Long form is:

1) we need something outside of git that holds things like db username and db password, and doesn't need to be edited after every release (this is how socorro used to be)

2) as much as possible socorro should set reasonable defaults, and manage things in git, but some things like db hostname and password is not appropriate for public git

3) for stuff that has to live outside socorro, it should be easily discoverable/use conventions, like /etc/socorro/, and be managed via puppet so we can spin up new servers easily

I think the cron jobs probably fall under #2 - we want to ship Socorro with the list of jobs, that shouldn't be something IT has to manage. It might be nice for non-mozilla users to be able to override it in /etc (our current system is completely generic - we can override anything via environment variables set in /etc/socorro/common.conf).. for example a non-mozilla user might like to exclude the bugzilla cronjob in a way that won't get blown away next time they upgrade :)

Right now, we have to ask IT to make changes to the system crontab when we add/remove jobs or change the time they run. I think that ideally the only thing the system socorro crontab will run in the near future is crontabber, and the list of jobs and timing will go out with Socorro releases. The ability to override these would be nice-to-have for reasons above but not something we absolutely need for crash-stats.m.o
(In reply to Peter Bengtsson [:peterbe] from comment #7)
> (In reply to Robert Helmer [:rhelmer] from comment #6)
> > (In reply to Peter Bengtsson [:peterbe] from comment #5)
> > What's this going to look like with multiple jobs?
> Like this:
> 
> jobs='''socorro.cron.jobs.weekly_reports_partitions.
> WeeklyReportsPartitionsCronApp|1d
>         socorro.cron.jobs.other_job.OtherJobCronApp|2d'''

OK, as long as we can have some kind of formatting that is fine w/ me :)

 
> > Also, the more I think about it, the job definitions should probably be
> > shipped in the tarball with Socorro releases, and the configs in
> > /etc/socorro should just have site-specific info like DB host/user/password
> > ... that way we can enable/disable jobs without having to ask IT to do it.
> > 
> 
> We have two options:
> 
> 1. Instead of relying on it being in the /etc/socorro/crontabber.ini we can
> rely on the default value and maintain that as code. 
> 
> 2. Make some improvements so you just have to specify the full class path,
> we can make it relevant plus it can perhaps auto-discover the class from the
> module name. 


Either of these is fine w/ me.. re: comment 8 I think for non-mozilla sites it'd be nice-to-have a way to override the job list, although this might get a bit complex (really I think they'd want to exclude stuff like bugzilla/ftpscraper but otherwise follow changes we made in the source).

If you care about the above, then I'd think it'd be nice to use the same config format both in-tree and also in /etc/socorro


> > Is that possible to do with the current override system ^? Is there an
> > assumption that the override file lives in the same dir as the regular file,
> > and if so are absolute paths allowed?
> 
> No, it assumes sys.path relevant "paths".


We could put /etc/socorro on the PYTHONPATH is that's helpful? That might be the clearest way to say "use /data/socorro/application and thirdparty, but override with /etc/socorro/" ?
I do not doubt that the list of jobs needs to be in a .ini file (or .conf or .json or whatever).
And I do not doubt that we need to manage this list with git as opposed to telling IT ("please add socorro.cron.job.some_long_name.SomeLongAppName|6h to the crontabber")

I'm still a bit confused about how puppet and /etc work together. 

If it helps (...you to help me), with configman it's possible to use includes in the .ini files. E.g. we could have::
`/data/application/socorro/socorro/cron/crontabber.ini` which could look like this::

 +include /etc/socorro/postgres.ini
 +include /etc/socorro/logging.ini

 jobs='''socorro.cron.jobs.FooJob|1d
         socorro.cron.jobs.HandJob|1d'''
           
As far as I can see, this would give us the best of both works. Devs happy and IT happy. (however, devs will have to put up with writing these loooong class paths for now)
I believe that the +include idea is the correct solution to this problem.  It requires no changes to any existing code.  It separates IT from Devs.  It also support a deeper fragmentation, which I hope is positive. Other apps, in their config files can also import peterbe's example postgres and logging ini files.  That centralizes common aspects of configuration to whatever degree the adopter of Socorro desires.

Please excuse my willful ignorance of the installation/puppet system, but the following seem logical to me.  As for the location of the ini files, why not split them into '/etc/socorro/changeable_config' and '/etc/socorro/static_config'.  we keep .dist versions in our git repo.  During installation, the .dists get copied to their proper locations as bare ini files.  Documentation sends IT to the 'changeable_config', while to adjust functionality (classes) for an installation, we devs can get IT to tweek the the static_config or we can change it between versions in the corresponding .dist.

Now this has opens the door to difficult-to-manage ini files.  If you change a cronjob class or add one, you may end up with a very different ini structure that's incompatible with existing ini files.  You'd have to be careful that you don't end up having to recreate your ini hierarchy for every class change.
Target Milestone: 10 → 11
Target Milestone: 11 → 10
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/669b0f8d0cea48337293609ca1e09677524ecb75
bug 749462 - add crontabber configs, in new top-level config dir

https://github.com/mozilla/socorro/commit/c16cebaff8e4818615dd55e83552a37628c2f970
Merge pull request #592 from rhelmer/bug749462-crontabber-puppet

bug 749462 - add crontabber configs, in new top-level config dir
Blocks: 751298
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/006f9c7b985dd9cfbe9e76fda1d7ac5b742bfeb3
bug 749462 - docs for crontabber on new installs

https://github.com/mozilla/socorro/commit/55fc93e59f59a8c688b9e8b41aae375d1c15efa1
Merge pull request #593 from rhelmer/bug749462-crontabber-puppet

bug 749462 - docs for crontabber on new installs
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [qa?]
Whiteboard: [qa?] → [qa-]
You need to log in before you can comment on or make changes to this bug.