Closed Bug 1393138 Opened 7 years ago Closed 5 years ago

repo mirror files are non-deterministic and change constantly between puppet runs

Categories

(Infrastructure & Operations :: RelOps: Puppet, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: dividehex, Assigned: dhouse)

References

Details

Attachments

(1 file)

repo mirror files contain a list of yum repo urls which is generated by sort_servers_by_group function.  The list is suppose to start with the clients 'closest' puppet masters.  This is randomly shuffled based on the clients fqdn so as to be deterministic per host.  Following that, is a list of the rest of the puppet masters also shuffled with the fqdn as the seed.

This give the list the randomness need for load balancing but makes it deterministic per host so it doesn't change on every puppet run.

The problem is this is not working.  From what I can tell the first group of closest servers is deterministic but the second group is not.

https://hg.mozilla.org/build/puppet/file/tip/modules/config/lib/puppet/parser/functions/sort_servers_by_group.rb#l30

I wonder if this could be fixed by simply resetting the seed in between shuffling group_servers and all_servers
This was bothering me. so I wanted to fix it for my re-runs of puppet. Thanks Jake for describing the likely fix (confirmed). I ran five times without this and got the re-ordering 4/5 times, and with this patch I did not get reordering in seven runs.
Assignee: relops → dhouse
Attachment #8901975 - Flags: review?(jwatkins)
Attachment #8901975 - Flags: review?(dcrisan)
Comment on attachment 8901975 [details] [diff] [review]
re-seed between shuffles to make all_servers addition deterministic per fqdn

LGTM
Attachment #8901975 - Flags: review?(dcrisan) → review+
Comment on attachment 8901975 [details] [diff] [review]
re-seed between shuffles to make all_servers addition deterministic per fqdn

Review of attachment 8901975 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome! thanks
Attachment #8901975 - Flags: review?(jwatkins) → review+
Pushed to default:
remote:   https://hg.mozilla.org/build/puppet/rev/8deb341ca1bafd232328c00d98d1e2d75874a12f
passed travis and merged
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Attachment #8901975 - Flags: checked-in+
Dragos found that the host order changes for a '--test --noop' run of puppet sometimes. I ran puppet agent with a noop and regular apply many times on rejh1.mdc1 and found that the '--test --noop' run shows a different ordering of the hosts about 1 in 5 times. Applying never changes the order when run multiple times in a row or after a '--noop' run reports a change.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I think the problem is the uniq() call. It appears to be nondeterministic in which non-unique elements it removes from a list. So instead of using uniq(), we can remove the elements when we find them and create the group_servers.
```
all_servers = groups.values.flatten
groups.each_pair do |re, servers|
    if Regexp.new(re) =~ fqdn
        group_servers += servers
        all_servers -= servers
    end
end
[...]
old_seed = rand()
srand(Digest::MD5.hexdigest([fqdn].join(':')).hex)
result = group_servers.shuffle + all_servers.shuffle
function_notify(result)
srand(old_seed)
```
Sep  8 06:34:44 releng-puppet2.srv.releng.scl3.mozilla.com puppet-master[2509]: (Scope(Class[Config])) result:'releng-puppet2.srv.releng.mdc1.mozilla.com','releng-puppet1.srv.releng.mdc1.mozilla.com','releng-puppet1.srv.releng.usw2.mozilla.com','releng-puppet1.srv.releng.use1.mozilla.com','releng-puppet2.srv.releng.scl3.mozilla.com','releng-puppet1.srv.releng.scl3.mozilla.com'
uniq should not be the problem:
"+self+ is traversed in order, and the first occurrence is kept." (https://github.com/ruby/ruby/blob/f0ae63b072f3404f377f8fae0121c943ce33fc5d/array.c#L4363)
Jake suggested that puppet's multiple threads could be causing the system random seed to be changed elsewhere (out from under us). So I'll try setting up a separate pseudo random number generator.

Testing with manually removing the elements so we didn't need uniq seemed to fix it (ran about twenty times with no change, but not over enough time and only tested on one system).
This might be stating the obvious but I think this might be a bug somewhere in the ocean of bugs that exist in our OLD versions of ruby 1.8.7 and puppet 3.7

I'm sure this wouldn't be an issue at all if/when we upgrade.

Masterless puppet will end up replacing puppetagain and this wont be an issue.

Status: REOPENED → RESOLVED
Closed: 7 years ago5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: