Closed Bug 1006954 Opened 11 years ago Closed 11 years ago

Intermittent "NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV1Handler'] Check your credentials" when sccache auth token expires during the build

Categories

(Firefox Build System :: General, defect)

x86
Windows Server 2008
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: glandium)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file, 2 obsolete files)

https://tbpl.mozilla.org/php/getParsedLog.php?id=38897992&tree=B2g-Inbound https://tbpl.mozilla.org/php/getParsedLog.php?id=39175913&tree=B2g-Inbound Traceback (most recent call last): File "/builds/slave/b2g-in-linux32_g-0000000000000/build/sccache/server.py", line 312, in run_command for result in _run_command(job): File "/builds/slave/b2g-in-linux32_g-0000000000000/build/sccache/server.py", line 244, in _run_command storage = Storage.from_environment() File "/builds/slave/b2g-in-linux32_g-0000000000000/build/sccache/storage.py", line 53, in from_environment os.environ.get('SCCACHE_NAMESERVER')) File "/builds/slave/b2g-in-linux32_g-0000000000000/build/sccache/storage.py", line 126, in __init__ https_connection_factory=(self._https_connection_class, ())) File "/builds/slave/b2g-in-linux32_g-0000000000000/build/sccache/boto/s3/connection.py", line 176, in __init__ validate_certs=validate_certs) File "/builds/slave/b2g-in-linux32_g-0000000000000/build/sccache/boto/connection.py", line 559, in __init__ host, config, self.provider, self._required_auth_capability()) File "/builds/slave/b2g-in-linux32_g-0000000000000/build/sccache/boto/auth.py", line 875, in get_auth_handler 'Check your credentials' % (len(names), str(names))) NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV1Handler'] Check your credentials <philor> glandium: what does https://tbpl.mozilla.org/php/getParsedLog.php?id=38897992&tree=B2g-Inbound mean? <philor> besides "I'm supposed to remember what it means, because I've seen it before" <glandium> philor: it means file a bug and retrigger <glandium> philor: essentially, this means the build happened at the wrong moment, and the auth token expired during the build, and we don't support that situation, apparently <glandium> well, at least, in all likeliness
So, reading the code and error, it turns out this is not exactly what i was thinking, but is kind of related. When starting a subprocess to process compilations, we create a s3 connection, and doing so makes boto get the temporary IAM credentials. For some reason, that sometimes fails, and when that does, we're essentially screwed. When the temporary credentials are near expiration, though, boto tries to renew them but keeps the old ones if that fails. Which means it retries each time there is a new s3 access done on the same connection, which is likely to happen a lot before actual expiration. So renewal is more or less handled. Fortunately, boto has options to retry getting the temporary credentials.
Attached file boto.cfg (obsolete) —
This makes boto retry 10 times getting the metadata, including the temporary IAM credentials. This needs deployment on all AWS build slaves (and need not replace the existing boto.cfg for all non-AWS build slaves). Mike, could you take care putting that in puppet?
Attachment #8419182 - Flags: review?(mshal)
Blocks: 1007991
Depends on: 1007976
Attachment #8419817 - Flags: review?(mshal)
Assignee: nobody → mh+mozilla
Status: NEW → ASSIGNED
Attachment #8419182 - Attachment is obsolete: true
Attachment #8419182 - Flags: review?(mshal)
Blocks: 1008015
Comment on attachment 8419817 [details] [diff] [review] Make boto retry getting instance metadata several times I don't think I can review puppet changes, so -> dustin. I'd be happy to land it if it gets r+ though. That said, why do you want aws_dot_boto.erb for !='in-house' rather than =='aws'? The default node location ends up being 'unknown', and we probably don't want the file installed in that case.
Attachment #8419817 - Flags: review?(mshal) → review?(dustin)
Comment on attachment 8419817 [details] [diff] [review] Make boto retry getting instance metadata several times Review of attachment 8419817 [details] [diff] [review]: ----------------------------------------------------------------- A patch I just r-'d will bitrot this. Please re-request review when that one is at least updated. That said this is probably going to be good.
Attachment #8419817 - Flags: review?(dustin)
mshal's comment is good and should be addressed in the next version, though (thanks Michael!)
Attachment #8419817 - Attachment is obsolete: true
Comment on attachment 8421430 [details] [diff] [review] Make boto retry getting instance metadata several times Review of attachment 8421430 [details] [diff] [review]: ----------------------------------------------------------------- This will have to land After Bug 1007976 ::: modules/slave_secrets/manifests/ceph_config.pp @@ +8,5 @@ > > if ($ensure == 'present' and $config::install_ceph_cfg) { > if ($config::node_location == 'in-house' and $slave_trustlevel == 'try') { > $boto_content = template("$module_name/try_dot_boto.erb") > + } elsif ($config::node_location == 'aws') { based on earlier patch and convo, I wonder if we should limit this explicitly to "slave_trustlevel=='try'" I'll let you decide though, since there is no secret involved here anyway.
Attachment #8421430 - Flags: review?(bugspam.Callek) → review+
(In reply to Justin Wood (:Callek) from comment #9) > based on earlier patch and convo, I wonder if we should limit this > explicitly to "slave_trustlevel=='try'" I'll let you decide though, since > there is no secret involved here anyway. This config is actually needed on all aws nodes, not only try.
In production.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Target Milestone: mozilla32 → ---
Product: Core → Firefox Build System
See Also: → 1597531
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: