Closed
Bug 1006954
Opened 11 years ago
Closed 11 years ago
Intermittent "NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV1Handler'] Check your credentials" when sccache auth token expires during the build
Categories
(Firefox Build System :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Assigned: glandium)
References
Details
(Keywords: intermittent-failure)
Attachments
(1 file, 2 obsolete files)
1.51 KB,
patch
|
Callek
:
review+
|
Details | Diff | Splinter Review |
https://tbpl.mozilla.org/php/getParsedLog.php?id=38897992&tree=B2g-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=39175913&tree=B2g-Inbound
Traceback (most recent call last):
File "/builds/slave/b2g-in-linux32_g-0000000000000/build/sccache/server.py", line 312, in run_command
for result in _run_command(job):
File "/builds/slave/b2g-in-linux32_g-0000000000000/build/sccache/server.py", line 244, in _run_command
storage = Storage.from_environment()
File "/builds/slave/b2g-in-linux32_g-0000000000000/build/sccache/storage.py", line 53, in from_environment
os.environ.get('SCCACHE_NAMESERVER'))
File "/builds/slave/b2g-in-linux32_g-0000000000000/build/sccache/storage.py", line 126, in __init__
https_connection_factory=(self._https_connection_class, ()))
File "/builds/slave/b2g-in-linux32_g-0000000000000/build/sccache/boto/s3/connection.py", line 176, in __init__
validate_certs=validate_certs)
File "/builds/slave/b2g-in-linux32_g-0000000000000/build/sccache/boto/connection.py", line 559, in __init__
host, config, self.provider, self._required_auth_capability())
File "/builds/slave/b2g-in-linux32_g-0000000000000/build/sccache/boto/auth.py", line 875, in get_auth_handler
'Check your credentials' % (len(names), str(names)))
NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV1Handler'] Check your credentials
<philor> glandium: what does https://tbpl.mozilla.org/php/getParsedLog.php?id=38897992&tree=B2g-Inbound mean?
<philor> besides "I'm supposed to remember what it means, because I've seen it before"
<glandium> philor: it means file a bug and retrigger
<glandium> philor: essentially, this means the build happened at the wrong moment, and the auth token expired during the build, and we don't support that situation, apparently
<glandium> well, at least, in all likeliness
Reporter | ||
Comment 1•11 years ago
|
||
Assignee | ||
Comment 2•11 years ago
|
||
So, reading the code and error, it turns out this is not exactly what i was thinking, but is kind of related. When starting a subprocess to process compilations, we create a s3 connection, and doing so makes boto get the temporary IAM credentials. For some reason, that sometimes fails, and when that does, we're essentially screwed. When the temporary credentials are near expiration, though, boto tries to renew them but keeps the old ones if that fails. Which means it retries each time there is a new s3 access done on the same connection, which is likely to happen a lot before actual expiration. So renewal is more or less handled.
Fortunately, boto has options to retry getting the temporary credentials.
Assignee | ||
Comment 3•11 years ago
|
||
This makes boto retry 10 times getting the metadata, including the temporary IAM credentials.
This needs deployment on all AWS build slaves (and need not replace the existing boto.cfg for all non-AWS build slaves).
Mike, could you take care putting that in puppet?
Attachment #8419182 -
Flags: review?(mshal)
Assignee | ||
Comment 4•11 years ago
|
||
Attachment #8419817 -
Flags: review?(mshal)
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → mh+mozilla
Status: NEW → ASSIGNED
Assignee | ||
Updated•11 years ago
|
Attachment #8419182 -
Attachment is obsolete: true
Attachment #8419182 -
Flags: review?(mshal)
Comment 5•11 years ago
|
||
Comment on attachment 8419817 [details] [diff] [review]
Make boto retry getting instance metadata several times
I don't think I can review puppet changes, so -> dustin. I'd be happy to land it if it gets r+ though.
That said, why do you want aws_dot_boto.erb for !='in-house' rather than =='aws'? The default node location ends up being 'unknown', and we probably don't want the file installed in that case.
Attachment #8419817 -
Flags: review?(mshal) → review?(dustin)
Comment 6•11 years ago
|
||
Comment on attachment 8419817 [details] [diff] [review]
Make boto retry getting instance metadata several times
Review of attachment 8419817 [details] [diff] [review]:
-----------------------------------------------------------------
A patch I just r-'d will bitrot this. Please re-request review when that one is at least updated. That said this is probably going to be good.
Attachment #8419817 -
Flags: review?(dustin)
Comment 7•11 years ago
|
||
mshal's comment is good and should be addressed in the next version, though (thanks Michael!)
Assignee | ||
Comment 8•11 years ago
|
||
Attachment #8421430 -
Flags: review?(bugspam.Callek)
Assignee | ||
Updated•11 years ago
|
Attachment #8419817 -
Attachment is obsolete: true
Comment 9•11 years ago
|
||
Comment on attachment 8421430 [details] [diff] [review]
Make boto retry getting instance metadata several times
Review of attachment 8421430 [details] [diff] [review]:
-----------------------------------------------------------------
This will have to land After Bug 1007976
::: modules/slave_secrets/manifests/ceph_config.pp
@@ +8,5 @@
>
> if ($ensure == 'present' and $config::install_ceph_cfg) {
> if ($config::node_location == 'in-house' and $slave_trustlevel == 'try') {
> $boto_content = template("$module_name/try_dot_boto.erb")
> + } elsif ($config::node_location == 'aws') {
based on earlier patch and convo, I wonder if we should limit this explicitly to "slave_trustlevel=='try'" I'll let you decide though, since there is no secret involved here anyway.
Attachment #8421430 -
Flags: review?(bugspam.Callek) → review+
Assignee | ||
Comment 10•11 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #9)
> based on earlier patch and convo, I wonder if we should limit this
> explicitly to "slave_trustlevel=='try'" I'll let you decide though, since
> there is no secret involved here anyway.
This config is actually needed on all aws nodes, not only try.
Assignee | ||
Comment 11•11 years ago
|
||
Assignee | ||
Comment 12•11 years ago
|
||
In production.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Target Milestone: --- → mozilla32
Updated•11 years ago
|
Target Milestone: mozilla32 → ---
Updated•7 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•