Closed
Bug 588229
Opened 15 years ago
Closed 14 years ago
repo_setup fails out on non-existent staging repos
Categories
(Release Engineering :: General, enhancement, P5)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mozilla, Assigned: rail)
References
Details
(Whiteboard: [automation][releases])
Attachments
(2 files, 1 obsolete file)
5.28 KB,
patch
|
mozilla
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
3.33 KB,
patch
|
mozilla
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
The delete_repo step in the repo_setup factory fails (but doesn't halt) if the repo doesn't exist.
This means a first run staging release can often fail.
It would make it less annoying to run staging releases if it was smart about this (checked the url, maybe? http://hg.mozilla.org/users/stage-ffxbld/foobarbaz gives both "Not found: foobarbaz" and "The specified repository "foobarbaz" is unknown, sorry." in its response page).
Reporter | ||
Comment 1•15 years ago
|
||
More concerning: a bad configuration didn't cause a failure?
bash -c ssh -l stage-ffxbld -i ~cltbld/.ssh/ffxbld_dsa hg.mozilla.org clone mobile-browser releases/mobile-browser
I think what happened is it didn't delete the previous clone of mobile-browser due to the command line
bash -c wget -O- http://hg.mozilla.org/releases/mobile-browser >/dev/null && ssh -l stage-ffxbld -i ~cltbld/.ssh/ffxbld_dsa hg.mozilla.org edit mobile-browser delete YES
The first wget failed since there is no releases/mobile-browser, so it didn't proceed with the delete. Then I think the clone failed silently.
To fix comment 0, the command line could add a 2nd wget -O- :
bash -c wget -O- http://hg.mozilla.org/mobile-browser >/dev/null && wget -O- http://hg.mozilla.org/users/stage-ffxbld/mobile-browser && ssh -l stage-ffxbld -i ~cltbld/.ssh/ffxbld_dsa hg.mozilla.org edit mobile-browser delete YES
To fix this comment, a) I can be less stupid going forward, but b) gotta think about it some more.
Comment 2•15 years ago
|
||
i think i hit this during a staging run for firefox 3.6.9 build 1 for a bunch of locales.
bash -c wget -O- http://hg.mozilla.org/releases/l10n-mozilla-1.9.2/id >/dev/null && ssh -l stage-ffxbld -i ~cltbld/.ssh/ffxbld_dsa hg.mozilla.org edit id delete YES
<snip env>
--22:52:43-- http://hg.mozilla.org/releases/l10n-mozilla-1.9.2/id
Resolving hg.mozilla.org... 10.2.74.67
Connecting to hg.mozilla.org|10.2.74.67|:80... connected.
HTTP request sent, awaiting response... 200 Script output follows
Length: 26397 (26K) [text/html]
Saving to: `STDOUT'
0% [ ] 0 --.-K/s
100%[=======================================>] 26,397 --.-K/s in 0.02s
22:52:43 (1.65 MB/s) - `-' saved [26397/26397]
Could not find the repository at /users/stage-ffxbld/id.
Please check the list at https://hg.mozilla.org/users/stage-ffxbld
Reporter | ||
Updated•14 years ago
|
Whiteboard: [automation][releases]
Comment 4•14 years ago
|
||
I hit this on my staging release.
One of the delete_repo steps failed but the job did not go red.
I assumed that a green repo_setup job would trigger the tagging builder.
http://hg.mozilla.org/build/buildbotcustom/file/tip/process/factory.py#l3378
The code says that haltOnFailure and flunkOnFailure do not have any value different than the default values which seems to be False.
http://hg.mozilla.org/build/buildbot/file/5a049fbe224b/master/buildbot/process/buildstep.py#l576
> haltOnFailure = False
> flunkOnWarnings = False
> flunkOnFailure = False
Are these default values correct?
Shouldn't haltOnFailure be True?
It seems that the value is set to False since the import of 0.8.1:
http://hg.mozilla.org/build/buildbot/rev/42babfd9ed35#l301.579
Doesn't this mean that a step (without haltOnFailure changed to True) it would NOT change the state of the job and NOT abort the job?
Comment 5•14 years ago
|
||
wow it seems that it has been False by default even in 0.7.12
https://github.com/buildbot/buildbot/blob/buildbot-0.7.12/buildbot/process/buildstep.py#L575
I thought all my Mozilla life that if a step fails it turns the job red and aborts (by default) .
I just noticed that we set haltOnFailure to True in sooooo many places.
Shame on me!
It seems that we have to add haltOnFailure after all.
Sorry for the noise.
Comment 6•14 years ago
|
||
(In reply to comment #4)
> http://hg.mozilla.org/build/buildbot/file/5a049fbe224b/master/buildbot/process/buildstep.py#l576
> > haltOnFailure = False
> > flunkOnWarnings = False
> > flunkOnFailure = False
>
> Are these default values correct?
> Shouldn't haltOnFailure be True?
I assume you're talking about RepositorySetupFactory's steps, not the default.
In any case, we do *not* want haltOnFailure for the deletions, because they
will "fail" if the repository doesn't exist at all. In that situation, we
should be proceeding.
I don't think there's much we can do to improve the situation until bug 626641
is fixed, because we can't accurately judge existence of a repository at this
point.
Comment 7•14 years ago
|
||
I am sorry I went off on a tangent. It makes sense what you say.
We don't want to stop because the deletion failed; the problem is that the job did not trigger the tag builder.
Updated•14 years ago
|
Assignee: nobody → catlee
Updated•14 years ago
|
Assignee: catlee → rail
Assignee | ||
Comment 8•14 years ago
|
||
* Add another wget against the users repo (requires another releaseConfig variable :( )
Attachment #528563 -
Flags: review?(aki)
Assignee | ||
Comment 9•14 years ago
|
||
Attachment #528564 -
Flags: review?(aki)
Assignee | ||
Comment 10•14 years ago
|
||
Staging tests have been passed.
Reporter | ||
Comment 11•14 years ago
|
||
Comment on attachment 528563 [details] [diff] [review]
buildbotcustom
Now that we have a check to make sure the user repo exists before trying to delete it, should we haltOnFailure=True ?
Attachment #528563 -
Flags: review?(aki) → review+
Reporter | ||
Comment 12•14 years ago
|
||
Comment on attachment 528564 [details] [diff] [review]
configs
Thanks for fixing this, Rail!
Attachment #528564 -
Flags: review?(aki) → review+
Assignee | ||
Comment 13•14 years ago
|
||
(In reply to comment #11)
> Now that we have a check to make sure the user repo exists before trying to
> delete it, should we haltOnFailure=True ?
Yeah. Interdiff is just 1 line.
Attachment #528563 -
Attachment is obsolete: true
Attachment #528648 -
Flags: review?(aki)
Reporter | ||
Updated•14 years ago
|
Attachment #528648 -
Flags: review?(aki) → review+
Assignee | ||
Comment 14•14 years ago
|
||
Comment on attachment 528564 [details] [diff] [review]
configs
http://hg.mozilla.org/build/buildbot-configs/rev/7e96916bb262
Attachment #528564 -
Flags: checked-in+
Assignee | ||
Comment 15•14 years ago
|
||
Comment on attachment 528648 [details] [diff] [review]
buildbotcustom
http://hg.mozilla.org/build/buildbotcustom/rev/304da956f42b
Attachment #528648 -
Flags: checked-in+
Assignee | ||
Comment 16•14 years ago
|
||
All done here. Closing.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•