Closed Bug 588229 Opened 14 years ago Closed 13 years ago

repo_setup fails out on non-existent staging repos

Categories

(Release Engineering :: General, enhancement, P5)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozilla, Assigned: rail)

References

Details

(Whiteboard: [automation][releases])

Attachments

(2 files, 1 obsolete file)

The delete_repo step in the repo_setup factory fails (but doesn't halt) if the repo doesn't exist.

This means a first run staging release can often fail.

It would make it less annoying to run staging releases if it was smart about this (checked the url, maybe? http://hg.mozilla.org/users/stage-ffxbld/foobarbaz gives both "Not found: foobarbaz" and "The specified repository "foobarbaz" is unknown, sorry." in its response page).
More concerning: a bad configuration didn't cause a failure?

bash -c ssh -l stage-ffxbld -i ~cltbld/.ssh/ffxbld_dsa hg.mozilla.org clone mobile-browser releases/mobile-browser

I think what happened is it didn't delete the previous clone of mobile-browser due to the command line

bash -c wget -O- http://hg.mozilla.org/releases/mobile-browser >/dev/null && ssh -l stage-ffxbld -i ~cltbld/.ssh/ffxbld_dsa hg.mozilla.org edit mobile-browser delete YES

The first wget failed since there is no releases/mobile-browser, so it didn't proceed with the delete.  Then I think the clone failed silently.

To fix comment 0, the command line could add a 2nd wget -O- :

bash -c wget -O- http://hg.mozilla.org/mobile-browser >/dev/null && wget -O- http://hg.mozilla.org/users/stage-ffxbld/mobile-browser && ssh -l stage-ffxbld -i ~cltbld/.ssh/ffxbld_dsa hg.mozilla.org edit mobile-browser delete YES

To fix this comment, a) I can be less stupid going forward, but b) gotta think about it some more.
i think i hit this during a staging run for firefox 3.6.9 build 1 for a bunch of locales.


bash -c wget -O- http://hg.mozilla.org/releases/l10n-mozilla-1.9.2/id >/dev/null && ssh -l stage-ffxbld -i ~cltbld/.ssh/ffxbld_dsa hg.mozilla.org edit id delete YES
<snip env>
--22:52:43--  http://hg.mozilla.org/releases/l10n-mozilla-1.9.2/id
Resolving hg.mozilla.org... 10.2.74.67
Connecting to hg.mozilla.org|10.2.74.67|:80... connected.
HTTP request sent, awaiting response... 200 Script output follows
Length: 26397 (26K) [text/html]
Saving to: `STDOUT'


 0% [                                        ] 0           --.-K/s             
100%[=======================================>] 26,397      --.-K/s   in 0.02s  

22:52:43 (1.65 MB/s) - `-' saved [26397/26397]

Could not find the repository at /users/stage-ffxbld/id.
Please check the list at https://hg.mozilla.org/users/stage-ffxbld
Whiteboard: [automation][releases]
Depends on: 626641
I hit this on my staging release.

One of the delete_repo steps failed but the job did not go red.
I assumed that a green repo_setup job would trigger the tagging builder.

http://hg.mozilla.org/build/buildbotcustom/file/tip/process/factory.py#l3378
The code says that haltOnFailure and flunkOnFailure do not have any value different than the default values which seems to be False.

http://hg.mozilla.org/build/buildbot/file/5a049fbe224b/master/buildbot/process/buildstep.py#l576
> haltOnFailure = False
> flunkOnWarnings = False
> flunkOnFailure = False

Are these default values correct?
Shouldn't haltOnFailure be True?

It seems that the value is set to False since the import of 0.8.1:
http://hg.mozilla.org/build/buildbot/rev/42babfd9ed35#l301.579

Doesn't this mean that a step (without haltOnFailure changed to True) it would NOT change the state of the job and NOT abort the job?
wow it seems that it has been False by default even in 0.7.12
https://github.com/buildbot/buildbot/blob/buildbot-0.7.12/buildbot/process/buildstep.py#L575

I thought all my Mozilla life that if a step fails it turns the job red and aborts (by default) .

I just noticed that we set haltOnFailure to True in sooooo many places.
Shame on me!

It seems that we have to add haltOnFailure after all.
Sorry for the noise.
Blocks: 563942
(In reply to comment #4)
> http://hg.mozilla.org/build/buildbot/file/5a049fbe224b/master/buildbot/process/buildstep.py#l576
> > haltOnFailure = False
> > flunkOnWarnings = False
> > flunkOnFailure = False
> 
> Are these default values correct?
> Shouldn't haltOnFailure be True?

I assume you're talking about RepositorySetupFactory's steps, not the default.

In any case, we do *not* want haltOnFailure for the deletions, because they
will "fail" if the repository doesn't exist at all. In that situation, we
should be proceeding.

I don't think there's much we can do to improve the situation until bug 626641
is fixed, because we can't accurately judge existence of a repository at this
point.
I am sorry I went off on a tangent. It makes sense what you say.
We don't want to stop because the deletion failed; the problem is that the job did not trigger the tag builder.
Blocks: 627307
Assignee: nobody → catlee
Assignee: catlee → rail
Attached patch buildbotcustom (obsolete) — Splinter Review
* Add another wget against the users repo (requires another releaseConfig variable :( )
Attachment #528563 - Flags: review?(aki)
Attached patch configsSplinter Review
Attachment #528564 - Flags: review?(aki)
Staging tests have been passed.
Comment on attachment 528563 [details] [diff] [review]
buildbotcustom

Now that we have a check to make sure the user repo exists before trying to delete it, should we haltOnFailure=True ?
Attachment #528563 - Flags: review?(aki) → review+
Comment on attachment 528564 [details] [diff] [review]
configs

Thanks for fixing this, Rail!
Attachment #528564 - Flags: review?(aki) → review+
Attached patch buildbotcustomSplinter Review
(In reply to comment #11) 
> Now that we have a check to make sure the user repo exists before trying to
> delete it, should we haltOnFailure=True ?

Yeah. Interdiff is just 1 line.
Attachment #528563 - Attachment is obsolete: true
Attachment #528648 - Flags: review?(aki)
Attachment #528648 - Flags: review?(aki) → review+
All done here. Closing.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: