Closed Bug 846104 Opened 10 years ago Closed 6 years ago

Self-serve nightly wildcard matching is not strict enough (can't retrigger UX nightlies)

Categories

(Release Engineering :: General, defect)

All
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: philor, Unassigned)

References

Details

Attachments

(1 file, 2 obsolete files)

MattN triggered nightlies for the UX branch, like he always does, and it triggered every sort of nightly claiming that they were both on a UX revision and for another tree, and when he killed them, they uploaded the aborted log to the other trees, as though they really were mozilla-central and mozilla-esr and mozilla-b2g18 nightlies, e.g. ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013-02-27-13-14-44-mozilla-esr17 and ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013-02-27-13-14-44-mozilla-b2g18 and ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013-02-27-13-14-44-mozilla-central.

Given the uncertainty about whether that would have ended up moving users from those branches onto the ux branch, we really need to shut off the ability for anyone to trigger any nightlies, immediately.
Attached image Screenshot of UX TBPL
To be clear, I hadn't manually triggered UX nightlies for a few months so I don't know whether this is a new issue.

OS X and Windows appeared to be unaffected on TBPL.
fwiw looking at http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013-02-27-13-14-44-mozilla-b2g18/mozilla-b2g18-linux64-nightly-bm35-build1-build25.txt.gz it actually looks like it was failing to pull/clone due to the wrong revision (understandable given it was meant for UX) but does seem to have been infinitely retrying until the job was killed.
triaged after ping in irc.


(In reply to Matthew N. [:MattN] from comment #1)
> Created attachment 719263 [details]
> Screenshot of UX TBPL
> 
> To be clear, I hadn't manually triggered UX nightlies for a few months so I
> don't know whether this is a new issue.
Multiple months... hmm... wonder if there is something UX-branch specific broken? 

MattM: can you give explicit steps on what you did to trigger the nightlies, and what you did to cancel them?



> OS X and Windows appeared to be unaffected on TBPL.

The concern is that this could be a problem for tonight's nightlies. We believe last night's nightlies were ok. 

Did anything change in our infrastructure today which could be involved?
Component: Release Engineering → Release Engineering: Developer Tools
QA Contact: hwine
(In reply to John O'Duinn [:joduinn] from comment #3)
> (In reply to Matthew N. [:MattN] from comment #1)
> MattN: can you give explicit steps on what you did to trigger the nightlies,
> and what you did to cancel them?

1) Notice today's UX Nightly (on 4ec2b1c83ed0) has a bug  
2) Push a fix (https://hg.mozilla.org/projects/ux/rev/fc287e8c97a1 )
3) Waited a few minutes and then went to https://secure.pub.build.mozilla.org/buildapi/self-serve/ux/rev/fc287e8c97a1
3) Pasted fc287e8c97a1 in the "Create new nightly builds on ux revision" field and clicked Submit
4) Went back to work on other things for a while and then checked the UX TBPL tab
5) Noticed way more nightlies than I expected beside Fedora *.
6) Asked for advice in #releng at 14:46:47 PST. Philor replies at 14:56:09 PST
7) Clicked cancel on the pending and running non-UX builds at https://secure.pub.build.mozilla.org/buildapi/self-serve/ux/rev/fc287e8c97a1 

> > OS X and Windows appeared to be unaffected on TBPL.
> The concern is that this could be a problem for tonight's nightlies. We
> believe last night's nightlies were ok. 

The other concern is that this will happen again on manually triggered nightlies on any tree and that the self-serve UI for it should be removed until we know more.  This could be as simple as commenting out the HTML on the self-serve page.
Right now my hypothesis is that our wildcard matching in self-serve is making %ux%nightly match all Linux*nightly jobs.

Not sure whether http://hg.mozilla.org/build/buildapi/rev/45500847fd99 affected this or not.

If this is true, I think this is specific to the ux branch only.
I think this bug was introduced in http://hg.mozilla.org/build/buildapi/rev/0bb3e16ba8cb , only for the ux branch.
I think we have to allow for [ -_] on either side of the branch name, but I'm not sure how to do that in a sql query.  I know how in regexes.
(In reply to Aki Sasaki [:aki] from comment #5)
> Right now my hypothesis is that our wildcard matching in self-serve is
> making %ux%nightly match all Linux*nightly jobs.

That seems reasonable and aligns with the fact that I didn't notice extra non-Linux nightlies being built. I guess the priority of this can probably be lowered if someone updates the tree status on the UX repo to tell people not to trigger nightlies. I'll notify jwein and mconley directly about this too.
(In reply to Matthew N. [:MattN] from comment #8)
> I guess the priority of this can probably be lowered…

if that change only affects self-serve and not the scheduled nightlies.
Correct, it only affects self-serve.
Not sure if we can add treestatus to UX since it's not in the treestatus page.
(In reply to Aki Sasaki [:aki] from comment #11)
> N/m, added. 

Thanks. I sent my email too.
I think this also mean that if a Nightly is triggered on "Mozilla-B2g18", one will also be created on "Mozilla-B2g18-v1.0.1" because it is a substring of the former repo name?
Summary: Shut off self-serve triggering of nightlies until we know why it triggered nightlies on all branches for a ux nightly → Self-serve nightly wildcard matching is not strict enough
You'd think I'd remember that, since we've actually hit that very same overmatching before, but The Fear overtook me first.
Severity: blocker → normal
OS: All → Linux
Priority: P1 → --
Nope, we actually explicitly look for BRANCH_v and exclude that.

I propose we put spaces around the branch name for b2g builds and back out these two self-serve patches.
Changing base_name in mozilla/b2g_config.py to replace all underscores with spaces gave me this:

  File "/src/selfservespace/buildbotcustom/misc.py", line 1025, in generateBranchObjects
    'slavebuilddir': normalizeName('%s_dep' % pf['base_name']),
  File "/src/selfservespace/buildbotcustom/common.py", line 160, in normalizeName
    raise ValueError(msg)
ValueError: Cannot shorten "b2g mozilla-b2g18 unagi_eng_dep" to maximum length (30). Got to: b2g mozilla-b2g18 unagi_eng_dep

So easier said than done.
Product: mozilla.org → Release Engineering
Summary: Self-serve nightly wildcard matching is not strict enough → Self-serve nightly wildcard matching is not strict enough (can't retrigger UX nightlies)
I poked at this briefly, the following SQL snippets match all 49 of our nightly builders on mozilla-central (where we have b2g and desktop enabled):
   WHERE buildername LIKE '%\_mozilla-central\_%nightly'    # b2g
   WHERE buildername LIKE '% mozilla-central %nightly%'     # desktop + android

The b2g case has to handle the two forms
  b2g_<branch>_<platform> nightly             # desktop b2g
  b2g_<branch>_<platform>_nightly             # device b2g
NB: escaping of _ since that's a single char match with LIKE.

For rest, everything ends ' nightly' except the likes of'Android mozilla-central l10n nightly 1/5', which is the trailing %. XULRunner and ASAN builds push you towards 'mozilla-central % nightly', but builds like 'Android 2.2 mozilla-central nightly' only have a single space between branch and nightly.

I'm not going any further with this right now because of other priorities. To make this work self-serve-agent.py's _create_build_for_revision() may need to take a list of builder_expression instead of a single one.
I *think* this is what comment 17 was getting at, but I'm totally not at home in these waters. I just want to be able to run nightly builds when necessary... :-)
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Attachment #809482 - Attachment description: self-serve nightly wildcard matching should be stricter, f?nthomas → self-serve nightly wildcard matching should be stricter
Attachment #809482 - Flags: feedback?(nthomas)
Comment on attachment 809482 [details] [diff] [review]
self-serve nightly wildcard matching should be stricter

Overall looks good, but this should definitely get tested on a staging instance of buildapi by someone in RelEng.

>     def do_new_nightly_at_revision(self, message_data, message):
...
>-            '%' + branch + '%nightly',
>+            ['%\_' + branch + '\_%nightly', '% ' + branch + ' nightly%'],

I was expecting this line to end:   ... branch + ' %nightly%'],
ie a % prefixing nightly. Did you drop that for some reason ?

>             ['%' + branch + '_v%nightly', '%l10n nightly'])

I hadn't paid attention to this exclusion list when writing comment #17. The interaction of this with the inclusions is important because we only want to create en-US nightlies, and let them trigger the dependent l10n builds. In particular we need to make sure the right thing happens for the likes of 'Android 2.2 mozilla-central l10n nightly 1/5'.
Attachment #809482 - Flags: feedback?(nthomas) → feedback+
No, I just screwed up when copying from comment 17, good catch. Here's an updated patch, and I'll try to find someone in releng to test this.
Attachment #809482 - Attachment is obsolete: true
Comment on attachment 810497 [details] [diff] [review]
self-serve nightly wildcard matching should be stricter

Carrying over f+
Attachment #810497 - Attachment description: self-serve nightly wildcard matching should be stricter, f?nthomas → self-serve nightly wildcard matching should be stricter
Attachment #810497 - Flags: feedback+
Chris, word in #releng is that you may be able to help test this on buildapi staging?
Flags: needinfo?(coop)
Sorry, missed the NEEDINFO and then was out on PTO.

Sadly, we don't have an official buildapi staging instance to test this out, but the instructions for setting one up seem reasonably complete. Let me talk to people in releng and see if someone already has such an env setup and can validate your change.
Flags: needinfo?(coop)
(In reply to Chris Cooper [:coop] from comment #23)
> Sorry, missed the NEEDINFO and then was out on PTO.
> 
> Sadly, we don't have an official buildapi staging instance to test this out,
> but the instructions for setting one up seem reasonably complete. Let me
> talk to people in releng and see if someone already has such an env setup
> and can validate your change.

Ping?
Flags: needinfo?(coop)
(In reply to :Gijs Kruitbosch from comment #24) 
> Ping?

I don't know if you've had a peek at https://wiki.mozilla.org/ReleaseEngineering/BuildAPI/Setup_Local_Virtualenv_for_BuildAPI, but getting a staging env setup is not straightforward. I didn't get any takers when I polled releng last week, so I've been trying to get the env setup myself. I just made some db slices today that I should be able to use for testing, provided the rest of the env cooperates.

I also filed bug 937781 to get a proper staging environment setup for buildapi.
Depends on: 937781
Flags: needinfo?(coop)
(In reply to Chris Cooper [:coop] from comment #25)
> (In reply to :Gijs Kruitbosch from comment #24) 
> > Ping?
> 
> I don't know if you've had a peek at
> https://wiki.mozilla.org/ReleaseEngineering/BuildAPI/
> Setup_Local_Virtualenv_for_BuildAPI, but getting a staging env setup is not
> straightforward. I didn't get any takers when I polled releng last week, so
> I've been trying to get the env setup myself. I just made some db slices
> today that I should be able to use for testing, provided the rest of the env
> cooperates.

Oh, wow. That is crazy. Thank you for taking the time. Sorry if I'm naggy, but every time I've wanted to retrigger UX nightlies I go back and look at this bug and grumble. It's annoying not to be able to do anything about how/when these builds are created. :-(

> I also filed bug 937781 to get a proper staging environment setup for
> buildapi.

Awesome, I'll follow along there, too.
(In reply to :Gijs Kruitbosch from comment #26) 
> Oh, wow. That is crazy. Thank you for taking the time. Sorry if I'm naggy,
> but every time I've wanted to retrigger UX nightlies I go back and look at
> this bug and grumble. It's annoying not to be able to do anything about
> how/when these builds are created. :-(

No worries. You've found a real bug in the parsing and I'd love to get this fixed. It's proving harder than I imagined though.
Comment on attachment 810497 [details] [diff] [review]
self-serve nightly wildcard matching should be stricter

After getting rabbitmq setup locally, I can verify that the patch doesn't break buildapi, and does actually submit only new nightlies for the ux branch when used in that context.

I'd still like catlee to take a look since he has a broader understanding of buildapi and what else this patch might tickle.
Attachment #810497 - Flags: review?(catlee)
Attachment #810497 - Flags: feedback+
Attachment #810497 - Flags: review?(catlee) → review+
Comment on attachment 810497 [details] [diff] [review]
self-serve nightly wildcard matching should be stricter

Review of attachment 810497 [details] [diff] [review]:
-----------------------------------------------------------------

https://hg.mozilla.org/build/buildapi/rev/0b68a3d395b3
Attachment #810497 - Flags: checked-in+
Buildapi should pick up the code automatically within the next 30min.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Depends on: 975191
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 810497 [details] [diff] [review]
self-serve nightly wildcard matching should be stricter

Backed out in https://hg.mozilla.org/build/buildapi/rev/e7baa3e519f6 for bug 975191.
Attachment #810497 - Flags: checked-in+ → checked-in-
Attachment #810497 - Attachment is obsolete: true
Assignee: gijskruitbosch+bugs → nobody
Bug 1325371 may have made a difference here, but UX is dead now so it'll be hard to tell.
Status: REOPENED → RESOLVED
Closed: 9 years ago6 years ago
Resolution: --- → WORKSFORME
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.