Closed Bug 449607 Opened 16 years ago Closed 13 years ago

change dated dirs on ftp.m.o to use new longer BuildID

Categories

(Release Engineering :: General, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Assigned: coop)

References

Details

(Whiteboard: [automation][ftp])

Attachments

(3 files, 2 obsolete files)

Currently, we use the shorter YYYYMMDDHH BuildID when creating dated dirs for nightly builds, ie:

ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2008-07-07-02-mozilla-central

Now that we have a longer YYYYMMDDHHMMSS BuildID, we should use all of that in the dated dir name, ie:

ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2008-07-07-02-30-59-mozilla-central

The reason for the longer BuildID, and more accurate resolution in the name of the dated dir is to avoid collisions. 

We hit a problem earlier this summer where two nightly/clobber builds happened within an hour, and hence were both placed into the same dated directory. This caused problems for people who downloaded one and then got the updates from the other. 

We recovered from that at the time by generating custom updates to un-jam the affected users, and having some users do a fresh install. However, the problem would have been completely avoided in the first place if we used more precise BuildID in the name of the dated dir.
Assignee: nobody → joduinn
Priority: -- → P3
This was originally posted in bug#431270, but then moved out to here as something that could be landed after the new improved BuildID changes were landed. This also needed explicit notifications on newsgroups, separate to the BuildID notifications.
Attachment #332757 - Flags: review?(bhearsum)
Comment on attachment 332757 [details] [diff] [review]
make transfer.py use longer BuildID as dirname on ftp.m.o

Please set review on this after it has been discussed (preferably in the newsgroups) and decided upon.
Attachment #332757 - Flags: review?(bhearsum)
While I still think this is something we should do, I've not had time to
look at this in months. Pushing to Future until I (or anyone else) gets time
for this.
Assignee: joduinn → nobody
Component: Release Engineering → Release Engineering: Future
Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Not sure if this is similar to, or directly conflicts with bug 538540 -- probably depends on how you read them.
Whiteboard: [automation]
Whiteboard: [automation] → [automation][ftp]
Depends on: 431905
Found in triage.
Assignee: nobody → joduinn
Priority: P3 → P5
I vote for WONTFIX based on comments in bug 538540, and bug 584178.
I'm untangling these 4 overlapping bugs:

https://bugzilla.mozilla.org/show_bug.cgi?id=449607
https://bugzilla.mozilla.org/show_bug.cgi?id=487036
https://bugzilla.mozilla.org/show_bug.cgi?id=538540
https://bugzilla.mozilla.org/show_bug.cgi?id=584178


After some study, I believe these are about two interwoven but orthogonal issues:

1) improve regression hunting
=============================
There are a few ways to do this:
a) beltzner's tagging proposal. 
b) to look on ftp, at the txt file containing changeset which RelEng already post in a txt file alongside the build. 
c) to create a website reporting from buildbot db and present that in a public UI.

2) organize builds on ftp.m.o so that each build is in a unique and consistent location
=======================================================================================
This means build systems can programmaticly calculate locations of builds and their log files, which is needed to remove a dependency on TinderboxServer. This needs to handle rebuilds of the same changeset (like for respins of nightlies). Using changesets in the dirname is unique, but does not sort chronologically, and does not handle respins. However, using the full BuildID does handle both these usage cases.



I propose the following:

* morph https://bugzilla.mozilla.org/show_bug.cgi?id=487036 into "write tool to read buildbot db for BuildID+changesets of nightlies, and then construct URL to feed to hg pushlog". For example, buildbot status db contains a list of changesets, and buildids. RelEng would write a dashboard which queries the db and generate a URL to send to hgpushlog. For example, this URL shows all changes between Tues nightly (early morning of 14dec) and Wed nightly (early morning of 15dec): http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=66036625795f&tochange=f11f7ed625ba

* FIX https://bugzilla.mozilla.org/show_bug.cgi?id=449607 "change dated dirs on ftp.m.o to use new longer BuildID", using the full BuildID as outlined above.

* WONTFIX https://bugzilla.mozilla.org/show_bug.cgi?id=538540 "stop putting hour number in nightly directories" because of bug#449607

* WONTFIX https://bugzilla.mozilla.org/show_bug.cgi?id=584178 "list hourly tinderbox builds by changeset on ftp.mozilla.org" because of bug#449607


If I've missed anything in all this detangling, please let me know.
See Also: → 487036, 538540, 584178
(In reply to comment #8)
...
> * FIX https://bugzilla.mozilla.org/show_bug.cgi?id=449607 "change dated dirs on
> ftp.m.o to use new longer BuildID", using the full BuildID as outlined above.

Refreshed patch coming.
Attachment #501761 - Attachment is patch: true
The solution in comment 8 is fine with me. John and I talked via Skype, and I set out the core priorities which were to:

 - make it possible to map from nightly range to changes within that range
 - make it possible to enter a date range and get a list of nightlies (and links to the sets of changes in that nightly)

For the purposes of testing.

Added bonus marks for having some indication in the normal HTMLPushLog display that indicates "2010 01 10 NIGHTLY GENERATED HERE" so that appears in the normal flow of things, too.

I guess I don't care how you do it all, only that you do.
The war-on-orange database scrapes buildbot logs from the FTP site.  This change will require changes to that scraper, and will require that they go live at about the same time, in order to avoid losing any data.  For this reason, can we get 24-48 hours advance notice before this change goes live?
(In reply to comment #12)
> The war-on-orange database scrapes buildbot logs from the FTP site.  This
> change will require changes to that scraper, and will require that they go live
> at about the same time, in order to avoid losing any data.  For this reason,
> can we get 24-48 hours advance notice before this change goes live?

Yes, of course. We're still at the planning stage right now, but follow this bug to stay in the loop.
Havent got to this in ages - flagging for triage after talking with coop.
Whiteboard: [automation][ftp] → [automation][ftp][triagefollowup]
Per discussion with joduinn, I'm taking this bug, but I still want to talk about it in triage next week to figure out a plan.
Assignee: joduinn → coop
Whiteboard: [automation][ftp][triagefollowup] → [automation][ftp]
I like to make this change next Thursday, June 9th unless someone raises an objection.

jgriffin: does that give you enough lead time to change your scripts?
(In reply to comment #17)
> I like to make this change next Thursday, June 9th unless someone raises an
> objection.

I'm going to hold off on this until next week to avoid colliding with b5. New target will be Tuesday, June 14th.
(In reply to comment #18) 
> I'm going to hold off on this until next week to avoid colliding with b5.
> New target will be Tuesday, June 14th.

...and of course now we're going to beta for 3.6.18 tomorrow. We'll play it by ear, but this may get pushed out until Thursday.
talked w/coop earlier today - a downtime would be safest just in case anything goes not-to-plan.
Flags: needs-treeclosure?
Downtime is scheduled for Wednesday this week after the 5.0 release. Exact time TBD.
Comment on attachment 501761 [details] [diff] [review]
refreshed, as previous version was > 2 years old, when we were still in cvs!

a) this patch has no review
b) i don't think we use MozillaStageUpload anymore
c) we probably need a patch to change LONG_DATED_URL_PATH in post_upload.py, and that patch needs review.
d) this bug is not ready for downtime.
(In reply to comment #22)
> c) we probably need a patch to change LONG_DATED_URL_PATH in post_upload.py,
> and that patch needs review.

*LONG_DATED_URL_PATH and LONG_DATED_DIR
Removing the needs-downtime flag as this bug does not appear to be ready for deployment early wednesday morning.  Please renominate this bug when there is a reviewed patch.

Also, could a patch for this bug landed during a non-tree closing maintenance window.  I ask because, as far as I can tell, it only touches nightly directories.
Flags: needs-treeclosure?
(In reply to comment #24)
> Removing the needs-downtime flag as this bug does not appear to be ready for
> deployment early wednesday morning.  Please renominate this bug when there
> is a reviewed patch.

Yeah, doubtful. I thought this bug was ready to go when I inherited it. If Aki's list in comment #22 is right (or even close to right), there's a lot more investigation that needs to happen before I can try to deploy this.
 
> Also, could a patch for this bug landed during a non-tree closing
> maintenance window.  I ask because, as far as I can tell, it only touches
> nightly directories.

Probably not. There are other groups who rely on the current directory structure that need a downtime window to update their scripts.
This seems to get the job done. I'm testing this in staging right now along with a patch for bug 517428.
Attachment #501761 - Attachment is obsolete: true
Attachment #540669 - Flags: review?(aki)
Comment on attachment 540669 [details] [diff] [review]
Add minutes and seconds to LONG_DATED_DIR and SHORT_DATED_DIR

I think you also need to update LONG_DATED_URL_PATH, for the property to be set correctly for sendchanges and emails (not sure exactly what uses this).

Bonus points for also updating the commented out LONG_DATED_URL_PATH for staging-stage.

r=me with those updated.

(Line 19 and 27 need updating.)
Attachment #540669 - Flags: review?(aki) → review+
Flags: needs-treeclosure?
Comment on attachment 540669 [details] [diff] [review]
Add minutes and seconds to LONG_DATED_DIR and SHORT_DATED_DIR

(In reply to comment #27) 
> I think you also need to update LONG_DATED_URL_PATH, for the property to be
> set correctly for sendchanges and emails (not sure exactly what uses this).
> 
> Bonus points for also updating the commented out LONG_DATED_URL_PATH for
> staging-stage.

Done.

http://hg.mozilla.org/build/tools/rev/a0a4efc1bf7d

Waiting for downtime to deploy the change to stage.mozilla.org.
Attachment #540669 - Flags: checked-in+
Lets do this during today's downtime
Flags: needs-treeclosure? → needs-treeclosure+
tools checkout is updated on stage. Removing tree-closure flag.
Flags: needs-treeclosure+
Blocks: 666268
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Does this bug now require users to download/install nightlies because installations prior to this changeover cannot "handle" this gracefully? Thanks.
(In reply to comment #31)
> Does this bug now require users to download/install nightlies because
> installations prior to this changeover cannot "handle" this gracefully?
> Thanks.

I mean just for one time until our installations have the necessary changes in place on our systems.
Updates are failing on the new path. Investigating.
Blocks: 666581
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Also adds fix for MozillaStageUpload, JIC.
Attachment #541379 - Flags: review?(armenzg)
(In reply to comment #32) 
> I mean just for one time until our installations have the necessary changes
> in place on our systems.

It shouldn't. The code that generates the update snippets needs updating (attachment 541379 [details] [diff] [review]), after which updates should just work.

I'll be updating the current batch of broken snippets as well.
Attachment #541379 - Flags: review?(armenzg) → review+
FYI-I was able to auto-update my Aurora to the 6/23/11 nightly build.
Reconfig is still chugging along on the test masters, but the build masters all have the updated snippet code now.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
(In reply to comment #37)
> Reconfig is still chugging along on the test masters, but the build masters
> all have the updated snippet code now.

Thanks for your hard work. I was able to auto-update my Nightly nightly on another system of mine so that seems to be OK, too.

(The only thing I am hearing is that the Win64 Nightlies are not auto-updating, but I don't know how they fit in with this.)
> (The only thing I am hearing is that the Win64 Nightlies are not
> auto-updating, but I don't know how they fit in with this.)

Correct. The problem with win64 is that we don't have stable build platforms to for it yet.
Of course Android uses a different script for snippet generation. Patch incoming.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Luckily it's a tools patch so I won't have to reconfig the world for this one.
Attachment #541678 - Flags: review?(armenzg)
Comment on attachment 541678 [details] [diff] [review]
Add minutes and seconds to dated dir for Android snippets.

*stamp*
Attachment #541678 - Flags: review?(armenzg) → review+
Comment on attachment 541678 [details] [diff] [review]
Add minutes and seconds to dated dir for Android snippets.

http://hg.mozilla.org/build/tools/rev/ab3da3255cff
Attachment #541678 - Flags: checked-in+
(In reply to comment #39)
> Correct. The problem with win64 is that we don't have stable build platforms
> to for it yet.

Nighly here is still stuck at Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:7.0a1) Gecko/20110621 Firefox/7.0a1 ID:20110621030205

Should I just forget about this being fixed for win64 and download a new Nightly?
(In reply to comment #44)
> (In reply to comment #39)
> > Correct. The problem with win64 is that we don't have stable build platforms
> > to for it yet.
> 
> Nighly here is still stuck at Mozilla/5.0 (Windows NT 6.1; Win64; x64;
> rv:7.0a1) Gecko/20110621 Firefox/7.0a1 ID:20110621030205
> 
> Should I just forget about this being fixed for win64 and download a new
> Nightly?

Yes.
Blocks: 666945
(In reply to comment #44)
> (In reply to comment #39)
> > Correct. The problem with win64 is that we don't have stable build platforms
> > to for it yet.
> 
> Nighly here is still stuck at Mozilla/5.0 (Windows NT 6.1; Win64; x64;
> rv:7.0a1) Gecko/20110621 Firefox/7.0a1 ID:20110621030205
> 
> Should I just forget about this being fixed for win64 and download a new
> Nightly?

tl;dr: you should be able to update now.

These nightlies are being run on a staging buildbot master, so they were missed in the initial pass on reconfigs. 

I've updated the snippets and the master is being reconfig-ed to pick up the change.
(In reply to comment #46)
> I've updated the snippets and the master is being reconfig-ed to pick up the
> change.

Thanks! Nightly just updated to Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:7.0a1) Gecko/20110623 Firefox/7.0a1 ID:20110623030205
I think we got all the stragglers. Didn't need to update any snippets by hand/script this morning.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Blocks: 668726
I've filed bug#671450 and bug#671455 for followup work.
Blocks: 671450, 671455
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: