Closed Bug 405816 Opened 17 years ago Closed 16 years ago

try server occasionally resubmits old patches

Categories

(Release Engineering :: General, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

The latest set of builds on the try server (#152 (linux), #132 (windows), and #137 (mac)) could not find the diff that was supposably uploaded. Not sure if this is a bug in the web interface, or if the patches are getting deleted too soon.
Assignee: nobody → bhearsum
Status: NEW → ASSIGNED
Priority: P3 → P2
So, it turns out that _something_ happens on build.mozilla.org around/at midnight that causes the timestamps on these patches to be wrong.

Here's a timestamp from a build before midnight: 1199345804 - Thu, 03 Jan 2008 07:36:44 GMT
Here's a timestamp from a build just after midnight: 1196597865 - Sun, 02 Dec 2007 12:17:45 GMT

It ends up that the cron jobs on build.mozilla.org are deleting the patches as "old" before the Buildbot master gets to download them.
I don't think this is caused by the system clock, here's a list of the 3 builds with weird timestamps:
(actual time of submission) (unix timestamp used) (timestamp in human readable format)
03:00:16 PST - 1196597865 - Sun, 02 Dec 2007 12:17:45 GMT
03:00:23 PST - 1196611045 - Sun, 02 Dec 2007 15:57:25 GMT
03:00:23 PST - 1196624466 - Sun, 02 Dec 2007 19:41:06 GMT

build.mozilla.org's cron.daily runs at 4:02am..so it looks like nothing there is the problem.

I wonder if it's something to do with multiple patches at once.
Can some from IT take a look at build.m.o (dm-wwwbuild01) and see if anything funny/interesting happened with the clock lately? Particularly around 3am this morning (Jan 3).
Assignee: bhearsum → server-ops
Status: ASSIGNED → NEW
Component: Build & Release → Server Operations
QA Contact: build → justin
VMWare clock sync is not set up correctly on this box.  I need to shut it down and restart it to fix it.  When's a good time?
Assignee: server-ops → justdave
Anytime should be fine, can you catch me on IRC before you take it down to confirm, though?
OK, this is done.  I'm assuming this will probably fix the problems, if not, feel free to reopen.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Moving try server bugs to the new component.
Component: Server Operations → Try Server
Product: mozilla.org → Webtools
QA Contact: justin → try-server
Seeing this again. A build was submitted around 3am last night but got this timestamp: 1200225154 (Sun, 13 Jan 2008 11:52:34 GMT). Interestingly, this is almost exactly one month early than it should be.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I've seen the VMWare clock sync wobble ocassionally, but usually only by 30 or 60 seconds and only when the machine is under high load.  I've never seen it be off by a month.

I just enabled the time-of-day server in xinetd on that box and set up nagios to watch it, just so we can make sure.  It'll page build if it's off by more than 60 seconds.

With it being off by almost exactly a month though, I've got to at least have a little suspicion of the program code that's generating the date...  is something setting a timestamp to match when it checked out or anything, and perhaps does some funny calculations?
I don't think it's something code related. The time is simply grabbed with 'time()' in Perl. If you're curious, the code is here: http://lxr.mozilla.org/mozilla/source/webtools/buildbot-try/sendchange.cgi.
It happened again last night, same bat time, same bat channel. Dave, can you have a look at sm-try-master, too? I wonder if it's something funny on it.
This happened again Tuesday at 3am. However, it didn't happen this morning at 3am.
OK, sm-try-master is a VM, it does have vmware tools installed on it, and the kernel command line is configured to do the host clock sync.  The host config is also set to supply the clock sync to the client OS.  So looks like that's a dead-end.

I have no further ideas at the moment.
This has happened pretty much every night for the past 2 weeks. I'm going to try to do some more investigation today. Is it possible for me to get access to the system logs on build.mozilla.org?
Assignee: justdave → bhearsum
Status: REOPENED → NEW
Status: NEW → ASSIGNED
I've got a new hypothesis about this:
Both sm-try-master and build.mozilla.org have cronjobs that cleanup old patches after 30 days. I wonder if the job on sm-try-master runs slightly before the one on build.m.o and then it re-downloads the patches before build.m.o deletes them. I don't think the patches are vital to keep around (certainly not on both machines), so I'm going to get build.m.o to delete them after 29 days and see if that helps.
Summary: try server occasionally cannot find an uploaded diff → try server occasionally resubmits old patches
Happened again.

Awhile back I changed the way the tryserver identifies already submitted patches, I think there may be some fallout from this. I'm going to schedule some try server downtime and manually clean out a bunch of old patches from the try server master. I'm also going to change build.m.o to clean up old patches (not builds) after 25 days, to put it far ahead of the master.
This did not happen last night, I bet bumping build.m.o to 25 day cleanup fixed it, but I'm still going to do a manual cleanup of patches just in case.
There was several hundred old patches dating as far back as the 5th of January. I've cleaned them all out and with the adjusted cronjob on build.m.o I don't expect this to be a problem again.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago16 years ago
Resolution: --- → FIXED
Component: Try Server → Release Engineering
Product: Webtools → mozilla.org
QA Contact: try-server → release
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.