Closed
Bug 758114
Opened 12 years ago
Closed 12 years ago
tbpl-dev site not updating
Categories
(Developer Services :: General, task)
Developer Services
General
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: philor, Assigned: fox2mike)
References
Details
bug 756723 - it was stuck from April 27th to 23:00 May 20th. bug 757875 - got stuck again 12 hours later, unstuck and "ran one by hand" 10:00ish May 23rd this bug - https://tbpl-dev.allizom.org/cache/revision_info.txt says it last ran 10:08 May 23rd, so that one by hand was apparently the only one that managed to run (and with the day hg.m.o had today, it may well have hit hg bustage on the very next automatic run) Is it running with a really old version of hg, that doesn't manage to recover pulling from a dead or busted hg.m.o? I pull a lot over a terrible connection during times when hg.m.o is busted (because someone said it was, and I'm pulling to see if they're right), and I've never had a pull hang for 12 hours, much less 23 days.
Assignee | ||
Comment 1•12 years ago
|
||
I'll look into this and make sure it's fixed for good. No idea why it hangs so often :)
Assignee: server-ops-devservices → shyam
Assignee | ||
Comment 2•12 years ago
|
||
so... May 23 22:00:01 genericadm CROND[1187]: (root) CMD (/usr/bin/flock -w 10 /var/lock/tbpl-dev-update /data/genericrhel6-dev/src/tbpl-dev.allizom.org/update > /dev/null 2>&1) May 23 22:15:01 genericadm CROND[4065]: (root) CMD (/usr/bin/flock -w 10 /var/lock/tbpl-dev-update /data/genericrhel6-dev/src/tbpl-dev.allizom.org/update > /dev/null 2>&1) May 23 22:30:01 genericadm CROND[5139]: (root) CMD (/usr/bin/flock -w 10 /var/lock/tbpl-dev-update /data/genericrhel6-dev/src/tbpl-dev.allizom.org/update > /dev/null 2>&1) May 23 22:45:02 genericadm CROND[8144]: (root) CMD (/usr/bin/flock -w 10 /var/lock/tbpl-dev-update /data/genericrhel6-dev/src/tbpl-dev.allizom.org/update > /dev/null 2>&1) May 23 23:00:02 genericadm CROND[9260]: (root) CMD (/usr/bin/flock -w 10 /var/lock/tbpl-dev-update /data/genericrhel6-dev/src/tbpl-dev.allizom.org/update > /dev/null 2>&1) It's been running fine... The date only gets printed if there's an update to tbpl. Is there an update after : + hg tip changeset: 788:e5d599c52396 tag: tip user: Phil Ringnalda <philringnalda@gmail.com> date: Wed May 16 22:18:41 2012 -0700 summary: Bug 756724 - Drop Firefox3.6 tree ? If not, then nothing to worry, it's running fine. [shyam@genericadm.private.phx1 ~]$ hg --version Mercurial Distributed SCM (version 2.2) That's also pretty up to date. So I'm not sure if there's a problem here.
Reporter | ||
Comment 3•12 years ago
|
||
There has been, http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/rev/276c25fd49e8 was pushed at 10:11, and it's not running on tbpl-dev (though I forgot to check that before). And while my memory does frequently lie to me, I would have sworn that the last time I looked while things were working, the second line of https://tbpl-dev.allizom.org/cache/revision_info.txt showed a new time after each run, not just after each pull that got something.
Comment 4•12 years ago
|
||
I encountered this issue first-hand while completing bug 730667. From a purely empirical perspective, it's clear that dev updates are _not_ successfully getting pushed out every 15 minutes as expected. There are logs which are generated on the admin server every five minutes, but it takes totally random amounts of time for them to show up as web-facing...
Reporter | ||
Comment 5•12 years ago
|
||
Pushed http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/rev/9dda434d1e40 so we could get back to where even with something to pull, it's not updating.
Comment 6•12 years ago
|
||
The cron in comment #2 should exit with code 1 if there is a hung process holding the lock. Is there cron mail to confirm that ?
Comment 7•12 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #6) > The cron in comment #2 should exit with code 1 if there is a hung process > holding the lock. Is there cron mail to confirm that ? The very first non-commented line in the cron file for tbpl-dev is as follows : MAILTO="cron-tbpl@mozilla.com" That said, the output from the "tbpl-dev-update" job is redirected to /dev/null (include stderr), which would explain why that command doesn't result in an email. I can change that for you, if you want - let me know.
Comment 8•12 years ago
|
||
Who will get the cron-tbpl@ emails?
Assignee | ||
Comment 9•12 years ago
|
||
(In reply to Ed Morley [:edmorley] from comment #8) > Who will get the cron-tbpl@ emails? Seems like right now only laura and rhelmer get them. Who all would like to be added?
Comment 10•12 years ago
|
||
(In reply to Daniel Maher [:phrawzty] from comment #7) > (In reply to Nick Thomas [:nthomas] from comment #6) > > The cron in comment #2 should exit with code 1 if there is a hung process > > holding the lock. Is there cron mail to confirm that ? > > The very first non-commented line in the cron file for tbpl-dev is as > follows : > > MAILTO="cron-tbpl@mozilla.com" > > That said, the output from the "tbpl-dev-update" job is redirected to > /dev/null (include stderr), which would explain why that command doesn't > result in an email. I can change that for you, if you want - let me know. Yes we should do this - ideally the script shouldn't output anything on success, only on failure so we can see what's going on.
Comment 11•12 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #10) > Yes we should do this - ideally the script shouldn't output anything on > success, only on failure so we can see what's going on. I have adjusted the cron job such that stdout continues to be surpressed, but stderr will output as normal.
Reporter | ||
Comment 12•12 years ago
|
||
Not sure what to make of the deploy schedule as reflected by the timestamps in https://tbpl-dev.allizom.org/cache/, but at least it is unstuck, pulling and deploying.
Assignee | ||
Comment 13•12 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #12) > Not sure what to make of the deploy schedule as reflected by the timestamps > in https://tbpl-dev.allizom.org/cache/, but at least it is unstuck, pulling > and deploying. Can't see the URL coz it's behind auth. Feel free to close this out if it's fixed :)
Comment 14•12 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #12) > Not sure what to make of the deploy schedule as reflected by the timestamps > in https://tbpl-dev.allizom.org/cache/, but at least it is unstuck, pulling > and deploying. We actually discussed those timestamps on IRC last week, documented in bug 730677 : https://bugzilla.mozilla.org/show_bug.cgi?id=730677#c6 The epoch stamp in the filename is the source of truth, not the mtime as reported by Apache - as you can see, the epoch stamps are occurring on a regular, predictable basis now, which would seem to indicate that the updates are no longer hanging.
Reporter | ||
Comment 15•12 years ago
|
||
The stamp in the filename is one source of truth, truth about when import-buildbot-data.py ran. Apache's mtime is a, or at least my only available, source of truth for when changes got synced to whatever webserver I'm hitting at the moment. It looks to me like those get updated between 5 and 30 minutes after the `update` script runs, depending on what server (sometimes when I refresh cache/, I get a list that's consistently :05 :20 :35 :50, other times it's variable like :50, :21, :56, :05, :31, :45, :15, :25, :35). So probably this was not at all about the update script being stuck, but about whatever drives syncing, but if you're okay with that sync having magically restarted and being rather variable, we can certainly learn that tbpl-dev actually updates 45 minutes after a push.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Summary: tbpl-dev update script stuck → tbpl-dev site not updating
Updated•10 years ago
|
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in
before you can comment on or make changes to this bug.
Description
•